Gate-to-Gate Energy Consumption in Chemical Batch Plants

Subscriber access provided by Kaohsiung Medical University

Gate-to-gate energy consumption in chemical batch plants: Statistical models based on reaction synthesis type Cecilia Pereira, Ines Hauner, Konrad Hungerbuehler, and Stavros Papadokonstantakis ACS Sustainable Chem. Eng., Just Accepted Manuscript • DOI: 10.1021/ acssuschemeng.7b03769 • Publication Date (Web): 04 Apr 2018 Downloaded from http://pubs.acs.org on April 4, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

1 2 3

Gate-to-gate energy consumption in chemical batch plants: Statistical models based on reaction

4

synthesis type

5 6 7

Cecilia Pereira1, Ines Hauner1, Konrad Hungerbühler1, Stavros Papadokonstantakis2,*

8 1

9 10 11

Vladimir-Prelog-Weg 1, 8093 Zurich, Switzerland 2

Chalmers University of Technology, Division of Energy Technology

12 13

ETH Zurich, Institute for Chemical and Bioengineering,

Hörsalsvägen 7B, 41296 Gothenburg, Sweden *

[email protected] / +46 708129280

14 15 16

1 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

ABSTRACT

2

Energy consumption in the chemical industry is an important operating cost and

3

environmental impact factor and reducing it is also explicitly mentioned as one of the key

4

principles of green chemistry. Energy consumption has thus been included in diverse process

5

design and evaluation tools as a key metric. However, measurements of energy consumption at

6

the process equipment level are scarce, especially in fine chemical production typically

7

performed in multiproduct and multipurpose batch plants. In this work we present a short-cut

8

approach based on statistical models, such as probability density functions (PDF) and

9

classification trees, for estimating steam consumption which typically represents the highest

10

energy utility consumption in batch plants. The output of these models is in the form of

11

intervals derived from PDF interquartile ranges and as classes derived from the classification

12

trees, respectively. The validation results (i.e., goodness of fit, cross validation and case studies)

13

show that the models provide satisfactory interval estimations of steam consumption

14

benchmarking chemical reaction types and performing uncertainty analysis. The models can be

15

primarily used at early design stages for screening purposes, the reaction type being the

16

minimum needed input information, allowing in the case of classification trees also an analysis

17

of the most influencing predictor variables (i.e., reaction type and operating parameters) upon

18

the steam consumption.

19

This study also demonstrates the use of the PDF statistical models to a previously published


Page 2 of 59

Page 3 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1

case study for the production of the intermediate substance 4-(2-methoxyethyl)-phenol, which

2

can be produced from seven different synthesis routes. The ranking of the synthesis routes

3

according to the PDF models shows similar trends to that of an Energy Loss Index proxy

4

indicator which however requires more detailed chemical and process information.

5 6

Keywords

7

Green chemistry; batch processes; energy consumption benchmarking; life cycle energy

8

inventories; early design phase metrics; classification trees.

9



Page 4 of 59

1

INTRODUCTION

2

Energy related impacts often account for over 50% of the total environmental impact of

3

chemical industry 1. Therefore, minimisation of energy use in chemical production, which has

4

been recognized as one of the twelve principles of green chemistry, is a key target for the

5

chemical sector and for environmental regulations 2. As a result, evaluation methods have been

6

developed which include energy use as a design metric in sustainable process design

7

cycle assessment 8-14, efficient heat transfer 15-17, and pinch analysis 18-20, in the academia and

8

industry (e.g., BASF

9

different design alternatives as part of scenario-based analysis and multi-objective optimization.

10

To be able to screen different alternatives at early phases of process design can be more

11

beneficial than in later stages, because the degrees of freedom are more providing flexibility for

12

modifications and improvements with significantly less costs. However, the aforementioned

13

evaluation methods require process energy consumption data, which can be scarce or

14

aggregated, because energy flow measurements at the process equipment level are more

15

complicated and costly compared to material flows. (i.e., requiring mass flow, temperature and

16

pressure measurements of the energy utilities).

21

and GlaxoSmithKline

3-7

, life

22

). These methods can be used to compare

17

Particularly in multiproduct and multipurpose batch plants, energy measurements are

18

highly aggregated (i.e., typically only available at the “production building” level), partly due

19

to complexities caused by the process dynamics of batch operation and partly because material


Page 5 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1

flows are traditionally more important for the batch plant economics. Consequently, process

2

specific energy data has to be modelled or estimated by process experts and rules of thumbs in

3

many cases.

4

Rigorous energy consumption models for batch processes have been addressed in previous 23-24

5

works

6

multipurpose batch plants based on extensive process documentation and sensor data. These

7

models are suitable when high resolution for dynamic optimisation of energy efficiency is

8

needed. Shortcut models of steam consumption in multipurpose batch plants based on standard

9

process documentation, rules of thumb, expert opinion, classical thermodynamics, and model

10

parameterisation, have also been proposed 25. These documentation-based models still require

11

detailed process information as input (e.g., standard operating procedures) which can also be

12

partially confidential. Therefore, another modelling approach is needed when standard

13

operation procedures are not available, namely in early phases of process design, or when very

14

fast estimations have to be performed for screening purposes, for instance to streamline

15

environmental assessment studies. In this context, models of steam consumption are developed

16

in this study based on statistical analysis of production data provided by a consortium of

17

industrial partners representing leading companies in fine chemical and pharmaceutical

18

production.

19

proposing bottom-up approaches for modelling of energy utility consumption in

Statistical analysis has been previously used to diverse kind of data and applications in



Page 6 of 59

1

relevance with process design and environmental assessment. Some examples include

2

relationships between parameterization factors of life cycle inventories and design parameters

3

26

, guidelines for building process and material alternatives within a life cycle inventory

27

4

identification of the most influential data elements in the life cycle inventory stage

28

5

multivariate data analysis to characterize environmental impacts of power production

6

technologies and detect cause-effect relationships 29, hierarchical cluster analysis and principal

7

component analysis for benchmarking the relative sustainability of pharmaceutical production

8

processes 30, statistically rigorous reconciliation of life cycle inventory data according to satisfy

9

thermodynamic principles 31, and uncertainty analysis in LCA inventories 32.

, ,

10

The models of steam consumption proposed in this work are based on probability density

11

functions (PDF) and classification trees (Witten and Frank, 2005, Breiman, et al., 1984). In both

12

cases the output of these models can take the form of intervals rather than point estimations.

13

The hypothesis behind these generic models is that energy utility consumption is mostly

14

dependent on the production processes than on specific reactants, auxiliaries and products.

15

Therefore, the estimation error of these generic models is expected to be partly due, among

16

other factors, to this simplification.

17

The PDF models describe the variability of the gate-to-gate steam consumption for the

18

production of one kilogram of product for a particular reaction type. The type of distribution

19

and the value of its parameters were selected to maximize the probability of generating the


Page 7 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


33

1

sample data by means of the maximum likelihood method

. The goodness of the fit was

2

evaluated using standard statistical tests (Youden, 1950) and the Akaike information criteria 34.

3

The PDF models can provide interval based estimations, for instance by determining the

4

respective interquartile ranges. The classification trees directly provide intervals (i.e., in the

5

form of output classes) of the gate-to-gate steam consumption for the production of one

6

kilogram of product, given a particular set of reaction and process attributes depending on the

7

available information in the respective stage of process design.

8

The PDF models can be used for benchmarking production processes on the basis of the

9

reaction type and facilitate rigorous uncertainty analysis and fast predictions for screening

10

purposes. The classification trees can be used for a finer categorisation of the steam

11

consumption, when not only reaction type but also additional process information is available.

12

Consequently, both types of these new models should be of high importance in early phases of

13

process design, for streamlining LCA, and as benchmarking tools for labelling chemical

14

reactions.

15 16

METHODS Definition of system boundaries

17

As depicted in Figure 1, a chemical synthesis route can in general include one-to-n reaction

18

steps, each of them followed by zero-to-m work-up recovery processes. The processes for

19

which steam consumption models are developed in this work are defined by the grey boxes in



1

Figure 1 corresponding to a single reaction step plus the work-up processes which immediately

2

follow this reaction step, if existent. In the rest of the document this system is referred to as

3

“reaction”. The empirical yield ranges corresponding to the reaction systems of this work are

4

given in Table S4 in the Supporting Information.

5

Special purification steps (e.g., because of unusually strict requirements for the purity of

6

the chemical to be used in the next synthesis step) and drying of the product (e.g., because of the

7

special form that the chemical must be stored, packaged, sold etc.) are not addressed within the

8

system, because this may depend on the relative position of the reaction step within the

9

synthesis path. Thus, the steam consumption models of this work are independent of whether

10

the corresponding reaction is performed, for instance, as the last step of a synthesis route or not,

11

in which case additional work-up processes may be required for delivering the traded form of

12

the chemical and meeting customer specific purities. The available dataset did not allow the

13

definition of significantly large sets for statistical analysis of reaction types separately as last

14

synthesis steps. Thus, if a reaction is performed as a last synthesis step, then the special

15

purification and other processes required for the chemical to take its final marketed form should

16

be calculated separately. Moreover, production of raw materials and auxiliaries, solvent

17

recovery and waste treatment lie outside the reaction system boundaries. The respective models

18

and tools for the scope of comprehensive cradle-to-gate LCA are generally available elsewhere

19

(e.g., Ecoinvent,

35

, Finechem, 8, Ecosolvent,

36

) and are complementary to the models


Page 8 of 59

Page 9 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1

presented in this work. Thus, for LCA purposes, a chemical synthesis path can be modelled as a

2

sequence of distinct reaction-steps, where the individual reaction models proposed here can be

3

used for filling in gate-to-gate steam consumption data gaps and cradle-to-gate life cycle

4

inventories can be provided by typical LCA tools and databases.

5

Data sources, predictor variables and output classes

6

The data for the model development were provided by nine industrial partners in

7

Switzerland, Germany, France and United States, covering different sectors from basic

8

chemicals to pesticides and pharmaceutical products. In most cases the primary data were in the

9

form of standard operation procedures describing in detail the way the unit operations are

10

performed in the production plants. This data was used to model the steam consumption

11

(kilograms of steam at 6 bar per kilogram of product) according to the method presented

12

elsewhere 25 for the reaction boundary system described above. Each of these modelled values

13

represents one data point in the training dataset (250 points) for the development of the PDF

14

and the classification tree models. Additional data from the industrial partners that have not

15

been used for model development were used for testing and comparing the performance of the

16

classification trees and the PDF models. This testset consists of 17 modelled data of steam

17

consumption for chemical production in similar multiproduct/multipurpose batch plants like

18

those used in the training setnot all of which, however, correspond to the same modelling

19

approach 25.



1

The collected data cover a wide range of frequently performed reactions in the chemical

2

industry. Nevertheless, this is not a comprehensive study of all existing reactions in chemical

3

production sites but rather a methodology and set of models for benchmarking the most

4

common reaction types with respect to their energy (i.e., steam) consumption, while motivating

5

practitioners in academia and industry to extend it in other reactions of their interest. The

6

selected reaction classes and the reaction classification procedure are shown and explained in

7

the Supporting Information (Figure S1). The reaction type constitutes the main predictor

8

variable for the PDF models and one of the predictor variables for the classification trees.

9

Classification trees may include several other predictor variables of nominal, binary and

10

continuous type depending on the stage of process design (Table1). At the earliest stage of

11

process design (S1), only information regarding the reaction type is available. At the next stages,

12

the type of work-up processes (S2), operating parameters such as operation time and

13

temperatures (S3) and mass flows (S4) are respectively known. All these variables can be

14

determined in conceptual design stages based on laboratory experiments and simple scale-up

15

calculations (e.g., what kind of separations are most likely applicable, what are the maximum

16

expected temperatures and the expected mass efficiencies, etc.). The latest stage of design (S5)

17

assumes that the steam consumption of the energy intensive distillation processes is available.

18

The classification trees developed for S5 can be used to compare the performance of the S1-S4

19

models, which are missing the important predictor of the distillation steam consumption.


Page 10 of 59

Page 11 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1

To test the effect of the number of predictor variables in the performance of the 37

2

classification trees

3

dataset-1 (Table 1) comprises only a subset of predictors from the more inclusive dataset-2

4

(Table S2), selected based on empirical knowledge about their influence on steam consumption.

5

For all classification models, the target variable (i.e., steam consumption) is discretized

6

into a certain number of intervals (classes); thus all classification trees provide as output a range

7

of steam consumption ranges rather than point estimation. To specify the number and the width

8

of the steam consumption intervals, a histogram of the steam consumption data indicated a

9

trade-off between classes of equal size and sufficient sample size in every interval. To test the

10

influence of the number and size of classes, two scenarios were tested with three (Table 2) and

11

five classes (Table S3), respectively.

12

two different training datasets (dataset-1 and dataset-2) were used:

Model development and evaluation metrics

13

Classification trees represent rules underlying data with hierarchical, sequential structures.

14

These rules partition the data in every tree node based on a particular predictor variable value.

15

At every node, the resulting split optimizes the classification for the respective tree depth. The

16

tree is typically grown to its full size achieving maximum classification performance for the

17

training data (e.g., using the CART algorithm

18

validation procedure to safeguard against overfitting. In this study, cross validation was

19

performed by randomly dividing the data into ten equal partitions (i.e., stratified ten-fold cross

38

) and then pruned back according to a cross



Page 12 of 59

1

validation), keeping in turn one tenth of the data for testing and the remaining nine-tenths of the

2

data for training. Thus, the training procedure is performed ten times on different datasets and

3

every time the cross validation defines the degree that the tree must be pruned (e.g., based on

4

the accuracy, see Eq. 1). From the average performance during the ten-fold cross validation the

5

pruning degree of the tree and the modelling accuracy and generalisation metrics are inferred.

6

Then, this pruning degree is imposed in a tree trained with all the available data to propose a

7

final classification tree. This is a standard procedure, when the data for training and testing is

8

limited 37.

9

Three different metrics, namely sensitivity, specificity and accuracy, are calculated to

10

assess the classification performance for each output class and the overall performance of the

11

classification tree. These metrics can be expressed as follows:

12

Accuracy = (TP+TN)/(TP+TN+FP+FN)

(1)

13

Sensitivity = TP/(TP+FN)

(2)

14

Specificity = TN/(TN+FP)

(3)

15

where TP (true positives) accounts for the number of instances belonging to one class and

16

predicted within that class, FN (false negatives) is the number of instances belonging to one

17

class but not predicted within that class, FP (false positives) is the number of instances

18

predicted to be in one class but not belonging to that class, and TN (true negatives) is the

19

number of instances not belonging to one class and not being predicted into that class. While


Page 13 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1

high sensitivity for a certain output class implies a very general model for this class, namely a

2

lot of data points fitting into few rules, low sensitivity denotes a very specific model with many

3

rules containing small portions of data. This trade-off is best depicted in plots of sensitivity

4

against (1-specificity), also called receiver operating characteristic (ROC) plots 39.

5

Besides the tree performance for every output class and overall, extraction of important logical

6

rules is an important step towards model interpretability and transparency. The importance of

7

these rules can be considered in relation to the importance of certain predictor variables, in the

8

sense that an important rule should also contain important predictors. This importance can be

9

quantified considering the risk reduction from parent to children nodes due to splitting on every

10

predictor variable (see also formulas in the Supporting Information, Figure S3).

11

The selection of important predictors can also be used for further parameterisation of the PDF

12

models, the second type of models developed in this work. These models are probability

13

density functions fitted to different datasets, each of them defined by one reaction type.

14

Parameterisation of these models means that for a specific reaction type the dataset is divided

15

into subsets according to process parameters, such as temperature, operation time, etc. An

16

example of this parameterisation procedure is the partition of the dataset for the condensation

17

reaction into two datasets, a first one for condensations where distillation processes take place

18

and a second one where distillation processes do not take place. In general, the need for further

19

parameterisation arises when the intervals of the initial PDF models are very broad or when the



1 2 3

goodness of fit indicates a poor fitting of the data. RESULTS AND DISCUSSION Classification trees

4

For analyzing the performance of the classification trees we start by considering dataset-1

5

as training set and three output classes. The selection of priors (i.e., estimates of the probability

6

that randomly sampling an instance from a population will yield the given class) in this study

7

did not influence the model performance at any of the design stages considered in this work.

8

The results of the cross validation for the five classification trees (S1 to S5) are depicted in

9

Figure S4a in the Supporting Information as an ROC plot. As expected, an improvement from

10

S1 to S5 for both training and test sets is observed, since more process information is available

11

for the models. More importantly, overfitting is avoided since the model performance for the

12

training and test sets is similar, with a difference of less than 11% and 6% for the sensitivity and

13

specificity respectively. The classification trees with five output classes (Figure S4b in the

14

Supporting Information) follow similar trends but have generally lower sensitivity values.

15

Moreover, the additional resolution for the case of the five output classes is not well supported

16

from the independent class analysis, as discussed later in this section. Overall, the results of the

17

preliminary analysis indicate that the case of three output classes and eight predictor variables

18

(dataset-1) presents the best performance and interpretability due to lower model complexity

19

(see also the Supporting Information and Figure S2 for further discussion on the model


Page 14 of 59

Page 15 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1

selection).

2

Figure 2 shows the performance of the selected model per output class. The low and high

3

classes tend to form clusters, while the middle class is rather scattered. The low class presents

4

higher sensitivity and slightly lower specificity than the high and middle classes. This is

5

because of the more data points belonging to the low class than to the other classes in a ratio of

6

approximately 2 to 1. In general, when the class sizes are not equal, the model favours the larger

7

class in terms of sensitivity and overall success rate or accuracy, but performs less well

8

regarding specificity. Following the low class, the high class shows a better performance than

9

the middle class. Moreover, for the middle class we observe a performance improvement from

10

S1 to S5 in terms of sensitivity, while the other two classes improve in terms of specificity.

11

Overall, these results show that except for the middle class in S1, all other classes at different

12

stages of process design appear at the left side of the ROC plot, indicating satisfactory model

13

performance. Similarly, the classification trees with five output classes (depicted in Figure S5

14

in the Supporting Information) show a poor performance for the respective middle classes,

15

namely the middle-low, middle, and middle-high classes. These results also support the

16

decision of having three output classes instead of five, since the higher resolution regarding the

17

middle classes did not lead to improvement of the performance of the overall tree or the specific

18

output classes. The sensitivity and specificity values as well as the distances to the (0,1) point

19

and the random line for the three output classes trees built from dataset-1 can be found in Table



1

S5 in the Supporting Information.

2

Table 3 summarizes the most important classification rules for the models at each one of

3

the S1 to S5 stages, each rule corresponding to a path in the decision tree (the full classification

4

trees for every design stage are presented in Tables S8-S12 in the Supporting Information). For

5

every output class the most sensitive rule was selected as the most important one, since

6

sensitivity was found to be the most critical metric, especially for the middle class. For example,

7

Figure 3 shows the classification tree developed for S4 with a total of seven paths

8

corresponding to seven rules. The path leading to the highlighted high class output corresponds

9

to rule-3 for S4 and can be stated as follows:

10

IF the reaction type is acylation OR alkylation OR complexation OR condensation OR

11

hydrolysis OR polymerisation OR reduction, AND the operation time is higher than 18 hours

12

THEN the steam consumption is high.

13

The performance of this model is presented in Figure 4a by means of a resubstitution

14

performance, where the training data is presented on the x-axis and the predicted classes on the

15

y-axis. In agreement with what was observed in Figure 2, the S4 model is performing very well

16

for the low and high classes, and satisfactorily for the middle class. Moreover, combining the

17

information in Figure 4a with that of Table 2, the following can be inferred for the performance

18

of the S4 model:

19

When the model output is “low steam consumption” there is 11% probability that this is


Page 16 of 59

Page 17 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1 2 3 4 5

underestimation. When the model output is “medium steam consumption” there is 18% probability that this is underestimation and 9% probability that this is overestimation. When the model output is “high steam consumption” there is 26% probability that this is overestimation.

6

The more “severe” misclassification errors (i.e., high-to-low or low-to-high) have a

7

probability of 4% and 5% in the cases that the model predicts “high” or “low” steam

8

consumption, respectively.

9

Considering that the main use of these models would be at early cases of process design for

10

prescreening synthesis options or for a fast estimation (i.e., before detailed process simulations

11

and pilot plant experimentation), these numbers are satisfactory, taking also into account the

12

respective inaccuracies in other life cycle assessment metrics, costs estimations etc., at these

13

early stages.

14

Moreover, if the target of prescreening process options is to mostly avoid losing

15

“interesting” process options (i.e., low steam consumption in this regard), then only 5% for of

16

such options would be screened out. However, it should be noted that better performance with

17

respect to specificity of the medium and high consumption classes can be achieved by

18

increasing the number of data points for these categories in future studies.

19

In Table 3, the reaction type appears in all rules from S1 to S4, indicating a high importance



1

of this predictor variable. This is also confirmed by the predictor importance plot (Figure S3 in

2

the Supporting Information), where the reaction type presents the highest importance from S1

3

to S4 among all attributes. The operation time, which appears in two rules in S3 and S4, has also

4

a high influence. This can also be observed in Figure S3 in the Supporting Information. On the

5

other hand, the rules derived for S5 follow a different pattern. As expected, the distillation

6

steam consumption being the main additional predictor in S5 dominates the classification rules,

7

as it is very important part of the target attribute. Other differences in S5 classification trees

8

refer to the reaction type not being part of the rules, while the temperature, which was not

9

appearing in S3 and S4, is now included in two of the rules. Again, the same trends can be also

10

observed in the predictor importance plot in Figure S3 in the Supporting Information.

11

Summarizing, it is verified that the most sensitive rules for the classification trees include the

12

most important predictor variables.

13

PDF models

14

The most important predictor variable for S1 to S4 was shown to be the reaction type,

15

which verifies the decision to construct PDF models on this minimum process information. In

16

addition, the extracted rules from the classification trees helped to further parameterize the PDF

17

models, where necessary. Table 4 shows the empirical median, minimum and maximum steam

18

consumption values of the different datasets corresponding to reaction types (first column) and

19

the further parameterized subsets (second column). It also includes the fitted PDF with the


Page 18 of 59

Page 19 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1

corresponding parameters, their median, first and third quartiles, and 2.5th and 97.5th percentiles.

2

The assessment of the goodness of the fit is also given and further discussed in Table S6 in the

3

Supporting Information. The performance of the PDF models when providing interval

4

estimation of the steam consumption based on the interquartile ranges is depicted in Figure 4b.

5

The resubstitution performance is in most cases similar among the different reaction types

6

(60% average of true positives) except for the polymerisation and elimination reactions which

7

present a slightly poorer performance (40% true positives), and the reduction reaction which

8

performs slightly better (70% true positives) than the rest. This is an indication that for the

9

polymerisation and elimination reactions the hypothesis that steam consumption can be well

10

predicted based on the reaction type is not verified. In general, although the predictive

11

performance of the PDF models is inferior to the one of the classification trees with additional

12

process related predictor variables, they can still provide useful information for an interval

13

estimation based on the interquartile ranges. Additionally, the PDF models can benchmark for

14

chemical reaction types performed in industrial operations with respect to their place in the

15

distribution of the same reaction type family. Furthermore, the PDF models allow for a more

16

rigorous uncertainty analysis compared to the interval estimations, by sampling from the

17

respective distributions.

18

In cases where further parameterisation was performed for the PDF models (i.e., for 5 out

19

of the 13 reaction models), the original dataset for a reaction type was partitioned into two



1

subsets. In all cases the partition results in one lower and one higher interquartile interval.

2

These interquartile intervals have the same or smaller width with the original interquartile

3

intervals and present a significant degree of overlapping with them. Only for the polymerisation

4

reaction no overlapping was observed because of the gap of values in the empirical distribution

5

of this reaction group. As a result of the parameterisation, the PDF models for alkylation,

6

condensation and polymerisation maintain the same percentage of true positives with narrower

7

interval predictions. The only exception is the family of acylation reactions where the narrower

8

intervals are combined with a decrease in the number of true positives decreases (see Table S7

9

in the Supporting Information); thus in this case a comparison between the parameterized and

10

the parent PDF models is not straightforward. Overall, these results support the increased

11

resolution of the models as a result of the further parameterisation. Continuing the

12

parameterization using additional predictor variables could theoretically improve the PDF

13

model performance; however such a parameterization could not be supported by the available

14

amount of data for statistically significant inference.

15

Therefore, for prediction purposes, the use of higher order classification trees should be

16

preferred, assuming availability of the respective predictor variable values. On the other hand,

17

when the input information of the classification trees is comparable with the one of the PDF

18

models, as for instance in the cases of the S1 classification trees, a lower prediction

19

performance compared to the S4 classification tree is expected (Figure S6 in the Supporting


Page 20 of 59

Page 21 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1 2

Information), which is much closer to the level of the PDF models. Case study results

3

Both types of models were applied to the additional case study dataset and the results are

4

presented in Figure 5. The 30% error limit is considered to be acceptable for shortcut models at

5

early design stages

6

performance of the models (training set) and on the bottom (b) the performance on the external

7

case study dataset (not used for training). In both cases we observe that the PDF models and the

8

S1 classification tree perform similarly within a difference range of 13% (i.e., regarding the

9

respective sections of the bars). The rest of the trends are also similar between (a) and (b),

10

although generally the performance of the models is inferior in the external dataset. This can be

11

due to the fact that not all steam consumption target values were derived by utilizing standard

12

operating procedure documents, which were always available for the training set. However, in

13

both cases, more than 80% of the predictions were not underestimated by more than 30%. This

14

is an additional positive feature for the robustness of the model, in terms of safeguarding the

15

predictions from severe steam consumption underestimation.

40-41

. On the top of Figure 5 (a), we see the respective resubstitution

16

Summarizing the results presented in this section, for a first approximate estimation of

17

steam consumption and fast screening purposes the PDF models or the S1 trees can be used..

18

Higher order classification trees (S4-S5) provide more satisfactory steam consumption

19

estimations and can also be used for revealing key process features that define in a large extent



1 2 3

the level of energy consumption (high, middle, low). Demonstration of the PDF models to a case study for the production of 4-(2-methoxyethyl)-phenol

4

The PDF models were applied to a previously published case study (Albrecht et al., 2010)

5

for the production of 4-(2-methoxyethyl)-phenol from seven different synthesis routes (Figure

6

6). The procedure of estimating the steam consumption of the different synthesis routes using

7

the PDF models is demonstrated in detail and the resulting ranking of the alternatives is

8

compared to the one derived by the proxy indicators of Albrecht et al. (2010). The reason that

9

only the PDF models are used is to demonstrate what one should expect when the only

10

information available is the chemical synthesis path. Table 5 depicts the reaction types at every

11

step for the seven synthesis routes. From the 30 reactions, 13 belong exactly to reactions

12

considered in the dataset for the development of the PDF statistical models. Seven reactions

13

steps cannot be exactly matched to any of the reactions in the dataset but belong to one of the

14

reaction categories (e.g., the O-Alkylation reaction with dimethyl sulfate as reactant in step

15

A-3). In this case in order to fill the data gap the steam consumption is estimated considering

16

the PDF model for Alkylation. Ten remaining reactions could not be assigned to any of the

17

reactions included in the training dataset (e.g., nitration in step A-1). A default value of 1.2 kg

18

per kg of product, calculated as the average over the median values from all PDF models, was

19

considered in order to fill the data gaps in these cases.


Page 22 of 59

Page 23 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1

The steam consumption for a complete synthesis route can be estimated by adding the

2

median values derived from the PDF models for every reaction step. The model values given in

3

kilograms of steam per kilogram of product were multiplied by the corresponding reaction yield

4

values, assuming stoichiometric ratios. As it can be seen in Figures 7a and 7b, route E

5

represents the best alternative according to the PDF models for steam consumption, the Mass

6

Loss Index (MLI) and the Energy Loss Index (ELI) defined in the work of Albrecht et al.,

7

(2010). The MLI is defined in this case, as the sum of the mass ratios of all coupled products

8

and by-products to intermediate or end product. Other input materials into the system such as

9

solvents, auxiliaries, etc. are not considered in this definition of the MLI, since this information

10

is not available at earliest design stages. The ELI proxy indicator is calculated on the basis of

11

four parameters: the concentration of water at the reactor outlet, the difference of the boiling

12

point temperatures between the product and the substance which has the closest boiling point to

13

the product, the MLI values for each reaction step, and the reaction energy. All these parameters

14

are first scaled according to empirical criteria and then weighted and aggregated to give the ELI

15

value. The rankings given of the PDF models and the ELI indicator are generally in agreement,

16

with the exception of routes C and D. Overall the ranking according to the ELI indicator

17

presents a higher similarity to the PDF model predictions than the MLIs. This is consistent with

18

the fact that an index based only on the reaction mass yield does not necessarily correlate with

19

energy consumption. A fairly good correlation of a mass index, namely the Process Mass



1

Intensity (PMI), with an energy related indicator, such as the cradle-to-gate Global Warming

2

Potential (GWP), was observed in the work of Jimenez-Gonzalez et al., (2011). However, in

3

this case, the mass index (PMI) includes the total mass of materials per mass of product, for

4

instance reactants, reagents, solvents used for reaction and separation and catalysts, being this a

5

more robust, but also a more data intensive indicator than the MLI as defined by Albrecht et al.,

6

(2010).

7

In order to evaluate the uncertainty of the ranking results, a Monte Carlo simulation was

8

carried out. For each reaction step of the synthesis routes, values of steam consumption were

9

repeatedly sampled by the PDF models. Then, standard ANOVA tests were performed to

10

analyze the differences between the means of these samples. The data sample for each synthesis

11

route was obtained by adding the sample values for each reaction step within the route. For the

12

reaction steps without a corresponding PDF model, a uniform distribution was considered for

13

the generation of the sample. The parameters of the uniform distribution were assigned the

14

minimum and maximum values of 0 and 2.4 kg per kg of product respectively. These values

15

were derived considering the mean value of the distribution to be the default value of 1.2 kg per

16

kg of product.

17

A boxplot diagram of the different reaction route samples is shown in Figure 8. Two

18

different groups of synthesis routes can be observed, namely E, G, D, A with lower median

19

values and interval ranges, and F, B, C with higher median values and interval ranges. The


Page 24 of 59

Page 25 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1

results of the ANOVA test (not presented here) showed that it is possible to further discriminate

2

among some routes within these two groups. It was determined that route C is significantly

3

different from B and F, route E from route G. and route A from route D.

4

The results depicted in this case study show the applicability of the PDF models –

5

considering median values or intervals – for a fast ranking of alternative chemical synthesis

6

routes with respect to their energy consumption at early stages of process design. The PDF

7

models provide a similar ranking as by using a more complex and process data intensive

8

indicator such as the ELI. The PDF models cannot obviously be a “perfect” match to the

9

relatively richer in information ELI index. Moreover, in this specific case study, the prediction

10

of route D (one of the two routes not correctly ranked) is somewhat cumbersome because half

11

of the reactions involved in the synthesis path do not match with any of the reactions for which

12

PDF models were developed and are thus calculated based on default values.

13

Finally, one should consider the proper way of using the PDF based ranking (i.e., not only

14

the one based on the point estimations but also the Monte Carlo and ANOVA analysis), namely

15

to prioritize the most promising routes for the next level of more detailed analysis. For instance,

16

looking at the ELI based ranking in Figure 8, one would prioritize routes E and G from energy

17

consumption perspective, followed by C and A, while F, B and D would most probably be

18

discarded from the next analysis steps. Based on the point estimations of the PDF models, one

19

would also clearly prioritize E and G, followed by D and A. Knowing the aforementioned PDF



1

data gaps considering route D, one would probably still consider it for the next stage of analysis

2

and try to fill in data gaps by process simulations, etc. The fact that route D would also pass to

3

the next stage of detailed process simulation (or perhaps to ELI or other index-based

4

prioritization), should not be considered as a drawback of the PDF models, which in principle

5

do not want to lose at this stage any major opportunity, while still reducing the number of

6

options for the next stage of analysis following a relatively simple calculation procedure.

7 8

CONCLUSIONS

9

We have proposed shortcut models of steam consumption for production processes in

10

chemical batch plants in the form of generic intervals based on modelling and statistical

11

analysis of industrial production data. The developed models are in the form of probability

12

density functions (PDFs) and classification trees and can be used at different levels of

13

information for the chemical process under investigation, the minimal being the reaction type.

14

The resubstitution performance of the PDF models indicates that for most reaction types, except

15

for polymerisation and elimination, the interquartile ranges can provide satisfactory interval

16

estimations when the reaction type is the only available process information. This can be of

17

particular interest when a first screening of various chemical synthesis path is required, before

18

proceeding to experimentation or laborious process design calculations, to eliminate some of

19

the process options if statistically significant differences in the estimated steam consumption


Page 26 of 59

Page 27 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1

are detected. Additionally, the PDF and the classification trees generalisation capability was

2

validated in a case study. It was shown that, in average, more than 80% of the predictions were

3

not underestimated by more than 30%, being this a satisfactory performance for shortcut

4

models at early design stages.

5

The new shortcut steam models developed in this work represent a potentially useful tool

6

for estimating steam consumption of production processes, when limited process information is

7

available or when overwhelming processes have to be screened in short time. Although the PDF

8

models allow for reasonable predictions of steam consumption, their most interesting

9

application will be to benchmark chemical reaction types and facilitate rigorous uncertainty

10

analysis. Thus, the shortcut models presented here are especially suitable for applications in the

11

fields of process design and for streamlining LCA studies.

12

ACKNOWLEDGEMENTS

13

We thank Professor Stefanie Hellweg, the Swiss Federal Office for the Environment, the Swiss

14

Federal Office for Energy, and our industrial partners for their support in this project.

15

REFERENCES

16

1.

Wernet, G.; Mutel, C.; Hellweg, S.; Hungerbuehler, K., The Environmental Importance

17 18 19

of Energy Use in Chemical Production. J. Ind. Ecol. 2011, 15 (1), 96-107, doi 10.1111/j.1530-9290.2010.00294.x 2. Jenck, J. F.; Agterberg, F.; Droescher, M. J., Products and processes for a sustainable

20 21 22 23

chemical industry: a review of achievements and prospects. Green Chem. 2004, 6 (11), 544-556, doi 10.1039/b406854h 3. Albrecht, T.; Papadokonstantakis, S.; Sugiyama, H.; Hungerbühler, K., Demonstrating multi-objective screening of chemical batch process alternatives during early design phases.

24

Chem. Eng. Res. Des. 2010, 88 (5-6A), 529-550, doi 10.1016/j.cherd.2009.11.009



1

4.

Cano-Ruiz, J. A.; McRae, G. J., Environmentally conscious chemical process design.

2 3

Annu Rev Energy Env 1998, 23, 499-536, doi 10.1146/annurev.energy.23.1.499 5. Chen, H.; Shonnard, D. R., Systematic framework for environmentally conscious

4 5

chemical process design: Early and detailed design stages. Ind. Eng. Chem. Res. 2004, 43 (2), 535-552, doi 10.1021/ie0304356

6

6.

Ruiz-Mercado, G. J.; Smith, R. L.; Gonzalez, M. A., Sustainability indicators for

7 8 9 10

chemical processes: I. Taxonomy. Ind. Eng. Chem. Res. 2012, 51 (5), 2309-2328, doi 10.1021/ie102116e 7. Sugiyama, H.; Fischer, U.; Hungerbuhler, K.; Hirao, M., Decision framework for chemical process design including different stages environmental, health, and safety

11 12 13

assessment. AlChE J. 2008, 54 (4), 1037-1053, doi 10.1002/aic.11430 8. Wernet, G.; Papadokonstantakis, S.; Hellweg, S.; Hungerbuhler, K., Bridging data gaps in environmental assessments: Modeling impacts of fine and basic chemical production. Green

14 15

Chem. 2009, 11 (11), 1826-1831, doi 10.1039/b905558d 9. Kim, S.; Overcash, M., Energy in chemical manufacturing processes: gate-to-gate

16 17 18 19

information for life cycle assessment. J. Chem. Technol. Biotechnol. 2003, 78 (9), 995-1005, doi 10.1002/jctb.821 10. Kniel, G. E.; Delmarco, K.; Petrie, J. G., Life cycle assessment applied to process design: Environmental and economic analysis and optimization of a nitric acid plant. Environ.

20 21

Prog. 1996, 15, 221-228, doi 10.1002/ep.670150410 11. Bretz, R.; Frankhauser, P., Screening LCA for large numbers of products. Int. J. Life

22 23 24 25

Cycle Assess. 1996, 1 (3), 139-146, doi 10.1007/BF02978941 12. Van der Vorst, G.; Van Langenhove, H.; De Paep, F.; Aelterman, W.; Dingenen, J.; Dewulf, J., Exergetic life cycle analysis for the selection of chromatographic separation processes in the pharmaceutical industry: preparative HPLC versus preparative SFC. Green

26 27 28

Chem. 2009, 11 (7), 1007-1012, doi 10.1039/b901151j 13. Romero-Hernandez, O., To treat or not to treat? Applying chemical engineering tools and a life cycle approach to assessing the level of sustainability of a clean-up technology. Green

29 30

Chem. 2004, 6 (8), 395-400, doi 10.1039/b401871k 14. Hellweg, S.; Fischer, U.; Scheringer, M.; Hungerbuhler, K., Environmental assessment

31 32 33

of chemicals: methods and application to a case study of organic solvents. Green Chem. 2004, 6 (8), 418-427, doi 10.1039/b402807b 15. Vaklieva-Bancheva, N.; Ivanov, B. B.; Shah, N.; Pantelides, C. C., Heat exchanger

34 35 36

network design for multipurpose batch plants. Comput. Chem. Eng. 1996, 20 (8), 989-1001, doi 10.1016/0098-1354(95)00217-0 16. Phillips, C. H.; Lauschke, G.; Peerhossaini, H., Intensification of batch chemical

37 38 39

processes by using integrated chemical reactor-heat exchangers. Appl. Therm. Eng. 1997, 17 (8-10), 809-824, doi 10.1016/s1359-4311(96)00061-0 17. Oppenheimer, O.; Sorensen, E., Comparative energy consumption in batch and


Page 28 of 59

Page 29 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1 2

continuous distillation. Comput. Chem. Eng. 1997, 21, 529-534, doi 10.1016/s0098-1354(97)87556-4

3 4 5

18. Smith, R., Chemical process design. McGraw Hill: New York, 1995. 19. Shenoy, U. V., Heat exchanger network synthesis: process optimization by energy and resource analysis. Gulf Publishing Co.: Houston, 1995.

6 7 8

20. Linnhoff, B., Pinch analysis - A state-of-the-art overview. Chem. Eng. Res. Des. 1993, 71 (A5), 503-522, doi 21. Saling, P.; Kicherer, A.; Dittrich-Krämer, B.; Wittlinger, R.; Zombik, W.; Schmidt, I.;

9

Schrott, W.; Schmidt, S., Eco-efficiency analysis by BASF: The method. Int. J. Life Cycle

10 11

Assess. 2002, 1-16, doi 10.1065/lca2002.06.083.1 22. Jiménez-González, C.; Constable, D. J. C.; Curzons, A. D.; Cunningham, V. L.,

12

Developing GSK’s green technology guidance: methodology for case-scenario comparison of

13 14

technologies. Clean Techn Environ Policy 2002, 4, 44-53, doi 10.1007/s10098-001-0134-7 23. Bieler, P. S.; Fischer, U.; Hungerbuhler, K., Modeling the energy consumption of

15 16 17 18

chemical batch plants: Bottom-up approach. Ind. Eng. Chem. Res. 2004, 43 (24), 7785-7795, doi 10.1021/ie049641j 24. Szïjjarto, A.; Papadokonstantakis, S.; Fischer, U.; Hungerbühler, K., Bottom-up modeling of the steam consumption in multipurpose chemical batch plants focusing on

19 20 21 22

identification of the optimization potential. Ind. Eng. Chem. Res. 2008, 47, 7323-7334, doi 10.1021/ie071291o 25. Pereira, C.; Papadokonstantakis, S.; Rerat, C.; Hungerbühler, K., Industrial documentation-based approach for modeling the process steam consumption in chemical batch

23 24 25

plants. Ind. Eng. Chem. Res. 2013, 52 (44), 15635-15647, doi 10.1021/ie401198w 26. Mueller, K. G.; Lamperth, M. U.; Kimura, F., Parameterised inventories for life cycle assessment - Systematically relating design parameters to the life cycle inventory. Int. J. Life

26 27

Cycle Assess. 2004, 9 (4), 227-235, doi 10.1065/lca2004.03.147 27. Cooper, J.; Godwin, C.; Hall, E. S., Modeling process and material alternatives in life

28 29

cycle assessments. Int. J. Life Cycle Assess. 2008, 13 (2), 115-123, doi 10.1065/lca2007.06.341 28. Canter, K. G.; Kennedy, D. J.; Montgomery, D. C.; Keats, J. B.; Carlyle, W. M.,

30 31 32 33

Screening stochastic life cycle assessment inventory models. Int. J. Life Cycle Assess. 2002, 7 (1), 18-26, doi 10.1065/lca2001.08.063 29. Cosmi, C.; Loperte, S.; Macchiato, M.; Pietrapertosa, F.; Ragosta, M.; Salvia, M., Life cycle assessment and multivariate data analysis for an integrated characterisation of the

34 35 36

technologies for electric energy production. Adv. Air Pollut. Ser. 2004, 14, 67-75, doi 30. Curzons, A. D.; Jimenez-Gonzalez, C.; Duncan, A. L.; Constable, D. J. C.; Cunningham, V. L., Fast life cycle assessment of synthetic chemistry (FLASC (TM)) tool. Int. J.

37 38

Life Cycle Assess. 2007, 12 (4), 272-280, doi 10.1065/lca2007.03.315 31. Hau, J. L.; Yi, H. S.; Bakshi, B. R., Enhancing life-cycle inventories via reconciliation

39

with the laws of thermodynamics. J. Ind. Ecol. 2007, 11 (4), 5-25, doi 10.1162/jiec.2007.1165



1 2

32. Maurice, B.; Frischknecht, R.; Coelho-Schwirtz, V.; Hungerbühler, K., Uncertainty analysis in life cycle inventory. Application to the production of electricity with French coal

3 4 5

power plants. J. Clean. Prod. 2000, 8, 95-108, doi 10.1016/S0959-6526(99)00324-8 33. Myung, I. J., Tutorial on maximum likelihood estimation. J. Math. Psychol. 2003, 47 (1), 90-100, doi 10.1016/s0022-2496(02)00028-7

6 7 8

34. Akaike, H., New look at statistical-model identification. IEEE T. Autmoat. Contr. 1974, AC19 (6), 716-723, doi 10.1109/tac.1974.1100705 35. Frischknecht, R.; Jungbluth, N.; Althaus, H. J.; Doka, G.; Dones, R.; Heck, T.; Hellweg,

9

S.; Hischier, R.; Nemecek, T.; Rebitzer, G.; Spielmann, M., The ecoinvent database: Overview

10 11

and methodological framework. Int. J. Life Cycle Assess. 2005, 10 (1), 3-9, doi 10.1065/lca2004.10.181.1

12 13

36. Capello, C.; Hellweg, S.; Badertscher, B.; Betschart, H.; Hungerbuhler, K., Part 1: The ecosolvent tool - Environmental assessment of waste-solvent treatment options. J. Ind. Ecol.

14 15 16 17 18

2007, 11 (4), 26-38, doi 10.1162/jiec.2007.1231 37. Witten, I.; Frank, E., Data mining. second ed. ed.; Morgan Kaufmann Publishers: Massachusetts, 2005. 38. Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., Classification and regression trees. Chapman and Hall: New York, 1984.

19 20 21 22 23

39. Youden, W. J., Index for rating diagnostic tests Biometrics 1950, 6 (2), 172-173, doi 10.2307/3001825 40. Bumann, A. A.; Papadokonstantakis, S.; Sugiyama, H.; Fischer, U.; Hungerbuehler, K., Evaluation and analysis of a proxy indicator for the estimation of gate-to-gate energy consumption in the early process design phases: The case of organic solvent production. Energy

24 25 26

2010, 35 (6), 2407-2418, doi 10.1016/j.energy.2010.02.023 41. Turton, R.; Bailie, R. C.; Whiting, W. B.; Shaeiwitz, J. A., Analysis, synthesis, and design of chemical processes. Prentice Hall: New Jersey, 1998.

27


Page 30 of 59

Page 31 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1

SUPPORTING INFORMATION

2 3 4

Methodology followed for chemical reaction classification Statistics of reaction yields in the dataset Description of predictor variables, and discretization of steam consumption

5 6 7

Further insights in model selection and predictor importance Assessment of the “goodness of fit” for the PDF models Detailed description of the classification trees in the form of IF-THEN rules

8



1

TABLES

2 3

Table 1: Definition of the predictor variables in the dataset-1.

Stage S1

Predictor

Type

Description

reaction type

categorical

Reaction type defined according to Figure S1

mechanism

categorical

Reaction mechanism defined according to Table S1

variable

Total S2

2 distillation

binary

Indicates presence or absence of distillation processes during the reaction work-up. It refers to simple evaporation or distillation under reflux conditions (yes/ no)

reflux

binary

Indicates presence or absence of reflux conditions during the reaction synthesis or during the reaction work-up (yes/ no)

Total* S3

4 Tmax

continuous

Maximal operation temperature in °C

time

continuous

Sum over time in hours required for heating of the reaction mixture, solvent evaporation, keeping the temperature constant above the atmospheric temperature under reflux conditions or not, during the reaction synthesis and work-up processes within the defined boundary system

Total* S4

6 PMI

continous

Total* S5

Process Mass Intensity1 7

Steamdist

continous

Total*

Steam consumption during distillation processes 8

4 5 6

*The total number of predictor variables per design stage is cumulative, meaning that at a certain stage i the variables appearing

7

at the previous stages are also present at stage i.

8

1

PMI =

mtotal m product

, where mtotal is the total input mass of raw materials and mproduct is the mass of product.


Page 32 of 59

Page 33 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1 2

Table 2: Discretized intervals of steam consumption (target values) considering three output classes. The values are given in kilograms of steam consumption per kilogram of product.

3 Class label

Interval

Number of data points

High

3-16

61

Middle

1-3

51

Low

0-1

122

4 5



1

Table 3: Most important rules of the classification trees at the five stages of process design (S1 to S5).

Class*

Steamdist

PMI

T max

Time

Distillation

acylation (cyanur chloride) , azo-counpling, diazotisation,

Reflux

Reaction

1

***

Rule

S1

Mechanism

2 Stage

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 34 of 59

low

elimination, halogenation, sulfonation 2

acylation,

alkylation,

complexation,

condensation, HC,SN1,SN2,SNAr

high

hydrolysis, polymerisation, reduction S2

1

acylation (cyanur chloride), azo-counpling, diazotisation,

low


acylation,

alkylation,

complexation,

condensation, HC,SNAr

no

middle

condensation,

yes

high

hydrolysis, polymerisation, reduction 3

acylation,

alkylation,

complexation,


1


low


acylation,

alkylation,

complexation,

condensation, HC,SN2,SNAr

no

18

high


acylation,

alkylation,

complexation,

condensation,

hydrolysis, polymerisation, reduction

34

ACS Paragon Plus Environment

Page 35 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 20 21 2 22 3 23 4 24 25 5 26 6 27 28 7 29 8 30 9 31 32 10 33 11 34 35 12 36 37 38 39 40 41 42 43 44 45 46 47

S4


1


low


acylation,

alkylation,

complexation,

condensation, HC,SN2,SNAr

no

18

high


acylation,

alkylation,

complexation,

condensation,


1

1.5

high

3

In this table each row corresponds to one rule, each column starting from the third one to a predictor variable and the last column to the output class. The grey areas indicate when a predictor variable is not present at the corresponding design stage. Considering the categorical predictor variables, reaction and mechanism, the logical rule operation for these predictors corresponds to “OR”. The relation between the different predictors is given by the logical operator “AND”. For example, the second rule of S1 can be formulated as follows: IF the reaction type is equal to acylation OR alkylation OR complexation OR condensation OR hydrolysis OR polymerisation OR reduction, AND the mechanism is equal to HC OR SN1 OR SN2 OR SNAr THEN the steam consumption is high. Reaction mechanisms are included only in cases where at least one of the reaction types can undergo through more than one mechanism. In this way we consider the reaction mechanism as additional information to the reaction type. This choice is consistent with the predictor importance depicted in Figure S3 in the Supporting Information, which shows a higher relevance of the reaction type compared to the mechanism at the five stages of process design.

35

ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 36 of 59

1

Table 4: Empirical statistics and probability density function (PDF) model results per reaction type. The values are given in kilograms of steam

2

consumption per kilogram of product.

3 Reaction type

Parameterisation

Empirical values n

Acylation Time18h Acylation

median

min

Model

PDF max 1

33

1.6

0.0

9.3

gamma

21

0.9

0.0

6.0

gamma

Fitted values

parameters

2

p1

p2

median 25th

75th

2.5th 97.5th

0.6

4.1

1.2

0.3

3.2

0.0

11.1

0.5

3.8

0.7

0.14

2.3

0.0

9.1

0.9

0.8

2.5

1.5

4.4

0.5

12.3

12

2.4

0.79

9.3

lognormal

22

0.0

0.0

0.8

lognormal

-4.1

2.9

0.0

0.0

0.1

0.0

4.9

33

2.5

0.0

11.8

gamma

0.6

5.2

1.7

0.5

4.2

0.0

14.4

12

1.0

0.0

3.4

gamma

0.4

2.3

0.3

0.1

1.2

0.0

5.1

(cyanur chloride) Alkylation no distillation distillation

3

21

3.0

0.3

11.8

weibull

4.9

1.5

3.8

2.1

6.1

0.4

11.7

Azo-coupling

25

0.1

0.0

5.3

gamma

0.1

6.6

0.0

0.0

0.2

0.0

6.4

Complexation

9

2.8

0.2

14.9

exponential4

0.23

3.1

1.3

6.1

0.1

16.3

Time

Gate-to-Gate Energy Consumption in Chemical Batch Plants

Recommend Documents