A Dynamic Response Surface Model for Polymer Grade Transitions in

Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE

Process Systems Engineering

A Dynamic Response Surface Model for Polymer Grade Transitions in Industrial Plants Zhenyu Wang, and Christos Georgakis Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.8b04491 • Publication Date (Web): 14 Feb 2019 Downloaded from http://pubs.acs.org on February 15, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

248x194mm (96 x 96 DPI)

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1: Example of the process responses (lower) to two different input profiles (upper) parameterized by 3 sub-factors in the Academic simulation. The one in the left shows small overshoot 203x152mm (120 x 120 DPI)


Page 2 of 50

Page 3 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 2: Comparison of the nominal predictions of the quadratic DRSM model, estimated from the data of regular (―, blue) and the augmented (---, green) design of experiments for the Academic simulation against the simulated data (●). The corresponding 95% prediction intervals of the models (···) are plotted in two doted lines. 185x138mm (96 x 96 DPI)



Figure 3: Comparison of the nominal predictions of the cubic DRSM model, estimated from the data of regular (―, blue) and the augmented (---, green) design of experiments for the Academic simulation against the simulated data (●). The corresponding 95% prediction intervals of the models (···) are plotted in two doted lines 185x138mm (96 x 96 DPI)


Page 4 of 50

Page 5 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 4: Residual analysis plots of quadratic (blue) and cubic (red) DRSM models estimated using augmented design for the Academic simulation 203x152mm (120 x 120 DPI)



Figure 5: Residual analysis plots of 2FI DRSM model estimated for the Industrial simulation 203x152mm (120 x 120 DPI)


Page 6 of 50

Page 7 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 6: Residual analysis plots of quadratic (in blue) and cubic (in red) DRSM models estimated using augmented design for the Industrial simulation 203x152mm (120 x 120 DPI)



Figure 7: Comparison of the quadratic (-·-) and cubic (―) DRSM models for the Industrial simulation. The data (--) is plotted as well. 203x152mm (120 x 120 DPI)


Page 8 of 50

Page 9 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 8: Average distances between each of the four cross-validation runs to the n nearest experiments (Euclidean distances between the factor values) 203x152mm (120 x 120 DPI)



A Dynamic Response Surface Model for Polymer Grade Transitions in Industrial Plants Zhenyu Wang and Christos Georgakisa Department of Chemical and Biological Engineering and Systems Research Institute for Chemical and Biological Processes Tufts University, Medford, MA 02155 USA

Abstract The Dynamic Response Surface Methodology (DRSM) has recently been applied to accurately model the dynamics of polymer grade transitions (Wang and Georgakis, Ind. Eng. Chem. Res. 2017). Following this publication, we propose here two methodological improvements which greatly reduce the required number of experiments and still achieve similar model accuracy. First, a novel design of the experiments is defined to combine the previous separate input domains incorporating transitions that increase and decrease a measure of polymer grade. This enables using a single DRSM model, instead of the prior two separate ones, and reduces the number of experiments by half. Furthermore, a sequential modeling strategy is proposed, appending experiments to an initial simpler design to estimate a more complex DRSM model. As the desired DRSM model complexity is not known a priori, the sequential modeling strategy is critical in achieving the desired model accuracy with the minimum number of experiments. These methodological advances have been tested against an academic and an industrial process simulation. Keywords: Data-Driven Modeling, Dynamics, Response Surface Methodology, Nonlinear Process, Polymerization

a

Corresponding Author: Science and Technology Center, Tufts University, 4 Colby Street, Medford, MA 02155; Phone: 617-627-2573 e-mail: [email protected] Page 1 of 40


Page 10 of 50

Page 11 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1. Introduction Polyolefins of different grades, whose quality is often measured by their Melt Flow Index (MFI) value, are indispensable modern materials. A single continuous plant is used to produce polymers with different MFI values under different operating conditions. As the polymers produced during the grade transition periods, say from product A to product B, are not meeting the product specification of either A or B, they are considered off-spec product and their amount should be minimized. The optimization of the polymer grade transition relies on accurate mathematical models. Due to the complex mechanism of the polymerization processes, large-scale knowledge-driven models consisting of Differential Algebraic Equations (DAE), have been developed to describe the process behaviors

1-2.

The grade transitions

problem is usually solved using model-based optimization approaches 3-4 necessitating the availability of an accurate knowledge-driven process model. The development of such knowledge-driven models is not possible for processes whose inner details are not understood sufficiently. Alternatively, a data-driven model might be estimated from a set of systematically designed experiments or from historical data. The Design of Dynamic Experiments (DoDE) 5-7, as a generalization of the classic Design of Experiments (DoE) methodology

8-9,

is often used to design experiments with time-varying inputs to estimate a

Response Surface Methodology (RSM) model or its generalized dynamic form, the Dynamic Response Surface Methodology (DRSM) model10-11. The DRSM modeling methodology has been successfully applied to model the polymer grade transitions of an in silico polypropylene process12 whose knowledge-driven model, based on prior publications1, 13, was used to generate the data. This process will be denoted as the academic one in this paper. In our previous publication12, in order to represent the nonlinear process dynamics accurately, two different DRSM models were estimated, one for increasing MFI values and one for decreasing ones.

With such DRSM models at hand, one can use them for both optimization and

control purposes10, saving experimental costs for the development of separate models for these two tasks.

Page 2 of 40



In this paper, we present two methodological improvements of the previously published methodology 12 with the primary aim of reducing the number of experiments. This is motivated by our collaboration with colleagues from a member company of our industrial consortium. We first combine the previous two design domains for increasing and decreasing MFI cases, respectively, into a single domain and develop one DRSM model for all possible MFI transitions, both increasing and decreasing. In this way, the required number of experiments is reduced by half compared to the separate DRSM models discussed previously12. As the model accuracy cannot be ascertained before the model is developed, an additional contribution of this paper is the use of an augmented experiment design approach to reduce the total number of experiments in arriving at the desired model accuracy. The initial set of experiments is designed aiming to estimate a two-factor interaction (2FI) DRSM model, which is the simplest nonlinear DRSM model and requires the smallest number of experiments. If the 2FI DRSM model is not accurate enough, additional runs are appended to enable the estimation of a quadratic model and then again some more experiments to estimate a cubic model, if necessary. We also introduce an analysis of the model residuals to see whether they are independent of the transient time and of the values of the factors in the experiment. This will also help us decide whether a model is accurate enough. To demonstrate the applicability of the proposed approach, we test it against an academic as well as an industrial simulation. The industrial example is a proprietary simulation of an actual polymerization process, similar to the academic one but much more detailed. Colleagues in the member company of our industrial consortium run the industrial simulation for us. They have indicated that it accurately represent industrial operations. The data-driven model we developed was with complete ignorance of the simulation’s details. If the proposed DRSM modeling methodology is capable of accurately representing the polymer grade transitions of this industrial simulation, we can claim with considerable certainty that a similar success can be achieved with real industrial data from a process for which a knowledge-driven model does not exist. Page 3 of 40


Page 12 of 50

Page 13 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


The paper is organized as follows. We first introduce the DRSM methodology. Following that, a new model selection criterion based on residual analysis for arriving at the final model is discussed. Then we propose the combined input domains for both the increasing and decreasing MFI cases. The augmented design of experiments used to reduce their total number is discussed in section 5. In the results section, we first compare the DRSM models obtained for the academic simulation via the newly proposed methodology against the ones estimated in our previous publication12. Then the efficacy of the proposed method is further validated by demonstrating DRSM model’s accuracy against the industrial simulation.

2. The DRSM Methodology We recently proposed a new and improved DRSM methodology12, denoted as DRSM-2, which generalizes the original DRSM methodology11, now denoted as DRSM-1, to model both continuous and batch processes. An example of a quadratic DRSM-2 model is given as follows: 𝑛

𝑦(𝜃) = 𝛽0(𝜃0) +

𝑛

𝑛

𝑛

∑𝛽 (𝜃 )𝑥 + ∑∑𝛽 (𝜃 )𝑥 𝑥 + ∑𝛽 (𝜃 )𝑥 𝑖

𝑖=1

𝑖

𝑖

𝑖𝑗

𝑖𝑗

𝑖 𝑗

𝑖 = 1𝑖 < 𝑗

𝑖𝑖

𝑖𝑖

2 𝑖

(1)

𝑖=1

Here 𝑦(𝜃) is the time-varying output variable of interest and is a function of transformed time variable 𝜃 defined in eq. (3). The variables 𝑥𝑖’s are the traditional DoE factors or the dynamic sub-factors in the DoDE. We will use here the quadratic model class as an example. Other model classes, such as linear, two-factor interaction (2FI), or cubic, are of interest as well. The time-varying parametric function 𝛽𝑞(𝜃𝑞) is expressed as follows: 𝛽𝑞(𝜃𝑞) = 𝛾𝑞,1𝑃0(𝜃𝑞) + 𝛾𝑞,2𝑃1(𝜃𝑞) + ⋯ + 𝛾𝑞,𝑅𝑞 + 1𝑃𝑅(𝜃𝑞)

(2)

with 𝑞 = 0, 𝑖, 𝑖𝑗, 𝑜𝑟 𝑖𝑖. Also 𝑃𝑖( ∙ ) is the 𝑖𝑡ℎ shifted Legendre polynomial, a (𝑖 ― 1)𝑡ℎ order polynomial in the independent variable, 𝜃𝑞, while the 𝛾𝑞′𝑠 are the DRSM model parameters to be estimated. The independent variable, 𝜃𝑞, is an exponential function of the actual time, 𝑡, and is defined as follows:

Page 4 of 40



𝜃𝑞 = 1 ― 𝑒𝑥𝑝 [ ― (𝑡 ― 𝑑𝑞) 𝑡𝑐]

Page 14 of 50

(3)

Here 𝑡𝑐 is an approximate value for the time constant characterizing the long-term dynamic changes in the output variable in response to input changes. In most of the prior estimations of DRSM models the value of 𝑑𝑞 is equal to zero as all factors are active at the beginning of the experiment at 𝑡 = 0. In contrast here, the values for 𝑑𝑞 are not all zero but, instead, represent the input delays from 𝑡 = 0 of when the 𝑞𝑡ℎ input change takes place. As the time 𝑡 increases from 𝑑𝑞 to infinity, the transformed time variable 𝜃𝑞 increases from zero to one, 𝜃𝑞 ∈ [0, 1], the interval in which the shifted Legendre polynomials are defined and are orthogonal with each other. The model parameters, 𝛾𝑞,𝑖, are estimated via stepwise regression by solving the linear least-squares problem defined before12. The three major decision variables in the development of the DRSM model here are the time constant, 𝑡𝑐, the order of shifted Legendre polynomials, 𝑅𝑞, and the model class, 𝐶. The choices of model classes that are considered here are Linear, 2FI, Quadratic and Cubic. The selection of the DRSM model class is mostly influenced by the number of experiments available and the corresponding values of the factors or subfactors at which the experiments were executed. The best values for the other two decision parameters, 𝑡𝑐 and 𝑅𝑞, are selected by examining several combinations and selecting the ones which results in the smallest value of the Bayesian Information Criterion (BIC)14 and also in a nonsignificant Lack-of-Fit (LoF) p-statistic12. On the LoF test the p-value should be larger than a threshold, α, which is usually selected at 0.05. The LoF statistic calculation requires that the number of independent runs is larger than the number of model parameters by at least three, if not more. Furthermore, three or more replicate runs are required so that one can estimate the normal variability of the process. In the DRSM, the number of parameters counted against the number of experiments are the number of 𝛽(𝜃) parametric functions of 𝜃, not the number of 𝛾 parameters.

Page 5 of 40


Page 15 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


3. Residual Analysis for Selecting the Final Model In the previously published model selection criterion12, we mainly focused on the LoF statistic9, which compares the unmodeled sum of squared errors resulting from the parameter estimation against the sum of squares that characterizes the normal variability of the process. If the unmolded sum of squared errors is not significantly larger than the normal process variability, the obtained DRSM model is considered sufficiently accurate. However, in practice, we noticed that the LoF statistic is a very strict criterion. A little lower prediction accuracy in the estimated DRSM model might not be that harmful but it will most likely result into a p-value for the LoF statistic that is significant, namely smaller than 0.05. If the initially selected model class fails to yield an acceptable LoF statistic for all the considered reasonable values of the decision parameters, 𝑡𝑐 and 𝑅, then a more complex model class might be tried, moving, for example, from a quadratic to a cubic DRSM. This necessitates quite a few additional experiments which is not a burden if they are done in silico. However, they are a substantial challenge if they are to be done in the industrial unit. Most importantly, if the eventual use of these DRSM models is in model based feedback control, their accuracy might be judged by its close-loop performance. If the DRSM model cannot be qualified through the LoF statistic alternative performance criteria might be considered. The desired new. Here we utilize the Analysis of Residuals15 as the alternative method for selecting the model class, replacing the previously used LoF statistic. In residual analysis, we first calculate the residual between the measured and predicted values as below 𝑒=𝑦―𝑦

(4)

Where 𝑦 and 𝑦 are the measured and predicted values, respectively. For the industrial simulation, the data, 𝑦 is scaled in the range of [-1, +1] for proprietary reason. But the residual, 𝑒, is calculated directly using eq.(4) without scaling. These residuals resulted from the best DRSM model should be normally distributed and independent of the values of the experimental factors as well as time. The analysis of Page 6 of 40



Page 16 of 50

residuals can be conducted qualitatively or quantitatively. The qualitative approach, such as the scatter plot16, is considered as a direct and often unambiguous basis for judging the accuracy of the estimated model15. Here we will plot the histogram of residuals to examine if the residuals are normally distributed. Scatter plots of the magnitudes of the residuals against the variable magnitudes will be plotted to examine whether they are independent of those variables. The application of the residual analysis will be discussed in detail in the results section.

4. A Combined Input Domain for Polymer Grade Transition The polymerization process is a continuous one in a loop reactor and the high recirculation rate permits us to model it as a CSTR. Hydrogen, monomer, catalyst and solvent are fed into the reactor while the produced polymer, characterized by the Melt Flow Index (MFI), is measured at the outlet of the reactor. The hydrogen concentration in the feeding stream is the manipulated input that controls the average chain length in the polymer product and thus its corresponding Melt Flow Index (MFI) value. The process is highly nonlinear because its settling time and steady state gain varies with state variables and input variables magnitudes as visualized in Figure 3 of the previous publication12. For some details about the polymerization process, please refer to the Appendix. In our previous publication, we used four dynamic sub-factors to parameterize the input profile12. However, feedback from our industrial collaborators has convinced us that an input characterization by only three sub-factors is sufficient for the optimization of the polymer grade transitions. This results in a decrease in the number of experiments required for the estimation of the parameters of a given DRSM model class. The mathematical expression for the input profile’s dependence of time examined in this paper is given by

{

𝑢1 + 𝑎𝑡, 0 ≤ 𝑡 ≤ 𝑡1 𝑢(𝑡) = 𝑢2 ― 𝑎𝑡, 𝑡1 < 𝑡 ≤ 𝑡2 𝑢3, 𝑡2 < 𝑡 ≤ ∞

(5)

Page 7 of 40


Page 17 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


where 𝑢1 and 𝑢3 are the steady-state input values at the beginning and the end of the polymer grade transition, respectively. These guarantee the desired initial and final output values of the MFI. The value of 𝑢2 is an intermediate value that, if properly chosen, helps to achieve the transition in minimum time. If 𝑢3 is larger than 𝑢1, then 𝑢2 is higher than 𝑢3. Conversely, 𝑢2 is smaller than 𝑢3, if 𝑢3 is smaller than 𝑢1. The input profile 𝑢(𝑡) is piecewise linear, i.e. 𝑢1 +𝑎𝑡1 = 𝑢2 ―𝑎𝑡1 and 𝑢2 ―𝑎𝑡2 = 𝑢3. The ramp rate, 𝑎, has the same magnitude, say 𝑎0, for the increasing and decreasing MFI cases, but its sign varies based on the relative values of 𝑢1 and 𝑢3 as given below. = |𝑎 |, 𝑖𝑓 𝑢 > 𝑢 {𝑎 =𝑎―|𝑎 |, 𝑖𝑓 𝑢 < 𝑢 0

0

3

1

3

(6)

1

Two example input profiles and the corresponding output profiles are given in Figure 1. The durations of the two MFI transitions, one decreasing and one increasing, are 15.4 h and 13.4 h, respectively. The end of the transition is defined by the time instant when the output has reached 98% of its final value. Location of Figure 1 The sub-factors 𝑥𝑖’s for the DoDE experiments vary in the interval of [ ― 1, + 1] and are defined as follows 𝑢𝑖 = 𝑢𝑖,0 + ∆𝑢𝑖𝑥𝑖 for 𝑖 = 1, 2, 3

(7)

Where 𝑢𝑖,0 is the reference value of 𝑢𝑖 while ∆𝑢𝑖 specifies the deviation from the reference value. Additional constraints are needed to eliminate some irrelevant designs of the input profiles. We start with separate constraints for increasing and decreasing MFI cases, respectively,ß and we show that they can be combined into a single set of constraints. First, for the case of increasing MFI, we have the following constraints 𝑥3 ― 𝑥1 > 0 𝑥2 ― 𝑥3 > 0 𝑥2 ― 𝑥3 < 𝛿(𝑥3 ― 𝑥1)

(8)

Page 8 of 40



Page 18 of 50

The first constraint forces the final input value to be larger than the initial input value, which guarantees an increased MFI value at the new steady state. The second constraint forces temporarily the input to higher values that the eventual one. This is motivated by the desire to induce a quicker transition by initially telling the system that we are aiming for a higher final MFI that the eventual final value is. The third constraint, with a properly selected ratio δ, excludes experiments which will exhibit a substantial MFI “overshoot” during the transient period. The “overshoot” here refers to the phenomenon that the MFI value during the transition exceeds the final steady-state MFI. It includes two cases: the transient MFI value larger than the final MFI value for the operations increasing MFI and the transient MFI value lower than the final value for the decreasing MFI operations. The runs with large “overshoot” are not desired and their absence from the modeling data will increase the accuracy of the resulting model for the most reasonable transitions. The value of 𝛿 can be selected by prior process knowledge or historical data. Here it is set to 𝛿 = 1. Therefore, constraints in eq. (8) is rewritten as 𝑥3 ― 𝑥1 > 0 𝑥2 ― 𝑥3 > 0 2𝑥3 ― 𝑥1 ― 𝑥2 > 0

(9)

Software used to design a set of experiments, such as MatLab 17, do not allow strict inequality constraints. Therefore, the above constraints will lead to the selection of a few irrelevant experiments, such as the one characterized by 𝑥1 = 𝑥2 = 𝑥3 representing no transition at all. To avoid these irrelevant runs, we modify the above constraints as follows: 𝑥 3 ― 𝑥 1 > 𝜀1 𝑥 2 ― 𝑥 3 > 𝜀1 ― 𝑥1 ― 𝑥2 + 2𝑥3 > 𝜀1

(10)

Here we choose 𝜀1 = 0.1. Other values for 𝜀1 can be chosen, based on process knowledge or information from historical data. The corresponding constraints for the transition to a new steady state with smaller final MFI value are given by:

Page 9 of 40


Page 19 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


𝑥 1 ― 𝑥 3 > 𝜀1 𝑥 3 ― 𝑥 2 > 𝜀1 𝑥1 + 𝑥2 ― 2𝑥3 > 𝜀1

(11)

By combining the constraints in eq. (10) and (11), we obtain the following constraints for both increasing and decreasing MFI.

(𝑥3 ― 𝑥1)(𝑥2 ― 𝑥3) > 𝜀2 (𝑥3 ― 𝑥1)[𝑥1 + 𝑥2 ― 2𝑥3] < ― 𝜀2

(12)

Here we set 𝜀2 = 𝜀21 = 0.01. In the input domain specified by constraints in eq. (12), we design experiments using the MatLab Statistics and Machine Learning toolbox17. We will start with the simplest nonlinear DRSM model, the 2FI one, and augment these experiments with additional ones to be able to estimate a more detailed model, quadratic or cubic. This is compared to the design approach where no augmentation takes place and each design, quadratic or cubic, is defined from scratch.

5. Augmented Design of Experiments In practice, one does not know how accurate the model of a certain class, linear 2FI, quadratic or cubic, might be until the corresponding experiments are conducted and the model is developed. To minimize the initial number of experiments, one might start with a design for a simpler model, say a two-factor interaction (2FI) model. If a 2FI model does not meet the expectation on the model accuracy, one has to design an absolutely new set of experiments for the estimation of a more complicated model, such as a quadratic model using a regular optimal design algorithm. This wastes the experimental effort for the initial 2FI design. To utilize the data collected from the prior simpler design for the estimation of a more complex model, one can augment the existing experiments with a minimal number of additional runs. The regular and augmented design optimize the same optimal criterion9. Here we choose the D-Optimality criterion as given below Page 10 of 40



𝐗 = arg 𝑚𝑎𝑥|𝐗′𝐗|

Page 20 of 50

(13)

where the 𝑀 × 𝑝 matrix 𝐗 represents the values of the factors at the M experiments used to estimate a DRSM model with 𝑝 parametric functions, 𝛽𝑞(𝜃𝑞). For example, estimating a linear DRSM model with n factors, the mth row representing the mth experiment in 𝐗 is given by 𝒙 = [1 𝑥𝑚,1 𝑥𝑚,2⋯𝑥𝑚,𝑛]. In the augmented design, the prior set of experiments are fixed. Only the additional experiments will be selected to maximize the optimality criterion. Therefore, the augmented design has a lower optimality value compared to the regular design, designed from scratch. However, the lower optimality value might not necessarily lead to significant lower model accuracy as discussed in section 6.1. The augmented experiments as well as the initial ones can be designed using commercial software, such as MATLAB17 and JMP18. For the detailed algorithm regarding the augmented design of experiments please refer to the following publications19-20. The numbers of experiments of the regular designs for estimating the 2FI, quadratic or a cubic DRSM model are listed in the first row of Table 1 while the ones of the augmented designs are given in the second row. The 2FI design, as the initial set of experiments, is identical for both the regular and augmented designs. It consists of 13 experiments. This includes 7 experiments for estimating the same number of parametric function, 𝛽𝑞(𝜃𝑞), 3 additional distinct experiments enabling the calculation of the LoF statistic and 3 replicates to quantify the process variability. The regular design for a quadratic model, consisting of 16 experiments, is selected independently from the existing 2FI design. However, in the specific design we performed, there are 4 experiments in the quadratic design that are identical to the existing 2FI design. Therefore, the additional number of experiments to be conducted is 12 (= 16-4) and is reported parenthetically in column 3 of Table 1. The factors of these 4 prior existing experiments are: (-1.00, 1.00, 0.95), (0.05, -1.00, 0.50), (1.00, -1.00, -0.95) and (1.00, -0.95, 0.00). For the regular design of a cubic DRSM, the number of experiments is 26 as there are 10 additional cubic terms of the type 𝑥2𝑖 𝑥𝑗, 𝑥𝑖𝑥2𝑗 ,𝑥3𝑖 and 𝑥1 𝑥2𝑥3. Here again, and for our specific design, there are 6 prior existing experiments with the following Page 11 of 40


Page 21 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


values of the factors: (-0.75, -1.00, -0.95), (1.00, -1.00, -0.95), (0.35, 1.00, -0.35), (-0.75,-1.00,-0.95), (1.00,-0.75,-0.80) and (-1.00,-0.75,-0.80). The last two experiments are one of the three pairs of replicated experiments. Thus the actually needed additional experiments here were 20, noted parenthetically in Table 1. The total number of experiments needed for the independent estimation of the three DRSM models are 55 and because of the existence of identical experiments in our specific designs, there are actually 45 distinct experiments that are needed, noted parenthetically. Turning our attention to the augmented designs, the additional experiments for estimating a quadratic model from those of the existing 2FI design are only 3, for the pure quadratic regression terms 𝑥2𝑖 . The additional experiments needed for estimating a cubic model given the quadratic design are 10. It is thus clear that the total number of experiments to arrive at the cubic design via the augmented approach is 26, reported in the last column of Table 1. This is about half of the 55 (or 45) experiments of the regular design. In order to reduce the duration of the plant experiments, the augmented design appears to be the most desirable way to proceed as we do not know a priori what type of DRSM model will yield sufficient accuracy. Location of Table 1 The values of the three factors of the designed experiments for the 2FI model as well as the augmented runs to estimate a quadratic or a cubic model are given in Table 2. The last four rows of the table are four cross-validation runs to examine the prediction accuracy of the obtained DRSM models and are denoted as CV-i for i = 1, 2, 3, 4. Among them, two experiments, CV-1 & 2 are for validating the increasing MFI case while the other two, CV-3 & 4, are for examining the decreasing MFI case. These cross-validation runs were designed using the augmentation task after the runs of the cubic design were selected. This ensures that these four runs are quite different from the ones used in estimating any of the DRSM models. This approach ensures that the most stringent set of cross validation runs are selected. A less stringent design

Page 12 of 40



would have been to select the four cross-validation runs by appending them to the set designed for the 2FI model. They might have been less challenging to match for the quadratic and cubic models. Location of Table 2 In addition, we should note that the number of experiments is only determined by the model class and the number of factors and is independent from the number of parameters parameterizing each parametric function 𝛽𝑞(𝜃𝑞), which are only related to the time resolved measurements at each run. We will examine the proposed approach using the academic and the industrial simulations in the following section.

6. Results and Discussion In this section, we examine the efficacy of the proposed methods using two polymerization simulations. One is developed from a literature model12 and is denoted as the Academic Simulation. The reactions involved in the simulation as well as the kinetic constants are listed in the Appendix. The industrial process simulation is owned and was run for us by one of the member companies in our industrial consortium. The industrial simulation is much more complicated than the academic simulation and is very similar to the real process. At present, it serves as the ultimate proof for the efficacy of the proposed method. We first examine modeling the polymer grade transition of the Academic Simulation using a single DRSM model both for increased and decreased MFI values. We confirm that the single DRSM model accurately represents the time-varying output variable in all experiments, compared to the two separate DRSM models for increasing and decreasing MFI values published previously12. Moreover, we demonstrate that the models estimated via augmented and regular designs provide similar prediction accuracy. In addition, we verify that the residual analysis is a reliable criterion for selecting the most accurate model. We then examine the proposed DRSM method against the Industrial Simulation with the augmented experimental designs. This also proves to be quite accurate. Page 13 of 40


Page 22 of 50

Page 23 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


6.1.

Augmented Design vs Regular Design in Academic Simulation

In the DRSM models for the Academic Simulation using the regular experimental designs as well as the augmented designs, the input is parametrized using 3 sub-factors as given in eq. (5). The combined input domain is described in eq. (12). To the simulated output values, 𝑦𝑠, we add fractional normally distributed noise as given below 𝑦𝑚 = 𝑦𝑠(1 + 0.02𝑁(0,1))

(14)

The same fractional noise has been added to all the data for estimating the DRSM models throughout this paper. With the present noise level the analysis of residuals, to be presented in the following, requires a cubic model to have an accurate representation of all the non-random information in the data. If the added noise is increased the most likely result will be that the quadratic or the 2FI model might become accurate enough. With the collected data from the academic simulation, we estimate three classes of DRSM models (2FI, quadratic and cubic model) using the regular designs. Starting with the 2FI model estimated using regular design, we estimate two classes of DRSM models (quadratic and cubic) using the augmented designs. To evaluate the prediction accuracy of the models, we examine them using the four cross-validation experiments given in the last four rows of Table 2. The model accuracy for predicting the cross-validation experiments is measured using the Mean Squared Error (MSE) defined as follows 1 MSE = 𝐾𝑇

𝑀

𝐾𝑚

∑ ∑ (𝑦

𝐷𝑅𝑆𝑀,𝑚,𝑘

― 𝑦𝑚,𝑘)2

(15)

𝑚 = 1𝑘 = 1

Here 𝑦𝐷𝑅𝑆𝑀,𝑚,𝑘 and 𝑦𝑚,𝑘 are the model predicted and measured values at time instance k in the 𝑚𝑡ℎ experiment of the cross-validation set, respectively. The total number of measurements in the cross-

Page 14 of 40



Page 24 of 50

validation set is 𝐾𝑇 = 𝐾𝑚𝑀. 𝐾𝑚 is the number of measurements in each of the 𝑚 experiments and the number of cross-validation experiments, 𝑀, is equals to 4. We report the statistics of the obtained DRSM models in Table 3. It is shown in column 4 and 5 of Table 3 that for the cubic models, the one estimated using regular design has smaller MSE than the one via augmented design (0.0023 < 0.0044). But for quadratic models, the one obtained using regular design has, unexpectedly, a larger MSE compared to the one estimated using the augmented design (0.0390 > 0.0261) as shown in column 2 and 3. These results indicate that the regular design does not necessarily provide a consistent advantage in prediction accuracy over the augmented design. Location of Table 3 A visual comparison of the nominal predictions for the four cross-validation runs of the quadratic DRSM models, obtained using data from the regular and the augmented design of experiments are shown in Figures 2. Figure 3 shows the same but for the cubic DRSM model. The predictions from the regular and the augmented experiments are given in blue solid lines and green dashed lines, respectively. The values of the factors parameterizing each input profile is shown in the title of each sub-figure. The durations of the four runs are different ranging from 12.4 h (lower left) to 16.0 h (lower right). So that we do not crowd the figure, we only plot every 8th measured data point (in red). We also plot, in dot-lines, the 95% prediction intervals of the two models. They are in the same color with the corresponding nominal model predictions. The prediction interval is calculated is a similar way to that of the confidence interval.21 It quantifies the prediction uncertainty for a new observation, not included in the training dataset. For a 100(1-α)% confidence level, it is given by9: 𝑦 ― 𝑡𝛼 2,𝑛 ― 𝑝 𝜎2(1 + 𝒙′0(𝑿′𝑿)

―1

𝒙0 ≤ y ≤ 𝑦 + 𝑡𝛼 2,𝑛 ― 𝑝 𝜎2(1 + 𝒙′0(𝑿′𝑿)

―1

𝒙0

(16)

Here 𝑦 is the predicted normal value for the new observation 𝑦, while 𝒙0 represents the corresponding factor values. To verify the accuracy of the estimated model, the observed experimental values should Page 15 of 40


Page 25 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


fall inside the prediction intervals. It is the case for the cubic models developed using regular and augmented design as shown in Figure 3. While in Figure 2, some of the data fall outside the confidence interval of the model estimated using regular design. The cubic model is clearly more accurate. In addition, though the cubic model estimated using the regular design has a smaller MSE value compared to that of the model estimated using the augmented design, the difference in the prediction accuracy of the two models is not significant (Figure 3). Considering the 43% fewer number of experiments in the augmented design to arrive the final cubic model, we recommend using the augmented design. Location of Figures 2 and 3

6.2.

Residual Analysis for Model Selection in the Academic Simulation

We should note that both the quadratic and cubic models estimated using the augmented design have a significant LoF statistic; the corresponding p-values as shown in Table 3 are almost zero. Strictly speaking, according to the LoF statistic alone, the cubic model is not sufficiently accurate to be selected as the final model, having not modeled all nonrandom information in the data. However, the obtained model appears to provide quite accurate predictions on the cross-validation runs as shown in Figure 3. This indicates that the LoF criterion may be too strict for selecting the final model. Alternatively, the added error in eq. (15) is too small and results in a small sum of squares for the normal variability of the process. This motivates the application of alternative mean of assessing the model accuracy and we here examine the use of residual analysis. Nevertheless, one needs to also remark that if the primary use of the DRSM model is for feedback control purposes, a perfectly accurate model with an insignificant LoF statistic may not be necessary. Instead, a less accurate model, but with uniform accuracy over the entire input domain may be preferable.

Page 16 of 40



Based on the MSE values of the quadratic and cubic models estimated by the augmented design, we already know that the cubic model is a superior to the quadratic. Therefore, the residual analysis, if it is an appropriate approach, should discriminate the cubic model from the quadratic one. In addition, it might also indicate that the cubic model is the final model to be selected. The results of residual analysis for the quadratic and cubic model estimated using augmented design are visualized in Figure 4. The upper left sub-plot indicate whether the residuals resulted by the two models are normally distributed. The subplot on the top right depicts the correlation of the residuals with time while the bottom three sub-plots exhibit the dependencies of the residuals against the values of the factors. Location of Figure 4 As seen in the top right subplot and in the blue color, the residuals by the quadratic model are somewhat dependent on time. The observed residuals are larger during the first 50 time instants. In addition, the residuals are also correlated with the values of the factors. For example, as shown in the bottom left plot, the quadratic residuals (blue) have a nonlinear dependence on the 𝑥1 values, hinting that the quadratic DRSM model estimated using augmented design is not consistently accurate over the entire input domain. In contrast, the residuals by the cubic DRSM model (in red) are independent of time, as seen in the top right sub-plot. and independent of the factor values as seen in the bottom subplots in red. This residual analysis clearly indicates that the cubic model superior to the quadratic one. This analysis of the model residuals is the evidence that the cubic model should be the final model, despite the unsatisfactory LoF result. Because the cubic model requires a substantial number of experiments for its estimation, there is considerable motivation to settle with a less accurate DRSM model such as the 2FI or the quadratic one if it is to be used in a feedback controller, such as a nonlinear MPC. Whether such sacrifice in the accuracy of the DRSM model will be impactful to the closed-loop behavior of the controller is not known at present.

Page 17 of 40


Page 26 of 50

Page 27 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


6.3.

DRSM Model for the Industrial Simulation

In this section we apply the DRSM-2 approach for modeling polymer grade transitions of the industrial simulation provided by our industrial partner. The augmented designs given in Table 2 has been conducted in the industrial simulation. After we were provided with the data from these simulations, we estimated three DRSM models: 2FI, quadratic and cubic. The selected DRSM models with the smallest BIC values are given in Table 4. For confidentiality reasons, the input and corresponding output are coded in the range of [-1, +1]. The sampling interval is 1 time instant. The obtained time constant for DRSM model, 𝑡𝑐, is therefore in time instants as well. Location of Table 4 It can been seen in row 5 of Table 4, that the LoF statistics for the 2FI, quadratic and cubic models are all significant. The corresponding p-values are all almost zero. The final model is not going to be selected based on the LoF statistic. Instead, we select the final model using the analysis of residuals. The results of this residual analysis for the 2FI model is given in Figure 5. Location of Figure 5 The plot of residuals against time on the upper right clearly indicates that they are not of the same size for all times after the start of the transient. The largest residuals are in the first 100 time instants of the transient. The bottom three plots also show that the residuals are dependent on the values of the factors. Therefore, a DRSM model of higher model class should be estimated. The residual analysis results of the quadratic and cubic models are shown in Figure 6. The residuals of both models are normally distributed. However, as seen in the top-right figure, the residuals by the quadratic model (in blue) are still dependent on time. Though the largest residual is smaller than the one by the 2FI model, the residuals during the first 150 time instants are larger than the ones in the remaining Page 18 of 40



time period. In contrast, the residuals of the cubic model (in red) are much smaller than the ones of the quadratic model. More importantly, they are independent of time. In addition, the residuals of the cubic model are also independent of the factors as shown in the bottom three sub-figures. This is not the case for the ones of the quadratic model. According to the results of residual analysis, the cubic model is of sufficient accuracy to be selected as the final model. Location of Figure 6 We further examine the obtained DRSM models using the four cross-validation experiments. The average prediction errors for the cross-validation set by the DRSM models of different model classes are given in row 6 of Table 4. The MSE for the 2FI model is 0.0107, higher than the ones of quadratic and cubic DRSM models, which are 0.0075 and 0.0069, respectively. The comparison of the predictions by the quadratic (in green dashed lines) and cubic (in blue solid lines) DRSM models, as well as the data (in red dashed lines), are plotted in Figure 7. The values of the factors for each run are given in the title of each subfigure. Location of Figure 7 It is seen from this figure that the cubic model is more accurate than the quadratic ones. The cubic DRSM model is quite accurate as the predicted values by the cubic model almost overlaps with the data. We do notice that the predictions for the transient period of the second cross-validation run (top right) is less accurate than the predictions for other runs. There, the predicted behavior of the DRSM model is, unexpectedly, more complex than the simulated run. This is because the second cross-validation run (CV-2) is located the furthest from the experiments used to estimate the model parameters. The augmented design reduces the number of experiments, but with the cost of lower D-optimality value compared to the regular D-optimal design. This might have resulted in the experiments not covering the whole design space to be able to accurately model the CV-2 run. Page 19 of 40


Page 28 of 50

Page 29 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


To demonstrate that the CV-2 locates further away from the training experiments than other crossvalidation runs, we calculate two types of Euclidian distances for each of the four cross-validation runs. The first one is the distance between each of the cross-validation runs and the “center point” of the designed experiments. While the second one is the average distance between each of the cross-validation runs and the n nearest experiments. Though we express the design domain in a set of generalized constraints, the design domains for increasing and decreasing MFI cases are not compact. So there are two “center points”, one for each case. Such “center points” are calculated by taking the arithmetic mean of each of the three sub-factors in the set of experiments for increasing or decreasing MFI. The center of increasing MFI runs is 𝒙𝑖𝑛 = (0.50, ― 0.70, ― 0.42 ) and that for decreasing MFI runs 𝒙𝑑𝑒 = ( ― 0.55, 0.62, 0.34 ). The distances of the four cross validation runs from these “center points” are calculated as follows

‖𝒙𝐶𝑉 ― 𝑖 ― 𝒙𝑖𝑛‖2 𝑓𝑜𝑟 𝑖 = 1, 2 𝐿𝐶𝑉 ― 𝑖 = ‖𝒙 𝐶𝑉 ― 𝑖 ― 𝒙𝑑𝑒‖2 𝑓𝑜𝑟 𝑖 = 3, 4

{

(17)

The 𝒙𝐶𝑉 ― 𝑖 is a 3 × 1 column vector with the values of the sub-factors in the 𝑖𝑡ℎ cross-validation run. The calculated distances for the four cross-validation runs are 0.23, 0.70, 0.61 and 0.34, respectively. It is clear that the second cross validation run is the further away from the center point. In addition, we calculate the average Euclidian distances of the each of the cross-validation runs from the 𝑛 nearest modeling experiments for 𝑛 = 1, 2, 3, 4, 5 and plot these distances in Figure 8. The distances associated with the CV-2 is given in blue dots. It is seen that the distance between CV-2 and the nearest designed experiment is at least two times that between other cross-validation runs and their corresponding nearest experiments. Though CV-3 has a similar distance from the center point of the modeling experiments as the CV-2 does, its distances from the nearest 1 or 2 neighbors are only half the distances of CV-2. Therefore, the CV-2 might be on a location of the design space near which there might

Page 20 of 40



not have been enough modeling runs. This explains why the prediction for CV-2 is less accurate than the predictions for other cross-validation runs in the industrial simulation. Location of Figure 8

7. Conclusions Even though the Dynamic Response Surface Methodology (DRSM) has been successfully applied to model polypropylene grade transitions12, the present paper has presented significant improvements in the modeling methodology. We proposed ways to combine the input domains of both increasing and decreasing MFI cases and thus reduced the number of experiments by half. To further reduce the number of experiments required to arrive at the final DRSM model, we discuss the utilization of the augmented design, which utilizes the experiments, used in the estimation of a simpler DRSM model, and only appends a minimal number of additional experiments to enable the estimation of a more complex DRSM model. In addition, we also proposed a model evaluation approach using the analysis of residuals. The efficacy of the proposed methods has been examined against an academic simulation and a more complex industrial simulation. It has been shown that the obtained cubic DRSM model accurately represents the dynamic behaviors of the polymer grade transitions in all of the modeling runs as well as in the cross-validation runs. This demonstrates that the DRSM approach is an effective data-driven modeling methodology. We do notice that the DRSM model prediction for the second cross-validation run in the industrial simulation, denoted as CV-2 here, is less accurate. This is partially because CV-2 has the largest distance than any of the other cross-validation runs to the closest modeling experiments. It is likely that the augmented design, with a lower optimality value compared to the regular design, might not have covered the full design space adequately enough This is possibly the price we have paid for using the minimum number of modeling runs through the augmented design. Furthermore, for each type of model, we used only three modeling runs more than the number of 𝛽𝑞(𝜃) parametric functions. This was done Page 21 of 40


Page 30 of 50

Page 31 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


not for the purpose of reducing the simulation task, but we envision the application of the methodology in a real industrial process where there is a need to limit the experimentation time.

8. Acknowledgements We are thankful for the financial support by the industrial partners in the Data-Driven Modeling Consortium at Tufts University. The authors also gratefully acknowledge the input and feedback provided by Dr. Cara Touretzky, Dr. Kiran Sheth and Dr. David Schmidt, at ExxonMobil.

9. Nomenclature DRSM MFI DAE DoDE DoE LoF CV MSE 𝑦 𝑥 𝜃 𝛽 𝑃𝑖 𝛾 𝑑 𝑢 𝑎

Dynamic Response Surface Methodology Melt Flow Index Differential Algebraic Equations Design of Dynamic Experiments Design of Experiments Lack-of-Fit Cross Validation Mean Squared Error Output variable DoE or DoDE factor Transformed time variable Parametric function of DRSM model i-th shifted Legendre polynomial DRSM model parameter to be estimated Time delay Hydrogen concentration Changing rate of hydrogen concentration

Page 22 of 40



10. Appendix Here we detail the reactions involved in the Academic simulation of the propylene polymerization process. This simulation includes five types of reactions; chain initiation, chain propagation, chain transfer, site activation and site deactivation. The reactions are described in column 2 of Table 5 while the kinetic constants and the corresponding units are given in column 3 and 4, respectively. Position of Table 5

Page 23 of 40


Page 32 of 50

Page 33 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


11. References 1. Zacca, J. J.; Ray, W. H., Modeling of the liquid-phase polymerization of olefins in loop reactors. Chemical Engineering Science 1993, 48 (22), 3743-3765. 2. Dunnebier, G.; van Hessem, D.; Kadam, J. V.; Klatt, K. U.; Schlegel, M., Optimization and control of polymerization processes. Chemical Engineering & Technology 2005, 28 (5), 575-580. 3. Shi, J.; Biegler, L. T.; Hamdan, I., Optimization of grade transitions in polyethylene solution polymerization processes. Aiche Journal 2016, 62 (4), 1126-1142. 4. Cervantes, A. M.; Tonelli, S.; Brandolin, A.; Bandoni, J. A.; Biegler, L. T., Large-scale dynamic optimization for grade transitions in a low density polyethylene plant. Computers & Chemical Engineering 2002, 26 (2), 227-237. 5. Georgakis, C., Design of Dynamic Experiments: A Data-Driven Methodology for the Optimization of Time-Varying Processes. Industrial & Engineering Chemistry Research 2013, 52 (35), 12369-12382. 6. Wang, Z.; Georgakis, C., An in silico evaluation of data‐driven optimization of biopharmaceutical processes. AIChE Journal 2017, 63 (7), 2796-2805. 7. Wang, Z.; Georgakis, C. In On the Performance of DoDE in a class of in silico Fermentation Processes and the Impact of the Input Domain, 12th IFAC Symposium on Computer Applications in Biotechnology, Mumbai, India, 16-18, 2013, December; Mumbai, India, 2013. 8. Box, G. E. P.; Wilson, K. B., On the experimental attainment of optimum conditions. Journal of the Royal Statistical Society Series B-Statistical Methodology 1951, 13 (1), 1-45. 9.

Montgomery, D. C., Design and Analysis of Experiments. 8th ed.; Wiley: New York, 2013.

10. Wang, Z.; Klebanov, N.; Georgakis, C., DRSM Model for the Optimization and Control of Batch Processes. In Dynamics and Control of Process Systems, including Biosystems/IFAC2016, Trondhelm, Norway, 2016. 11. Klebanov, N.; Georgakis, C., Dynamic Response Surface Models: A Data-Driven Approach for the Analysis of Time-Varying Process Outputs. Ind. & Eng. Chem. Res. 2016, 55 (14), 4022-4034. 12. Wang, Z.; Georgakis, C., New Dynamic Response Surface Methodology for Modeling Nonlinear Processes over Semi-infinite Time Horizons. Ind. & Eng. Chem. Res. 2017, 56 (38), 10770-10782. 13. de Lucca, E. A.; Filho, R. M.; Melo, P. A.; Pinto, J. C., Modeling and Simulation of Liquid Phase Propylene Polymerizations in Industrial Loop Reactors. Macromolecular Symposia 2008, 271 (1), 8-14. 14.

Bishop, C. M., Pattern Recognition and Machine Learning. Springer: 2006.

15. Straume, M.; Johnson, M. L., Analysis of Residuals: Criteria for determining goodness-of-fit. Methods in enzymology 1992, 210, 87-105. 16. Armitage, P.; Berry, G.; Matthews, J. N. S., Statistical methods in medical research. 4th ed.; Blackwell Science: 2002. 17. MathWorks, Statistics and Machine Learning Toolbox™ User's Guide (2017a). Natick,MA, MathWorks Inc. 2017. 18.

JMP®, V., SAS Institute Inc.,: Cary, NC, 1989-2007. Page 24 of 40



19. Meyer, R. D.; Steinberg, D. M.; Box, G., Follow-up designs to resolve confounding in multifactor experiments. Technometrics 1996, 38 (4), 303-313. 20.

Atkinson, A. C.; Donev, A. N., Optimum Experimental Designs. Clarendon Press: 1992.

21. Ogunnaike, B. A., Random phenomena: fundamentals of probability and statistics for engineers. CRC Press: 2009.

Page 25 of 40


Page 34 of 50

Page 35 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


List of Tables Table 1: Number of experiments of regular and augmented designs for estimating DRSM models of different model classes Table 2: List of sub-factors in augmented experiments for input parameterized by 3 sub-factors Table 3: Comparison of DRSM models estimated using regular and augmented designs of the academic simulation. Table 4: Single DRSM models estimated using augmented designs with smallest BIC values for the industrial simulation Table 5:Kinetic mechanism and rate constants of the academic simulation

Page 26 of 40



Page 36 of 50

Table 1: Number of experiments of regular and augmented designs for estimating DRSM models of different model classes Regular Design Augmented Design

2FI 13 13

Quadratic, 12 3

Page 27 of 40


Cubic 20 10

Total 45 26

Page 37 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Table 2: List of sub-factors in augmented experiments for input parameterized by 3 sub-factors 𝑥1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

𝑥2

Initial Design, 2FI -1.00 0.05 -1.00 0.95 -1.00 1.00 -1.00 1.00 -0.75 -1.00 0.05 -1.00 1.00 -1.00 -0.35 1.00 1.00 -0.95 1.00 -0.05 Augmented Experiments for Quadratic 1.00 -1.00 0.75 1.00 -0.15 0.10 Augmented Experiments for Cubic -1.00 1.00 0.50 -0.55 1.00 0.05 -0.15 -1.00 -0.40 1.00 -0.40 1.00 0.35 0.60 1.00 -1.00 -1.00 -0.75 0.55 -0.55 Cross-Validation Set -1.00 1.00 -0.80 0.60 0.50 -0.50 1.00 -1.00

Page 28 of 40


𝑥3 0.00 0.00 0.05 0.95 -0.95 -0.50 -0.95 0.35 0.00 0.00 -0.50 0.95 0.05 0.70 -0.05 0.50 -0.95 0.95 0.65 0.55 -0.65 -0.80 -0.50 0.50 0.10 -0.30 -0.05


Page 38 of 50

Table 3: Comparison of DRSM models estimated using regular and augmented designs of the academic simulation. Model 𝑡𝑐 𝑅𝑞 BIC LoF MSE

Quadratic, Regular 3 9 -1.37 x 10+3 < 10-16 0.0390

Quadratic, Augmented 3 9 -1.27 x 10+3 < 10-16 0.0261

Cubic, Regular 3 9 -7.38 x 10+3 0.051 0.0023

Page 29 of 40


Cubic, Augmented 3 9 -7.24 x 10+3 6.2 x 10-9 0.0044

Page 39 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Table 4: Single DRSM models estimated using augmented designs with smallest BIC values for the industrial simulation Model 𝑡𝑐 𝑅𝑞 BIC LoF MSE

2FI 25 10 -1.1 x 104 < 10-16 0.0107

Quadratic 25 10 -1.3 x 104 < 10-16 0.0075

Page 30 of 40


Cubic 25 11 -3.7 x 104 2.1 x 10-15 0.0069


Page 40 of 50

Table 5:Kinetic mechanism and rate constants of the academic simulation Reaction Step

Reaction

Site Activation

𝑆 𝑃 + 𝐴 𝑃0

Chain Initiation

𝑃0 + 𝑀 𝑃1

Chain Propagation

𝑃𝑖 + 𝑀 𝑃𝑖 + 1

𝑘𝐴

𝑘𝑃

𝑘𝑃

𝑘𝑇𝐻

𝑃𝑖 + 𝐻2 Chain Transfer

Rate Constant

𝑃𝑖 + 𝑀

𝑘𝑇𝑀

Unit

𝑘𝐴 = 7.04 ∙ 102𝑒

―12000 𝑅 𝑇 𝑔

l/gmol/s

𝑘𝑃 = 6.30 ∙ 108𝑒

―10000 𝑅 𝑇 𝑔

l/gmol/s

𝑘𝑃 = 6.30 ∙ 108𝑒

―10000 𝑅 𝑇 𝑔

l/gmol/s

𝑃0 + 𝑄 𝑖

𝑘𝑇𝐻 = 2.22 ∙ 1010𝑒

―14000 𝑅 𝑇 𝑔

l/gmol/s

𝑃1 + 𝑄 𝑖

𝑘𝑇𝑀 = 2.76 ∙ 107𝑒

―14000 𝑅 𝑇 𝑔

l/gmol/s

𝑘𝑇𝑆 = 1.72 ∙ 103𝑒

―14000 𝑅 𝑇 𝑔

1/s

𝑘𝐷 = 7.92 ∙ 103𝑒

―12000 𝑅 𝑇 𝑔

1/s

𝑘𝑇𝑆

𝑃𝑖 𝑃0 + 𝑄 𝑖 𝑘𝐷

Site Deactivation

𝑃𝑖 𝑆 𝐷 + 𝑄 𝑖 𝑘𝐷 𝑃0 𝑆 𝐷

Page 31 of 40


Page 41 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


List of Figures Figure 1: Example of the process responses (lower) to two different input profiles (upper) parameterized by 3 sub-factors in the Academic simulation. The one in the left shows small overshoot Figure 2: Comparison of the nominal predictions of the quadratic DRSM model, estimated from the data of regular (―, blue) and the augmented (---, green) design of experiments for the Academic simulation against the simulated data (●). The corresponding 95% prediction intervals of the models (···) are plotted in two doted lines. Figure 3: Comparison of the nominal predictions of the cubic DRSM model, estimated from the data of regular (―, blue) and the augmented (---, green) design of experiments for the Academic simulation against the simulated data (●). The corresponding 95% prediction intervals of the models (···) are plotted in two doted lines. Figure 4: Residual analysis plots of quadratic (blue) and cubic (red) DRSM models estimated using augmented design for the Academic simulation Figure 5: Residual analysis plots of 2FI DRSM model estimated for the Industrial simulation Figure 6: Residual analysis plots of quadratic (in blue) and cubic (in red) DRSM models estimated using augmented design for the Industrial simulation Figure 7: Comparison of the quadratic (-·-) and cubic (―) DRSM models for the Industrial simulation. The data (--) is plotted as well. Figure 8: Average distances between each of the four cross-validation runs to the n nearest experiments (Euclidean distances between the factor values)

Page 32 of 40



Figure 1: Example of the process responses (lower) to two different input profiles (upper) parameterized by 3 sub-factors in the Academic simulation. The one in the left shows small overshoot.

Page 33 of 40


Page 42 of 50

Page 43 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 2: Comparison of the nominal predictions of the quadratic DRSM model, estimated from the data of regular (―, blue) and the augmented (---, green) design of experiments for the Academic simulation against the simulated data (●). The corresponding 95% prediction intervals of the models (···) are plotted in two doted lines.

Page 34 of 40



Figure 3: Comparison of the nominal predictions of the cubic DRSM model, estimated from the data of regular (―, blue) and the augmented (---, green) design of experiments for the Academic simulation against the simulated data (●). The corresponding 95% prediction intervals of the models (···) are plotted in two doted lines.

Page 35 of 40


Page 44 of 50

Page 45 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 4: Residual analysis plots of quadratic (blue) and cubic (red) DRSM models estimated using augmented design for the Academic simulation

Page 36 of 40



Figure 5: Residual analysis plots of 2FI DRSM model estimated for the Industrial simulation

Page 37 of 40


Page 46 of 50

Page 47 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 6: Residual analysis plots of quadratic (in blue) and cubic (in red) DRSM models estimated using augmented design for the Industrial simulation

Page 38 of 40



Figure 7: Comparison of the quadratic (-·-; green) and cubic (―; blue) DRSM models for the Industrial simulation. The data (---; red) is plotted as well.

Page 39 of 40


Page 48 of 50

Page 49 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 8: Average distances between each of the four cross-validation runs to the n nearest experiments (Euclidean distances between the factor values)

Page 40 of 40



248x194mm (96 x 96 DPI)


Page 50 of 50

A Dynamic Response Surface Model for Polymer Grade Transitions in

Recommend Documents