Subscriber access provided by the Library Service | Stellenbosch University
Process Systems Engineering
Gaussian Process Regression and Bayesian Inference Based Operating Performance Assessment for Multiphase Batch Processes Yan Liu, Xiaojun Wang, Fuli Wang, and Furong Gao Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.8b00234 • Publication Date (Web): 03 May 2018 Downloaded from http://pubs.acs.org on May 5, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
Gaussian Process Regression and Bayesian Inference Based
2
Operating Performance Assessment for Multiphase Batch Processes
3
Yan Liu a, b, c*, Xiaojun Wang d, Fuli Wang a, b,1Furong Gao c
4
(a. College of Information Science & Engineering, Northeastern University, Shenyang, Liaoning, 110819, China;
5
b. State Key Laboratory of Synthetical Automation for Process Industries (Northeastern University), Shenyang,
6
Liaoning, 110819, China;
7
c. Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology,
8
Clear Water Bay, Kowloon, Hong Kong;
9
d. Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University,
10
Dalian, 116622, China)
11
Abstract: Batch processes have been playing a significant role in modern industrial
12
processes. However, even if the operating conditions are normal, the process
13
operating performance may still deteriorate away from optimal level, and this may
14
reduce the benefits of production, so it is crucial to develop an effective operating
15
performance assessment method for batch processes. In this study, a novel operating
16
performance assessment method of batch processes is proposed based on both
17
Gaussian process regression (GPR) and Bayesian inference. It is committed to solving
18
the challenges of multiphase, process dynamics and batch-to-batch uncertainty that
19
contain in most of batch processes. To characterize different dynamic relationships
20
within each individual phase, multiple localized GPR-based assessment models are
21
built firstly. Furthermore, the phase attribution of each new sample is determined, and
22
two different identification results are obtained, i.e., a certain interval and a fuzzy
23
interval between two adjacent phases. Then different online assessment strategies are
24
designed correspondingly. When the operating performance is nonoptimal, cause *
Corresponding author at: College of Information Science & Engineering, Northeastern University,
3 Lane 11, Wenhua Road, Heping District, Shenyang, Liaoning, 110819, China. Tel.: +86 024 83680351; Fax: +86 024 23890912. Email:
[email protected] 1
ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
variables are identified by variable contributions. Finally, the effectiveness of the
2
proposed method is demonstrated by the fed-batch penicillin fermentation process.
3
Keywords: Batch processes, operating performance assessment, Gaussian process
4
regression, Bayesian inference, nonoptimal cause identification
5
1. Introduction
6
Batch processes are commonly used in chemical, materials, pharmaceutical,
7
biotechnology and semiconductor industries for the production of low-volume while
8
high-value-added commodities, and they play a more and more important role in
9
modern industrial production.1,
2
However, as time goes on, process operating
10
performance may deteriorate away from optimal state due to process disturbances,
11
noise, and other uncertainties, and this may cancel the benefits of preliminary designs
12
for process optimization and result in a degraded operating behavior. Therefore, it is
13
very necessary to develop an effective operating performance assessment method for
14
batch processes.
15
In the past several decades, many process and quality monitoring approaches have
16
been proposed for batch processes.3-7 Among them, multiway principal component
17
analysis (MPCA) and multiway partial least squares (MPLS) are most widely used. 6-9
18
They extend the applications of principal component analysis (PCA) and partial least
19
squares (PLS) techniques from continuous processes to batch processes, and allow
20
process variable trajectory information to be projected into low-dimensional latent
21
variable spaces. Therefore, batch process performance and product quality can be
22
easily analyzed and monitored in the reduced space. As the study moving on,
23
extensions to taking into account various factors are available, such as dynamic
24
characteristics, 10, 11 nonlinearity, 12, 13 non-Gaussian distributions, 14, 15 and multiscale
25
16
26
MPCA/MPLS model cannot efficiently reveal the multiphase data behavior, even
27
though it is very common in many batch processes. The phases defined by Undey and
28
Cinar
29
caused by operational or phenomenological (chemical reactions, microbial activities,
of batch processes. However, it has recently been explored that the traditional
17
is that ‘‘Steps occurring in a single processing unit as succession of events
2
ACS Paragon Plus Environment
Page 2 of 37
Page 3 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
18
1
etc.) regimes’’. Furthermore, Lu et al.
defined phases from the perspective of
2
process monitoring, where phases are determined according to the changes in the
3
underlying process correlations. In the present work, we apply the phase definition
4
provided by Lu et al.18 Being different from stages that lay their focuses on describing
5
the physical operation units of a batch process, phases put the emphasis on the
6
expression of process characteristics, such as variable relationships, data distributions,
7
dynamic trajectories, amplitudes of measurements and so on. Compared with the
8
stages, although the phases are more abstract, it helps learn more about the intrinsic
9
process characteristics and enhance process understanding. Therefore, a batch process
10
is desired to be divided into several phases to improve the modeling performance. In
11
recent years, different phase division methods have been proposed, 19-21 and different
12
modeling methods have been developed that take the phase effects into consideration.
13
11, 22, 23
14
work.
The pioneer work has provided abundant theoretical bases for our following
15
In actual processes, the main task of process monitoring is to maintain the
16
production process under normal operating conditions, but it can’t satisfy the quest by
17
enterprises for profits any longer. For most of plants, the production goal is to profit,
18
and an effective way is to ensure that the process operates on optimal level throughout
19
the batch production. By this point, the operating performance assessment of
20
industrial processes came into being. 24 The purpose of process operating performance
21
assessment is to get a measure on how far the current operating condition is from the
22
optimum (or how optimal the current operating state is), and the potential assumption
23
is that the operating conditions are normal. According to the process characteristics
24
and plant personnel’s attitudes of the operating performance, the performance levels
25
can be divided into several grades, such as optimal, suboptimal, general, and poor.
26
Through operating performance assessment, operators and managers can make a
27
deeper understanding and mastering with the process operating performance, and
28
propose reference suggestions on the operating adjustment and performance
29
improvement. Despite a rich body of literatures in process monitoring, 25-27 studies in
30
operating performance assessment of industrial processes are still in its infancy. In our 3
ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
previous work, some methods in respect to continuous and multimode continuous
2
processes have been developed on this issue. 28-30
3
This paper is devoted to the operating performance assessment for multiphase batch
4
processes with the considerations of process dynamics and batch-to-batch uncertainty.
5
It is generally believed that the process operating performance has a close relationship
6
with the comprehensive economic index, such as cost, profit, total revenue, product
7
quality or the weighted integration of several production indices. If the comprehensive
8
economic index approaches to or reaches the history optimal level, the process
9
operating performance is usually considered as optimality. Thus, the comprehensive
10
economic index is applicable to evaluate the process operating performance. However,
11
it is hard to get online and usually obtained at the end of a batch production, which
12
seriously affects the timeliness of online assessment. As an alternative way, the
13
predicted value of the comprehensive economic index can be used in operating
14
performance assessment. Many quality prediction methods of batch processes have
15
been proposed so far.
16
processes, the quality prediction methods depend on the phase division strategies to
17
some extent. For instance, the multivariate statistics based quality predictions are
18
often applied to the batch processes whose phases are divided according to the
19
variable relationships or covariance structures.
20
characterized by non-stationary, batch-to-batch uncertainty and process variables
21
frequently comprise deterministic trends, the process data of the whole batch often
22
present non-Gaussian distribution. When the process variables have different change
23
rates or amplitudes in two adjacent phases and the change rates are not very large in
24
each individual phase, the local process data will show different distributions in a
25
cycle of the batch production. In this sense, the batch processes can be divided into
26
multiple phases based on the distribution characteristics, and each individual phase is
27
actually represented by an approximate multivariate Gaussian distribution of the
28
unfolded process data. This is quite consistent with the methodology of Gaussian
29
mixture model (GMM), and some offline phase division algorithms based on GMM
30
have been established, such as MPCA-GMM, 34 multiway Gaussian mixture model
31, 32
Considering the multiphase characteristics of batch
33
Since batch processes are usually
4
ACS Paragon Plus Environment
Page 4 of 37
Page 5 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
35
and GMM-based phase successive division (GMM-PSD). 36 Under the
1
(MGMM)
2
framework of probability, Gaussian process regression (GPR)
3
appropriate quality prediction method matching GMM. Compared with the
4
deterministic modeling methods, GPR is more suitable for characterizing the complex
5
relationships between the process and quality variables caused by process stochastic
6
feature and system uncertainty.
37
is regarded as an
7
In the present study, with the consideration of process dynamics and batch-to-batch
8
uncertainty, a new operating performance assessment method based on GPR and
9
Bayesian inference is proposed for multiphase batch processes. Considering the
10
advantages of GMM-PSD in dealing with the uneven-length batch processes, it is
11
used in offline phase division firstly. Then multiple local GPR-based assessment
12
models are developed to characterize different dynamic relationships of the identified
13
multiple phases, and the predictive distribution of the comprehensive economic index
14
is estimated by averaging and weighting all the regression model parameter values
15
with their posterior probabilities. This kind of assessment models can not only
16
effectively handle the stochastic feature caused by process dynamics and
17
batch-to-batch uncertainty but also ensure the accuracy of the assessment result. In
18
online assessment, based on the result of offline phase division, the phase type can be
19
identified as a certain interval or a fuzzy interval between two adjacent phases. In a
20
certain interval, the phase type of the new sample can be uniquely identified, and only
21
the corresponding assessment model is invoked for online assessment. While, if the
22
new sample falls into a fuzzy internal between two adjacent phases, the assessment
23
result is the weighted sum of those from two adjacent phases with the Bayesian
24
inference-based posterior probabilities as the adaptive weights. It effectively reduces
25
the error rate caused by misclassification and is very challenging for traditional
26
assessment methods. When the process operating performance degenerates, the
27
possible cause variables can be identified based on variable contributions, which helps
28
managers and operators take appropriate operating adjustment strategy on production
29
improvement. The contributions of the proposed method are summarized as follows: (i) to ensure
30
5
ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 37
1
the accuracy of the assessment result, the GPR-based assessment models are
2
developed to deal with process dynamics and batch-to-batch uncertainty; (ii) for the
3
new sample whose phase type is unknown, its posterior probabilities with respect to
4
the assessment models of two adjacent phases are set as the adaptive weights, and
5
then integrate the corresponding local assessment results for online operating
6
performance
7
misclassification; (iii) the cause variables responsible for the nonoptimal operating
8
performance can be determined by variable contributions.
assessment,
which
avoids
the
error
evaluation
caused
by
9
The remainder of this article is organized as follows. Section 2 briefly reviews the
10
methodologies of GMM and GPR. Then the proposed operating performance
11
assessment method is introduced in Section 3. Section 4 demonstrates the
12
effectiveness of the proposed assessment approach by the fed-batch penicillin
13
fermentation process. Finally, some conclusions are drawn in Section 5.
14
2. Preliminaries
15
2.1 Gaussian mixture model
16
GMM is usually used to model a collection of random variables arising from a
17
number of latent classes or states.38 Consider a J dimensional random variable
18
x R J 1 , the GMM with M components is written as:
19
G ( x Θ) ωm g ( x θm ),
M
(1)
m 1
20
where ωm , m 1, 2,..., M is the prior probabilities of the m th Gaussian component Cm
21
and satisfies the conditions of 0 ωm 1 and
M
ω m 1
m
1 ; θm μm , Σ m and
22
Θ = ω1 , ω2 ,..., ωM , θ1 , θ2 ,..., θM separately represent the local and global Gaussian
23
model parameters; μm and Σ m are the mean vector and covariance matrix of the m th
24
Gaussian component, respectively; g ( x θm ) is the corresponding multivariate
25
Gaussian density function and expressed as follows:
6
ACS Paragon Plus Environment
Page 7 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
g ( x θm )
1
1 (2π )
J 2
Σm
12
1 exp ( x - μm )T Σ m1 ( x - μm ) , m 1, 2,..., M . 2
(2)
2
In order to establish the GMM, the unknown model parameters Θ need to be
3
estimated. Some learning methods, such as maximum likelihood estimation (MLE),
4
expectation maximization (EM), and Figueiredo–Jain (F–J) algorithm, are usually
5
used in parameter estimation of mixture model.
6
algorithm
7
and estimating their statistical distribution parameters, it is adopted for model
8
parameters estimation in this study. With the initialized model parameters
9
Θ(0) = ω1(0) , ω2(0) ,..., ωM(0) , θ1(0) , θ2(0) ,..., θM(0) , the s th two-step iteration involved in the
10
40
39, 40
In view of the ability of F-J
in automatically optimizing the number of Gaussian model components
F-J algorithm is as follows: E-step:
11
P ( s ) (Cm xn )
12
ωm( s ) g ( xn μm( s ) , Σ (ms ) ) M
ω m 1
(s) m
(s) m
(3)
.
(s) m
g ( xn μ , Σ )
M-step:
13
N
( s 1) m
μ
14
P n 1 N
(s)
P
(Cm xn ) xn ,
(s)
n 1
N
Σ (ms 1)
15
P n 1
(s)
ωm( s 1)
(Cm xn )
(Cm xn )( xn μm( s 1) )( xn μm( s 1) )T N
P n 1
16
(4)
, (s)
(5)
(Cm xn )
N max 0, P ( s ) (Cm xn ) V 2 n 1 , M N max 0, P ( s ) (Cm xn ) V 2 m 1 n 1
(6)
17
where xn is the n th sample of the modeling data X = x1 , x2 ,.., x N R N J ;
18
V J 2 2 3J 2 is the number of scalar parameters specifying each Gaussian
19
component; μm( s 1) , Σ(ms 1) , and ωm( s 1) are the mean vector, covariance matrix, and prior
T
7
ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 37
1
probability of the m th Gaussian component at the ( s 1) th iteration, respectively.
2
The F-J algorithm is implemented in an iterative way until all parameters converge to
3
the optimal solution. 40 For an arbitrary new sample xnew , the posterior probability corresponding to the m
4 5
th Gaussian component can be represented as follows:
P(Cm xnew )
6
ωm g ( xnew μm , Σ m ) M
ω m 1
m
.
(7)
g ( xnew μm , Σ m )
7
For a class of multiphase batch processes, where process variables have different
8
change rates or amplitudes in two adjacent phases and the change rates are not very
9
large in each individual phase, the phases can be characterized by their distribution
10
functions. Although some variables show particular trends throughout the batch
11
production, these trends can be significantly weakened in each individual phase. Thus,
12
it can be considered that the unfolded process data from each individual phase
13
approximately follow a multivariate Gaussian distribution, and GMM is applicable to
14
cluster the batch process data from different phases.
15
2.2 Gaussian process regression
16
In many industrial processes, process variables often show stochastic feature owing
17
to process dynamics and batch-to-batch uncertainty. Thus, the deterministic modeling
18
strategies become ill-suited for characterizing the complex relationships between
19
process variables and comprehensive economic index. To deal with this issue, GPR 37
20
is proposed. GPR focus its interest in two aspects: (i) making inferences about the
21
relationship between process inputs and outputs and (ii) obtaining the conditional
22
distribution of the outputs given the inputs, rather than the multivariate distribution of
23
process inputs. Denote the comprehensive economic index data corresponding to X
24
as
25
y = y1 , y2 ,.., yN R N 1 . Assuming that the relationships between process variables
26
and comprehensive economic index are approximately linear, and then the regression
27
model between X and y can be described as below:
T
8
ACS Paragon Plus Environment
Page 9 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
yn f ( xn ) xnT α ,
2
where α R J 1 is the regression parameter vector and follows a Gaussian prior
3
distribution with zero mean and covariance ; is the Gaussian noise with zero
4
mean and standard deviation .37 Moreover, given the process inputs and regression model parameter, the conditional
5 6
(8)
probability density function of the output is represented as follows 37: N
P ( y X , ) g ( yn x n , ) n 1
( yn xnT ) 2 1 exp 2 2 2 n 1 1 2 1 exp 2 y X , 1 2 2 (2 ) N 2 2 I N
7
(9)
( X , 2 I ) 8
where I R N N is a unit matrix.
9
According to the posterior distribution over the regression model parameter, the
10
posterior probability density function can be estimated through Bayesian inference
11
strategy as follows:
P ( X , y )
12
13
P( y X , ) P( ) P( y X , ) P( ) , P( y X ) ( y X , ) ( ) P P d
(10)
and the corresponding distribution can be expressed as P( X , y ) ( , A1 ),
14 15
where A 2 X T X Σ α1 ; 2 A1 X T y and
16
covariance matrix, respectively.
(11)
A1 are the mean vector and
17
In deterministic modeling strategies, a single parameter is typically chosen by some
18
criteria to predict the output of the new sample xnew . However, under the consideration
19
of process stochastic feature, all possible regression model parameters are weighted
20
with their posterior probabilities. Thus, the predictive distribution of output
21
yˆnew f ( xnew ) is deduced as follows: 9
ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
P ( yˆ new xnew , X , y ) P( yˆ new xnew , α )P(α X , y )dα
1
T T ( 2 xnew A1 X T y, xnew A1 xnew ).
(12)
As seen from Eq. (12), the predictive distribution is also Gaussian, and the mean
2 3
Page 10 of 37
T T A1 X T y and xnew A1 xnew , respectively. vector and covariance matrix are 2 xnew
Naturally, the predicted output of xnew is given by
4 5
T yˆ new 2 xnew A1 X T y.
6
3. Gaussian process regression and Bayesian inference based
7
operating performance assessment of multiphase batch processes
(13)
8
Due to the dynamic characteristics and random uncertainty, 19, 20, 22, 41 the stochastic
9
assessment modeling method is more suitable for multiphase batch processes. In this
10
study, a novel operating performance assessment method for multiphase batch
11
processes is developed by integrating GPR with probabilistic inference strategy. The
12
GMM-PSD algorithm is first used to divide different phases of batch processes.
13
Furthermore, the GPR-based assessment models are established to characterize
14
different dynamic relationships of the identified multiple phases, meanwhile, reveal
15
the influences of different underlying process information on the process operating
16
performance. During online assessment, the phase type of the new sample is identified
17
firstly. If the new sample falls into the certain interval of a phase, only the assessment
18
model of the corresponding phase is invoked for online assessment. While, if it
19
belongs to a fuzzy internal between two adjacent phases, the assessment result can be
20
calculated by incorporating those from two adjacent phases, where the Bayesian
21
inference-based posterior probabilities are used as the adaptive weights. In this way,
22
the error evaluation owing to misclassification is avoided effectively. When the
23
process operating performance degenerates, the possible cause variables responsible
24
for the nonoptimality can be identify based on variable contributions.
25
3.1 Establishment of assessment models
26
For a multiphase batch process, I uneven-length training batches are collected
27
from the historical data. The process data of the i th batch are expressed in the form 10
ACS Paragon Plus Environment
Page 11 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
T
1
of X i xi ,1 , xi ,2 ,..., xi , Ki , where Ki is the number of samples of the i th batch,
2
i 1, 2,..., I . The corresponding comprehensive economic index measured at the end
3
of the batch is denoted as yi . In order to remove the process noise and dynamics to
4
some extent, all training batches are unfolded into a two-dimensional block matrix
5
X [ X (Tk 1) , X (Tk 2) , , X (Tk Kmin ) , , X (Tk Kmax ) ]T R K × J via variable-wise unfolding as
6
shown in Fig. 1, where K min min( Ki ) and K max max( Ki ) are the minimum and
7
maximum numbers of samples of the training batches, K Ki . Then the
1i I
1i I
I
i 1
8
normalization approach proposed in Refs. 42 and 43 is applied to X , i.e., xk ,i , j
9
xk ,i , j x j sj
(14)
,
10
where xk ,i , j is the j th variable of the k th sample in the i th batch, x j and s j are mean
11
and standard deviation of the j th variable of X .
X ( k=1) X ( k=2)
X Î R K ×J
X ( k = Kmin )
X ( k = Kmax )
12 13
Fig.1. Illustration of the variable-wise unfolding of uneven-length training batches
Since the sample numbers of the comprehensive economic index and process
14
11
ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 37
1
variables must be the same in modeling, the comprehensive economic index of each
2
batch is extended to a vector by stacking it Ki times repeatedly, and then yi is
3
extended as yi yi , yi ,..., yi RKi 1 . Furthermore, the comprehensive economic index
4
data of all training batches are unfolded as y [ y(Tk 1) , y(Tk 2) ,, y(Tk Kmin ) ,, y(Tk Kmax ) ]T RK×1 .
5
For simplicity, the normalized forms are still denoted as X and y , respectively.
T
6
In multiphase batch processes, if process variables have different change rates or
7
amplitudes in two adjacent phases and the change rates are not very large in each
8
individual phase, the unfolded process data approximately follow a multivariate
9
Gaussian distribution in each individual phase. Hence, GMM-PSD strategy
36
is
10
adopted for offline phase division. Firstly, with the unfolded data X , GMM is
11
estimated through the F-J algorithm as formulated in Eqs. (3)~(6). Then the means of
12
the posterior probabilities of the first h samples of each training batch are calculated
13
with respect to each Gaussian component as follows: P (Cm xi , h )
14
ωm g ( xi , h μm , Σ m ) M
ω m 1
m
,
g ( xi ,h μm , Σ m )
(15)
i 1, 2,..., I , m 1, 2,..., M , h 1, 2,..., h,
Pm
15
1 I h P(Cm xi,h ), m 1, 2,..., M . Ih i 1 h1
(16)
16
The target Gaussian component of phase 1 is determined by searching for the
17
Gaussian component with the largest posterior probability, i.e., m arg max( Pm ) .
18
Thereafter, from the h 1 th sample of each training batch, only the posterior
19
probability with respect to the target Gaussian component of phase 1, Cm , is
20
calculated and used to determine whether the sample is still belonging to phase 1.
21
Based on Bayesian inference, the posterior probabilities of the samples of phase 1
22
should be greater than those with respect to other Gaussian components. Hence, if the
23
posterior probability is greater than a given threshold, it can be sure that the sample
24
belongs to phase 1. In this way, the end moments of phase 1 of each batch are
m
12
ACS Paragon Plus Environment
Page 13 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
determined finally. Because the phase may be uneven-length across different training
2
batches, the end moments of phase 1 are different. As phase 1 is identified, the
3
process data belonging to phase 1 are removed from each batch, and then the
4
aforementioned procedures is applied to identify the second phase. This process is
5
conducted iteratively until all phases are divided. Correspondingly, y is divided into
6
different data sets along with X . The detailed steps of offline phase division by
7
GMM-PSD method can be referred in Ref. 36. Assuming that C phases are identified from offline phase division, the process
8 9
data and comprehensive economic index data of phase c T
are denoted as
T
10
X c x1c , x2c ,..., x Nc c R Nc J
y c y1c , y2c ,..., yNc c R Nc 1 , c 1, 2,..., C
11
respectively. Nc is the number of samples of phase c . Then the GPR-based
12
assessment model of phase c is established and formulated as follows:
and
,
Nc
P ( y c X c , c ) g ( ync xnc , c ) n 1 Nc
n 1
13
( ync xncT c ) 2 1 exp 2( c ) 2 2 ( c ) 2 2 1 y c X c c exp c 2 2( )
1 (2 ) Nc
2
( c ) 2 I
(17)
12
( X c c , ( c ) 2 I ), 14
where c is the regression parameter vector and follows a Gaussian prior distribution
15
with zero mean and covariance Σcα ; c is the standard deviation of the Gaussian noise. For a new sample xnew that belongs to phase c , the predictive distribution of
16
c yˆ new f ( xnew ) can be formulated as follows:
17
c c P ( yˆ new xnew , X c , y c ) P ( yˆ new xnew , α c )P (α c X c , y c )dα c
18
19
(( ) x c 2
T new
c 1
cT
c
(A ) X y , x
T new
c 1
( A ) xnew ),
(18)
where Ac ( c ) 2 X cT X c ( Σαc ) 1 .
20
In modeling, some computations require inversions of covariance matrices.
21
However, it doesn’t necessarily exist in real world scenarios for the collinearity of 13
ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 37
1
process variables. In this case, the Moore–Penrose pseudo inverse 44 is calculated to
2
solve the inverse of a singular matrix.
3
3.2 Online operating performance assessment
4
In online assessment, the basic premise is to choose the correct active assessment
5
model, and this involves the issue of online phase identification for the new sample.
6
As all phases have been identified during the offline phase division, the start and end
7
moments of phase c of the i th batch are known and denoted as kic,in and kic,out ,
8
c ] respectively. Then the maximum interval of phase c is written as [kinc , kout
9
c accordingly, where kinc min(kic,in ) and kout max(kic,out ) are the earliest start and latest
10
end moments of phase c . Since the uneven-length duration also presents in the single
11
c 1 phase across different batches, kinc may be less than kout , c 2, 3,..., C 1 , and this
12
c 1 . further results in an overlapping interval between two adjacent phases, i.e., kinc , kout
13
c 1 , kinc 1 ) , the phase type of the new sample can In the nonoverlapping interval, i.e., (kout
14
be identified uniformly, thus it is called as certain interval; while the phase type is
15
unknown if a sample falls into the overlapping interval of two phases, and this kind of
16
interval is described as fuzzy interval. According to the intervals of each phase, the
17
phase type of the new sample can be determined. Table 1 gives a demonstration of the
18
intervals and the corresponding phase identification results.
1 i I
19
Table 1 Intervals and online phase identification result Interval
Phase type
Phase identification result
[1, k ) [k , k ]
certain interval fuzzy interval
1 1 or 2
2 in
2 in
1 out
c 1 ( k out , kinc 1 )
fuzzy interval certain interval
c 1or c
C 1 C ( k out , k out ]
certain interval
C
c 1 [ kinc , k out ]
c
Supposing that the new sample belongs to phase c certainly, the assessment model
20 21
1 i I
of phase c is thus invoked, and the predicted output of xnew is given by 14
ACS Paragon Plus Environment
Page 15 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
c T yˆ new ( c ) 2 xnew ( Ac )1 X cT y c .
1
(19)
2
In order to simplify the assessment process, the predicted comprehensive economic
3
index should be normalized, and the normalized form is defined as the assessment
4
index and used for online operating performance assessment. Without loss of
5
generality, assuming that the operating performance is optimal when comprehensive
6
economic index is high, and the assessment index, γc , is calculated as follows:
7
c c if yˆ new ymax 1, c c yˆ ymin c c c γc new yˆ new ymax , if ymin , c c y y min max c c ymin 0, if yˆ new
8
c c where ymax max( ync ) and ymin min( ync ) , n 1, 2,..., Nc , are the maximum and
9
minimum values of the comprehensive economic index of phase c , respectively. As
10
seen from Eq.(20), γc is between 0 and 1. When γc is close to 1, it means that the
11
predicted value is close to the optimal value appeared in historical data, and the
12
process operating performance is usually optimal; otherwise, if γc is nearly 0, the
13
process operating performance is likely to be nonoptimal. In order to distinguish
14
optimality from nonoptimality strictly, an assessment index threshold, (0.5 1) ,
15
is introduced. If γc , it means that the process operating performance is optimal. On
16
the contrary, we can say that the process is operating on the nonoptimal performance
17
grade. The value of can be determined through historical data and expert
18
experience. According to the historical data, the maximum and minimum values of the
19
comprehensive economic index can be counted, and then experts who are familiar
20
with the production process can give a dividing value between optimality and
21
nonoptimality. Furthermore, is calculated as in Eq. (20) by replacing the predicted
22
value of the comprehensive economic index with the dividing value.
n
(20)
n
23
c 1 ] , both the Particularly, if the new sample falls into the fuzzy interval [kinc , kout
24
assessment indices γc 1 and γc are calculated, and Bayesian inference-based posterior 15
ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 37
1
probabilities of xnew with respect to phase c 1 and c are set as the adaptive weights to
2
integrate the corresponding local assessment results. The posterior probabilities are calculated as follows:
3
P
4
c 1
ω c 1 g ( xnew μ c 1 , Σ c 1 ) c 1 , ω g ( xnew μ c 1 , Σ c 1 ) ω c g ( xnew μ c , Σ c )
(21)
ω c g ( xnew μ c , Σ c ) , μ c 1 , Σ c 1 ) ω c g ( xnew μ c , Σ c )
(22)
Pc
5
ω c 1 g ( xnew
6
where ω c 1 ωc1 (ωc1 ωc ) and ω c ωc (ωc 1 ωc ) represent the prior probabilities
7
with respect to phase c 1 and c , respectively. Furthermore, the assessment indices can be integrated as follows:
8
γ P c 1γc 1 P c γc .
9
(23)
Then the process operating performance can be evaluated by comparing the
10 11
assessment index γ with the threshold .
12
3.3 Nonoptimal cause identification
13
Although the nonoptimal operating performance is not expected, it may occur due
14
to manual operating error or process characteristics drift in production. Therefore,
15
when the process operating performance is nonoptimal, it is much important to
16
identify the possible cause for actual production processes. Being similar to the
17
contribution plots-based fault diagnosis methods,
18
identification method is proposed based on variable contributions in this section. By
19
approximately decomposing the assessment index into the weighted sum of different
20
process variables, the variable contributions are constructed firstly. Then compare the
21
contribution values with their statistical scopes obtained from the optimal level, and
22
identify the responsible variables according to the given criterion. Since the
23
contribution is a function of the process variable, the nonoptimal cause can be
24
identified as long as the nonoptimality can be reflected through the measurable
25
process variables. Therefore, the proposed method is applicable to nonoptimal
26
conditions caused by sensors and process abnormalities.
45, 46
16
ACS Paragon Plus Environment
the nonoptimal cause
Page 17 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
As known from Eq.(20), the variable contribution to the assessment index γc is
2
c c equivalent to the contribution to yˆ new , so yˆ new is further decomposed into the
3
following form: J
c T yˆ new xnew z c xnew, j z cj ,
4
(24)
j 1
5
where z c = ( c )2 ( Ac )1 X cT y c ; z cj is the j th element of z c , and xnew, j is the j th
6
variable of xnew . Accordingly, the raw contribution of the j th process variable with
7
respect to the assessment index γc is denoted as below: c c contrraw , j xnew, j z j , j 1, 2,..., J .
8
(25)
9
c ˆc Intuitively, small contrraw , j causes small ynew , and further leads to small γc . So it
10
looks like that the process variables with smaller raw contributions should be
11
considered as the responsible variables for nonoptimality. However, even though
12
under the optimal operating performance, the raw contributions of different process
13
variables are not the same, and some of them may be very small or negative.
14
Therefore, it is more reasonable to compare each raw variable contribution with its
15
average level under optimal operating performance, rather than only considering its
16
absolute value. In view of this, I batches under the optimal operating performance
17
are selected from I training batches and used to calculate the mean of the raw
18
contribution of each process variable. The reference data of each phase are denoted as
19
( X 1 , y 1 ),( X 2 , y 2 ),...,( X C , y C ) , where X c [ x1c , x 2c ,..., x Nc ]T and y c [ y1c , y 2c ,... y Nc ]T ,
20
c 1, 2,..., C .
21
related parameters. The mean of the raw contribution of the j th process variable of
22
phase c is calculated as follows:
c
c ) 1 and A c ( c ) 2 X cT X c ( Σ α
z c = ( c ) 2 ( A c )1 X cT y c
c
are the
N c
c c c N c , contrmean , j xn , j z j
23
n 1
24
where z cj is the j th element of z c , and xnc, j is the j th variable of x nc , n 1, 2,..., N c . 17
ACS Paragon Plus Environment
(26)
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 37
Then we define the variable contribution as below:
1
c c contrjc contrraw , j -contrmean , j , j 1, 2,..., J .
2
(27)
3
In theory, if contrjc 0 , it means that the j th variable is the responsible variable for
4
the nonoptimal operating performance. Nevertheless, it is difficult to achieve an
5
c c absolutely equal between contrraw , j and contrmean , j in actual processes. Hence, the
6
upper and lower confidence limits of the variable contributions of the reference data
7
are counted by kernel density estimation 47 for different phases, and they are denoted
8
c as CU cj and CLcj , respectively. If CLcj contrjc CU cj , it means that contrraw , j and
9
c contrmean , j are approximately equal statistically; otherwise, it can be concluded that
10
they are obviously unequal. Furthermore, the process variables with contributions that
11
are significantly less than CLcj are considered as the cause variables responsible for
12
the nonoptimal operating performance. For the process variables whose contributions
13
are greater than CU cj , they may play a role in improving the operating performance.
14
c 1 ] , both contrjc 1 and contrjc should be If xnew belongs to the fuzzy interval [kinc , kout
15
calculated, and the posterior probabilities are used as the adaptive weights to integrate
16
them as the final variable contribution, i.e.,
17
contrj P c 1contrjc 1 P c contrjc , j 1, 2,..., J .
18
Correspondingly, the upper and lower confidence limits are constructed as follows: CU j P c 1CU cj 1 P cCU cj ,
19
CL j P c 1CLcj1 P c CLcj , j 1, 2,..., J ,
(28)
(29)
20
The step-by-step procedure of GPR and Bayesian inference based operating
21
performance assessment approach is summarized as below and the flow diagram is
22
shown in Fig. 2.
23
(1) Collect the training batches for the establishment of assessment models.
24
(2) Unfold the three-dimension data into two-dimension matrix via variable-wise
25
unfolding and perform the normalization on the unfolded matrix. (3) Estimate GMM and identify all different phases based on GMM-PSD strategy.
26
18
ACS Paragon Plus Environment
Page 19 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
(4) Develop GPR-based assessment models for different phases.
2
(5) Identify the phase type for new sample and invoke the corresponding model.
3
(6) Predict the comprehensive economic index of the new sample and construct the
4
assessment index with it.
5
(7) Online assessment using the constructed assessment index.
6
(8) Identify the cause variables responsible for the nonoptimal operating
7
performance based on variable contributions.
8 9
Fig.2. Diagram of GPR and Bayesian inference based operating performance assessment approach
19
ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
4. Case study
2
4.1 Fed-batch penicillin fermentation process description
The fed-batch penicillin fermentation process
3
48, 49
is a typical multiphase batch
4
process and has received wide attentions for its academic and industrial importance.
5
There are two physical stages in this process: preculture and fed-batch. The preculture
6
stage starts with small amounts of biomass and substrate. Most of the initially added
7
substrate is consumed by the microorganisms after about 45 h, and the process is
8
switched from preculture stage to fed-batch stage which usually continues about 355 h.
9
In fed-batch stage, the penicillin increased exponentially until the stationary
10
procedure. Because the biomass growth rate must be kept constant for the aim of
11
optimizing penicillin production, the substrate is supplied continuously into the
12
fermentor, rather than being added all at once in the beginning. Meanwhile, in order to
13
maintain the constant temperature and pH values, two proportional-integral-derivative
14
(PID) control loops are implemented in the fermentor for manipulating the acid/base
15
and hot/cold water flow ratios, respectively. In 2002, a modular simulator (PenSim
16
v2.0) for fed-batch fermentation was developed by Ali Çinar et al. from the
17
Monitoring and Control Group of the Illinois Institute of Technology. It can simulate
18
the concentrations of biomass, CO2, hydrogen ion, penicillin, carbon source, oxygen,
19
and heat generation under various operating conditions. The diagram of the penicillin
20
fermentation process is shown in Fig. 3.
21
Known from the mechanism of penicillin fermentation process, the process
22
operating performance depends on the final penicillin concentration. Under the
23
normal operating conditions, the higher penicillin concentration corresponds to the
24
better operating performance, while the poorer operating performance usually leads to
25
lower penicillin concentration. Thus, the final penicillin concentration is used as the
26
comprehensive economic index in this study. In addition, the process variables related
27
to the penicillin concentration are selected for operating performance assessment and
28
listed in Table 2. Table 3 gives the initial operating conditions of the production.
20
ACS Paragon Plus Environment
Page 20 of 37
Page 21 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
pH
FC Substrate Acid
Fermentor T
Base FC Cold Water Air
1 2
Hot Water
3
Table 2 Process variables used for operating performance assessment
Fig. 3 Flow diagram of the penicillin fermentation process
No. 1 2 3 4 5 6 7 8 9 10 11 4
Process variables
Normal operating ranges
Aeration rate (L/h) Agitator power (W) Substrate feed rate (L/h) Substrate feed temperature (K) Substrate concentration (g/L) Dissolved oxygen concentration (%) Biomass concentration (g/L) Culture volume (L) Carbon dioxide concentration (mole/L) pH Generated heat (kcal/h)
3-10 20-50 0.035-0.045 296-298 5-50 1.16 0-0.2 100-150 0.5-1 4-6 0
Table 3 Initial operating conditions of penicillin fermentation process Process variables
Initial operating conditions
Substrate concentration(g/L) Dissolved oxygen concentration (%) Biomass concentration (g/L) Penicillin concentration (g/L) Culture volume (L) Carbon dioxide concentration (mole/L) pH Fermentor temperature (K) Generated heat (kcal/h)
15 1.16 0.1 0 100 0.5 5 298 0
To establish the assessment models, a total of 50 training batches are produced
5
21
ACS Paragon Plus Environment
Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
under the normal operating conditions, and the durations are between 390h and 420h
2
with the sampling interval of 1h. By using GMM-PSD method, five phases are
3
identified from offline phase division. Table 4 shows the result of offline phase
4
division, and the preculture stage and fed-batch stage are further divided into two and
5
three phases, respectively. This can be attributed to the fact that a stage may contain
6
multiple data distributions caused by the intrinsic biochemical reactions or the
7
changes in external operating conditions. In preculture stage, most of the initially
8
added substrates are consumed quickly by the microorganisms. To capture the
9
regularity of data distribution, meanwhile, ensure that the data of each phase
10
approximately follow a multivariate Gaussian distribution, the preculture stage is
11
further divided into two phases, although its duration is only about 45 h. The
12
fed-batch stage, whose duration is about 355 h and almost eight times length of
13
preculture stage, is just divided into three phases. It is because that the variation trends
14
of the variable trajectories noticeably slows down in this stage, and three phases are
15
enough to describe the diversity of data distributions. In addition, the batch-to-batch
16
variations cause the inconsistent phase durations across different batches, thus form
17
the fuzzy intervals between phases.
18
Table 4
Result of offline phase division
Phase no.
Earliest start moment kinc
Latest end c moment kout
1
1
34
2
30
47
3
43
125
4
106
223
5
195
420
Phase division result Certain interval of phase 1: [1,30) ; Fuzzy interval between phase 1 and 2: [30,34] ; Certain interval of phase 2: (34, 43) ; Fuzzy interval between phase 2 and 3: [43, 47] ; Certain interval of phase 3: (47,106) ; Fuzzy interval between phase 3 and 4: [106,125] ; Certain interval of phase 4: (125,195) ; Fuzzy interval between phase 4 and 5: [195,223] ; Certain interval of phase 5: (223, 420] .
19
To test whether the process data of each phase follow Gaussian distributions, the
20
multivariate normality test algorithm proposed in Ref. 50, i.e., F-Straight Method
21
Based on Mahalanobis Distance (FSMD), is implemented before modeling. The basic
22
idea of FSMD is that, when the process data approximately follow a multivariate 22
ACS Paragon Plus Environment
Page 22 of 37
Page 23 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Industrial & Engineering Chemistry Research
1
Gaussian distribution, the distribution function of the squared Mahalanobis distances
2
of samples, D( n ) , is a specific F distribution, and it can be replaced by its empirical
3
distribution function. Denote the quantile of F distribution as Fn , and the above
4
argument is equivalent to test whether D( n ) Fn . Let the linear regression equation
5
between D( n ) and Fn be Fn a bD( n ) , the significance test is performed on it based on
6
the regression standard deviation, s , and the mean of the quantile, F . If
7
s F 0.15 , the significance test is valid and the linear regression equation can reflect
8
the real relationships between D( n ) and Fn ; otherwise, the significance test is invalid.
9
Furthermore, if both of the conditions a and b 1 are satisfied, where and
10
are usually set as F 5% and 0.2, it is usually considered that the fitting line,
11
Fn a bD( n ) , passes through the original point with the slope of 1. Thus, the
12
hypothesis test that the data follows a Gaussian distribution should not be rejected.
13
Otherwise, reject the null hypothesis. In summary, when the conditions s F 0.15 ,
14
a and b 1 are satisfied at the same time, the data follow a Gaussian
15
distribution. The detailed procedures of FSMD are given in Appendix A. As shown in
16
Table 5 and Fig. 4, the test results illustrate that, in each individual phase, the unfold
17
process data approximately follow a multivariate Gaussian distribution.
18
Table 5 Results of multivariate normality test of each phase Phase no.
s / F (