Quality Prediction in Complex Batch Processes with Just-in-Time

Jul 21, 2015 - A KPI prediction approach with JITL for vehicular Cyber Physical System. Hongpeng Zhou , Hao Ju , Tianyu Tan , Tianyi Gao. 2016,85-90 ...
0 downloads 0 Views 1MB Size
Page 1 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Quality prediction in complex batch processes with just-in-time learning model based on non-Gaussian dissimilarity measure a



a

Xinmin Zhang , Yuan Li , Manabu Kano

b

a

Information Engineering School, Shenyang University of Chemical Technology, ShenYang 110142, P. R. China (Email: [email protected]) b Dept. of Systems Science, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan (Email: [email protected]) ABSTRACT In modern batch processes, soft sensors have been widely used for estimating quality variables. However, they do not show superior prediction performance due to the self-limitations of these methods and the unique characteristics of batch processes such as time-varying dynamics, nonlinearity, non-Gaussianity, multi-phases and batch-to-batch variations. To cope with these issues, a novel just-in-time learning (JITL) soft sensor based on non-Gaussian dissimilarity measure is developed in this paper. Unlike the traditional JITL model which uses the distance-based dissimilarity measure for local modeling, the proposed method uses the non-Gaussian dissimilarity measure to evaluate the statistical dependency of the extracted independent components to construct the local model, which can well capture the non-Gaussian features in the process data. Furthermore, a novel relevant samples search strategy is introduced into the JITL framework for local modeling, which searches the relevant samples not only along the direction of time axis but also along the direction of batch-to-batch. The proposed search strategy can guarantee the current query sample and the local modeling data belong to the same phase duration and have the smallest process trajectory variations. Hence, the proposed soft sensor is suitable for uneven-phase and batch-to-batch variations batch processes. Meanwhile, the proposed method can well cope with the changes in process characteristics as well as nonlinearity. The reliability and validity of the proposed method are verified on the fed-batch Penicillin Fermentation process. The application results present superior prediction performance compared with MPLS and correlation-based JITL methods.

Keywords: Quality prediction; batch processes; independent component analysis; just-in-time learning; non-Gaussian dissimilarity measure 1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 29

1. INTRODUCTION In modern industrial processes, batch processes play a critical role in producing low volume and high value-added products due to its high operating flexibility and low capital investment, which have been widely used in the fine chemistry, biochemical, materials, pharmaceutical and semiconductor industries. Process monitoring and quality control have become the crucial tasks for the batch processes to improve product quality, ensure the process safety and environmental sustainability.1-4 Although quality variables can be measured by hardware analyzers, it is well known that most of the hardware analyzers are high costs and difficult to maintain. Furthermore, the large measurement delay associated with these hardware analyzers inevitably degrades the quality control performance.5 Therefore, soft sensor techniques have been developed over the past decades to implement process quality monitoring and control. Generally, the soft sensor techniques can be divided into three categories: the first-principal model method, the statistical (empirical) model method and the gray-box model method. The first-principal model method is usually based on mechanistic or thermodynamic process models. However, it is difficult to construct first-principal models in complicated batch processes.6 Moreover, Zamprogna et al.

5

pointed out that this type of estimator is difficult to initialize and tune, and requires the large computational burden for on-line application. Compared to the first-principal model method, the statistical model does not require the prior knowledge of systems or the initialization. It is computationally simple and very powerful to handle the high dimensional data; thus it has attracted considerable attention in recent years.7-13 Furthermore, the gray-box model method which integrates the first-principal model and statistical model has also been proposed to improve the prediction performance.14 Multiway principal component analysis (MPCA) and multiway partial least-squares (MPLS) methods are the most popular approaches for process monitoring and quality prediction in batch processes.1, 15, 16 However, some self-limitations of MPLS/MPCA and some unique characteristics of batch processes such as finite duration, multi-state, inherent nonlinearity and non-Gaussianity, time-varying dynamics, and batch-to-batch variations prevent the conventional MPLS/MPCA method from functioning well. On the one hand, MPLS/MPCA is a linear method, which cannot handle process nonlinearity. Additionally, MPLS/MPCA is a second-order method, which means it takes into account only mean and variance or covariance. Thus MPLS/MPCA cannot efficiently extract higher-order statistical information from process data with 2

ACS Paragon Plus Environment

Page 3 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

non-Gaussian distributions, which is common for actual industrial processes. Moreover, since the traditional MPLS/MPCA usually takes the entire batch as an object to build a global model, estimations of the future values of the rest of the ongoing batch are needed for on-line application; this inevitably deteriorates the prediction accuracy. Apparently, as has been illustrated by

17-20

, it also cannot

efficiently capture the multi-phase feature of most batch processes. In order to obtain better prediction performance, several approaches have been proposed. For example, to handle the nonlinear problem, a series of nonlinear regression methods such as nonlinear partial least squares (NLPLS) 21, artificial neural networks

22

, kernel PLS (KPLS)

23

have been developed. Compared to NLPLS and

neural networks, KPLS avoids nonlinear optimization through introducing the nonlinear transformation kernel function, which has recently attracted increasing consideration in many industries. Though these methods can tackle process nonlinearity, they will perform poorly when it is applied to multi-phase batch processes. This is because these methods usually assume that the batches operated under a single constant phase and conditions in the whole duration of batch process. In practice, however, batch processes have no constant operating conditions and often show multi-phase characteristics. To handle the multi-phase problem, some phase-based PLS modeling methods have been developed to improve the quality prediction ability,19, 20, 24 based on the basic idea that batch process operation can be divided into several separate phases with different phase characteristics. Compared to the MPLS model, phase-based PLS models can enhance data interpretation and process understanding. Furthermore, considering the transition information or interphase relationships between different phases, Zhao et al.25 proposed a phase-based PLS method utilizing transition information for quality prediction. Ge et al.19 improved the prediction performance by proposing a two-level PLS modeling approach, where a separated intraphase PLS model and a series of interphase PLS models are built. They point out that the interphase relationship or transition information has certain impacts on the final product quality prediction. To solve the non-Gaussian data behavior, independent component analysis (ICA) has been developed. ICA can capture more meaningful information on higher-order statistics from the exploratory variables than the PCA-based methods.26-28 In other words, independent components (ICs) are more powerful and essential in interpreting the multivariate data than principal components (PCs) because the high-order statistic of 3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ICs can reflect the intrinsic properties of mixed signal better. Based on ICA, independent component regression (ICR) was proposed by Cheng and Wang,29 and applied to qualitative analysis and quantitative prediction of the NIR spectral data. Similarly, ICA-PLS and ICA-MLR (multiple linear regression, MLR) were also developed to construct regression model between ICs and quality variables by PLS and MLR. It was found that the ICA-based regression models give high prediction power and are easy to interpret for non-Gaussian distributed process data.30-32 Generally, industrial processes are time-varying due to changes of process characteristics, such as catalyst deactivation, equipment aging and change of raw materials, etc. Hence, it is necessary to update soft sensor models automatically with the changes of the process characteristics to keep the prediction performance. Recently, recursive methods, such as recursive PLS and recursive SVR, have been proposed to adapt the soft sensor model to a new operation condition recursively.33 However, recursive models cannot deal with abrupt changes of the process. When the process operation runs a narrow range for a certain period of time, these recursive methods are easy to adapt the soft sensor model excessively. And they will not function well in the new operation region until a sufficient period of time, because there is a time delay when the recursive methods adapt the soft sensor to the new operation condition. Alternatively, just-in-time-learning (JITL) method was proposed to cope with these abrupt changes of the process, and it has been widely applied for soft sensor modeling and process monitoring.34-37 In the JITL modeling, a local model is built from the historical database using the most relevant samples around the query data point when an estimated value of this point is required. Different from the traditional offline and global modeling methods, the JITL method constructs a local model online, thus it can trace the current operating state well. Another advantage of JITL modeling is that it can tackle the process nonlinearity due to its local model structure. However, its prediction performance is mainly dependent upon the samples that are selected for local modeling. Therefore, the appropriate selection of local modeling samples is crucial to design accurate JITL models. Most JITL modeling approaches select the local modeling samples on the basis of distance-based similarity indexes. However, the distance-based similarity indexes do not take into account the correlation among process variables. Though Cheng et al.38 define a similarity index through combining distance and angle between two samples, it does not always describe the correlation among process variables sufficiently. To overcome this issue, 4

ACS Paragon Plus Environment

Page 4 of 29

Page 5 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Fujiwara et al.39 define a correlation-based similarity index by integrating T 2 with Q statistics of PCA to develop a JITL soft sensor model (Co-JITL). Nevertheless, the

above similarity index based on PCA only considers the second-order statistical characteristics of process data; the non-Gaussian information cannot be effectively extracted. Hence, using this similarity index to select the local modeling samples may have a poor effect on the prediction accuracy of JITL soft sensors for non-Gaussian distributed process data. To handle the non-Gaussian data characteristics, Xie et al.40 proposed a support vector data description (SVDD) based JITL soft sensor using non-Gaussian regression technique. And Fan et al.41 proposed a Gaussian mixture model (GMM) based JITL soft sensor which based on an assumption that a non-Gaussian distribution signal can be approximated by a mixture of several Gaussian distributions. Furthermore, the applications of the above JITL soft sensors are confined to the typical continuous process. In this study, to explicitly account for the inherent characteristics of batch processes such as non-Gaussianity, time-varying dynamics, multi-phase and take into account the batch-to-batch variation of process trajectories, a novel JITL soft sensor model based on non-Gaussian dissimilarity measure is proposed to enhance the quality prediction performance. The non-Gaussian dissimilarity index is defined by integrating ICA with multidimensional mutual information to measure the statistical independency between two IC subspaces. Compared to the correlation based dissimilarity index, the non-Gaussian dissimilarity index can well capture the non-Gaussian process features. Furthermore, a novel relevant samples search strategy is introduced into the JITL framework for local modeling by using the moving time window. The new strategy can search the relevant samples not only along the direction of time axis but also along the direction of batch-to-batch. It can guarantee the current query sample and the local modeling data belong to the same phase duration and have the smallest process trajectory variations. First, ICA models are built on time-series data subsets of each training batch at each specific time interval to extract the IC subspaces. Second, for online prediction, the dissimilarity indexes between IC subspace of training batches and IC subspace of the current batch at the same time region are calculated at each specific sampling instant by using a moving time window. The IC subspace that minimizes the dissimilarity value is selected to construct the local soft sensor model. Then, the regression relationship between the 5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 29

response variable and the selected ICs can be established by using the partial least squares (PLS). From the viewpoint of feature extraction and online prediction performance, the proposed method has the following advantages: 1) The proposed method inherits the merits of JITL modeling, and it can track the changes in process characteristics regardless of abrupt noises and can cope with the process nonlinearity. 2) From the local relevant samples selection to the final regression model construction, the proposed method can efficiently extract the higher-order statistical information, thus it is particularly suitable for quality prediction of non-Gaussian process data. 3) The proposed method takes into account the multi-phase and batch-to-batch variation characteristics, thus it can provide a better prediction performance and give a reasonable interpretation especially for the uneven-phase problem of batch processes. 4) Compared to the conventional MPLS, the proposed method does not need to estimate the future value when predicting an ongoing batch. The rest of this paper is organized as follows. Section 2 gives a brief introduction of the correlation based dissimilarity measure, ICA-PLS and the non-Gaussian dissimilarity measure. Then the novel just-in-time learning soft sensor model based on non-Gaussian dissimilarity measure is presented in section 3. In section 4, the proposed method is applied to the fed-batch Penicillin Fermentation batch processes, and its prediction results are compared with MPLS and Co-JITL approach. The conclusions of this work are presented in Section 5. 2. PRELIMINARIES 2.1. Correlation based dissimilarity measure

Recently, the correlation-based dissimilarity measure derived from PCA has been proposed.39 Consider a normal training dataset X ∈ ℜ K × J whose k -th row is

xk = [ x1 , x2 ,L xJ ] , where K is the number of samples, and J is the number of variables. In PCA, the loading matrix P ∈ ℜ J × R can be obtained through the singular value decomposition of the matrix X , where R represents the number of principal components retained in the PCA model. The score matrix T ∈ ℜ K × R is defined as T = XP .

(1) 6

ACS Paragon Plus Environment

Page 7 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Then, the matrix X can be reconstructed from T by the following linear transformation

% = TPT = XPPT . X

(2)

The reconstruction errors can be given by

% = X(I - PPT ) E = X−X

(3)

Based on this, the Q statistic is defined as J

Q = ∑ ( x j − x% j ) 2

(4)

j =1

where x j is the j -th variable of x , and x% j is the estimate of x j . The Q statistic represents the distance between the sample and the PC subspace, which is regard as a dissimilarity measure between the sample and the modeling data from the viewpoint of the correlation among variables. Furthermore, in order to guarantee the sample located in the modeling data and avoid extrapolation, the Hotelling’s T 2 statistic is used and defined as R

tr2

r =1

σ t2

T2 = ∑

(5)

r

where σ tr is the standard deviation of the r -th score tr . The T 2 statistic represents the Mahalanobis distance from the origin in the PC subspace. In other words, it measures the dissimilarity between the sample and the mean of the modeling data. Then a correlation-based dissimilarity measure index is defined by integrating

T 2 with Q statistics for data selection as proposed by Raich and Cinar. 42

J = λT 2 + (1 − λ )Q

(6)

where, 0 ≤ λ ≤ 1 .

2.2. ICA-PLS

ICA is a statistical technique for decomposing the observed data set into linear combinations of components that are as statistically independent of each other as 7

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 29

possible. Suppose the observed dataset X = [ x1 , x2 , ⋅⋅⋅, xJ ]T can be expressed as a linear combination of m ( m ≤ J ) unknown ICs S = [ s1 , s2 , ⋅⋅⋅, sm ]T , the basic model of ICA can be written as X = ΑS + E

(7)

where E is the residual matrix, Α and S are the unknown mixing and ICs matrices, respectively. The object of ICA is to obtain the ICs matrix S through searching a demixing matrix W such that the components of the estimated ICs matrix, denoted by

Sˆ = WX ,

(8)

become as independent of each other as possible. Before performing ICA, data whitening is conducted by classical PCA to eliminate the cross-correlation of the random observation variables. The whitening transformation can be expressed as Z = QX = QAS = BS

(9)

where Q is the whitening matrix and B is the orthogonal matrix which satisfies

E(zzT ) = BE(ssT )BT = BBT = Ι . After the transformation, we can calculate ICs as follows:

Sˆ = BT Z = BT QX .

(10)

From Eqs. (8) and (10), the demixing matrix W can be rewritten as

W = B TQ .

(11)

The FastICA algorithm can be used to calculate B , where each column vector of

B is randomly initialized and updated iteratively such that the i th IC has the maximized non-Gaussianity.43 After the calculation of B , the demixing matrix W and ICs Sˆ can be obtained. After the ICs are extracted, the relationship between the ICs and the response variable can be constructed by the PLS criterion. Suppose the response variable is given as y , the basic ICA-PLS31 model can be built through maximizing the covariance matrix between Sˆ T and y as follows:

Sˆ T = TPT + E

(12)

y = Tq + f

(13) 8

ACS Paragon Plus Environment

Page 9 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

where T is the score matrix, P and q are the loading matrix and vector of Sˆ T and y , respectively. E and f represent the residual matrix and vector of Sˆ T and

y , respectively. In order to estimate response variable y directly from input variable X , the regression model can be written as

y = XT b X → y + const

(14)

ˆ ˆ T )−1 Sˆ y b X → y = WT (SS

(15)

where b X → y is the regression coefficient vector. Compared to PCR or PLS, ICA-PLS is more powerful in analyzing multivariate non-Gaussian process data because the extracted ICs is the higher-order statistic which may provide more informative information and reflect the intrinsic features of process data better. 2.3. Non-Gaussian dissimilarity measure

Among the measures of statistical dependency between two (groups) of random variables, mutual information (MI) is introduced due to its information theoretic background.44 Compared to cross-correlation, MI considers the higher-order statistics and can capture the non-Gaussianity of the stochastic systems. Consider two random variables x1 and x2 . The Shannon entropy estimate of x1 is defined as

H ( x1 ) = − ∫ u ( x1 ) log u ( x1 )dx1 ,

(16)

where “ log ” is the natural logarithm, and u( x1 ) is the probability density function of x1 . The MI between random variable x1 and x2 can be calculated as follows:

I ( x1 , x2 ) = ∫∫ u ( x1 , x2 ) log

u ( x1 , x2 ) dx1dx2 u ( x1 )u ( x2 )

(17)

where u( x1 , x2 ) is the joint probability density function, whereas u( x1 ) and u ( x2 ) are the marginal probability density functions of x1 and x2 , respectively. The above MI can be estimated in terms of entropies as I ( x1 , x2 ) = H ( x1 ) + H ( x2 ) − H ( x1 , x2 )

(18)

where H ( x1 ) and H ( x2 ) are the marginal entropies of x1 and x2 , and H ( x1 , x2 ) is the joint entropy given by 9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

H ( x1 , x2 ) = − ∫∫ u ( x1 , x2 ) log u ( x1 , x2 )dx1dx2 .

Page 10 of 29

(19)

It is worth noting that the estimation of MI through computing the integrals and summations in Eq. (17) is intensive and inefficient in practice. In order to reduce the computation burden, the nearest neighbor strategy based on the Kozachenko– Leonenko estimator of Shannon entropy has been proposed.45, 46 Firstly, the estimate of joint entropy through nearest neighbor technique is defined as follows:

H ( x1 , x2 ) = −ψ (κ ) +ψ (n) + log(cd x cd x ) + 1

d x1 + d x2 n

2

n

∑ ε ( j)

(20)

j =1

where n is the number of points, and ε ( j ) = max{ε x1 ( j ), ε x2 ( j )} is the maximum Euclidean norm of the j -th sample point to its κ -th neighbor in the space z = ( x1 , x2 ) , ε x1 ( j ) / 2 and ε x2 ( j ) / 2 denote the Euclidean distance between the

same

point

projected

into

the

x1

and

x2

subspaces.

Moreover,

ψ (υ ) = Γ(υ )−1 d Γ(υ ) / dυ is the digamma function and Γ (υ ) = (υ − 1)! is the gamma function. Here d x1 and d x2 are the dimensions of x1 and x2 , respectively. And

cd = π d /2 / Γ(1 + d / 2) / 2 d is the volume of the d-dimensional unit cube for the Euclidean norm.45 The marginal entropy H ( x1 ) or H ( x2 ) can be estimated by projection from the joint space as

H ( x1 ) = −

dx 1 n ψ [τ x1 ( j )] +ψ (n) + log(cd x1 ) + 1 ∑ n j =1 n

n

∑ ε ( j)

(21)

j =1

where τ x1 ( j ) is the number of points located in its κ -th nearest neighbor area

ε x ( j ) . Subtracting Eqs. (20) and (21) from Eq. (18) we can obtain 1

I ( x1 , x2 ) = ψ (κ ) − ψ [τ x1 ( j )] +ψ [τ x2 ( j )] + ψ ( n) where ⋅

(22)

denotes the average of all possible realizations of the random samples.

Notice that the parameters ε ( j ) and cd are subtracted in the final equation of MI, this illustrates that the estimate has nothing to do with the data scale. So the estimate in Eq. (22) can be extended to compute the multidimensional mutual information (MMI) for variables x1 and x2 of any dimension. Consider two multivariate random IC subspaces S1 and S 2 , the MMI can be written as

I (S1 ,S 2 ) = ψ (κ ) − ψ [τ s1 ( j )] +ψ [τ s2 ( j )] +ψ (n) , 10

ACS Paragon Plus Environment

(23)

Page 11 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

where n is the size of the IC subspace, τ s1 ( j ) and τ s2 ( j ) are the numbers of samples located in its κ -th nearest neighbor area ε s1 ( j ) and ε s2 ( j ) , respectively. The non-Gaussian dissimilarity index which can estimate the statistical dependency between two IC subspaces is defined as follows in association with the MMI

1 I (S1 ,S 2 )

(24)

1 T S i S i , (i = 1, 2) n

(25)

DMMI =

I s2i =

I s21 I

2 s2



where I s21 and I s22 are the average of I 2 statistics of the two IC subspaces. Eq. (24) illustrates that the dissimilarity index is smaller when the statistical dependency between two IC subspaces is stronger. As illustrated by 47, statistical dependency is a much stronger condition than cross-correlation, depends on the higher-order statistics and can capture the non-Gaussian information of the process data. 3. JUST-IN-TIME LEARNING MODEL BASED ON NON-GAUSSIAN DISSIMILARITY MEASURE 3.1. JITL model

Generally, the prediction performance of global soft sensor model may deteriorate when the process condition changes frequently, especially for the time-varying and multi-phase batch processes. This is because the global model cannot effectively track the time-varying dynamics and cannot reveal the multi-phase characteristic. Therefore, a local modeling strategy that divides a process operation region into several small regions and constructs a local model in each small region has been developed. In order to construct the local model online, the JITL method was proposed and has been widely used for soft sensor and process monitoring. The basic modeling principle of JITL can be summed up in three steps: (1) for a new query sample, relevant samples that match the current query sample are selected in the dataset through some similarity measure criteria; (2) a local model is constructed based on the relevant samples; (3) the response variable is estimated by the current local model, then the constructed local model is discarded. When the next query sample comes, a new local model will be constructed through the same procedure. Compared to the conventional 11

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 29

global methods, the JITL model can cope with the abrupt changes in the process as well as nonlinearity, because the nonlinear process can be represented with a set of local models valid in certain operating regions. However, its prediction performance depends mainly on the samples that are selected for local modeling. Therefore, how to select the appropriate local modeling samples to design an accurate model is a crucial problem of the JITL approach. 3.2. JITL model based on non-Gaussian dissimilarity measure

To improve the performance of JITL model, we propose a new quality prediction method, which can cope with the inherent characteristics of batch processes such as time-varying,

multi-phase,

batch-to-batch

variations,

nonlinearity

and

non-Gaussianity. In particular, it should be noted that most batch processes exhibit multi-phase characteristics. For example, a typical fermentation process contains the pre-culture phase and the fed-batch phase. Since each phase may have its own underlying mechanism, different phases may exhibit different variable relationships. Even within one phase, the variable relationships may change due to the time-varying dynamics. Additionally, the switching from one phase to another phase may occur at the different time for different batches. That is to say, different batches may show different operation phases at the same sampling time. All in all, different batches have different behaviors; this is termed batch-to-batch variations. In such a situation, both the traditional MPLS, which builds a global soft sensor based on the entire batch process datasets, and the time-slices PLS, which constructs the soft sensor model based on the current time slice matrix, cannot provide good performance, especially for the batch processes with severe uneven-phase and batch-to-batch variations. An illustration of problems of uneven-phase and batch-to-batch variations in batch processes is given in Figure 1, where the different colors represent batch-to-batch variations. In order to overcome problems of non-Gaussianity, uneven-phase and batch-to-batch variations and to enhance the real-time prediction performance, a new relevant samples search strategy integrating the non-Gaussian dissimilarity index with the moving time window is proposed in this study. In contrast to the distance-based or 12

ACS Paragon Plus Environment

Page 13 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

correlation-based dissimilarity index, the non-Gaussian dissimilarity index is defined by integrating ICA with multidimensional mutual information, which takes into account the higher-order statistical information and is well suitable for implementation in non-Gaussian process data. Furthermore, different from the traditional samples search methods, the proposed strategy searches the relevant samples not only along the direction of time axis but also along the direction of batch-to-batch. In other words, the proposed search strategy attaches importance to the effect leaded by the uneven-phase and batch-to-batch variations to the samples selection, which searches the resembling samples that make it possible to guarantee the current query sample and the local modeling data belong to the same phase duration and have the smallest process trajectory variations. Assume

we

have

obtained

I

batches

normal

measurement

data

Z = [ X ( I × J × K ) , Y ( I × M × K ) ] as the training samples. Where K represents the number of

sampling instants in each batch, J and M are the number of predictor and response variables, respectively. First, the three-dimensional batch data matrix is unfolded to a two-dimensional form X( I × JK ) . Afterwards, normalization along batch direction is performed to reduce non-linearity and erase the impacts of variable measuring ranges and units. Therewith, the standardized data matrices are rearranged to form the time-series data matrix X ( K × J ) for each batch. On the off-line training stage, each training batch is divided into several subsets through the moving time window strategy. Each subset consists of a set of successive samples with a fixed length, defined as L . The length of subset should not be too large to guarantee the local modeling samples have the successive and similar process characteristics. On the other hand, it should not be too small to ensure the local model contains adequate process information. Commonly, it should satisfy L ≥ 2 J as recommended in the field of multivariate statistical regression to ensure a reliable statistical model.48 In this paper, we set the L is approximately of two or three times of the number of the process variables, as suggested in work by Lu et al..49 When on-line process modeling and prediction, the time window moves forward and we can search the relevant samples at around the current query sample. The schematic diagram of the relevant samples search strategy is shown in Figure 2. For the current sampling instant k , we can search the relevant samples before or after the current time k . In this study, the 13

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 29

search range is from time k − 2L + 1 to time k + L , and the size of search range θ is I × (2 L − 1) . Here the step size of time window defaults to 1. Note that it is not necessary to search the relevant samples that match the current query sample from the entire historical dataset at all sampling time, because the current query sample is most likely relevant to the historical samples at around the current sampling time. In addition, the computational burden will be large if the search range is the entire datasets, and this will result in large prediction delays. In this search strategy, the I × (2 L − 1) subsets which are closest to the current query sample are regarded as

more similar to the current query sample. Then the most similar subset can be searched from the I × (2 L − 1) subsets by using the non-Gaussian dissimilarity index. Denote the l -th subset to be X l ∈ ℜ L× J . Then, build the ICA model on the subset Xl as follows

Sˆ l = Wl Xl I l2 =

(26)

1 L ˆl T ˆl ∑ S ( j) S ( j) L j =1

(27)

where Sˆ l represents the IC subspace of Xl , and W l represents the demixing matrix of the ICA model, and I l2 is the average of I 2 statistics of Xl . Here, the number of ICs is selected according to the cumulative percentages of Euclidean norms of the demixing matrix. For the current sampling instant k , there will be a time-window matrix X knew is available. Then the new time-window matrix X knew of the predicted batch is normalized using the mean and variance of the modeling procedure. Similarly, the IC 2 subspace Sˆ knew and I new ( k ) can be obtained through independent component

analysis as follows: k Sˆ knew = Wnew Xknew 2 I new (k ) =

(28)

1 L ˆk ∑ S new ( j )T Sˆ knew ( j ) L j =1

(29)

Then the mutual information between the two IC subspaces Sˆ l and Sˆ knew can be calculated to quantitative estimate the non-Gaussian independency between the l -th subset and the current predicted one as follows: k k I (Sˆ l , Sˆ new ) = ψ (κ ) − ψ [τ l ( j )] +ψ [τ new ( j )] +ψ ( L)

14

ACS Paragon Plus Environment

(30)

Page 15 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

k where τ l ( j ) and τ new ( j ) are the numbers of samples located in its κ -th nearest k neighbor area ε l ( j ) and ε new ( j ) , respectively. Naturally, the MMI based

non-Gaussian dissimilarity at the current time can be rewritten as k Dnew (l ) =

2 I new (k ) 1 . ⋅ 2 l ˆk ˆ Il I (S , S new )

(31)

From Eq. (31), it can be seen that the MMI based dissimilarity is designed based on the higher-order statistics through estimating the statistical independency between the different IC subspaces, and thus it can well capture the non-Gaussian features of the process data compared with the correlation based dissimilarity. After all the dissimilarity values between I × (2 L − 1) subsets and X knew are calculated, a dissimilarity index vector defined as D k ∈ ℜ I ×( 2 L −1) can be obtained. Then rearrange

D k in the descending order, and only the most relevant IC subspace Sˆ lR which corresponds to the smallest dissimilarity value is selected for constructing the local ICA-PLS model. After Sˆ lR is determined, the relationship between Sˆ lR and the response variable yRl can be constructed online, and the corresponding prediction k results for the new measurement sample xnew can be calculated as follows

(Sˆ lR )T = TPT + E

(32)

yRl = Tq + f

(33)

b X → y = WT [Sˆ lR (Sˆ lR )T ]−1 Sˆ lR yRl

(34)

k k yˆnew = ( xnew )T b X → y + const

(35)

where b X → y is the regression coefficient vector of the online local soft sensor model. k After the response variable yˆ new is estimated by the current local model, the

constructed local model is then discarded. When the next query sample comes, a new local model will be constructed through the same procedures. The proposed soft sensor approach consists of two parts: off-line modeling and on-line predicting. And the flowchart of the proposed technique is given in Figure 3. Off-line modeling 1) Obtain the normal batch processes dataset Z = [ X ( I × J × K ) , Y ( I × M × K ) ] , unfold it to a batch-wise form X( I × JK ) and normalize it. 15

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 29

2) Rearrange the normalized dataset into the time-series data matrix X ( K × J ) for each batch. 3) Determine the moving time window size L , and divide each training batch into several subsets through the moving time window strategy. Denote the l -th subset to be X l ∈ ℜ L× J . 4) Built ICA model on the subset Xl to extract IC subspace Sˆ l and I l2 . 5) Store the model parameters Sˆ l and I l2 for each subset.

On-line predicting k at sampling instant k using the mean and 1) Normalize the new acquired data xnew

variance of the modeling phase. 2) Construct the current time-window matrix X knew , and extract the IC subspace 2 Sˆ knew and I new (k ) .

3) Calculate the non-Gaussian dissimilarity index

k Dnew (l )

between all the

I × (2 L − 1) subsets and X knew . k 4) After all Dnew (l ) are computed, rearrange them in the descending order.

5) The IC subspace S l that minimizes the dissimilarity index is selected as the local modeling sample, defined as Sˆ lR . 6) Build the online local regression model between Sˆ lR and response variable yRl k as Eqs. (32-35), then the corresponding prediction variable yˆ new can be obtained. k 7) After yˆ new is estimated, the constructed local model is discarded; when the next k +1 observation sample xnew comes, return to step 1.

As shown in Figure 3, the computation load of the proposed method is mostly concentrated in the off-line modeling stage to calculate the Sˆ l and I l2 for all the training batches at each specific time interval by using ICA, and the computation load involved in the on-line monitoring stage is relatively very light. Note that repeated use of ICA decomposition on the training batches should be avoided because only Sˆ l and I l2 are our needed in the on-line local modelling. In practice, we only need storage the IC subspaces Sˆ l and I l2 in order to reduce computational time for 16

ACS Paragon Plus Environment

Page 17 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

on-line prediction. Hence, the proposed approach is suitable for practical on-line application.

Figure 1. An illustration of problems of the uneven-phase and batch-to-batch variations in batch processes.

Figure 2. The schematic diagram of the new search strategy.

17

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. The flowchart of the just-in-time learning model based on non-Gaussian dissimilarity measure.

4. APPLICATION EXAMPLE In this section, a benchmark fed-batch Penicillin Fermentation process is used to verify the effectiveness of the proposed online quality prediction method. The Penicillin Fermentation process is a typical nonlinear, multiphase, dynamic and non-Gaussian industrial process.50,

51

Implementing an effective online prediction 18

ACS Paragon Plus Environment

Page 18 of 29

Page 19 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

approach is very important to ensure operating conditions stable and safety, improves the product quality and yield, and increases the economic profits for Penicillin Fermentation process. Generally, in the typical Penicillin Fermentation process, microorganisms are cultivated and accumulated to up to adequate cell densities for penicillin production in the initial pre-culture phase. When most of the initially added substrate has been consumed by the microorganisms, the substrate feed begins. The penicillin usually starts to be produced at the exponential growth phase and continues to be generated until the stationary phase. In order to obtain a high penicillin formation rate, it is necessary to maintain the minimum cell growth rate during the fermentation process. This is why glucose is fed continuously into the reactor instead of being added one-off at the beginning. In addition, there are two cascaded PID controllers to control the PH and temperature of the fermenter by adjusting the acid/base flow ratio and hot/cold water flow ratio, respectively. Inversely, the substrate and air are fed into the fermenter under the open-loop operation during the fed-batch mode. More detailed description of the Penicillin Fermentation process can be found in reference52. In this paper, the fed-batch Penicillin Fermentation process data sets are generated from a simulator named PenSim v2.0, which is developed by the monitoring and control group of the Illinois Institute of Technology.52 In order to illustrate the uneven-phase and batch-to-batch variation issues, some perturbations with different ranges are introduced to each batch, and the durations between the pre-culture phase and the fed-batch phase are coordinated to simulate the uneven-phase characteristics. The completed duration of each batch is 400 h with the sampling interval of 0.5 h. A total of eight predictor variables and two response variables are considered to construct the soft sensor model as shown in Table 1. The typical process trajectories of the predictor variables are shown in Figure 4. Before the implementation of the proposed approach, collecting an appropriate number of normal training batches is necessary to construct a good soft sensor model for quality prediction. If the number of normal training batches is small, the soft sensor model does not contain efficient process information to reflect the batch-to-batch variations, and that means the selected local modeling samples may not 19

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

sufficiently represent the current operating condition, thus this will lead to poor prediction accuracy. However, a large number of modeling batches may contain a lot of useless redundant information and leading to a heavy computational burden and a large prediction delay. Therefore, selecting an appropriate number of modeling batches is important for the effective implementation of the soft sensor method. However, the determination of the number of normal modeling batches is still an open and confusing question without the unified calculation rule. In practice, the required number of normal batches is dependent on two aspects: one is the real characteristics of the batch processes to be predicted; another is the soft sensor technique we are going to adopt. In this paper, the number of normal training batches is determined by a tradeoff between the required ability to reflect the batch-to-batch variations and the allowed prediction deviation. In other words, we not only make sure that the process trajectory statistical properties of batch-to-batch is adequate in the penicillin fermentation process but also concern the samples required for constructing a real-time and reliable regression model. Figure 5 gives the sensitivity analysis of the normal training batch number for the prediction results. On this basis, 60 batches are simulated under the normal operating conditions for soft sensor modeling.

Figure 4. Trajectories of the fed-batch Penicillin Fermentation process variables

20

ACS Paragon Plus Environment

Page 20 of 29

Page 21 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 1. Variables used in the fed-batch Penicillin Fermentation process Variable No.

variables

1

Aeration rate (L /h)

2

Agitator power (W)

3

Substrate feed rate (L /h)

4

Substrate concentration (g/L)

5

Dissolved oxygen concentration (g/L)

6

Culture volume (L)

7

Carbon dioxide concentration (g/L)

8

Generated heat (cal)

y1

Biomass concentration (g/L)

y2

Penicillin concentration (g/L)

Figure 5. Sensitivity analysis of the normal training batch number for the prediction results

In order to quantitative comparison the prediction performance of different soft sensor approaches, the root mean squares error (RMSE) and correlation coefficient criterion are employed in this paper. The RMSE and correlation coefficient r are defined as follows: RMSE =

1 nt

nt

∑( y

k

− yˆ k )

2

(36)

k

21

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

r=



nt k =1

( yk − m0 )( yˆ k − mˆ 0 )

∑ k =1 ( yk − m0 )2 nt

Page 22 of 29

(37)

∑ k =1 ( yˆ k − mˆ 0 )2 nt

where nt represents the total number of test samples, yk is the real value and yˆ k

ˆ 0 are the mean value of the real and predicted is the predicted value. m0 and m variable yk and yˆ k . The greater of the correlation coefficient, the better of the prediction performance. In this study, the conventional MPLS and Co-JITL are also applied to demonstrate the advantage of the proposed soft sensor modeling approach. Detailed online quality prediction results of MPLS, Co-JITL and the proposed method are given and analyzed as follows. In the first test case, biomass concentration is chosen as the response variable. The prediction results of MPLS, Co-JITL and the proposed method (NG-JITL) are shown in Figure 6. Furthermore, the prediction errors of biomass concentration by three methods are depicted in Figure 8(a). The number of latent variables used in MPLS is determined by trial and error to maximize the prediction performance. Figure 6 shows that MPLS does not perform well. There are significant deviations between the real and predicted values across the completed operating duration. In Figure 8(a), one can readily see that the prediction errors of MPLS are significantly bigger than those of Co-JITL and NG-JITL methods. The main reasons are as follows: one is that MPLS is a linear and global method, which is more suitable for linear and unimodal batch processes. However, penicillin fermentation process exhibits severe nonlinear and multi-phase characteristics due to process shifts so that the MPLS method is ill-suited. Another reason that should be noticed is that MPLS only considers the second-order statistics and cannot efficiently capture the non-Gaussian features of the penicillin fermentation process. In contrast to MPLS, Co-JITL method displays an improved prediction performance as shown in Figure 6. Here, the model parameters are set as

L = 25 , λ = 0.85 , which are determined by trial and error to maximize the prediction performance. The principal component number is chosen as 5, which can explain the most process information. Compared to MPLS, Co-JITL method builds a local 22

ACS Paragon Plus Environment

Page 23 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

regression model, and it is constructed online. So the Co-JITL method can well track the dynamic changes of penicillin fermentation process and is also suitable for quality prediction of the nonlinear process. However, Co-JITL method is designed based on PCA, which cannot deal with non-Gaussian process data. In contrast, the proposed method not only can track the dynamic changes but also characterize the non-Gaussianity of the penicillin fermentation process thus provides a superior prediction performance, as shown in Figure 6 and 8(a). Here, the moving time window size is also set as L = 25 . In addition, it should be noted that both MPLS and Co-JITL methods give poor prediction performance in the transition phase between the pre-culture and fed batch phase, where we have simulated the uneven-phase data characteristics. This is because the transition phase contains samples from different operation phases of different batches, which cannot reflect the current process characteristics efficiently. Usually, the transition phase also shows severe non-linearity and non-Gaussianity. Therefore, in such a situation, both MPLS and Co-JITL methods display a bad prediction performance due to their self-limitations. In comparison, the proposed method improves the local modeling samples accuracy of JITL model through using the non-Gaussian dissimilarity measure, and it shows stable and accurate prediction performance in the transition phase.

Figure 6. Biomass concentration prediction results by three methods.

23

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Next, the penicillin concentration was predicted by MPLS, Co-JITL and the proposed method as shown in Figure 7. And the prediction errors by three methods are shown in Figure 8(b). Figure 7 and 8(b) show that both Co-JITL and the proposed method present more efficient and accurate online prediction results in contrast with MPLS. In addition, both MPLS and Co-JITL provide vibratory predictions in the transition phase. Contrastively, the proposed method gives a smooth prediction result in the transition phase. Figure 6-8 demonstrate that the proposed method is more accurate and reliable for quality prediction of biomass and penicillin concentration with very minimal prediction errors. For the quantitative comparison, RMSE and correlation coefficient values are summarized in Table 2. Table 2 indicates that MPLS gives poor prediction performance with high RMSE value and small correlation coefficient. The main reason is that MPLS considers all the training batches data as a unit to build a global prediction model, which ignores the multi-phase characteristic of the Penicillin Fermentation process. Conversely, the Co-JITL method outperforms MPLS in dealing with the time-varying dynamics and nonlinearity of batch processes and shows a better prediction performance. However, it still cannot provide a reliable and accurate prediction performance with a relatively high RMSE value and small correlation coefficient as shown in Table 2. This is because both MPLS and Co-JITL methods only consider the second-order features of process data and do not effectively capture the non-Gaussian characteristics in the penicillin fermentation process. In contrast, the proposed approach performs the best and shows the lowest RMSE values and largest correlation coefficient among the three prediction methods as seen in Table 2. This can be owned to the fact that the proposed method is designed based on non-Gaussian dissimilarity measure and ICA-PLS, which can efficiently extract the higher-order statistical information thus it is more suitable for quality prediction of non-Gaussian process data. To analyze the offline and online efficiency of the proposed method, Table 3 provides the CPU running time of the three soft sensors for predicting the whole test samples. The configuration of the computer is as follows: OS: Windows 7 (32 bit); CPU: Pentium(R) Dual-Core E6600 (3.06 GHz); RAM: 2 GB; The version of 24

ACS Paragon Plus Environment

Page 24 of 29

Page 25 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

MATLAB is R2012a. It can be seen that the CPU time for offline training of NG-JITL is much larger, and the time for online modeling is relatively very light. The online average prediction time for one single sample is 0.2241 seconds. In addition, in practical industrial application process, the industrial computer is much more powerful than the traditional PC, so the proposed method is suitable for online application.

Figure 7. Penicillin concentration prediction results by three methods.

Figure 8. Prediction errors by three methods ((a) Biomass Conc. (b) Penicillin Conc.).

25

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 29

Table 2. Quantitative comparison of three methods for two test cases Methods MPLS Co-JITL NG-JITL

RMSE(y1) RMSE(y2) 0.2847 0.3791 0.0659 0.0974 0.0231 0.0370

r (y1) 0.9577 0.9871 0.9903

r (y2) 0.9278 0.9860 0.9885

Table 3. Operation time (second) of three methods Methods NG-JITL Co-JITL MPLS

Operation time Offline: 2653.92 s Online: 179.30 s Whole run: 89.175 s Whole run: 18.589 s

5. CONCLUSIONS In this paper, a novel just-in-time learning model based on non-Gaussian dissimilarity measure is proposed for quality prediction of complex batch processes. Different from the tradition soft sensors such as MPLS, which builds a global regression model, the proposed method constructs an online local model based on JITL, which can well cope with the changes of batch processes as well as nonlinearity. Moreover, for the local JITL modeling, the relevant samples are selected not only along the direction of time axis but also along the direction batch-to-batch, so the process dynamic characteristics and batch-to-batch variations can be effectively captured by this method. Therefore, the novel search strategy is more suitable for tackling the uneven-phase and batch-to-batch variation problems in batch processes. Furthermore, compared to the correlation-based JITL model, the proposed JITL local model based on the non-Gaussian dissimilarity measure can capture the non-Gaussian features and is more suitable for non-Gaussian process data. After the local modeling data is selected, the simple and high-efficiency non-Gaussian regression method ICA-PLS is employed to construct the prediction model. The reliability and effectiveness of the proposed approach have been evaluated in the fed-batch penicillin fermentation process, which show that the proposed soft sensor gives the superior prediction performance compared with MPLS and Co-JILL methods. However, there 26

ACS Paragon Plus Environment

Page 27 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

are still some open questions such as the size of moving time window and the number of training batches should be further considered and researched in the future work. In practice, although the model parameter is determined by the empirical or cross validation method in the present paper and the results show good performance, the parameter tuning process may be different due to different industrial processes with different data characteristics. So design a unified parameter tuning approach is currently underway.

ACKNOWLEDGMENTS This work is supported by the National Natural Science Foundation of China under Grant 61174119, 61034006, 60774070, and the fundamental research funds for the key laboratory of Liaoning Province education department, 2015.

REFERENCES (1) Nomikos, P.; MacGregor, J. F. Monitoring batch processes using multiway principal component analysis. AICHE J. 1994, 40, 1361-1375. (2) Nomikos, P.; MacGregor, J. F. Multivariate SPC charts for monitoring batch processes. Technometrics. 1995, 37, 41-59. (3) Yin, S.; Ding, S. X.; Abandan Sari, A. H.; Hao, H. Data-driven monitoring for stochastic systems and its application on batch process. International Journal of Systems Science. 2013, 44, 1366-1376. (4) Ge, Z.; Song, Z.; Gao, F. Review of recent research on data-based process monitoring. Ind. Eng. Chem. Res. 2013, 52, 3543-3562. (5) Zamprogna, E.; Barolo, M.; Seborg, D. E. Estimating product composition profiles in batch distillation via partial least squares regression. Control Engineering Practice. 2004, 12, 917-929. (6) Ge, Z.; Song, Z. Online monitoring and quality prediction of multiphase batch processes with uneven length problem. Ind. Eng. Chem. Res. 2014, 53, 800-811. (7) Zamprogna, E.; Barolo, M.; Seborg, D. E. Development of a soft sensor for a batch distillation column using linear and nonlinear PLS regression techniques. Control Engineering Practice. 2004, 12, 917-929. (8) Yin, S.; Ding, S. X.; Haghani, A.; Hao, H.; Zhang, P. A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process. J. Process Contr. 2012, 22, 1567-1581. (9) Ding, S.; Yin, S.; Peng, K.; Hao, H.; Shen, B. A novel scheme for key performance indicator prediction and diagnosis with application to an industrial hot strip mill. 2012. (10) Kano, M.; Nakagawa, Y. Data-based process monitoring, process control, and quality improvement: Recent developments and applications in steel industry. Comput. Chem. Eng. 2008, 32, 12-24. (11) Ge, Z.; Song, Z.; Kano, M. External analysis‐based regression model for robust soft sensing of multimode chemical processes. AICHE J. 2014, 60, 136-147. 27

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(12) Zhang, Y.; Teng, Y.; Zhang, Y. Complex process quality prediction using modified kernel partial least squares. Chem. Eng. Sci. 2010, 65, 2153-2158. (13) Liu, Y.; Gao, Z.; Chen, J. Development of soft-sensors for online quality prediction of sequential-reactor-multi-grade industrial processes. Chem. Eng. Sci. 2013, 102, 602-612. (14) Ahmad, I.; Kano, M.; Hasebe, S.; Kitada, H.; Murata, N. Gray-box modeling for prediction and control of molten steel temperature in tundish. J. Process Contr. 2014, 24, 375-382. (15) Nomikos, P.; MacGregor, J. F. Multi-way partial least squares in monitoring batch processes. Chemom. Intell. Lab. Syst. 1995, 30, 97-108. (16) Undey, C.; Cinar, A. Statistical monitoring of multistage, multiphase batch processes. 2002. (17) Ge, Z.; Zhao, L.; Yao, Y.; Song, Z.; Gao, F. Utilizing transition information in online quality prediction of multiphase batch processes. J. Process Contr. 2012, 22, 599-611. (18) Ge, Z.; Song, Z.; Gao, F.; Wang, P. Information-Transfer PLS Model for Quality Prediction in Transition Periods of Batch Processes. Ind. Eng. Chem. Res. 2013, 52, 5507-5511. (19) Ge, Z.; Song, Z.; Zhao, L.; Gao, F. Two-level PLS model for quality prediction of multiphase batch processes. Chemom. Intell. Lab. Syst. 2014, 130, 29-36. (20) Zhao, C.; Gao, F. Between‐phase‐based statistical analysis and modeling for transition monitoring in multiphase batch processes. AICHE J. 2012, 58, 2682-2696. (21) Wilson, D.; Irwin, G.; Lightbody, G. In Nonlinear PLS modelling using radial basis functions; American Control Conference, 1997; pp 3275-3276. (22) Blanco, M.; Coello, J.; Iturriaga, H.; Maspoch, S.; Pages, J. NIR calibration in non-linear systems: different PLS approaches and artificial neural networks. Chemom. Intell. Lab. Syst. 2000, 50, 75-82. (23) Kim, K.; Lee, J.-M.; Lee, I.-B. A novel multivariate regression approach based on kernel partial least squares with orthogonal signal correction. Chemom. Intell. Lab. Syst. 2005, 79, 22-30. (24) Ge, Z.; Gao, F.; Song, Z. Mixture probabilistic PCR model for soft sensing of multimode processes. Chemom. Intell. Lab. Syst. 2011, 105, 91-105. (25) Zhao, L.; Zhao, C.; Gao, F. Phase transition analysis based quality prediction for multi-phase batch processes. Chin. J. Chem. Eng. 2012, 20, 1191-1197. (26) Hyvarinen, A. Fast and robust fixed-point algorithms for independent component analysis. Neural Networks, IEEE Transactions on. 1999, 10, 626-634. (27) Kano, M.; Tanaka, S.; Hasebe, S.; Hashimoto, I.; Ohno, H. Monitoring independent components for fault detection. AICHE J. 2003, 49, 969-976. (28) Kano, M.; Hasebe, S.; Hashimoto, I.; Ohno, H. Evolution of multivariate statistical process control: application of independent component analysis and external analysis. Comput. Chem. Eng. 2004, 28, 1157-1166. (29) Chen, J.; Wang, X. A new approach to near-infrared spectral data analysis using independent component analysis. J. Chem. Inf. Comput. Sci. 2001, 41, 992-1001. (30) Kaneko, H.; Arakawa, M.; Funatsu, K. Development of a new regression analysis method using independent component analysis. Journal of chemical information and modeling. 2008, 48, 534-541. (31) Kaneko, H.; Arakawa, M.; Funatsu, K. Development of a new soft sensor method using independent component analysis and partial least squares. AICHE J. 2009, 55, 87-98. (32) Shao, X.; Wang, W.; Hou, Z.; Cai, W. A new regression method based on independent component analysis. Talanta. 2006, 69, 676-680. (33) Helland, K.; Berntsen, H. E.; Borgen, O. S.; Martens, H. Recursive algorithm for partial least squares regression. Chemom. Intell. Lab. Syst. 1992, 14, 129-137. 28

ACS Paragon Plus Environment

Page 28 of 29

Page 29 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(34) Bontempi, G.; Birattari, M.; Bersini, H. Lazy learning for local modelling and control design. International Journal of Control. 1999, 72, 643-658. (35) Ge, Z.; Song, Z. A comparative study of just-in-time-learning based methods for online soft sensor modeling. Chemom. Intell. Lab. Syst. 2010, 104, 306-317. (36) Kim, S.; Kano, M.; Hasebe, S.; Takinami, A.; Seki, T. Long-Term Industrial Applications of Inferential Control Based on Just-In-Time Soft-Sensors: Economical Impact and Challenges. Ind. Eng. Chem. Res. 2013, 52, 12346-12356. (37) Kano, M.; Fujiwara, K. Virtual sensing technology in process industries: trends and challenges revealed by recent industrial applications. J. Chem. Eng. Jpn. 2013, 46, 1-17. (38) Cheng, C.; Chiu, M.-S. A new data-based methodology for nonlinear process modeling. Chem. Eng. Sci. 2004, 59, 2801-2810. (39) Fujiwara, K.; Kano, M.; Hasebe, S.; Takinami, A. Soft‐sensor development using correlation‐ based just‐in‐time modeling. AICHE J. 2009, 55, 1754-1765. (40) Xie, L.; Zeng, J.; Gao, C. Novel just-in-time learning-based soft sensor utilizing non-Gaussian information. Control Systems Technology, IEEE Transactions on. 2014, 22, 360-368. (41) Fan, M.; Ge, Z.; Song, Z. Adaptive Gaussian Mixture Model-based relevant sample selection for JITL soft sensor development. Ind. Eng. Chem. Res. 2014, 53, 19979-19986. (42) Raich, A.; Cinar, A. Statistical process monitoring and disturbance diagnosis in multivariable continuous processes. AICHE J. 1996, 42, 995-1009. (43) Hyvärinen, A.; Oja, E. Independent component analysis: algorithms and applications. Neural Networks. 2000, 13, 411-430. (44) Kraskov, A.; Stögbauer, H.; Andrzejak, R. G.; Grassberger, P. Hierarchical clustering using mutual information. EPL (Europhysics Letters). 2005, 70, 278. (45) Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Physical Review E. 2004, 69, 066138. (46) Jung, C.-S.; Seo, H.; Kang, H.-G. Estimating redundancy information of selected features in multi-dimensional pattern classification. Pattern Recog. Lett. 2011, 32, 590-596. (47) Rashid, M. M.; Yu, J. A new dissimilarity method integrating multidimensional mutual information and independent component analysis for non-Gaussian dynamic process monitoring. Chemom. Intell. Lab. Syst. 2012, 115, 44-58. (48) Johnson, R. A.; Wichern, D. W., Applied multivariate statistical analysis. Prentice hall Upper Saddle River, NJ: 2002; Vol. 5. (49) Lu, N.; Yang, Y.; Wang, F.; Gao, F. In A stage-based monitoring method for batch processes with limited reference data; 7th International Symposium on Dynamics and Control of Process Systems (Dycops-7), Boston, USA, 2004. (50) Yoo, C. K.; Lee, J.-M.; Vanrolleghem, P. A.; Lee, I.-B. On-line monitoring of batch processes using multiway independent component analysis. Chemom. Intell. Lab. Syst. 2004, 71, 151-163. (51) Jia, Z. Y.; Wang, P.; Gao, X. J. Process Monitoring and Fault Diagnosis of Penicillin Fermentation Based on Improved MICA. Advanced Materials Research. 2012, 591, 1783-1788. (52) Birol, G.; Ündey, C.; Cinar, A. A modular simulation package for fed-batch fermentation: penicillin production. Comput. Chem. Eng. 2002, 26, 1553-1565.

29

ACS Paragon Plus Environment