Unsupervised-Multiscale-Sequential-Partitioning and Multiple-SVDD

Dec 3, 2018 - For the effective monitoring of batch processes with uneven multiphases, phase partitioning and discriminant analysis are two critical p...
0 downloads 0 Views 3MB Size
Article Cite This: Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

pubs.acs.org/IECR

Unsupervised-Multiscale-Sequential-Partitioning and MultipleSVDD-Model-Based Process-Monitoring Method for Multiphase Batch Processes Jianlin Wang,* Kepeng Qiu, Weimin Liu, Tao Yu, and Liqiang Zhao

Downloaded via TULANE UNIV on December 14, 2018 at 01:26:08 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China ABSTRACT: For the effective monitoring of batch processes with uneven multiphases, phase partitioning and discriminant analysis are two critical problems. To fully solve these two problems, a systematic strategy including fuzzy phase partitioning and hybrid discriminant analysis is proposed. First, using a new unsupervised, multiscale, sequential partition (UMSP), each batch is divided into phases with transitions using different clustering scales. On this basis, two multiplesupport-vector-data-description (SVDD) models are built for online phase partitioning and monitoring, and a hybriddiscriminant-analysis method is then developed for online fault detection. The effectiveness and advantages of the proposed method are illustrated with a 2D, handwritten example and a fed-batch penicillin-fermentation process.

1. INTRODUCTION Batch and semibatch processes are important manufacturing approaches.1,2 Because they can easily produce products according to customer requirements, they have become increasingly popular in important modern production areas, such as in the chemical, semiconductor, food, and biological industries.3−5 However, unavoidable process instability may cause faults and reduce product quality. Thus, online process monitoring is important for ensuring product quality and process security for batch processes. Because of the rapid development of data-acquisition systems and storage technology, data-driven multivariatestatistical-process-monitoring (MSPM) methods have become a topic of mainstream research. Many single-model MSPM methods6−9 based on multiway principal-component analysis (MPCA),10 multiway partial least squares (MPLS),11 multiway independent-component analysis (MICA), 12 and other methods have been put forward. In batch processes, different initial settings, multiple processing units, and complex process disturbances may result in numerous problems, such as multiple operation phases,13 unstable transitions,14 uneven lengths,15 and batch-to-batch variation.16 Because the simple single-model MSPM methods assume unchangeable processvariable correlations in the entire batch, they may produce inaccuracy in the resulting monitoring models. Thus, effective techniques for uneven multiphase online monitoring of processes that involve transitions are essential. The phase-partition methods that are currently in use can be divided into two main categories: (1) model identification and (2) cluster analysis. Model identification builds time-varying models using MSPM methods and achieves phase partitioning © XXXX American Chemical Society

on the basis of information extracted from the models. Sun et al.17 proposed a phase-partition method based on MPCA models, Wang et al.18 achieved phase partitioning by calculating the variation of features in MPCA models, Zhao and Sun19 proposed a sequential-stepwise-phase-partition (SSPP) algorithm to divide phases automatically according to changes in MPCA monitoring statistics, and Yan20 developed SSPP into a new iterative-two-step-sequentialphase-partition (ITSPP) method using the degradation degree. To solve the sequential phase partition for uneven batch processes, Zhao et al.21 built two PCA models: one was for phase partition, which analyzed the process correlation among uneven-length groups; the other was used for process monitoring, which realized online phase identification and process monitoring. On the basis of SSPP, Li et al.22 built phase models based on MPCA by arranging uneven batchprocess data and established online-monitoring strategies based on within-phase regions and between-phase shifting regions; Zhang et al.23 proposed a two-directional-concurrentmode-identification and sequential-phase-division strategy for multimode and multiphase batch-process monitoring. On the basis of a pseudo-time-slice construction and PCA, this strategy automatically identified each batch’s mode information along the batch direction and simultaneously determined the phase segments in the time sequence along the time direction. However, all of these methods are based on PCA, Received: Revised: Accepted: Published: A

June 3, 2018 October 3, 2018 December 2, 2018 December 3, 2018 DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

monitoring model is built using a historical batch data set of the same phase rearranged along the variable direction. Finally, for comparison, the KM, FCM, WKM, and SCFCM phasepartition methods are used to achieve phase partitioning of batch data. A 2D, handwritten example and a fed-batch penicillin-fermentation process are used to illustrate the effectiveness and the advantages of the proposed methods.

which assumes that the data distribution is Gaussian. Because non-Gaussian data distributions are common in batch processes, PCA-based phase-partition methods reduce the phase-partitioning and monitoring accuracy of batch processes. Cluster analysis mainly achieves phase partitioning by grouping batch data sets into clusters on the basis of a leastsquared-error criterion, a method that places no Gaussian limitation on the data. Lu et al.24 proposed a sub-PCA method in which the load matrix of MPCA models is clustered using kmeans (KM), Seng and Srinivasan25 employed a fuzzy c-means (FCM) clustering method to achieve phase partitioning, and Luo et al.26 used a warped K-means (WKM) algorithm that considers the sequentiality of batch data for phase partitioning. The above three clustering-based phase-partition methods produce unreasonable results because they do not consider the transition phases that occur between phases in batch processes. For this reason, Zhao et al.27 extended the sub-PCA method to soft-transition multiple PCA (STMPCA), which is used to calculate membership grades of each phase; Ng and Srinivasan28 used fuzzy c-means (FCM) to improve the interpretability of the transition phase in batch processes; Wang et al.29 proposed a two-step stage-division method that employs sub-PCA for steady-phase identification and supportvector data description (SVDD) for transition-phase identification; and Luo et al.30 added a hard sequentiality constraint to develop an FCM-sequence-constrained-fuzzy c-means (SCFCM) algorithm. However, three drawbacks of the above methods reduce the accuracy of phase partitioning in batch processes: (1) the phase-partition results may not function well because of improper input parameters, such as the number of clusters and the initial clustering centers; (2) the procedure of phase partitioning neglects the sequentiality of the data obtained during the batch process; and (3) the online phase identification still lacks theoretical guidance. Therefore, it is significant to develop more effective phasepartition methods that can automatically determine the optimal phase-partition results for batch processes based on process-data sequentiality. In this paper, a novel fuzzy phase-partition method and a hybrid-discriminant-analysis strategy are proposed for phase partitioning and monitoring in batch processes with multiple operation phases. First, an unsupervised-multiscale-sequentialpartition (UMSP) method is proposed for phase partitioning of batch processes. This method has four advantages: (1) UMSP ensures a reasonable phase-partition result; (2) UMSP can achieve different phase-partition results through different clustering scales; (3) the entire historical batch data set is divided into several phases using different clustering scales, and the optimal phase numbers of the batch process are achieved according to the sum of the quadratic error (SQE) and the partition-performance combination index (PPCI) calculated for different phase-partition results; and (4) on the basis of the optimal phase numbers, each historical batch data set is divided into the same number of phase partitions through different clustering scales, and the optimal partitioning result of the batch process is achieved according to SQE. Then, to arrange a new sample to the corresponding phase and judge whether the new sample is a fault, a hybrid-discriminantanalysis strategy is proposed for online phase-partitioning and process monitoring. On the basis of SVDD, the online phasepartition model is built using a historical batch data set of the same phase rearranged along a variable direction that only contains phase-sensitive variables, and the online process-

2. PRELIMINARIES To better analyze the relationship between the multiplesupport-vector-data-description (SVDD) modeling method and the SVDD modeling method, a brief introduction to the SVDD modeling method is provided in this section. 2.1. Support-Vector Data Description (SVDD). SVDD is a single-class classifier that attempts to find the minimum sphere that contains all (or most of) the data objects.31 Given a set of training data X = {x1, x2, ···, xn} with n samples based on the nonlinear transformation function Φ(·), the SVDD model is built with the following optimization objective function and constraints: n

min R2 + ε ∑ ξi R , a, ξ

i=1

s.t. [knorm(x i , a)]2 ≤ R2 + ξi ξi ≥ 0, i = 1, 2, ···, n

(1)

The term knorm(xi, a) refers to the kernel distance between the sample, xi ∈ X, and the sphere center, a, which is defined as follows: knorm(x i , a) = Φ(x i) − a ÄÅ n ÅÅ Å = ÅÅÅÅK (x i , x i) − 2 ∑ αiK (x i , x i) ÅÅ i=1 ÅÅÇ ÉÑ1/2 n n ÑÑÑ + ∑ ∑ αiαjK (x i , x j)ÑÑÑÑ ÑÑ i=1 j=1 ÑÑÖ

(2)

where kernel function K(xi, xi) = ⟨Φ(xi), Φ(xi)⟩ represents the nonlinear inner product in the high-dimensional space; R is the sphere radius, ξi(i = 1, 2, ···, n) are slack variables; and ε is a penalty term that can be calculated using the parameter D, which is defined in eq 3. D=

1 nε

(3)

To solve eq 1, the Lagrangian-multipliers data sets α = [α1, α2, ···, αn] and β = [β1, β2, ···, βn] are incorporated into eq 1: L(R , a , ξi , α , β) n

= R2 + ε ∑ ξi − n



i=1

n

∑ αi(R2 + ξi −

Φ(x i) − a 2 )

i=1

∑ βξi i i=1

αi ≥ 0, βi ≥ 0 (4)

Setting its partial derivatives with respect to R, a, and ξi to 0 results in B

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research n l o ∂L o o 0 = ⇒ αi = 1 ∑ o o ∂R o o i=1 o o o o o n o o ∂L m = 0 ⇒ a = ∑ αi x i o o ∂a o o i=1 o o o o o ∂L o o = 0 ⇒ ε − αi − βi = 0 o o o o n ∂ξi

heavily on the choice of design parameters, such as the number of clusters and the initial clustering centers; the other is that such clustering algorithms do not explore the sequential property of batch-process data. To overcome these drawbacks, an unsupervised-multiscale-sequential-partition (UMSP) method is proposed. The UMSP method consists of multiscale fuzzy clustering (MFC) and sequential phase partitioning (SPP). 3.1. Multiscale Fuzzy Clustering (MFC). Let X = {x1, x2, ···, xn} be the training data set that consists of n sequential samples (xi, i = 1, 2, ···n). MFC assumes that all the training data, X = {x1, x2, ···, xn}, are in the same cluster with cluster center v but that they may possess different membership degrees, U = [u1, u2, ···, un], where ui indicates the possibility of each datum, xi, belonging to this cluster. Let the objective function OMFC of MFC be

(5)

According to eq 5, eq 4 can be changed to its dual form: n

L=

n

n

∑ αiK (x i, x j) − ∑ ∑ αiαjK (x i, x j) i=1

(6)

i=1 j=1

The Lagrangian multipliers data set, α, can be achieved by maximizing eq 6. Generally, as defined by eq 7, the hypersphere radius, Ro(1 ≤ o ≤ n), equals the kernel distance from the center, a, to a sample, xo. n

a=

n

∑ αiΦ(x i)

OMFC(u , v) =

i=1

i=1

Ro = knorm(xo, a)

n

(7)

s.t. ∑ ui − 1 = 0

[Ri]ni=1,

To calculate the classification limit (Rlimit), R = which denotes the hypersphere-radius data set, is first sorted in descending order to R = [Ri]ni=1. A confidence, ω, is then introduced to place ω (%) of the training data inside the hypersphere. Finally, as shown in eq 8, the mean value of R1 to Rn0 (where n0 = ⌊n(1 − ω)⌋, and ⌊ · ⌋ rounds the elements to the nearest integers toward −∞) is taken as the classification limit, Rlimit. R limit =

∑ uim(di + η)2

where η is the so-called clustering scale, m ∈ [1, ∞) refers to the fuzzification degree, and di = ∥xi − v∥. As each term umi (di + η)2 in OMSC is proportional to (di + η)2, minimizing OMSC is equivalent to minimizing uim(di + η)2

n ∑i =0 1 R i

n0

(11)

The solution of eq 11 is effected using Lagrange multipliers. Therefore, let its Lagrangian be

(8)

ij n yz L(λ , u , v) = uim(di + η)2 − λjjjj∑ ui − 1zzzz j i z k {

In the classification process, to test a new data sample, xtest, the kernel distance knorm(xtest) should be calculated. The sample xtest belongs to the same class as the training data when this distance is smaller than or equal to the classification limit. knorm(x test) ≤ R limit

(10)

i

(12)

where (λ, u, v) is stationary for L only if ∇λ,ui,vL(λ, u, v) = 0.

(9)

SVDD can easily handle one-class classification problems, but it cannot handle online multiphase partitioning and monitoring. Thus, in Section 3, we propose an unsupervisedmultiscale-sequential-partition (UMSP) method for offline phase partitioning; the results of this offline partitioning are used to build multiple SVDD models that can be used for online phase identification and monitoring, as shown in Section 4. In addition, because the results of partitioning and monitoring may conflict, a hybrid-discriminant-analysis method for online fault detection is developed in Section 4.

Setting this gradient equal to 0 yields n ji zy ∂L = jjjj∑ ui − 1zzzz = 0 j i=1 z ∂λ k {

(13)

∂L = m × ujm − 1(dj + η)2 − λ = 0 ∂uj

(14)

∂L = ∂v

3. UNSUPERVISED, MULTISCALE, SEQUENTIAL PARTITIONING (UMSP) The fuzzy method is a clustering algorithm based on objective function. It iteratively corrects the cluster center and membership matrix by minimizing the sum of SQE. Finally, all the training data is divided into clusters corresponding to the maximum membership degree. There are two drawbacks to the use of traditional fuzzy clustering algorithms in batch processes that may lead to degraded segmentation performance. One is that the batch-segmentation results depend

n

∑ uim[−2(x i − v) − 2η] = 0 i=1

(15)

From eq 14, we obtain

ÄÅ ÉÑ1/(m − 1) ÅÅ ÑÑ λ ÅÅ ÑÑ ÑÑ uj = ÅÅÅ ÅÅ m(dj + η)2 ÑÑÑ ÅÇ ÑÖ

(16)

Using eq 13, eq 16 becomes C

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Ä ÑÉÑ2/(m − 1) ÑÑ 1 ÅÅ ÑÑ ÅÅ ÅÅ (dj + η) ÑÑÑ ÑÖ ÅÇ Ä l Å ÑÉÑ2/(m − 1)| n Å o 1/(m − 1)o o o ÅÅ ÑÑ o o 1 ij λ yz o o Å ÑÑ Å ∑ = jj zz m } Å Ñ o o Å Ñ o o η ( d ) + Å Ñ km{ o o j ÑÖ j=1 Å o o Ç n ~

Industrial & Engineering Chemistry Research 1/(m − 1)Å ÅÅ

∑ uj = ∑ ijjj λ yzzz j=1 j=1 k m { n

n

set, Y1 = {x1}, called the time-segment data set, is built, where the first data sample, x1, also indicates the first time-slice data; the next time-slice training data samples are added one by one to the existing time-segment data set (e.g., Y2 = {Y1, x2}). Step 2: Time-Segment-Data-Set-Based MFC. The MFC algorithm is performed on the time-segment data set. Step 3: Transition-Phase Partition. The time t for which the possibility ut of xt shows ut < τtransition is found, where τtransition is the transition partition factor. This means that the current time-slice training data, xt, have entered the transition phase. Step 4: Steady-Phase Partition. When the membership degree of the sample reaches a value less than τsteady, the current data may change from one steady phase to another. The stage-division criteria used to find the stable phase division are as follows:

(17)

=1

Thus,

jij λ zyz j z km{

1/(m − 1)

=

1 ÄÅ ÉÑ2/(m − 1) Å n 1 Ñ ∑ j = 1 ÅÅÅÅ (d + η) ÑÑÑÑ ÅÇ j ÑÖ

(18)

Applying eq 18 to eq 16 results in ui =

(di + n ∑ j = 1 (d j

(1) The MFC algorithm is performed on time-segment data set Yp. When membership degree up of sample xp shows up < τsteady, where τsteady is the steady partition factor, xp is replaced with xp+1 in a new data matrix, Yp′. (2) The MFC algorithm is performed on time-segment data set Yp′. When membership degree up′ of sample xp+1 shows up′ < τsteady, xp+1 is replaced with xp+2 in a new data matrix, Y′p′. (3) The MFC algorithm is performed on time-segment data set Yp′′. When membership degree up′′ of sample xp+2 shows u′p′ < τsteady, the steady-phase-division time is p.

η)2/(1 − m)

,∀i + η)2/(1 − m)

(19)

From eq 15, we obtain n

v=

∑i = 1 uim(x i + η) n

∑i = 1 uim

(20)

To calculate the membership, U, and the clustering center, v, of X, the mean center, Π, of sample X can be taken as the initial value of v. n

Π=

Step 5: Recursive Implementation. The time-segment data set is removed, and the left sequential data set of the batch process is employed as the new input data in Step 1. Step 1−4 are recursively repeated to find the following phases. The output is a phase partitioning of the sequential training data set along the time direction. The SPP procedure is shown in Figure 1. The detailed SPP procedure is described by Algorithm 2. In the UMSP method, there are two important parameters: clustering scale η and partition factor τ. By adjusting clustering scale η and partition factor τ, the influence of the data outliers and the noise can be reduced, so that the UMSP method can adaptively determine the optimal number of clusters and

∑i = 1 x i n

(21)

Although U and v can be solved using eqs 19 and 20, it is difficult to directly obtain analytical solutions using these equations. Thus, we have given an iterative-solution method for which the pseudocode is shown in Algorithm 1.

3.2. Sequential Phase Partitioning (SPP). As shown in Section 3.1, the proposed MFC assumes that all of the training data, X = {x1, x2, ···, xn}, are in the same cluster. The cluster center and the possibility ui of each datum (xi, i = 1, 2, ..., n) belonging to this cluster can be calculated. To fully consider the sequentiality of batch-process data, a sequential-phasepartition (SPP) method is proposed and the basic procedure of SPP is described as follows. Step 1: Data Preparation. From the beginning of a sequential data set of a batch process, a sequential training data

Figure 1. Sequential-phase-partitioning procedure. D

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 2. Data normalization.

4.2.1. Global Phase Partitioning Using UMSP. Because each historical batch may have a different length, an unevenlength problem occurs. Thus, as shown in Figure 3, each set of obtain more accurate partition results. Therefore, the UMSP method has higher robustness. The computational complexities of MFC and SPP are O(nc2vm) and O(n2c2vm), respectively. Therefore, the overall computational complexity of UMSP is O(nc2vm + n2c2vm), where n is the number of samples, v is the feature dimension of the sample, m is the number of iterations, and c is the number of clusters. The UMSP method can obtain more accurate partition results without relying on the number of clusters or the initial cluster center.

Figure 3. Data unfolding.

4. MSVDD-BASED BATCH-PROCESS MONITORING In this section, on the basis of the proposed UMSP and on traditional SVDD, the methodology of UMSP-SVDD for batch-process monitoring is explained in detail. First, data normalization is introduced. Second, the use of UMSP for offline phase partitioning is discussed. Third, the method used to build monitoring models and phase-partition models of different phases using SVDD is explained. Finally, real-time online phase partitioning and monitoring for batch processes are introduced. 4.1. Data Normalization. Batch-process data are usually built as a trio-component batch data set, {Xi(J × Ki)}Ii=1, where I is the batch number, J is the variable number, and Ki is the number of sampling times of the ith batch. Data normalization is shown in Figure 2. In addition, the mean values and standard-variance values of historical data sets are used for online data normalization. 4.2. Offline Phase Partitioning Using UMSP. Because each operation phase of a batch process usually involves multiple operating phases, it is better to divide the batch process into phases for better monitoring. It should be noted that not all batch-process variables are sensitive to phase changes. The presence of process variables that are insensitive to phase changes may decrease the accuracy of phase partitioning if these variables are assumed to fluctuate. Thus, only phase-sensitive process variables are addressed in phase partitioning. In this subsection, offline phase partitioning involves two tasks: (1) global phase partitioning is used to find the optimal phase numbers of the batch process, and (2) local phase partitioning is used to partition each historical batch.

batch data X̅ i(J × Ki) extracted from {X̅ i(J × Ki)}Ii=1 with different sampling durations (Ki) is expanded to have the same length (K′) by duplicating the last sample and sample-wise′ rearranged to X̃ (IJ × K′) = {xk(J × 1)}Kk=1 , where K′ is the maximum sampling time of all the historical batches. As stated in Section 3, using UMSP with clustering scale η, X̃ s(IJs × K′) containing phase-sensitive variables extracted from X̃ is divided into Cη (1 ≤ Cη ≤ Kmax) steady phases. Its cth (1 ≤ c+1−1 c ≤ Cη) phase’s data is denoted X̃ cs = {x̃k(IJs × 1)}bk=b , where bc c is the left boundary of the cth phase. With a different clustering scale, η, it is possible that X̃ s may be divided into the same number of phases but that the phasepartitioning results will differ. Thus, the sum of the quadratic error (SQE) is used to evaluate the phase-partitioning results obtained for the same phase number. The SQE index, Gη, of X̃ s with clustering scale η is defined as ij bc+1− 1

yz zz zz j c = 1 k k = bc { The cth-phase center, μc, is calculated as

∑ jjjjj ∑ Cη

Gη =

μc =

1 bc + 1 − bc

x̃k − μc

2z z

(22)

bc + 1− 1

∑ k = bc

x̃ k (23)

When X̃ s is divided into the same number of phases with a different clustering scale, η, the different phase-partitioning results are used to calculate the SQE index. The corresponding phase-partitioning result with the smallest SQE is the optimal phase-partitioning result with η under a certain phase number. E

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 4. Illustration of offline batch-process modeling using SVDD.

When the ith batch is divided into the same number of phases using a different clustering scale, η, the different phasepartitioning results are used to calculate the SQE index. The phase-partitioning result with the smallest SQE is the optimal phase-partitioning result with η under l′ phases. After the optimal clustering scale, η, of different historical batches is achieved, the transition phase is achieved by UMSP. In this way, the phase-partitioning results of all historical batches can be obtained. 4.3. Offline Batch-Process Modeling Using SVDD. As stated in Section 2, SVDD is a type of one-class classifier. If the SVDD model is built on the basis of the historical normal data of a batch process, it can determine whether the online data are normal or abnormal; if the SVDD model is built on the basis of data from one phase of the batch process, it can determine whether or not the online data belong to that phase. As shown in Figure 4, a historical batch data set of the same phase, {X̲ ic (K ic × Js )}iI , is rearranged to X̂ c(N′c × J) along the variable direction, and a historical batch data set containing phase-sensitive variables of the same phase, {X̲ ic (K ic × Js )}iI , is rearranged to X̑ c(Nc′ × Js) along the variable direction; I Nc′ = ∑i = 1 K ic , X ic(K ic × J ) denotes the ith normalized batch data set of the cth phase, and X ic(K ic × Js ) denotes the ith normalized batch data set of the cth phase containing only phase-sensitive variables. The cth SVDD monitoring model is built with data matrix X̂ c ′c = {x̂k(1 × J)}Nk=1 , and the hypersphere radius, R̂ ck, and center, âc, of each sample x̂k are calculated as follows on the basis of the modeling procedure of SVDD given in Section 2.1:

After the optimal phase-partitioning results under different phase numbers are obtained, a partition-performance combination index (PPCI), with partition number l and corresponding clustering scale ηl, is defined as PPCIl = Ĝ l + l ̂

(24)

with G̃ − mean(G̑ ) , Ĝ l = l std(G̑ )

l̂ =

l − mean(L̃) std(L̃)

G̃ l = log(Gηl)

(25) (26)

where G̑ = L̃ = and L is the largest phase number. The smallest PPCI is the optimal phase number, l′. In this way, the optimal phase number (l′) and the phase-partitioning result of the batch process are achieved. 4.2.2. Local Partitioning Using UMSP. After the optimal phase number of the batch process is determined, each set of i normalized batch data, X̅ i(J × Ki) = {xk̅ (J × 1)}Kk=1 , should be divided into l′ phases. As stated in Section 3, using UMSP with clustering scale η, let X̆ = {X̅ i(Js × Ki)}Ii=1 donate a batch data set containing phase-sensitive variables. The ith normalized set of batch data, X̅ i(Js × Ki), extracted from X̆ s is divided into l′ steady phases. Its cth (1 ≤ c ≤ l′)-phase data are denoted X̆ ci = {x̆k(Js × b̅i,c+1−1 , where b̅i,c is the left boundary of the cth phase. 1)}k=b̅ i,c With a different clustering scale, η, it is possible that the ith batch may be divided into l′ phases but that the phase-partition results will differ. Thus, the sum of the quadratic error (SQE) is used to evaluate the phase-partitioning results obtained using the same phase number. The SQE index, G̅ i,η, of the ith batch with clustering scale η is defined as [G̃ l]Ll=1,

[l]Ll=1,

ij bi̅ ,c+1− 1 j ∑ jjjjj ∑ x̆k − μi̅ ,c c = 1 j k = bi̅ , c k l′

Gi̅ , η =

yz zz zz {

Nc′

aĉ =

k=1

2z zz

c R̂ k

1 bi̅ , c + 1 − bi̅ , c

= knorm(x̂k , aĉ )

bi , c + 1− 1

∑ k = bi , c

(29)

c′ R̂ c = [R̂ ck]Nk=1 , denoting the hypersphere-radius data set, is sorted Nc′ . The monitoring limit, in descending order to R̃ c = [R̃ k]k=1 ̂Rclimit, is calculated as shown in eq 30.

(27)

The cth-phase center, μ̅ i,c, is calculated as μi̅ , c =

∑ αk̂ c Φ(x̂k)

⌊(1 − ω)N ′⌋

x̆k

c

R̂ limit =

(28) F

∑k = 1

R̃ k

⌊(1 − ω)N ′⌋

(30) DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 5. Procedure of offline modeling and online monitoring.

The cth SVDD phase-partition model is built with data matrix c′ X̑ c = {x̑k(1 × Js)}Nk=1 , and the hypersphere radius, Ȓ ck, and center, ȃc, of each sample x̑k are calculated as follows on the basis of the modeling procedure of SVDD given in Section 2.1:

⌊(1 − ω)N ′⌋

c

Ȓ limit =

∑ αk̑ c Φ(x̑k) k=1

c Ȓ k

= knorm(x̑k , ac̑ )

R̅ k

⌊(1 − ω)N ′⌋

(32)

4.4. Online Phase Identification and Process Monitoring Using a Hybrid-Discriminant-Analysis Strategy. To monitor online data sample xnew, xnew is first normalized to x̅new. The distances from x̑new to the centers of hypersphere in the cth SVDD phase-partition models are calculated on the basis of eq 7, where x̑new = x̅new(1 × Js) denotes the sample with only phase-sensitive variables.

Nc′

ac̑ =

∑i = k

(31)

Nc′ Ȓ c = [Ȓ̑ ck]k=1 , which denotes the hypersphere-radius data set, is

c′ sorted in descending order to R̅ c = [R̅ k]Nk=1 . The phase-partition c ̑ limit, Rlimit, is calculated as shown in eq 32.

Dist cp(x̑ new ) = knorm(x̑ new , ac̑ ) G

(33) DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 6. A 2D flower graphic. (a) Original graphic, (b) discrete graphic, and (c) optimal clustering results.

Figure 7. Clustering results for the 2D flower graphic using (a) KM, (b) FCM, (c) WKM, (d) SCFCM, and (e) UMSP methods.

The distances from xn̅ ew to the centers of the hyperspheres in the cth SVDD monitoring models are calculated on the basis of eq 7. Distmc (x̅ new )

= knorm(x̅ new , aĉ )

When the online data sample xnew belonging to the hth phase is normal, it shows

(34)

h

Distmh (x̅ new ) ≤ R̂ limit

The online data sample xnew belongs to the hth phase if h

(36)

Dist hp(x̑ new ) ≤ Ȓ limit

In this paper, a hybrid-discriminant-analysis strategy for online

c Ȓ limit

phase identification and process monitoring is proposed as

Dist cp(x̑ new )

>

c = 1, 2, ···, l′, c ≠ h

(35)

follows:

h c | o Dist hp(x̑ new ) ≤ Ȓ limit , Dist cp(x̑ new ) > Ȓ limit o o o o o o h m → x new is normal } o Disth (x̅ new ) ≤ R̂ limit o o o o o o c = 1, 2, ···, l′, c ≠ h ~

h c | o Dist hp(x̑ new ) ≤ Ȓ limit , Dist cp(x̑ new ) > Ȓ limit o o o o o o h m → x new is faulty } o Disth (x̅ new ) > R̂ limit o o o o o o c = 1, 2, ···, l′, c ≠ h ~

h h+1 c | o Dist hp(x̑ new ) ≤ Ȓ limit , Dist hp+ 1(x̑ new ) ≤ Ȓ limit , Dist cp(x̑ new ) > Ȓ limit o o o o o o h h+1 m m → phase change } o Disth (x̅ new ) > R̂ limit , Disth + 1(x̅ new ) ≤ R̂ limit o o o o o o c = 1, 2, ···, l′, c ≠ h , h + 1 ~

(37)

(1)

As shown in eq 37, for an online data sample, xnew, the distance Distpc (x̑new) and the distance Distmc (x̅new) are calculated for online phase identification and process monitoring. First, if Distph(x̑new) ≤ Ȓ hlimit, and Distpc (x̑new) > Ȓ climit, xnew belongs to the hth phase. Then, the distance Distmh (x̅new) from x̅new to the centers of the hyperspheres in the hth SVDD monitoring model is calculated; if Distmh (x̅new) ≤ R̂ hlimit, xnew is normal; otherwise, xnew is faulty. Finally, if the following occurs:

h h+1 Dist hp(x̑ new ) ≤ Ȓ limit , Dist hp+ 1(x̑ new ) ≤ Ȓ limit , c

Dist cp(x̑ new ) > Ȓ limit

(2) h h+1 Distmh (x̅ new ) > R̂ limit , Distmh + 1(x̅ new ) ≤ R̂ limit

H

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research This case indicates that xnew belongs to the transition phase between the phase h and phase h + 1. Through the hybriddiscriminant-analysis strategy, the phase identification and process monitoring of xnew can be realized. The procedure of MSVDD-based batch-process monitoring is illustrated in Figure 5.

Table 2. Fault Description of the Penicillin Process fault-batch no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

5. ILLUSTRATIVE EXAMPLES AND DISCUSSION In this section, a 2D, handwritten example and a fed-batch penicillin-fermentation process32 are used to demonstrate the superiority of the proposed algorithm. 5.1. 2D, Handwritten Example. The 2D flower graphic with a sequential path shown in Figure 6a is digitized to Figure 6b. Figure 6c is plotted on the basis of the optimal clustering results. For comparison, KM, FCM, WKM, and SCFCM methods are used to achieve the clustering results of the flower sample. The clustering results of the 2D flower graphic obtained using the KM, FCM, WKM, SCFCM, and UMSP methods are shown in Figure 7. On the basis of the optimal clustering results of the 2D flower graphic shown in Figure 6c, the clustering accuracies of the various methods are listed in Table 1.

petal 1 petal 2 petal 3 scape root average accuracy

KM 0.70 0.82 0.68 0.59 1.00 0.76

FCM 0.85 0.95 0.65 0.59 1.00 0.81

WKM 0.75 0.82 0.65 1.00 0.00 0.64

SCFCM 1.00 0.82 0.61 0.65 1.00 0.82

aeration rate aeration rate aeration rate aeration rate aeration rate aeration rate aeration rate aeration rate aeration rate aeration rate aeration rate aeration rate agitator power agitator power agitator power agitator power agitator power agitator power substrate feed rate

type

magnitude

fault duration (h)

ramp ramp ramp ramp ramp ramp ramp ramp step step step step step step step step ramp ramp ramp

+0.0002 L/h +0.05 L/h +0.05 L/h −0.0002 L/h +0.0002 L/h +0.05 L/h −0.005 L/h −0.01 L/h −0.5% +0.3% −0.3% +0.5% −0.2% +0.5% −0.5% +0.5% +0.002 W/h +0.005 W/h +0.001 L/h

1−50 46−55 81−90 10−110 100−150 161−170 201−300 201−300 1−50 20−70 20−70 51−150 1−50 1−50 21−100 61−150 21−100 101−200 201−400

Table 3. Variables Used for Process Monitoring

Table 1. Clustering Accuracies of Different Methods graphic part

variable name

UMSP

no.

no.

process variable

1.00 1.00 0.81 0.94 0.92 0.93

1

aeration rate (L/h)

6

2 3 4

agitator power (W) substrate feed rate (L/h) dissolved-oxygen concentration (%) culture volume (L)

7 8 9

carbon dioxide concentration (mmol/L) pH substrate feed temperature (K) generated heat (kcal/h)

10

cold-water flow rate (L/h)

5

process variable

5.3. Global-Phase-Partition Results. As described in Section 4.2.1, optimal global phase partitioning of data set A is used to achieve optimal steady-phase numbers of the batch process. A set of phase-clustering scales, η, ranging from 0.000 01 to 0.01 and a set of steady partition factors, τsteady, ranging from 0.000 01 to 0.01 are tested with the fuzzifier m = 2, where a step of η is 0.000 01, and a step of τsteady is 0.000 01. The optimal SQE values are plotted in Figure 8a, and the PPCIs of different phase numbers are plotted in Figure 8b. The smallest PPCI indicates the optimal phase number is 3. 5.4. Local-Phase-Partition Results. After the optimal phase number is determined as described in Section 4.2.2, local phase partitioning of data set A is used to achieve optimal phase-partitioning results for 20 training batches that included steady phases and transition phases. To select an optimal phase-partitioning result, a set of phase-clustering scales, η, ranging from 0.0001 to 0.05 and a set of steady partition factors, τsteady, ranging from 0.000 01 to 0.001 are tested with fuzzifier m = 2, where a step of η is 0.0001, a step of τsteady is 0.000 01, and τtransition = 0.1 × τsteady. As shown in Table 4, the optimal phase-partitioning results of the 20 batches differ because the 20 batches in data set A are generated under different conditions. It shows that the batch process has a preculture mode and a fed-batch mode. The boundary between the two modes occurs at approximately 45 h, but the time at which the boundary occurs differs for each batch. The possible physical interpretations of the steady and transition phases are given in Table 5.

As shown in Figure 7a,b, the KM and FCM clustering algorithms do not address sequential information; therefore, the sample points are ill-defined, leading to inconsistent clustering. Although the clustering results obtained using WKM and SCFCM are consistent, they do not directly cluster the data by analyzing the sequentiality of the data. Therefore, WKM and SCFCM produce unsatisfying clustering results, as shown in Figure 7c,d. However, the proposed UMSP directly copes with the sequentiality of data and achieves a good clustering result. 5.2. Fed-Batch Penicillin-Fermentation Process. The modular simulator (PenSim v2.0) of fed-batch penicillin fermentation is adopted in this paper to demonstrate the superiority of the UMSP-SVDD monitoring method. Three data sets are generated by PenSim: Data set A, containing 20 batches, is generated under normal operating conditions; this data set is used for global phase partitioning, local phase partitioning, and offline modeling. Data set B is a testing data set containing five batches that is generated under normal operating conditions; this data set is used for online phase identification and online monitoring. Data set C is a testing data set containing 19 batches that is generated by introducing faults with different conditions; this data set is used for online phase identification and online monitoring. Detailed information on data set C is provided in Table 2. The duration of each batch is 400 h, and the sampling time is 1 h. The process variables used for monitoring are listed in Table 3; six of the variables (Nos. 3−7 and 10) are phasesensitive process variables used for phase partitioning. I

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 8. (a) SQE and (b) PPCI values corresponding to different phase numbers of the global-partition results.

Table 4. Phase-Partitioning Results for 20 Batches in Data Set A phase batch no.

η

τsteady

τtransition

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0.0098 0.0097 0.0093 0.0103 0.0102 0.0106 0.0126 0.0099 0.0101 0.0124 0.0104 0.0115 0.0125 0.0093 0.0097 0.0124 0.0099 0.0105 0.0109 0.0101

0.0001

0.001

1−17 1−17 1−17 1−17 1−17 1−17 1−17 1−17 1−17 1−17 1−21 1−21 1−21 1−17 1−9 1−17 1−17 1−17 1−17 1−21

18−30 18−31 18−34 18−30 18−31 18−32 18−30 18−30 18−30 18−27 22−49 22−32 22−38 18−31 10−28 18−31 18−31 18−31 18−31 21−36

31−32 32−35 35−39 31−32 32−35 33−37 31−32 31−32 31−32 28−31 50−53 33−37 39−43 32−35 29−32 32−35 32−35 32−35 32−35 37−41

33−43 36−46 40−49 33−43 36−43 38−47 33−43 33−43 33−43 32−39 54−64 38−47 44−56 36−45 33−40 36−44 36−46 36−44 36−47 42−53

44−80 47−89 50−86 44−89 44−80 48−115 44−115 44−80 44−81 40−89 65−144 48−93 57−179 46−78 41−89 45−131 47−111 45−89 48−109 54−133

81−400 90−400 87−400 90−400 81−400 116−400 116−400 81−400 82−400 90−400 145−400 94−400 180−400 79−400 90−400 132−400 112−400 90−400 110−400 134−400

Table 5. Physical Interpretation of Fed-Batch Penicillin-Fermentation Process operation mode preculture mode

operation phase 1 2 3 4

fed-batch mode

5 6

physical interpretation slow biomass growth early exponential biomass growth stable exponential biomass growth later exponential biomass growth early exponential penicillin production later exponential penicillin production

process feature cells begin to grow, consuming substrate and oxygen biomass is rapidly forming; substrate and oxygen concentrations decrease rapidly rate of biomass generation and rate of substrate and oxygen consumption are stable rate of biomass generation and rate of substrate and oxygen consumption decrease; very little penicillin production rapid production of penicillin; small increase in biomass; oxygen concentration and substrate concentration are stable penicillin-production rate begins to decrease, slowly coming to a stop; small increase in biomass; oxygen and glucose concentrations are stable

shown in Figure 10, FCM, SCFCM, and UMSP separate the transition phases from the steady phases, but they produce different phase-partition results: (1) the phase-partition results of FCM are difficult to interpret physically; (2) SCFCM achieves phase-partition results that are based on the similarity of adjacent data rather than on the sequential changes in the

As shown in Figure 9, the boundary between the preculture mode and the fed-batch mode of training batch no. 3 occurs at approximately 49 h. As shown in Table 6, although the KM and WKM methods identify the boundary between the preculture mode and the fed-batch mode, they are unable to separate the transition phases from the steady phases. As J

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 9. Steady-phase-partitioning results for batch no. 3 obtained using (a) KM, (b) WKM, (c) FCM, (d) SCFCM, and (e) UMSP.

results according to the sequential changes in the process data; as a result, the results have possible physical interpretations, as shown in Table 5. 5.5. Online-Phase-Identification and Online-Monitoring Results. In this subsection, the phase-partitioning and monitoring models are built on the basis of SVDD as described in Section 4.3, the parameters of the models are discussed, and the optimal parameters are searched from the parameter grids according to the best partitioning and monitoring results. 5.5.1. Explanation of the Parameters Used in SVDD. When training the SVDD model, K(x, x) and C should be tuned. The widely used radial-basis kernel function (RBF) is chosen as the kennel function of SVDD; it is defined as shown in eq 38.

Table 6. Phase-Partitioning Results Obtained for Batch No. 3 Using Five Methods phasedivision method

separates the transition phases from the steady phases

KM21 WKM23 FCM25

no no no yes

SCFCM27

no yes

UMSP

no yes

phase-partition results 1: 1−34, 2: 35−49, 3: 50−400 1: 1−34, 2: 35−49, 3: 50−400 1: 1−37, 2: 38−82, 3: 83−400 1: 1−30, 2: 31−53, 3: 54−59, 4: 60−109, 5: 110−400 1: 1−40, 2: 41−103, 3: 104−400 1: 1−32, 2: 33−49, 3: 50−104, 4: 105−122, 5: 123−400 1: 1−34, 2: 35−49, 3: 50−400 1: 1−17, 2: 18−34, 3: 35−39, 4: 40−49, 5: 50−86, 6: 87−400

K (x i , x j) = exp( − x i − x j 2 /2σ 2)

(38)

Parameter ω in eq 8 is set to 0.01. According to the crossvalidation strategy, the optimal σ and D are set from the

process data, and SCFCM cannot distinguish the phases in preculture mode; (3) UMSP directly achieves phase-partition

Figure 10. Steady-phase- and transition-phase-partitioning results obtained for batch no. 3 using (a) FCM, (b) SCFCM, and (c) UMSP. K

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research Table 7. Phase-Partitioning Results for Five Batches in Data Set B phase-partitioning results obtained using UMSP

phase-partitioning results obtained using SVDD models phase

no.

1

2

3

4

5

6

1

2

3

4

5

6

1 2 3 4 5

1−17 1−17 1−17 1−21 1−9

18−30 18−32 18−30 22−38 10−28

31−32 33−37 31−32 39−43 29−32

33−43 38−47 33−43 44−56 33−40

44−80 48−115 44−115 57−179 41−89

81−400 116−400 116−400 180−400 90−400

1−17 1−17 1−17 1−21 1−9

18−30 18−32 18−30 22−38 10−28

31−32 33−37 31−32 39−43 29−32

33−43 38−47 33−43 44−56 33−40

44−81 48−115 44−115 57−182 41−89

82−400 116−400 116−400 183−400 90−400

Figure 11. Accuracy of (a) phase partitioning and (b) process monitoring with different parameters.

Figure 12. Phase-partitioning results for batch No. 5 in data set B obtained using SVDD models.

5.5.3. Online Phase Identification. UMSP is used to obtain phase-partition results for the batches in data set B; these are listed in Table 7. The phase-partition results are used to evaluate the accuracy of the SVDD phase-partition models. The SVDD online phase-identification models are built on the basis of the local phase-partition results for data set A, and the parameters σ and D of the SVDD models are set from grids {δ/64, δ/32, δ/16, δ/8, δ/4, δ/2, δ} and {0.01, 0.05, 0.1, 0.2, 0.5}, where δ is the average standard deviation of all variables. All the online phase-identification models are used with different parameters to achieve phase-identification results for each batch in data set B. The average accuracies of the phaseidentification models under different parameters are shown in Figure 11a. When parameter σ is set as δ/8, and D is set as 0.05, the best phase-identification result displays an accuracy of 99.62%. The best phase-identification results are listed in Table 7; the results for batch no. 5 are shown in Figure 12.

parameter grids according to the best phase-partitioning and monitoring results. 5.5.2. Evaluation Indicators. The geometric-mean accuracy, g, is used to evaluate the accuracy of phase-partitioning and process monitoring, as shown in eqs 39−41. g=

a + × a−

a+ =

target samples correctly classified × 100% target samples

(39)

= 1 − missed‐fault rate = fault‐detection rate (FDR)

a− =

(40)

nontarget samples correctly classified × 100% nontarget samples

= 1 − false‐alarm rate (FAR)

(41) L

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research Table 8. Process-Monitoring Results for 19 Batches in Data Set C monitoring method: SVDD phase-partition method: KM

phase-partition method: FCM

phase-partition method: WKM

phase-partition method: SCFCM

phase-partition method: UMSP

batch no.

a+

a−

g

a+

a−

g

a+

a−

g

a+

a−

g

a+

a−

g

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 average

0.43 1.00 1.00 0.66 0.06 1.00 0.98 0.99 1.00 1.00 1.00 1.00 0.42 0.94 0.04 0.03 0.36 0.63 0.24 0.67

1.00 0.99 0.99 1.00 0.99 0.99 0.99 0.99 1.00 1.00 1.00 0.99 1.00 1.00 1.00 0.99 1.00 0.99 0.99 0.99

0.65 0.99 0.99 0.81 0.24 0.99 0.98 0.99 1.00 1.00 1.00 0.99 0.65 0.97 0.19 0.18 0.60 0.79 0.49 0.76

0.31 1.00 1.00 0.64 0.30 0.99 1.00 1.00 1.00 0.98 0.84 1.00 0.28 0.80 0.05 0.22 0.38 0.89 0.73 0.71

1.00 0.99 0.99 1.00 1.00 0.99 0.99 0.99 1.00 1.00 1.00 0.99 1.00 1.00 1.00 0.99 1.00 0.99 0.99 0.99

0.55 0.99 0.99 0.80 0.55 0.99 0.99 0.99 1.00 0.99 0.92 0.99 0.53 0.89 0.22 0.47 0.61 0.94 0.85 0.80

0.29 1.00 1.00 0.13 0.32 1.00 0.99 0.99 1.00 1.00 0.90 1.00 0.26 0.66 0.10 0.17 0.16 0.10 0.69 0.62

0.96 0.94 0.94 0.95 0.98 0.94 0.93 0.92 0.96 0.96 0.96 0.97 0.96 0.96 0.96 0.97 0.96 0.96 0.89 0.95

0.52 0.97 0.97 0.35 0.56 0.97 0.96 0.96 0.98 0.98 0.93 0.99 0.50 0.80 0.31 0.40 0.39 0.31 0.78 0.72

0.43 1.00 1.00 0.79 0.40 1.00 0.99 1.00 1.00 1.00 1.00 1.00 0.40 0.92 0.70 0.74 0.71 0.89 0.72 0.83

0.79 0.82 0.82 0.91 0.81 0.80 0.73 0.73 0.79 0.84 0.84 0.93 0.79 0.79 0.91 0.90 0.92 0.82 0.60 0.82

0.58 0.90 0.90 0.85 0.57 0.89 0.85 0.85 0.89 0.92 0.92 0.97 0.56 0.85 0.80 0.82 0.81 0.85 0.65 0.81

0.53 1.00 1.00 0.80 0.24 0.99 0.99 1.00 1.00 1.00 1.00 1.00 0.66 1.00 0.20 0.26 0.66 0.85 0.60 0.78

1.00 0.97 0.98 1.00 0.97 0.97 0.97 0.97 1.00 1.00 1.00 0.97 1.00 1.00 1.00 0.97 1.00 0.97 0.96 0.98

0.73 0.99 0.99 0.89 0.48 0.98 0.98 0.99 1.00 1.00 1.00 0.99 0.82 1.00 0.45 0.50 0.81 0.91 0.75 0.86

Figure 13. Process-monitoring results for batch no. 1 based on (a) KM, (b) FCM, (c) WKM, (d) SCFCM, and (e) UMSP.

5.5.4. Online-Process-Monitoring Results. The online phase-identification models are used to achieve phaseidentification results for each batch in data set C. The SVDD monitoring models are built on the basis of the optimal local phase-partition results of data set A, and parameters σ and D of the SVDD monitoring models are set from the grids {δ/ 64, δ/32, δ/16, δ/8, δ/4, δ/2, δ} and {0.01, 0.05, 0.1, 0.2, 0.5}. On the basis of the hybrid-discriminant-analysis strategy described in Section 4.4, all the online process-monitoring models are used with different parameters to achieve processmonitoring results for each batch in data set C; the average accuracy of process monitoring with different parameters is shown in Figure 11b. When parameter σ is set as δ/8, and D is set as 0.01, the process-monitoring result has the highest accuracy (90.8%). The best process-monitoring results are shown in Table 8, and the process-monitoring results for batch No. 1 are shown in Figure 13. 5.5.5. Accuracy Comparison. In this subsection, the phasepartitioning results obtained using the four methods (KM, WKM, FCM, and SCFCM) are integrated with SVDD to achieve process monitoring for comparison. The phasepartitioning procedures used in each method are summarized as follows: (1) KM is used to achieve global phase-partition results. The online data sample is assigned to different phases according to the offline partition results.

(2) FCM is used to achieve global phase-partition results. The online data sample is assigned to different phases according to the offline partition results. (3) WKM is used to achieve local phase-partition results. The online data sample is assigned to different phases according to the online identification method. (4) SCFCM is used to achieve global phase-partition results. The online data sample is assigned to different phases according to the offline partition results. According to the above phase-partitioning results, SVDD monitoring models are built and used to achieve online process monitoring. The parameters of the SVDD monitoring models are set from the parameter grid, and the optimal parameters are identified on the basis of the highest monitoring accuracy. All tests are run five times; the mean values are listed in Table 8, and the process-monitoring results for batch no. 1 are shown in Figure 13. Table 8 compares the values of a+, a− ,and g from the five methods for 19 batches in data set C. UMSP-SVDD has the highest mean value of g. Although SCFCM-SVDD has the highest mean value of FDR, it also has the highest mean value of FAR. The KM-SVDD, FCM-SVDD, and UMSP-SVDD methods produce almost no false alarms, and WKM-SVDD produces a few false alarms. For most of the fault batches, the methods that consider the transition phases, such as FCMM

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

outstanding advantages. First, UMSP can deal directly with the sequentiality of batch data. Second, it can achieve phasepartitioning results without any prior knowledge of phase centers or numbers. Third, UMSP can achieve different phasepartitioning results by using different clustering scales; the optimal phase-partitioning result is achieved according to the SQE and the PPCI index. Fourth, UMSP can separate transition phases from main phases. On the basis of optimal phase-partitioning results, phase-partitioning models and process-monitoring models are built using SVDD. In addition, a hybrid-discriminant-analysis strategy can be used for online phase identification and process monitoring. The proposed UMSP method has been tested with a 2D, handwritten example and a fed-batch penicillin-fermentation process. In both test examples, the advantages of the proposed method are fully demonstrated. In addition, the results obtained using the UMSP-SVDD approach proposed in this paper were compared with those obtained using the KMSVDD, FCM-SVDD, WKM-SVDD, and SCFCM-SVDD approaches. The UMSP phase-partition method successfully identifies the main phases and separates the transition phases from the corresponding main phase. The online phaseidentification models can assign all online samples to specific phases with high accuracy, and the online process-monitoring models display high monitoring accuracy in most of the test batches. The reason that UMSP achieves reasonable phasepartitioning results is that it fully captures the sequential characteristics of batches and achieves its phase-partitioning results without requiring prior knowledge that could introduce bias. It is envisaged that the proposed approach will have an impact in the process industry.

SVDD, SCFCM-SVDD, and UMSP-SVDD, have higher FDRs than the methods that consider only the main phases. KM and FCM are based on global phase partitioning, and the online data sample is simply assigned to different phases on the basis of the offline partition results; however, FCM can separate the transition phases from the main phases, whereas KM cannot. Neither KM nor FCM considers the sequential characteristics of the batch process; thus, the poor phasepartitioning results may compromise the monitoring results. When monitoring the faults in batches 5, 16, 18, and 19, FCMSVDD has higher monitoring accuracy than KM-SVDD because the faults of these batches are mostly introduced in both the transition and main phases. However, when monitoring fault batches 1, 13, and 14, KM-SVDD has higher monitoring accuracy than FCM-SVDD because the faults of these batches are mostly introduced in the first two main phases. WKM and SCFCM, which are improved methods of KM and FCM, respectively, consider the sequential characteristics of the batch process. However, WKM-SVDD yields a higher FAR than KM-SVDD, and WKM-SVDD yields a higher FDR than KM-SVDD only when monitoring fault batches 5, 15, 16, and 19. In addition, SCFCM-SVDD has a higher FDR than FCM-SVDD when monitoring most of the fault batches, but it has a higher FAR than FCM. Although WKM and SCFCM both add a hard sequentiality constraint for batch data sequentiality, the phase-partition results rely on the performance of KM and FCM. UMSP deals directly with the sequentiality of batch data and achieves sequential phase-partitioning results. Because it achieves phase-partitioning results according to the sequential changes in the data, the phase partitioning produced by UMSP is reasonable. Because the faults of fault batches 1, 4, 13, 14, and 16−18 are mainly introduced in the preculture mode, UMSP-SVDD has higher monitoring accuracy than KMSVDD, FCM-SVDD, WKM-SVDD, and SCFCM-SVDD when monitoring these fault batches. In addition, UMSP-SVDD has higher monitoring accuracy than KM-SVDD, FCM-SVDD, WKM-SVDD, and SCFCM-SVDD when monitoring other fault batches. To further demonstrate the advantages of the proposed UMSP methods, the process-monitoring results for batch no. 1 obtained using KM-SVDD, FCM-SVDD, WKM-SVDD, SCFCM-SVDD, and UMSP-SVDD are shown in Figure 13. KM-SVDD, FCM-SVDD, and UMSP-SVDD produce no false alarms; WKM-SVDD produces a few false alarms when phase 2 changes to phase 3; and SCFCM-SVDD produces many false alarms. The false alarms occur mainly because of improper phase-partitioning results. Both KM-SVDD and FCM-SVDD can detect faults in the first phase, but they cannot detect faults when the phase changes. Because the UMSP fully captures the sequential characteristics of the batches, UMSP achieves proper phase partitioning, and UMSP-SVDD has the highest monitoring accuracy. Therefore, the use of the UMSP phasepartition method can efficiently improve the monitoring accuracy of batch processes.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Tel.: 0086-010-64433803. ORCID

Jianlin Wang: 0000-0003-3398-7967 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work is supported in part by the Beijing Natural Science Foundation (4152041) and the National Natural Science Foundation of China (61240047).



REFERENCES

(1) Bakshi, B. R.; Locher, G.; Stephanopoulos, G.; Stephanopoulous, G. Analysis of Operating Data for Evaluation, Diagnosis and Control of Batch Operations. J. Process Control 1994, 4 (4), 179−194. (2) Lennox, B.; Montague, G. A.; Hiden, H. G.; Kornfeld, G.; Goulding, P. R. Process Monitoring of an Industrial Fed-Batch Fermentation. Biotechnol. Bioeng. 2001, 74 (2), 125. (3) MacGregor, J.; Cinar, A. Monitoring, Fault Diagnosis, FaultTolerant Control and Optimization: Data Driven Methods. Comput. Chem. Eng. 2012, 47, 111−120. (4) Ge, Z.; Song, Z.; Gao, F. Review of Recent Research on DataBased Process Monitoring. Ind. Eng. Chem. Res. 2013, 52 (10), 3543− 3562. (5) Luo, L.; Bao, S.; Gao, Z.; Yuan, J. Tensor Global-Local Preserving Projections for Batch Process Monitoring. Ind. Eng. Chem. Res. 2014, 53 (24), 10166−10176. (6) Westerhuis, J. A.; Kourti, T.; Macgregor, J. F. Comparing Alternative Approaches for Multivariate Statistical Analysis of Batch Process Data. J. Chemom. 1999, 13 (3−4), 397−413.

6. CONCLUSIONS In this article, a UMSP that can be used to achieve offline phase partitioning of a batch process was proposed. Further, the offline phase-partitioning results were used for batchprocess monitoring and online phase identification using SVDD. The UMSP phase-partitioning method has four N

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

(28) Ng, Y. S.; Srinivasan, R. An Adjoined Multi-Model Approach for Monitoring Batch and Transient Operations. Comput. Chem. Eng. 2009, 33 (4), 887−902. (29) Wang, J.; Wei, H.; Cao, L.; Jin, Q. Soft-Transition Sub-PCA Fault Monitoring of Batch Processes. Ind. Eng. Chem. Res. 2013, 52 (29), 9879−9888. (30) Luo, L.; Bao, S.; Mao, J.; Tang, D.; Gao, Z. Fuzzy Phase Partition and Hybrid Modeling Based Quality Prediction and Process Monitoring Methods for Multiphase Batch Processes. Ind. Eng. Chem. Res. 2016, 55 (14), 4045−4058. (31) Tax, D. M. J.; Duin, R. P. W. Support Vector Domain Description. Pattern Recognit. Lett. 1999, 20 (11), 1191−1199. (32) Birol, G.; Undey, C.; Cinar, A. A Modular Simulation Package for Fed-Batch Fermentation: Penicillin Production. Comput. Chem. Eng. 2002, 26 (11), 1553−1565.

(7) Smilde, A. K. Comments on Three-way Analyses Used for Batch Process Data. J. Chemom. 2001, 15 (1), 19−27. (8) Lee, J. M.; Yoo, C. K.; Lee, I. B. Enhanced Process Monitoring of Fed-Batch Penicillin Cultivation Using Time-Varying and Multivariate Statistical Analysis. J. Biotechnol. 2004, 110 (2), 119. (9) Zhao, C.; Wang, F.; Jia, M. Dissimilarity Analysis Based Batch Process Monitoring Using Moving Windows. AIChE J. 2007, 53 (5), 1267−1277. (10) Kourti, T.; Macgregor, J. F. Process Analysis, Monitoring and Diagnosis, Using Multivariate Projection Methods. Chemom. Intell. Lab. Syst. 1995, 28 (1), 3−21. (11) Kosanovich, K. A.; Dahl, K. S.; Piovoso, M. J. Improved Process Understanding Using Multiway Principal Component Analysis. Ind. Eng. Chem. Res. 1996, 35 (1), 138−146. (12) Yoo, C. K.; Lee, J. M.; Vanrolleghem, P. A.; Lee, I. B. On-Line Monitoring of Batch Processes Using Multiway Independent Component Analysis. Chemom. Intell. Lab. Syst. 2004, 71 (2), 151− 163. (13) Zhao, C.; Gao, F. Statistical Modeling and Online Fault Detection for Multiphase Batch Processes with Analysis of BetweenPhase Relative Changes. Chemom. Intell. Lab. Syst. 2014, 130, 58−67. (14) Ge, Z.; Zhao, L.; Yao, Y.; Song, Z.; Gao, F. Utilizing Transition Information in Online Quality Prediction of Multiphase Batch Processes. J. Process Control 2012, 22 (3), 599−611. (15) Ge, Z.; Song, Z. Online Monitoring and Quality Prediction of Multiphase Batch Processes with Uneven Length Problem. Ind. Eng. Chem. Res. 2014, 53 (2), 800−811. (16) Luo, L.; Bao, S.; Gao, Z.; Yuan, J. Batch Process Monitoring with GTucker2Model. Ind. Eng. Chem. Res. 2014, 53 (39), 15101− 15110. (17) Sun, W.; Meng, Y.; Palazoglu, A.; Zhao, J.; Zhang, H.; Zhang, J. A Method for Multiphase Batch Process Monitoring Based on Auto Phase Identification. J. Process Control 2011, 21 (4), 627−638. (18) Wang, S.; Chang, Y.-Q.; Zhao, Z.; Wang, F.-L. Multi-Phase MPCA Modeling and Application Based on an Improved Phase Separation Method. Int. J. Control Autom. Syst. 2012, 10 (6), 1136− 1145. (19) Zhao, C.; Sun, Y. Step-Wise Sequential Phase Partition (SSPP) Algorithm Based Statistical Modeling and Online Process Monitoring. Chemom. Intell. Lab. Syst. 2013, 125, 109−120. (20) Qin, Y.; Zhao, C.; Gao, F. An Iterative Two-Step Sequential Phase Partition (ITSPP) Method for Batch Process Modeling and Online Monitoring. AIChE J. 2016, 62 (7), 2358−2373. (21) Zhao, C.; Mo, S.; Gao, F.; Lu, N.; Yao, Y. Statistical Analysis and Online Monitoring for Handling Multiphase Batch Processes with Varying Durations. J. Process Control 2011, 21 (6), 817−829. (22) Li, W.; Zhao, C.; Gao, F. Sequential Time Slice Alignment Based Unequal-Length Phase Identification and Modeling for Fault Detection of Irregular Batches. Ind. Eng. Chem. Res. 2015, 54 (41), 10020−10030. (23) Zhang, S.; Zhao, C.; Gao, F. Two-Directional Concurrent Strategy of Mode Identification and Sequential Phase Division for Multimode and Multiphase Batch Process Monitoring with Uneven Lengths. Chem. Eng. Sci. 2018, 178, 104−117. (24) Lu, N.; Gao, F.; Wang, F. Sub-PCA Modeling and On-line Monitoring Strategy for Batch Processes. AIChE J. 2004, 50 (1), 255−259. (25) Seng, N. Y.; Srinivasan, R. An Adjoined Multi-DPCA Approach for Online Monitoring of Fed-Batch Processes. IFAC Proc. Vol. 2006, 39 (2), 279−284. (26) Luo, L.; Bao, S.; Mao, J.; Tang, D. Phase Partition and PhaseBased Process Monitoring Methods for Multiphase Batch Processes with Uneven Durations. Ind. Eng. Chem. Res. 2016, 55 (7), 2035− 2048. (27) Zhao, C.; Wang, F.; Lu, N.; Jia, M. Stage-Based Soft-Transition Multiple PCA Modeling and on-Line Monitoring Strategy for Batch Processes. J. Process Control 2007, 17 (9), 728−741. O

DOI: 10.1021/acs.iecr.8b02486 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX