Nonlinear and Non-Gaussian Dynamic Batch Process Monitoring

Jul 9, 2012 - Multidimensional Mutual Information Based Dissimilarity Approach ... mutual information (MMI) is developed and applied to batch process ...
1 downloads 0 Views 2MB Size
Article pubs.acs.org/IECR

Nonlinear and Non-Gaussian Dynamic Batch Process Monitoring Using a New Multiway Kernel Independent Component Analysis and Multidimensional Mutual Information Based Dissimilarity Approach Mudassir M. Rashid and Jie Yu* Department of Chemical Engineering, McMaster University, Hamilton, Ontario, Canada L8S 4L7 ABSTRACT: Batch or semibatch process monitoring is a challenging task because of various factors such as strong nonlinearity, inherent time-varying dynamics, batch-to-batch variations, and multiple operating phases. In this article, a novel nonlinear and non-Gaussian dissimilarity method based on multiway kernel independent component analysis (MKICA) and multidimensional mutual information (MMI) is developed and applied to batch process monitoring and abnormal event detection. MKICA models are first built on the normal benchmark and monitored batches to characterize the nonlinear and non-Gaussian variable relationship of batch processes. Then, the kernel independent component (IC) subspaces are extracted from the benchmark and monitored batches. Further, a multidimensional mutual information based dissimilarity index is defined to quantitatively evaluate the statistical dependence between the benchmark and monitored subspaces through the moving-window strategy. With the corresponding control limit estimated from the kernel density function, the integrated MKICA−MMI index can be used to detect the abnormal events in dynamic batch processes. The effectiveness of the proposed batch process monitoring approach is demonstrated using the fed-batch penicillin fermentation process, and its performance is compared to that of the MKICA method. The computational results show that the presented dissimilarity approach is faster and more accurate in detecting different types of process faults.

1. INTRODUCTION Batch and semibatch processes are used extensively in the production of high-quality, low-volume products including special chemicals, materials, food, drugs, and semiconductor devices. The successful monitoring and control of batch processes are essential to maintaining high end-product quality, ensuring safe and stable operation, optimizing production yield, and minimizing energy consumption. Key characteristics of batch processes such as the complex mechanism, finite duration, inherent process nonlinearity, underlying time-varying dynamics, batch-to-batch variations, and multiplicity of operating phases pose significant challenges to effective monitoring and cause the failure of traditional continuous process monitoring methods. Meanwhile, batch processes such as cell fermentation often encounter slow recovery from faulty conditions back to normal operation.1,2 Therefore, the early detection of process faults is crucially important in batch process monitoring so as to avoid the waste of an entire batch and potential damage to expensive process equipment. Once the faulty operation is signaled by the monitoring system, corrective actions can be immediately taken to prevent serious incidents, mitigate the deterioration of product quality, and reduce the number of rejected batches.3,4 Multivariate statistical process monitoring techniques such as multiway principal component analysis (MPCA) and multiway partial least-squares (MPLS) have been widely used in batch process monitoring.5−11 The conventional multivariate statistical process monitoring methods rely on the assumption that the batch process data come from a single operating phase and, thus, the batchwise unfolded data approximately follow a multivariate Gaussian distribution in order for the confidence limits of the monitoring statistics to be valid. However, an inherent feature of batch processes is a multiplicity of operating phases that can © 2012 American Chemical Society

result in significantly diverse variable correlation structures across different phases. Such characteristic of batch processes can make the traditional MPCA and MPLS methods ill-suited. Moreover, MPCA and MPLS approaches cannot efficiently deal with the nonlinearity in batch processes. To address this issue, multiway kernel PCA (MKPCA) and multiway kernel PLS (MKPLS) methods were proposed to map process data from the original measurement space onto a high-dimensional feature space for batch process monitoring.12,13 Nevertheless, the multiphase feature and time-varying dynamics of batch processes are still not properly handled by the MKPCA and MKPLS methods. A stage-based sub-PCA approach was developed to deal with the multiplicity of operation stages of batch processes using the k-means clustering technique, which partitions the PCA loading matrices into a number of clusters.14 However, the number of operational stages needs to be specified by the user, and a biased estimation can significantly affect the monitoring accuracy. As an alternative solution, the independent component analysis (ICA) technique has attracted growing attention in the process monitoring field. This technique utilizes higher-order statistics to extract non-Gaussian components with statistical independence and then conduct process monitoring within the independent component subspaces.15,16 For batch processes, the multiway kernel ICA (MKICA) technique has been applied to handle process nonlinearity and non-Gaussianity. However, it does not adequately solve the issues of process dynamics and Received: Revised: Accepted: Published: 10910

April 17, 2012 July 5, 2012 July 9, 2012 July 9, 2012 dx.doi.org/10.1021/ie301002h | Ind. Eng. Chem. Res. 2012, 51, 10910−10920

Industrial & Engineering Chemistry Research

Article

shifting operational phases. Meanwhile, machine learning techniques have been explored for batch process monitoring. The support vector machine (SVM) method provides a supervised monitoring strategy that utilizes both normal and faulty data to train the nonlinear classification model for fault detection.17,18 However, SVM requires known class labels on training data that are usually difficult to obtain in practice. Moreover, the SVM method itself does not have an inherent capability to deal with process dynamics and switching phases. As a powerful multiphase batch monitoring method, the multiway Gaussian mixture model (MGMM) was developed to automatically identify different operational phases from unlabeled data and then monitor batch processes within localized models corresponding to various phases through a Bayesian inference strategy.19,20 Another recently developed method is based on a hidden semi-Markov model (HSMM), which classifies the different phases as discrete states. Then, localized PCA models are built for the different phases.21 However, the local PCA models cannot handle any process non-Gaussianity and nonlinearity within each operating phase.22,23 In this study, a novel dissimilarity method that integrates multiway kernel independent component analysis and multidimensional mutual information (MMI) is developed for dynamic batch processes monitoring. The proposed dissimilarity approach models the normal benchmark batches and the monitored batch using MKICA, which can effectively characterizes the nonlinear and non-Gaussian features of batch processes. In addition, the localized MKICA models are built on a movingwindow basis so that the process dynamics and multiple phases can be effectively handled. Then, the multidimensional mutual information based dissimilarity index is computed between each pair of kernel IC subspaces from the benchmark and monitored batches. Such an MKICA−MMI dissimilarity index represents the statistical dependence of the monitored data on the benchmark data within each localized segment. By extracting the nonlinear and non-Gaussian features, the proposed dissimilarity method can effectively capture different types of process faults during batch operation. The remainder of the article is organized as follows: Section 2 reviews the multiway kernel independent component analysis based batch process monitoring method. Then, the new MKICA−MMI based dissimilarity method for monitoring nonlinear and non-Gaussian dynamic batch processes is developed in section 3. Section 4 presents the application of the proposed MKICA−MMI dissimilarity method to the fedbatch penicillin fermentation process. The conclusions of this work are summarized in section 5.

Figure 1. Illustration of the moving-window strategy for the MKICA− MMI dissimilarity method.

data Φ(X) ∈ F. Then, the Gram kernel matrix κ ∈ N × N can be defined as [κi , j] = ⟨Φ(xi), Φ(xj)⟩ = k(xi , xj)

(1)

where N is the number of measurement samples, ⟨·⟩ represents the inner product, and k(xi, xj) = ⟨Φ(xi), Φ(xj)⟩ denotes the kernel function value of any two samples points xi and xj with 1 ≤ i, j ≤ N. After the Gram kernel matrix κ has been computed using the predefined kernel function, the kernel matrix is further meancentered and scaled to yield the transformed kernel matrix κ̃. Then, the eigenvalue decomposition is performed on κ̃ as follows (2)

ΛΥ = κV ̃

where Λ = diag{λ1 λ2 ··· λd} corresponds to the d largest eigenvalues and V = [v1 v2 ··· vd] are the orthonormal eigenvectors. The whitened data matrix Z can thus be obtained as Z=

N Λ−1V Tκ ̃

(3)

The whitening procedure removes all cross-correlations among the variables, which is essential for extracting independent components. The kernel independent components can be computed from the whitened data as

2. REVIEW OF KERNEL INDEPENDENT COMPONENT ANALYSIS Kernel independent component analysis can be used to extract statistically independent and non-Gaussian components that also capture the process nonlinearity from the measurement data. The data set X is first mapped through a nonlinear kernel function from the original measurement space onto a highdimensional feature space where the projected samples show a linear relationship. Then, a modified ICA algorithm is implemented in the feature space to extract kernel independent components that are statistically independent of each other.24 The objective is to determine the linear operator in the feature space F to extract independent components from the mapped

S = Λ1/2ATZ

(4)

where A denotes the demixing matrix matrix with the columns being initialized and updated iteratively such that the corresponding normalized independent components S′ = ATZ have the maximum non-Gaussianity.25,26 The monitoring statistics T2 and SPE are then calculated as T 2 = St T Λ−1St 10911

(5)

dx.doi.org/10.1021/ie301002h | Ind. Eng. Chem. Res. 2012, 51, 10910−10920

Industrial & Engineering Chemistry Research

Article

Figure 2. Schematic diagram of the proposed MKICA−MMI dissimilarity method for batch process monitoring.

Table 1. Monitored Variables of the Fed-Batch Penicillin Fermentation Process

Figure 3. Process flow diagram of the fed-batch penicillin fermentation process.

and SPE = Zt T(I − AAT)Zt

(6)

where St and Zt are the kernel independent components and the whitened data matrix, respectively, from the test samples. The kernel ICA based monitoring statistics T2 and SPE along with the

variable no.

variable description

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

aeration rate (L/h) agitator power (W) substrate feed rate (L/h) substrate feed temperature (K) substrate concentration (g/L) pH dissolved oxygen concentration (g/L) carbon dioxide concentration (g/L) biomass concentration (g/L) penicillin concentration (g/L) fermenter temperature (K) cooling water flow rate (L/h) generated heat (kcal) acid flow rate (L/h) base flow rate (L/h) culture volume (L)

kernel density estimation27 based control limits can be used for process monitoring and fault detection. 10912

dx.doi.org/10.1021/ie301002h | Ind. Eng. Chem. Res. 2012, 51, 10910−10920

Industrial & Engineering Chemistry Research

Article

Figure 4. Normal profiles of the monitored variables in the fed-batch penicillin fermentation process.

Table 2. Test Cases of the Fed-Batch Penicillin Fermentation Process case no.

test scenario

duration

1

normal operation drift error in aeration rate normal operation step fault in substrate feed rate normal operation step fault in agitator power normal operation drift error in substrate feed rate

0−180 h 180−400 h 0−240 h 240−400 h 0−60 h 60−150 h 150−240 h 240−400 h

2 3

Table 3. Comparison of Monitoring Results for MKICA and MKICA−MMI Dissimilarity Methods case no.

monitoring method

1

MKICA

2

MKICA−MMI dissimilarity MKICA

3

MKICA−MMI dissimilarity MKICA MKICA−MMI dissimilarity

3. MULTIWAY NONLINEAR KERNEL INDEPENDENT COMPONENT ANALYSIS AND MULTIDIMENSIONAL MUTUAL INFORMATION BASED DISSIMILARITY METHOD Multiway PCA/PLS and multiway kernel PCA/PLS do not perform well in batch process monitoring because these methods employ only second-order statistics that cannot extract the nonGaussian hidden features for fault detection. Moreover, multiway kernel independent component analysis has limited capability in

monitoring statistic

fault detection rate (%)

false alarm rate (%)

T2 SPE DKICA−MMI

75.0 90.2 94.5

7.2 14.2 3.6

T2 SPE DKICA−MMI

60.9 95.3 97.8

3.1 21.9 6.3

T2 SPE DKICA−MMI

69.8 73.6 95.6

18.0 29.0 6.3

batch process monitoring because it does not take into account the inherent process dynamics and multiplicity of operating phases. In this study, a new MKICA and multidimensional mutual information based dissimilarity index is proposed for batch process monitoring through a moving-window strategy. The presented MKICA−MMI dissimilarity method can handle 10913

dx.doi.org/10.1021/ie301002h | Ind. Eng. Chem. Res. 2012, 51, 10910−10920

Industrial & Engineering Chemistry Research

Article

Figure 5. Process monitoring results of the MKICA method in the first test case of the fed-batch penicillin fermentation process.

is block-transposed to form the new two-dimensional matrix X ∈ IK × J expressed as

the nonlinearity, non-Gaussianity, time-varying dynamics, and phase shifts of batch processes. Batch process data are typically expressed in the format of a three-dimensional matrix as X̅ ∈ I × J × K , where I represents the number of batches, J denotes the number of measurement variables, and K is the number of sampling instants within each batch. For a data set with uneven durations across different batches, the dynamic time warping technique can be employed for batch alignment and synchronization.28 Then, the threedimensional data matrix is unfolded into a two-dimensional matrix through multiway analysis. First, the batchwise unfolding is conducted so as to stack the multiple block matrices from different sampling instants along the row direction. The formed two-dimensional matrix X̃ ∈ I × JK is expressed as ⎡ (1) (1) ⎢ x1,1 ··· x1, J ⎢ ⋱ ⋮ X̃ = ⎢⋮ ⎢ (1) (1) ⎣ xI ,1 ··· xI , J

(2) x1,1 ··· x1,(2)J

⋮ xI(2) ,1

⋱ ⋮ ···

xI(2) ,J

...

⎡ x (1) ⎢ 1,1 ⎢⋮ ⎢ ⎢ x(1) ⎢ I ,1 ⎢ (2) ⎢ x1,1 ⎢⋮ X=⎢ ⎢ x(2) ⎢ I ,1 ⎢ ⎢ ⎢ x (K ) ⎢ 1,1 ⎢⋮ ⎢ K) ⎢⎣ xI(,1

⎤ (K ) x1,1 ··· x1,(KJ ) ⎥ ⎥ ⋮ ⋱ ⋮ ⎥ ⎥ K) xI(,1 ··· xI(,KJ ) ⎦

··· x1,(1)J ⎤⎥ ⋱ ⋮ ⎥ ⎥ ⎥ ··· xI(1) ,J ⎥ (2) ⎥ ··· xI , J ⎥ ⋱ ⋮ ⎥⎥ ⎥ ··· xI(2) ,J ⎥ ⎥ ⋮ ⎥ ··· x1,(KJ )⎥ ⎥ ⋱ ⋮ ⎥ ⎥ ··· xI(,KJ ) ⎥⎦

(8)

After data preprocessing and unfolding, the batch process data are mapped from the original measurement space onto a high-dimensional feature space using the Gaussian kernel function

(7)

where all of the column vectors of the unfolded matrix X̃ are mean-centered and scaled to unit variance. Then, the data 10914

dx.doi.org/10.1021/ie301002h | Ind. Eng. Chem. Res. 2012, 51, 10910−10920

Industrial & Engineering Chemistry Research

Article

Figure 6. Process monitoring results of the MKICA−MMI dissimilarity method in the first test case of the fed-batch penicillin fermentation process.

⎛− x − x i j k(xi , xj) = exp⎜⎜ c ⎝

2

⎞ ⎟ ⎟ ⎠

series. It should be noted that the number of independent components is selected according to the cumulative percentage of Euclidean norms. For a random variable X with n possible outcomes {xi: i = 1, 2, ..., n}, the entropy can be used to measure the stochastic uncertainty

(9)

where c is the kernel width. After the Gram kernel matrix κ has been computed from eq 1, the kernel matrix is further meancentered as κ ́ = κ − 1N ·κ − κ ·1N + 1N ·κ ·1N

n

H(X ) = −∑ p(xi) log p(xi)

(10)

i=1

with the matrix 1N ∈ N × N defined as ⎡1 ··· 1⎤ 1⎢ ⎥ 1N = ⎢⋮ ⋱ ⋮⎥ N ⎣1 ··· 1⎦

where p(xi) is the probability density function of the outcome xi. The joint entropy of two random variables X and Y with the corresponding possible outcomes xi and yj is then defined as (11)

H(X , Y ) = −∑ ∑ p(xi , yj ) log p(xi , yj )

and subsequently scaled as κ̃ =

κ́ ́ N tr(κ )/

(13)

i

j

(14)

where p(xi,yj) is the joint probability density function of the outcomes xi and yj. The mutual information between X and Y, denoted by I(X,Y), can be obtained from the entropies as follows

(12)

where tr(κ′) represents the trace of the matrix κ′. In this section, the normal training batches are set as a benchmark, and the new monitored batches are compared to the benchmark through continuously rolling windows. The MKICA models can be built from the benchmark batches and the monitored batch. Then, the nonlinear kernel IC subspaces are extracted from both MKICA models and further compared using multidimensional mutual information to evaluate the dissimilarity between the benchmark and the monitored time

I (X , Y ) = H (X ) + H (Y ) − H (X , Y ) p(xi , yj ) = ∑ ∑ p(xi , yj ) log p(xi) p(yj ) i j

(15)

For continuous variables, the summations can be replaced by integrals. However, directly calculating integrals is not a practical 10915

dx.doi.org/10.1021/ie301002h | Ind. Eng. Chem. Res. 2012, 51, 10910−10920

Industrial & Engineering Chemistry Research

Article

Figure 7. Process monitoring results of the MKICA method in the second test case of the fed-batch penicillin fermentation process.

where Tb2 and Tm2 represent the MKICA based T2 indices for the benchmark and monitored batches, respectively, and are computed as

way of computing mutual information for multivariable systems. Instead, the numerical method of utilizing the k-nearest-neighbor (KNN) strategy can be employed to compute the mutual information within high-dimensional spaces as follows

Tb 2 = S b T Λb−1S b

1 I(X , Y ) = ψ (k) − − [[ψ (nX ) + ψ (nY )]] + ψ (N ) k

and

(16)

Tm 2 = Sm T Λ m−1Sm

where N denotes the size of the data set, nX and nY are the numbers of samples in proximity to the nearest neighbors, k is the number of nearest neighbors, and [[·]] represents the average value over all realizations of the measurement variables.29,30 In addition, ψ(·) denotes the digamma function and is defined as the logarithmic derivative of the gamma function Γ(·) as follows ψ (x) = Γ(x)−1

dΓ(x) dx

⎛T 2 ⎞ 1 ⎜ m ⎟ I(S b , Sm) ⎝ Tb 2 ⎠

(20)

with Λb and Λm denoting the diagonal eigenvalue matrices of the MKICA models. Meanwhile, I(Sb,Sm) is the multidimensional mutual information between the nonlinear kernel IC subspaces of the benchmark and monitored batches. An illustration of the moving-window strategy for the MKICA−MMI dissimilarity method is shown in Figure 1. The control limit of the DKICA−MMI index can then be computed from the Gaussian kernel function based kernel density estimator for batch process monitoring.27 The proposed MKICA and MMI based dissimilarity approach for batch process monitoring is illustrated in Figure 2, and its detailed step-by-step procedure is as follows: (i) Perform batch alignment and synchronization using dynamic time warping. (ii) Unfold the benchmark batches X̅ b ∈ I × J × K to X̃ b ∈ I × JK with the column vectors mean-centered and scaled.

(17)

A dissimilarity index, DKICA−MMI, based on the multiway kernel independent component analysis and multidimensional mutual information is further proposed to quantify the statistical independence between the nonlinear kernel independent component subspaces of the benchmark batches Sb and the monitored batch Sm as DKICA − MMI =

(19)

(18) 10916

dx.doi.org/10.1021/ie301002h | Ind. Eng. Chem. Res. 2012, 51, 10910−10920

Industrial & Engineering Chemistry Research

Article

Figure 8. Process monitoring results of the MKICA−MMI dissimilarity method in the second test case of the fed-batch penicillin fermentation process.

process was used to evaluate the effectiveness of the proposed multiway kernel independent component analysis and multidimensional mutual information based dissimilarity approach. In the process simulation, the microorganisms are grown in batch operation during the initial 40 h to reach a high cell density, and then, the fermenter is switched to the fed-batch mode with a continuous feed of glucose as the substrate to maintain a high biomass growth rate and promote penicillin synthesis. The target product, penicillin, is also the secondary metabolite in the process and mainly secreted during the fed-batch stage. This process is characterized by nonlinear dynamics, multiple operating phases, and system uncertainty. The main process variables to be monitored include biomass concentration, penicillin concentration, glucose concentration, dissolved oxygen, and pH in the fermenter. The process flow diagram of the fed-batch penicillin fermentation process is given in Figure 3. The substrate feed rate during the fed-batch stage is under open-loop operation, whereas the pH and temperature in the fermenter are controlled using two cascade loops that manipulate the acid/base equilibrium and the hot/cold water flow ratio. Furthermore, the air is continuously fed into the fermenter to maintain the desirable oxygen level for cell growth.31 In this study, the observation data of 16 measurement variables given in Table 1 were collected for batch process monitoring. The sampling time used was 0.5 h, and the overall duration of each batch was 400 h including both the batch and fed-batch stages. The normal profiles of the measurement variables in the

(iii) Block-transpose the batchwise unfolded data matrix X̃ b ∈ I × JK to obtain Xb ∈ IK × J . (iv) Unfold the monitored batch X̅ m ∈ 1 × J × K to X̃ m ∈ 1 × JK with column vectors mean-centered and scaled using the mean and standard deviation of benchmark data. (v) Block-transpose the monitored data X̃ m ∈ 1 × JK to obtain Xm ∈ K × J . (vi) Specify the size of moving window w. (vii) Map the benchmark and monitored data sets onto a highdimensional feature space using the Gaussian kernel function, resulting in the Gram kernel matrices κb and κm. (viii) Move the window over the benchmark and monitored data and build the corresponding MKICA models. (ix) Extract the kernel independent component subspaces Sb and Sm corresponding to the benchmark and monitored data, respectively. (x) Compute the multidimensional mutual information I(Sb,Sm) between the benchmark and monitored subspaces Sb and Sm. (xi) Compute the dissimilarity index DKICA−MMI and the kernel density estimation based control limit for batch operation monitoring.

4. APPLICATION EXAMPLE 4.1. Fed-Batch Penicillin Fermentation Process Description. The simulated fed-batch penicillin fermentation 10917

dx.doi.org/10.1021/ie301002h | Ind. Eng. Chem. Res. 2012, 51, 10910−10920

Industrial & Engineering Chemistry Research

Article

Figure 9. Process monitoring results of the MKICA method in the third test case of the fed-batch penicillin fermentation process.

4.2. Batch Process Monitoring Results and Comparison. The proposed dissimilarity approach was compared to the regular MKICA method in the penicillin fermentation process, and the monitoring results in terms of fault detection rate and false alarm rate are listed in Table 3. The time series plots of the monitoring statistics of the regular MKICA and the MKICA− MMI dissimilarity methods in the first test case are shown in Figures 5 and 6, respectively. It can be readily observed that the MKICA−MMI dissimilarity method yields a much higher fault detection rate of 94.5%, whereas the regular MKICA approach leads to the fault detection rates of only 75.0% and 90.2% for the T2 and SPE statistics, respectively. Meanwhile, the false alarm rate of the MKICA−MMI dissimilarity method (3.6%) is significantly lower than those of the T2 and SPE indices of the MKICA method (7.2% and 14.2%, respectively). It is obvious from the trend plots of the monitoring statistics that there is a significant delay in signaling the drift error, especially for the T2 statistic. The MKICA method does not specifically capture the time-varying dynamic behavior and inherently shifting operational phases. Moreover, the T2 index itself is based on a quadratic form of the distance metric and could cause degraded performance in handling very strongly non-Gaussian process data. Therefore, the T2 statistic becomes insensitive to the process faults as observed in this application example. In contrast, the proposed MKICA−MMI dissimilarity index can

fermentation process are shown in Figure 4. In our proposed approach, only the completed batches of benchmark data are needed, and the monitored data are examined on a movingwindow basis for online fault detection. In other words, as the new measurement samples are collected along the trajectory of each batch, the moving segments of test data are formed, and then the dissimilarity index values are continuously computed for fault detection in an online fashion. The fault detection capability of the proposed approach was evaluated using three test cases that involve different types of process faults such as step and drift errors, as shown in Table 2. The first test case starts at normal operation that lasts the first 60 h and is then followed by a drift error in the aeration rate remaining until the end of the batch. The second test case also begins with normal operation for the initial 240 h. Then, a step error in the substrate feed rate occurs and lasts throughout the rest of the batch. For the third case, the process encounters a sudden drop in agitator power starting from the 60th hour until the 150th hour. Then, the process is restored to the normal operation for the next 90 h. Afterward, a drift error in the substrate feed rate takes place and lasts until the completion of the batch. In addition to the three test cases, a training data set with 50 normal operation batches was used as the benchmark for process monitoring. In this work, the window width was set to 25 h, and the confidence level for the control limit computation was fixed at 95%. 10918

dx.doi.org/10.1021/ie301002h | Ind. Eng. Chem. Res. 2012, 51, 10910−10920

Industrial & Engineering Chemistry Research

Article

Figure 10. Process monitoring results of the MKICA−MMI dissimilarity method in the third test case of the fed-batch penicillin fermentation process.

detection rate is as high as 97.8%, and its false alarm rate is only 6.3%. The inability of the conventional MKICA method to either capture the faulty samples or mitigate the false alarms indicates that it is not well suited for monitoring batch processes with inherent dynamics, multiple phases, and operating scenario changes. However, the MKICA−MMI dissimilarity method can accurately distinguish between normal operation and faulty events because of its non-Gaussianity, dynamics, and local phase handling features. Finally, a more complicated test case with two types of faults, a step error in the agitator power and a drift fault in the substrate feed rate, was used to further verify the strong monitoring capability of the proposed dissimilarity method. As shown in Figure 9, both the T2 and SPE indices miss significant percentages of faulty samples, especially for the drift error caused by the delay of fault alarm. Furthermore, the initial normal period contains a substantial portion of sample points with active false alarms by both the T2 and SPE indices. The fault detection rates of the T2 and SPE indices are only 69.8% and 73.6%, respectively. Meanwhile, the corresponding false alarm rates are as high as 18.0% and 29.0%. In comparison, the MKICA−MMI dissimilarity index is plotted in Figure 10. As in the previous test cases, the new dissimilarity method is able to extract the non-Gaussian dynamic features from the localized kernel IC subspaces through statistical independence, resulting in a high fault detection rate of 95.6% and a relatively low false alarm rate of 6.3%. Therefore, the presented dissimilarity method can accurately quantify the

signal the fault occurrence with minimum delay. Moreover, the MKICA based T2 and SPE indices both trigger some false alarms around the transition period when the process operation is switched from batch to fed-batch stage. Nevertheless, the dissimilarity index does not generate any false alarms in the same transition period. The poor performance of the regular MKICA approach is attributed to its inefficiency of handling the process dynamics, local nonlinearity and non-Gaussianity, and the operating phase shifts. The proposed MKICA−MMI dissimilarity method, however, is based on moving windows and non-Gaussian mutual information between localized kernel IC subspaces. Thus, it has much stronger capability for monitoring non-Gaussian dynamic batch processes with multiplicity of operating phases, as demonstrated by the excellent fault detection results. The second test case consists of a step error in the substrate feed rate. The trend plots of the monitoring statistics are shown in Figures 7 and 8. It is easily seen that most of the faulty samples can be detected by the MKICA based SPE index, and thus, its fault detection rate of 95.3% is definitely acceptable. However, false alarms are triggered quite a number of times during the normal operation period, so that its false alarm rate is as high as 21.9%. On the contrary, the T2 index results in much fewer false alarms, whereas its fault detection shows a longer delay and more undetected faulty samples. In comparison, the MKICA−MMI dissimilarity index leads to satisfactory results in terms of both detecting faulty samples and minimizing false alarms. Its fault 10919

dx.doi.org/10.1021/ie301002h | Ind. Eng. Chem. Res. 2012, 51, 10910−10920

Industrial & Engineering Chemistry Research

Article

(11) Yu, J. Localized Fisher discriminant analysis based complex chemical processmonitoring. AIChE J. 2011, 57, 1817−1828. (12) Lee, J.-M.; Yoo, C. K.; Choi, S. W.; Vanrolleghem, P. A.; Lee, I.-B. Nonlinear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2004, 59, 223−234. (13) Zhang, Y.; Zhou, H.; Qin, S. J.; Chai, T. Decentralized fault diagnosis of large-scale processes using multiblock kernel partial least squares. IEEE Trans. Ind. Inf. 2010, 6, 3−10. (14) Lu, N.; Gao, F.; Wang, F. Sub-PCA modeling and on-line monitoring strategy for batch processes. AIChE J. 2004, 50, 255−259. (15) Albazzaz, H.; Wang, X. Z. Statistical process control charts for batch operations based on independent component analysis. Ind. Eng. Chem. Res. 2004, 43, 6731−6741. (16) Lee, J.-M.; Qin, S. J.; Lee, I.-B. Fault detection and diagnosis based on modified independent component analysis. AIChE J. 2006, 52, 3501−3514. (17) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault Detection and Diagnosis in Industrial Systems; Advanced Textbooks in Control and Signal Processing; Springer-Verlag: London, 2001. (18) Chiang, L. H.; Kotanchek, M. E.; Kordon, A. K. Fault diagnosis based on Fisher discriminant analysis and support vector machines. Comput. Chem. Eng. 2004, 28, 1389−1401. (19) Yu, J.; Qin, S. J. Multimode process monitoring with Bayesian inference-based finite Gaussian mixture models. AIChE J. 2008, 54, 1811−1829. (20) Yu, J.; Qin, S. J. Multiway Gaussian mixture model based multiphase batch process monitoring. Ind. Eng. Chem. Res. 2009, 48, 8585−8594. (21) Chen, J.; Jiang, Y.-C. Development of hidden semi-Markov models for diagnosis of multiphase batch operation. Chem. Eng. Sci. 2011, 66, 1087−1099. (22) Yu, J. Nonlinear bioprocess monitoring using multiway kernel localized Fisher discriminant analysis. Ind. Eng. Chem. Res. 2011, 50, 3390−3402. (23) Yu, J. A nonlinear kernel Gaussian mixture model based inferential monitoring approach for fault detection and diagnosis of chemical processes. Chem. Eng. Sci. 2012, 68, 506−519. (24) Zhang, Y.; Qin, S. J. Fault detection of nonlinear processes using multiway kernel independent component analysis. Ind. Eng. Chem. Res. 2007, 46, 7780−7787. (25) Hyvärinen, A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Networks 1999, 10, 626−634. (26) Hyvärinen, A.; Oja, E. Independent component analysis: Algorithms and applications. Neural Networks 2000, 13, 411−430. (27) Bishop, C. M. Neural Networks for Pattern Recognition; Oxford University Press: Oxford, U.K., 1995. (28) Kassidas, A.; MacGregor, J. F.; Taylor, P. A. Synchronization of batch trajectories using dynamic time warping. AIChE J. 1998, 44, 864− 875. (29) Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. (30) Kraskov, A.; Stögbauer, H.; Andrzejak, R. G.; Grassberger, P. Hierarchical clustering using mutual information. Europhys. Lett. 2005, 70, 278−284. (31) Birol, G.; Ü ndey, C.; Ç inar, A. A modular simulation package for fed-batch fermentation: Penicillin production. Comput. Chem. Eng. 2002, 26, 1553−1565.

difference between the normal benchmark and monitored batches for effective process monitoring and reliable fault detection.

5. CONCLUSIONS In this article, a novel dissimilarity method is developed to monitor non-Gaussian dynamic batch processes by combining multiway kernel independent component analysis and multidimensional mutual information. First, a hybrid unfolding scheme is used to unfold and rearrange the batch process data, and then, multiway kernel independent component analysis is conducted through a moving-window strategy to model the nonsteady-state process data of both benchmark and monitored batches. The multidimensional mutual information is further computed between the kernel IC subspaces of the benchmark and monitored data to evaluate their statistical independence, which represents the localized difference between two data sets. The derived dissimilarity index can be used to monitor batch process operation and detect abnormal events. The application of the presented approach to the fed-batch penicillin fermentation process demonstrates that it outperforms the regular MKICA method in accurately detecting various types of process faults with significantly lower false alarm rates and shorter delays of fault detection. Therefore, the proposed MKICA−MMI dissimilarity method provides an excellent tool for monitoring dynamic batch processes with inherent nonlinearity, non-Gaussianity, and multiplicity of operating phases. Future research will extend the new dissimilarity method for fault identification and diagnosis of batch processes.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



REFERENCES

(1) Bakshi, B. R.; Locher, G.; Stephanopoulos, G.; Stephanopoulos, G. Analysis of operating data for evaluation, diagnosis and control of batch operations. J. Process Control 1994, 4, 179−194. (2) Lennox, B.; Montague, G. A.; Hiden, H. G.; Kornfeld, G.; Goulding, P. R. Process monitoring of an industrial fed-batch fermentation. Biotechnol. Bioeng. 2001, 74, 125−135. (3) Chiang, L. H.; Leardi, R.; Pell, R. J.; Seasholtz, M. B. Industrial experiences with multivariate statistical analysis of batch process data. Chemom. Intell. Lab. Syst. 2006, 81, 109−119. (4) Yu, J.; Qin, S. J. Variance component analysis based fault diagnosis of multi-layer overlay lithography processes. IIE Trans. 2009, 41, 764− 775. (5) Wold, S.; Geladi, P.; Esbensen, K.; Ö hman, J. Multi-way principal components-and PLS-analysis. J. Chemom. 1987, 1, 41−56. (6) Nomikos, P.; MacGregor, J. F. Multi-way partial least squares in monitoring batch processes. Chemom. Intell. Lab. Syst. 1995, 30, 97− 108. (7) Kosanovich, K. A.; Dahl, K. S.; Piovoso, M. J. Improved process understanding using multiway principal component analysis. Ind. Eng. Chem. Res. 1996, 35, 138−146. (8) Wise, B. M.; Gallagher, N. B. The process chemometrics approach to process monitoring and fault detection. J. Process Control 1996, 6, 329−348. (9) Martin, E. B.; Morris, A. J. Non-parametric confidence bounds for process performance monitoring charts. J. Process Control 1996, 6, 349− 358. (10) Ü ndey, C.; Ertunç, S.; Ç inar, A. Online Batch/Fed-Batch Process Performance Monitoring, Quality Prediction, and Variable-Contribution Analysis for Diagnosis. Ind. Eng. Chem. Res. 2003, 42, 4645−4658. 10920

dx.doi.org/10.1021/ie301002h | Ind. Eng. Chem. Res. 2012, 51, 10910−10920