Multivariate Trajectory-Based Local Monitoring Method for Multiphase

Jan 12, 2015 - analysis (PCA) for multiphase batch process monitoring. To handle ... Furthermore, to acquire a more reliable monitoring performance, a...
1 downloads 5 Views 4MB Size
Article pubs.acs.org/IECR

Multivariate Trajectory-Based Local Monitoring Method for Multiphase Batch Processes Feifan Shen, Zhiqiang Ge,* and Zhihuan Song State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Department of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, People’s Republic of China ABSTRACT: This paper proposes a new method combining the multivariate trajectory analysis and the principal component analysis (PCA) for multiphase batch process monitoring. To handle the uneven length problem, the trajectories of process variables are calculated instead of the original samples. For online monitoring, similar trajectories are extracted by just-in-time learning (JITL) with historical trajectories and the PCA model is constructed, which can deal with the missing data problem as well. Furthermore, to acquire a more reliable monitoring performance, a new distance-based measurement is proposed to show the location of samples. For performance evaluation, case studies of a numerical example and a simulated penicillin fermentation process are provided, with detailed comparisons to traditional methods.

1. INTRODUCTION Batch processes play an important role in modern industry, producing high-value products such as for food, plastics, pharmaceuticals, biologicals, and semiconductors.1−4 Since these products require high quality, data-based monitoring methods for batch processes are necessary to ensure safety and reliability. In past years, multivariate statistical process monitoring (MSPM) methods such as multiway principal component analysis (MPCA), multiway partial least squares (MPLS), and multiway Gaussian mixture model (MGMM) have been applied in batch process monitoring.5−7 Various research works have been proposed for the multivariate statistical monitoring of batch processes.8−14 However, most of these methods have the assumption that each batch has the same duration and the sampling intervals are equal, while the length and the intervals may vary from batch to batch in terms of practical operation conditions. It is crucial to handle this uneven problem when monitoring methods are applied in practical batch processes. The uneven length problem generally consists of two aspects. One is the unequal length of each batch run, and the other is the diversity of local time. To solve this problem, different types of methods have been proposed in recent years. The simplest method is minimizing the batch length according to the minimum length or constructing a model with long batches but treating absent samples as missing data.15 However, it may lead to information loss and a false alarm, which is serious for practical batch process monitoring. An improved method is proposed with an indicator variable instead of a time index.16 This variable is selected to have similar starting and ending values during each batch. Then the original variables are processed and calculated to match the indicator variable and finally make each batch have the same length and index. This method proves useful in simple chemical processes when the operation condition is not quite complicated. However, data resampling with different indexes is not reliable actually, since some direct information may be lost and the original correlation of variables may be changed in complex situations. Another type of method is called “warping technique”, such as © 2015 American Chemical Society

dynamic time warping (DTW) and correlation optimization warping (COW).17,18 They have been used previously in speech recognition and then applied in monitoring of uneven batch processes by correcting process samples in different trajectories to synchronize uneven batches. Generally, it is complicated to achieve this kind of monitoring method and the result is not accurate because samples are corrected and transformed. Besides, other methods have also been proposed in recent years to solve the monitoring problem of uneven batch processes, such as a statistical analysis based method, local batch time approach, etc.19,20 In multiphase batch processes, normal batches usually have similar trajectories during the same time period, while fault batches present entirely different trajectories at the time that fault occurs.21,22 Thus, capturing the information on batch trajectories will help identify fault conditions by trajectory matching. In the position system, trajectory clustering has been widely used, which consists of the information on trajectory direction, speed, and sample location.23,24 Multivariate process trajectories have drawn much attention in recent years.25 However, attention mainly focuses on the individual variable trajectory and no significant approach is available for the multivariate trajectory monitoring which combines all the variable trajectories in a statistical way. Therefore, we proposed our method to make the trajectory-based idea available for the statistical monitoring of uneven batch processes. In fact, industrial process variables have similar trajectories, which indicate the trend of samples with information on speed, angle, and location. Therefore, the trajectory-based method can be applied for the monitoring purposes of uneven batch processes. However, the original method is complicated when the dimension of samples is high and the decision of the Received: Revised: Accepted: Published: 1313

October 3, 2014 January 11, 2015 January 12, 2015 January 12, 2015 DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research

Figure 1. Illustration of the uneven batch length problem with various time indexes.

weighting parameter is difficult to make without any prior experience. In the present paper, a novel trajectory-based monitoring approach for uneven batch processes is proposed to overcome the disadvantage of the traditional method. The information on trajectory speed and angle is calculated by the difference between samples with a fixed step, which is normalized afterward. Then, the trajectory vectors are used to construct monitoring models instead of the original samples. The trajectories of the samples in the same phase are similar to each other, so it could be more effective to achieve trajectory matching for monitoring than the original sample clustering method. When a new sample comes, the trajectory vector is calculated with the fixed step and mean-scaled to prepare for monitoring. Then the just-in-time learning (JITL) method is introduced to extract the most similar trajectories and updates the local PCA model,26−28 based on which two statistics are constructed to implement the monitoring scheme. Another statistic measuring the location is also calculated with the original new sample as well, which is a complement to the monitoring results of the trajectory speed and angle. Compared to other methods for uneven batch processes, this method shows more sensitive and reliable fault detection results even without adequate sample information. With uneven sampling intervals and limited samples, sample trajectories are calculated to capture the properties of the process, which leads to a trajectory-based monitoring scheme. The remainder of this paper is organized as follows. Section 2 introduces preliminaries of the proposed method including the trajectory calculation and JITL method, followed in section 3 by the detailed methodology on the monitoring steps for uneven batch processes with missing data. In section 4, a numerical example and the penicillin fermentation process are introduced for simulation and comparison. Finally, the conclusion is made in section 5.

si = λ exp( −di) + (1 − λ) cos(θi) di = x − x i

(1)

2

cos(θi) = ⟨x , xi⟩/( x

2

xi 2 )

where di and θi are the similarity values of distance and angle, respectively. However, when the value of cos(θi) is negative, this result will not be used for modeling. λ is the weighting factor between distance and angle, which varies from 0 to 1. Thus, historical samples with larger similarity values will be extracted for model updating. 2.2. Principal Component Analysis (PCA). PCA is a traditional monitoring method with the assumption that data is Gaussian and linear, which extracts the direction with maximum variance after data transformation. The method is described as follows:29 X = TPT + E T 2 = t T Λ−1t ∼ SPE(x) = x ̃

l(n2 − l) F (l , n − l ) n(n − l)

(2)

2

where X is the original data, T is the score matrix of principal components, P is the loading matrix, E is the residual matrix, and T2 and SPE are two monitoring statistics in principal subspace and residual subspace, respectively, with principal variance matrix Λ, sample number n, principal component number l, and residual x̃.

3. METHODOLOGY 3.1. Uneven Length Problem in Batch Process. In multiphase batch processes, it is difficult to handle the uneven length problem of batches. During each batch, the total sampling time may vary from one batch to another, while the sampling interval could change as well. This fact results in the nonuniformity of batch data, which may present various time indexes (time index k = 1, 2, ..., K) at the same sample sequence. As a matter of fact, a unified measurement should be implemented to solve the uneven length problem. The uneven length problem is illustrated in Figure 1, which presents different batch lengths and time indexes. As mentioned previously, trajectory vectors can be calculated and replace the original sample vectors, which describe the

2. PRELIMINARIES 2.1. Just-in-Time Learning (JITL). The main idea of JITL is to achieve the online data match by similarity comparison between the current sample and historical samples and update the data set for modeling. Given a new data sample, it is compared with historical data in distance and angle. The similarity value is calculated as follows:27 1314

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research trajectory of variables in a fixed time step. Unlike the original samples with various indexes, trajectory vectors in the closed time period are similar when the operation condition is stable, which ignores the index and sequence of vectors. 3.2. Trajectory Vectors and Date Preprocessing. Instead of the original sample vectors, trajectory vectors are used to feature extraction and model construction, due to its simplicity and reliability for handling the uneven length problem. Trajectory vectors transform the original vectors and describe the trajectory of samples, which contains the information on trajectory direction, speed, and angle. The main idea of trajectory vectors is to describe the change of variables in a fixed step length by normalizing the vectors. To acquire the trajectory vectors, the calculation is described as follows:

each batch are not equal, which means that the data set cannot be used immediately. Further data processing needs to be executed before modeling, and there is no perfect approach to overcome this deficiency. After unfolding, the two-dimensional data should be normalized and mean-scaled for further modeling. It is a necessary step but may cause information loss when the values of variables change frequently or rapidly. Compared to the traditional method, the trajectory-based approach has its advantages. First, every vector is independent and data unfolding is not necessary under the condition that the trajectory vectors are calculated directly by the original data set. All trajectory vectors are treated as a new data set for further modeling. On the other hand, trajectory vectors are normalized during calculation, which means that no more data reprocessing is required before modeling. Besides, when the time label of the original data set is not acquired, it is difficult to implement some local monitoring method that no accurate information on sampling time can assist the moving window or other strategies. With the use of trajectory vectors, modeling can be achieved even without the information on sampling time. This is because the trajectory of samples during a stable operation condition is similar regardless of the accurate sampling time and what we should do next is to extract the most similar trajectory samples for modeling. 3.3. Trajectory-Based Modeling and Monitoring. When a new sample comes, it is classified into one phase as mentioned previously. Then a local trajectory model needs to be constructed by historical trajectory vectors for monitoring purposes. To extract the most similar vectors, the JITL method is introduced due to its advantage on similarity comparison. Then the local PCA model is constructed with the extracted similar trajectory vectors. 3.3.1. Similar Vectors Extraction. The new trajectory vector is calculated first with the fixed step, and then the phase recognition is achieved. In the corresponding phase, all the trajectory vectors are considered as historical data. Unlike the traditional local method, this JITL-based method requires no local index information. Vectors with the same phase type are considered as potential modeling data. The JITL method is used as follows:

ti , j , k = xi , j , k − xi , j , k − s x̅ i , k = [ti ,1, k , ti ,2, k , ..., ti , J , k]T x̿ i , k = x̅ i , k / x̅ i , k i = 1, 2, ..., I ; j = 1, 2, ..., J ; k = s + 1, s + 2, ... ,K

(3)

where xi,j,k is the jth variable of the sample at the ith batch and kth time index, s is the step length, xi̅ ,k includes every variable trajectory value at the ith batch and kth time and xi̿ ,k is the normalized form of trajectory vectors. It is noticed that K is the length of the time index in the longest batch, and in other batches k may not reach the value of K. Generally, the batch process data is three-dimensional (batch index i, time index k, variable index j), and it is difficult to construct monitoring models directly. Most of the existing methods unfold the three-dimensional data into two-dimensional as data preprocessing.30,31 As shown in Figure 2, the original data set can be unfolded batch-wise or variable-wise unfolding. However, both unfolding methods show shortcomings in practical monitoring because of the uneven length problem and data correlations. For example, when data is unfolded through batch-wise unfolding, the sample amounts in

di , k = x c − x i , k

2

cos(θi , k) = ⟨x c, x i , k⟩/( x c

2

xi,k ) 2

si , k = λ exp( −di , k) + (1 − λ) cos(θi , k)

(4)

X = [x1̂ ...x̂ i...x̂ n]

where xc is the current trajectory sample and xi,k is the historical trajectory sample in the ith batch and kth time slice. di,k and θi,k are the distance and angle between xc and xi,k, respectively. si,k is the similarity value between the current trajectory sample and the historical trajectory sample in the ith batch and the kth time slice, and λ is the weighting parameter of distance and angle. Then X is obtained for modeling with the most similar trajectories x̂i, which have larger values of si,k. As described above, the n trajectories with the most similarity degree are extracted by JITL for modeling purposes. The value of n depends on the scale of the historical data set. Thus, historical data with the trajectories most similar to the current trajectory vector is figured out to support further monitoring strategy. 3.3.2. Local Monitoring Approach. Based on the extracted trajectory, the nonlinearity of the original samples could be

Figure 2. Data unfolding approach of the traditional methods. 1315

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research

Figure 3. Variables distribution of original samples with a one-degree variable x1 and a higher degree variable x3 compared to the trajectory variables distribution: (a) five-degree variable; (b) six-degree variable; (c) seven-degree variable; (d) eight-degree variable.

reduced. The sample difference is figured out by trajectory calculation immediately, which shows a more linearly dependent trend of variables instead of the original nonlinear relation between variables. Generally, the differential calculation leads to the reduction of variable degrees, while variables with lower degrees tend to have less nonlinear relation between each other. Figure 3 shows an example where the trajectory calculation can obviously reduce the data nonlinearity. As shown in Figure 3, the original samples with three variables show obvious nonlinearity when the degree of the variables is high. After the trajectory calculation, data nonlinearity has been significantly reduced. To illustrate the effect more clearly, the degree of nonlinearity can be achieved by the computation of correlation coefficients as shown in Table 1. When the trajectory calculation is executed, the degree of nonlinearity is reduced according to the correlation coefficient. To make the computation simple and effective, the PCA model is introduced for the modeling and monitoring purposes when the data nonlinearity is within an acceptable degree. X is the trajectory matrix with the most similar trajectories extracted

by the JITL method and is used as the data matrix in the PCA algorithm; further calculation is based on this trajectory matrix. Different from traditional PCA, the principal subspace and the residual subspace have additional significance according to the property of the trajectory vectors. As mentioned in section 2, the trajectory calculation implements a step difference and variables with low dimension change little in trajectory vectors, while high-degree variables remain fluctuating after the trajectory calculation. The PCA algorithm extracts the information on samples with the most changes as the principal subspace. In this case, the information on the higher-degree variables is extracted in the principal subspace, while variables with lower degrees are extracted in the residual subspace. 3.3.3. Monitoring Statistics. After the trajectory-based PCA modeling, monitoring statistics need to be constructed to achieve the process monitoring and fault detection. In the traditional PCA method, the T2and SPE statistics are defined as the monitoring statistics for the principal subspace and the residual subspace, respectively. Therefore, in the trajectorybased PCA method, these two statistics can still be used as monitoring statistics in the corresponding subspace. Unlike the traditional PCA, the statistics of the proposed method can detect fault immediately according to the property of trajectory vectors. To illustrate the efficiency, Figure 4 shows the detailed information. As shown in Figure 4, during the normal condition, the trajectories of the variable remain similar until the step change occurs. With the influence of the step bias, several obvious abnormal trajectories are detected, the amount of which

Table 1. Correlation Coefficients of Variables degrees original samples trajectories

five

six

seven

eight

0.8882 0.9996

0.8483 0.9995

0.8110 0.9990

0.7784 0.9980 1316

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research

Figure 4. Trajectories of the variable with a step change.

Figure 5. Flow diagram of modeling and monitoring for the uneven batch problem: (a) data preprocessing; (b) online monitoring procedure of normal condition; (c) online monitoring procedure of fault condition.

condition after fault detection. Thus, an additional reference needs to be introduced to help the judgment of whether the process has recovered from the fault condition. In the current work, a distance-based method is introduced to assist the two traditional statistics. While several similar trajectory vectors are extracted in the JITL method for

corresponds to the trajectory step length. After the detection, the trajectories recover to normal condition. Due to the property of the trajectory-based method, fault conditions can be easily detected and false alarm is significantly avoided, which contributes to a sensitive and reliable fault detection strategy. However, as described above, the statistics will return to normal 1317

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research

Figure 6. Process monitoring results of the normal condition: (a) PCA; (b) KPCA; (c) trajectory-based PCA.

Another advantage is that most multiphase methods require phase division and recognition steps. Sample clustering and phase division with historical data is necessary before modeling and monitoring, such as k-means and Gaussian mixture model. However, these methods classify samples into a confirmed category, while some samples have unique features actually and are forced to belong to one cluster. Thus, mistakes may take place and false alarms may occur. In the proposed method, most similar trajectories which have the same feature are extracted so that phase division and recognition are not necessary steps. Therefore, the model construction is more accurate with similar trajectories and errors in modeling and monitoring could be reduced. Besides, the proposed method has a better view of monitoring results with abnormal statistics which has an equal length or more than the trajectory calculation step when fault occurs. Meanwhile, after the fault disappears and the process returns to its normal condition, another abnormal monitoring result will indicate this change. Therefore, a false alarm can be avoided and the monitoring results are more reliable and sensitive.

modeling, the original samples before trajectory calculation are obtained as well. Because of this, the distance between the current sample and the trajectory-similar samples can be easily acquired as follows: n

dc =

∑ x i /n − x c i=1

(5)

where xc is the current original sample and xi are the original samples which are extracted for trajectory calculation and modeling. It can be understood that if the distance between the current sample and the normal sample is obviously large, the fault is still existing even if the monitoring statistic returns to the normal condition. Meanwhile, when the distance is similar compared with the normal condition and the monitoring statistic presents normal, the fault may have disappeared. As shown in Figure 5, under the monitoring of the trajectorybased T2 and SPE statistics as well as the distance measurement, the fault detection of uneven batch processes is finally achieved. 3.4. Advantages of the Trajectory-Based Method. Different from the traditional sample-based method, the proposed method transformed samples into trajectories, which takes the uneven and missing data problems into consideration. The diverse number of samples in different batches will not affect the monitoring results of the proposed method and several missing samples are acceptable because of the mean-scaled trajectory calculation.

4. ILLUSTRATIONS AND RESULTS In this section, two case studies are introduced to demonstrate the efficiency of the proposed method. One is a numerical example, which illustrates the effectiveness of the basic algorithm. The other one is the penicillin benchmark process, 1318

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research

Figure 7. Process monitoring results of fault 1 (T = 10): (a) PCA; (b) KPCA; (c) trajectory-based PCA.

x1 = 5t + e1

which is widely used for algorithm validation and performance evaluation in batch process monitoring. 4.1. Numerical Simulation. Given a nonlinear system with three variables as follows:

x 2 = 4t 2 − 3t + e 2 3

x3 = −3t + 3t − t + e3 + S

x1 = 5t + e1 2

x 2 = 4t − 3t + e 2 3

The following equations represent the third fault as a gradual change introduced in the 101 sample. Besides, this fault recovers at the 151 sample.

(6)

2

x3 = −3t + 3t − t + e3

x1 = 5t + e1 x 2 = 4t 2 − 3t + e 2 + M(t − 100)

where x1, x2, and x3are process variables, t represents latent variables, and e1, e2, and e3 are independent random noises with the distribution of N(0,0.01). To demonstrate the performance of the proposed method, 200 samples are generated and three faults are introduced to testify to the effectiveness of the algorithm. The first fault is a time delay for variable 2 introduced in the 151 sample as described by

3

3

(9)

2

x3 = −3t + 3t − t + e3

where T represents the time delay, S is the step change, and M is the magnification of the gradual fault. Five normal processes with different noises are used as five independent batches. To demonstrate the advantage of the proposed method, 145 random historical samples in each batch run are extracted for trajectory calculation and modeling with a trajectory step length as 10, while the traditional PCA and the kernel PCA (KPCA) method are introduced with all 200 samples in each batch for comparison. Test batches which are under different conditions with 200 samples each are introduced for performance evaluation. For JITL method, the 20 most similar trajectories are extracted for modeling. The monitoring results of the normal condition are presented in Figure 6.

x1 = 5t + e1 x 2 = 4(t − T )2 − 3(t − T ) + e 2

(8)

2

(7)

2

x3 = −3t + 3t − t + e3

The second fault is a step change for variable 3 introduced in the 71 sample as described by 1319

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research

Figure 8. Process monitoring results of fault 2 (S = 1.5): (a) PCA; (b) KPCA; (c) trajectory-based PCA.

comes below the control limit at the 190 sample, the distance measurement of the trajectory-based PCA reveals that the process never returns to normal condition. Figure 9 shows the monitoring results of fault 3, which recovers after 50 samples and the magnification is set as 0.05. Different from fault 1 and fault 2, the traditional PCA presents satisfactory monitoring results. The KPCA method detects the fault immediately as well, though its performance is poor when the process returns to normal as circled in Figure 9b. When the process recovers from fault condition, a period of abnormal statistics is still shown with the KPCA method, which can be considered as false alarms. As shown in Figure 9c, when the fault occurs and then disappears, the trajectory-based PCA method detects both conditions with two step-length abnormal statistics, reflecting that two severe changes have happened during these two time periods. Meanwhile, the distance indicates that the fault keeps until the 150 sample, and the process returns to normal condition after the distance comes normal. From the results with numerical examples, it can be figured out that the proposed trajectory-based method has a better performance than the PCA and KPCA methods even without all historical data due to the property of the algorithm. Besides, it reveals the happening and the recovery of fault with a clear view and the distance measurement is a reliable assistant to the fault detection. 4.2. Penicillin Fermentation Process. In this subsection, the well-known fed-batch penicillin benchmark process is used

The confidence levels for all methods are set as 95%, which show different performances even in the normal condition. Due to the nonlinearity of samples, the PCA presents unexpected abnormal monitoring results, while KPCA has a better performance because of its adjustment to the nonlinear data. Meanwhile, the trajectory-based PCA shows a flexible control limit according to the algorithm, and the monitoring result is excellent with nonlinear samples. At the same time, the measurement of distance in the trajectory-based PCA keeps stable under the normal condition, which indicates that these two statistics above have a good monitoring performance. The monitoring results of fault 1 are shown in Figure 7, and the delay is set as 10. Figure 7 indicates that the traditional PCA can hardly detect this fault since the time delay is not long enough and the relationships are nonlinear. KPCA can detect this fault from the 170 sample, which has a 20 delay, while the trajectory-based method detects the fault instantly. Besides, the distance reflects that the fault stays to the end of the process. As shown in Figure 8, the monitoring results of fault 2 are presented and the step change is set as 1.5. Similar to the results of fault 1, the traditional PCA can hardly detect this fault, while the KPCA method has a better monitoring performance. However, as circled in Figure 8b, the KPCA method fails to provide a sustained fault detection result. Compared with these two methods, trajectory-based PCA has an excellent monitoring performance, which has a 100% detection rate during a steplength time period (the step length of the trajectory calculation is set as 10). Besides, unlike KPCA whose monitoring statistic 1320

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research

Figure 9. Process monitoring results of fault 3 (M = 0.05): (a) PCA; (b) KPCA; (c) trajectory-based PCA.

Figure 10. Penicillin fermentation process.

The flow diagram of the penicillin process is presented in Figure 10. The whole process can be divided into two phases, which consist of a batch growth phase before 40 h and a fed-

to evaluate the monitoring performance. The simulation software PenSim v2.0 is proposed by the Illinois Institute of Technology. 1321

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research batch production phase. In the first phase, glucose is consumed and biomass is accumulated for the preparation of penicillin production. Then it comes to the second phase and penicillin is produced during this phase until the end of a batch. In this paper, five batch runs are introduced for performance evaluation. The duration of each batch is 400 h and the sampling intervals are different because the uneven batch length problem is focused on. The sampling intervals are set as 0.8, 0.9, 1, 1.1, and 1.2 h, and each batch run has 500, 444, 400, 363, and 334 samples, respectively. With these five uneven batches, 12 variables are chosen for process monitoring as listed in Table 2.

problem. To address the uneven problem, the sampling interval of the testing batch is 0.95 h, which has 421 samples during one batch. The step of trajectory calculation is set as 10, and for JITL method, the 20 most similar trajectories are extracted for modeling. The first fault is a step change of the agitator power introduced in 200 h and disappeared in 300 h, while the second fault is a ramp change of the substrate feed rate introduced in 200 h and disappeared in 300 h. In this work, the multiway principal component analysis (MPCA) and the multiway kernel principal component analysis (MKPCA) are introduced for comparisons. According to the uneven and missing data problem, these two methods are difficult for batch-wise unfolding since the number of samples in each batch is different. Thus, the data preprocessing used in these two methods is variable-wise unfolding. All samples are extracted for the modeling of the MPCA and MKPCA, while 200 random samples are selected for the trajectory calculation and modeling of the trajectory-based method. The confidence level is selected as 95%. The monitoring results of the normal condition are shown in Figure 11. It can be inferred from Figure 11 that MPCA presents abnormal monitoring results even under normal process condition. There are several reasons resulting in the poor performance, including the multiphase characteristic of the process, the data nonlinearity, and probably the uneven batch length which leads to the unavailability of the batch-wise unfolding for data preprocessing. In contrast, MKPCA can handle the problem of data nonlinearity, and has a better performance than the MPCA method. However, it shows abnormal statistics during the first period of monitoring results. Unlike these two traditional methods, the proposed method has a tighter control limit in the monitoring results and most

Table 2. Variables Selected for the Monitoring of Penicillin Process no.

variable

1 2 3 4 5 6 7 8 9 10 11 12

aeration rate (L/h) agitator power (W) substrate feed rate (L/h) substrate feed temp (K) pH generated heat (kcal) substrate concn (g/L) dissolved oxygen concn (g/L) biomass concn (g/L) penicillin concn (g/L) culture volume (L) carbon dioxide concn (g/L)

Besides, two abnormal batches are generated to testify to the effectiveness of the proposed method for the uneven batch

Figure 11. Process monitoring results of the normal condition: (a) MPCA; (b) MKPCA; (c) trajectory-based PCA. 1322

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research

Figure 12. Process monitoring results of fault 1: (a) MPCA; (b) MKPCA; (c) trajectory-based PCA.

Figure 13. Process monitoring results of fault 2: (a) MPCA; (b) MKPCA; (c) trajectory-based PCA.

statistics are under the control limit. The distance changes

Then the monitoring results of fault 1 are presented in Figure 12, which is a 10% step decreasing of the agitator power. MPCA fails to detect this fault between the 200 and 300 h and presents abnormal statistics before this change. MKPCA illustrates satisfactory monitoring results during the abnormal

frequently during the first 100 trajectory samples, which indicates that in this time period the process is not under a stable condition. 1323

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research Notes

period. However, it still has false alarms even without fault happening as circled in Figure 12b, which occurs during the transition between phase 1 and phase 2 of the penicillin process. The dynamic and nonlinear features of the betweenphase transition may result in this consequence. On the contrary, the trajectory-based method illustrates the exact beginning time of the fault with 10 abnormal statistics, which is the step length of trajectory calculation. Meanwhile, after the recovery of the process, another 10 abnormal statistics indicate this critical change. The abnormal distance between 200 and 300 h shows that this fault influences the process during this time period. Therefore, with the help of the distance measurement, the change of the process condition is presented more clearly. The monitoring results of fault 2 are presented in Figure 13, which is a ramp change with −0.5 slope of the substrate feed rate. MPCA has detection delays in both situations when the fault occurs and disappears, and still has false alarms before 200 h. MKPCA has delays in 200 h when the fault happens as well. On the other hand, detection delays exist when the process returns to the normal condition. These monitoring results indicate that, under the condition that fault influences the process gradually, the traditional method may present detection delays, which may finally affect the online performance of monitoring. However, the trajectory-based method presents an earlier warning of fault condition with fewer detection delays. When the process returns to the normal condition, the proposed method detects this change instantly with abnormal statistics that has a longer length than the trajectory calculation step. Then, the results of distance measurement reveal that this ramp fault begins at 200 h and stops to influence the process at 300 h approximately.

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported in part by the National Natural Science Foundation of China (NSFC) (61273167), Project National 973 (2012CB720500).



(1) Lu, N.; Gao, F.; Wang, F. A sub-PCA modeling and online monitoring strategy for batch processes. AIChE J. 2004, 50 (1), 255− 259. (2) Undey, C.; Cinar, A. Statistical monitoring of multistage, multiphase batch processes. IEEE Control Syst. Mag. 2002, 22 (5), 40−52. (3) Yu, J. A Bayesian inference based two-stage support vector regression framework for soft sensor development in batch bioprocesses. Comput. Chem. Eng. 2012, 41, 134−144. (4) Ge, Z.; Gao, F.; Song, Z. Batch process monitoring based on support vector data description method. J. Process Control 2011, 21 (6), 949−959. (5) Nomikos, P.; MacGregor, J. F. Monitoring batch processes using multiway principal component analysis. AIChE J. 1994, 40 (8), 1361− 1375. (6) Nomikos, P.; MacGregor, J. F. Multi-way partial least squares in monitoring batch processes. Chemom. Intell. Lab. Syst. 1995, 30 (1), 97−108. (7) Yu, J.; Joe Qin, S. Multiway Gaussian mixture model based multiphase batch process monitoring. Ind. Eng. Chem. Res. 2009, 48 (18), 8585−8594. (8) Ge, Z.; Song, Z.; Gao, F. Review of recent research on data-based process monitoring. Ind. Eng. Chem. Res. 2013, 52 (10), 3543−3562. (9) Ge, Z. Q.; Song, Z. H.; Gao, F. R. Nonlinear quality prediction for multiphase batch processes. AIChE J. 2012, 58, 1778−1787. (10) Camacho, J.; Pico, J.; Ferrer, A. Multi-phase analysis framework for handling batch process data. J. Chemom. 2008, 22 (11−12), 632− 643. (11) Lu, N.; Gao, F. Stage-based process analysis and quality prediction for batch processes. Ind. Eng. Chem. Res. 2005, 44 (10), 3547−3555. (12) Yao, Y.; Gao, F. Phase and transition based batch process modeling and online monitoring. J. Process Control 2009, 19 (5), 816− 826. (13) Wang, D. Robust data-driven modeling approach for real-time final product quality prediction in batch process operation. IEEE Trans. Ind. Inf. 2011, 7 (2), 371−377. (14) Ge, Z.; Song, Z.; Gao, F. Incorporating setting information for maintenance-free quality modeling of batch processes. AIChE J. 2013, 59 (3), 772−779. (15) Kourti, T. Multivariate dynamic data modeling for analysis and statistical process control of batch processes, start-ups and grade transitions. J. Chemom. 2003, 17 (1), 93−109. (16) Rothwell, S. G.; Martin, E. B.; Morris, A. J. Comparison of methods for dealing with uneven length batches. In Computer Applications in Biotechnology, 1998, the 7th IFAC International Conference; International Federation of Automatic Control: Laxenburg, Austria, 1998; pp 387−392. (17) Kassidas, A.; MacGregor, J. F.; Taylor, P. A. Synchronization of batch trajectories using dynamic time warping. AIChE J. 1998, 44 (4), 864−875. (18) Fransson, M.; Folestad, S. Real-time alignment of batch process data using COW for on-line process monitoring. Chemom. Intell. Lab. Syst. 2006, 84 (1−2), 56−61. (19) Ge, Z.; Song, Z. Online monitoring and quality prediction of multiphase batch processes with uneven length problem. Ind. Eng. Chem. Res. 2014, 53 (2), 800−811.

5. CONCLUSION In this paper, a trajectory-based method is proposed to handle the uneven length problem of batch process monitoring with missing data. Different from the traditional methods, the proposed method uses data trajectories instead of the original samples, which can reduce data nonlinearity and simplify the data preprocessing step since the batch-wise unfolding is difficult to achieve when it meets various batch lengths. For online modeling and monitoring purposes, a JITL-based method is introduced with trajectory samples, which has less demand on phase division and recognition but with accurate modeling. Therefore, with the help of a distance measurement, the trajectory-based PCA is introduced for monitoring, which shows an excellent performance compared to the traditional methods. However, there are still some issues that need to be focused on in further work. For example, when the scale of historical data is much larger, the computing time for JITL method will be longer, which may influence the online monitoring performance. Besides, the trajectories of process variables and quality variables are not identified so that the quality prediction is not implemented. How to achieve quality-relevant batch process monitoring with uneven length and missing data also needs to be taken into consideration in future research.



REFERENCES

AUTHOR INFORMATION

Corresponding Author

*Tel.: +86-87951442. E-mail: [email protected]. 1324

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325

Article

Industrial & Engineering Chemistry Research (20) Zhao, L.; Zhao, C.; Gao, F. Inner-phase analysis based statistical modeling and online monitoring for uneven multiphase batch processes. Ind. Eng. Chem. Res. 2013, 52 (12), 4586−4596. (21) Ge, Z.; Zhao, L.; Yao, Y.; Song, Z.; Gao, F. Utilizing transition information in online quality prediction of multiphase batch processes. J. Process Control 2012, 22 (3), 599−611. (22) Duchesne, C.; MacGregor, J. F. Multivariate analysis and optimization of process variable trajectories for batch processes. Chemom. Intell. Lab. Syst. 2000, 51 (1), 125−137. (23) Gaffney, S.; Smith, P. In Trajectory clustering with mixtures of regression models. In Knowledge Discovery and Data Mining, 1999, the 5th ACM SIGKDD International Conference; Association for Computing Machinery: New York, 1999; pp 63−72. (24) Scheepens, R.; Willems, N.; Wetering, H.; Wijk, J. J. Interactive visualization of multivariate trajectory data with density maps. In The 4th IEEE Pacific Visualization Symposium; Institute of Electrical and Electronics Engineers: New York, 2011; pp 147−154. (25) Bogomolov, A. Multivariate process trajectories: capture, resolution and analysis. Chemom. Intell. Lab. Syst. 2011, 108 (1), 49−63. (26) Hu, Y.; Ma, H.; Shi, H. Enhanced batch process monitoring using just-in-time-learning based kernel partial least squares. Chemom. Intell. Lab. Syst. 2013, 123 (15), 15−27. (27) Liu, Y.; Gao, Z.; Li, P.; Wang, H. Just-in-Time Kernel Learning with Adaptive Parameter Selection for Soft Sensor Modeling of Batch Processes. Ind. Eng. Chem. Res. 2012, 51 (11), 4313−4327. (28) Cheng, C.; Chiu, M. S.; Shi, H. Nonlinear process monitoring using JITL-PCA. Chemom. Intell. Lab. Syst. 2005, 76 (1), 1−13. (29) Yao, Y.; Gao, F. A survey on multistage/multiphase statistical modeling methods for batch processes. Annu. Rev. Control 2009, 33 (2), 172−183. (30) Choi, S. W.; Morris, J.; Lee, I. B. Dynamic model-based batch process monitoring. Chem. Eng. Sci. 2008, 63 (3), 622−636. (31) Ge, Z.; Song, Z. Bagging support vector data description model for batch process monitoring. J. Process Control 2013, 23 (8), 1090− 1096.

1325

DOI: 10.1021/ie503921t Ind. Eng. Chem. Res. 2015, 54, 1313−1325