Batch Process Monitoring by Wavelet Transform Based Fractal

Forecasting the abnormal operations in batch units and limiting their incidence have been the emphasis on the study of the on-line monitoring batch op...
1 downloads 0 Views 1MB Size
3864

Ind. Eng. Chem. Res. 2006, 45, 3864-3879

Batch Process Monitoring by Wavelet Transform Based Fractal Encoding Hsuan Chang Department of Chemical and Materials Engineering, Tamkang UniVersity, 151 Ying-Chuan Road, Tamsui, Taipei, Taiwan 25137, Republic of China

Junghui Chen* and Yun-Peng Ho R&D Center for Membrane Technology, Department of Chemical Engineering, Chung-Yuan Christian UniVersity, Chung-Li, Taiwan 320, Republic of China

Forecasting the abnormal operations in batch units and limiting their incidence have been the emphasis on the study of the on-line monitoring batch operation. The abundant real-time data gathered in the automatic system contain the complexity and uncertainty of the process behavior, so mining and fusing a series of universal applicable quantities from the historical data to monitor the operation status is essential. In this paper, a combination of wavelet analysis (WT) and the fractal encoding (FE) technique is proposed to resolve the problem by exploiting their virtues. Using WT with a local-time frequency analysis, this strategy takes advantage of the multiresolution representations on the measured profiles. Adapted FE is used to segment regions within the multiresolution representations. By extracting fractal models, the proposed method, like the philosophy of traditional statistical process control, can generate simple monitoring charts, track the progress in each batch run, and monitor the occurrence of observable upsets. Due to the local property of FE, the on-line batch monitoring based on the proposed method of filling the missing values only at some local regions is good enough for detecting the current status. Additionally, the advantages of the proposed method are demonstrated through two sets of benchmark data, a DuPont industrial batch polymerization reactor and fed-batch penicillin production, which are characterized by some fault sources. 1. Introduction Interest in research and development of batch process monitoring based on the multivariable statistical control process has increased steadily since the original work of multiway principal component analysis (MPCA) of Nomikos and MacGregor.1 MPCA based on this approach has proven useful in prioritizing the activities of process engineers by monitoring a batch operation which could potentially benefit from removing the abnormal condition.2-5 Comprehensive reviews of this area have appeared in the literature.6-7 Most existing multivariate statistical process control (MSPC) methods are used to identify assignable or special causes of variability, but they are good for detecting changes on a single scale. This means that they cannot extract the events occurred at different scales in the chemical operation processes,8 such as sharp changes with different frequency characteristics at the same time intervals. The wavelet representation provides a multiresoultion and multifrequency expression of signals with localization in both the time and frequency domains. Such a property is desirable for nonstationary measurements because multiresoultion subbands become more stationary and can be coded separately according to local statistical properties.9 As the wavelet transform decomposes the measurement data, notable applications in chemical engineering have been found to analyze process trends in process monitoring.8,10-12 The localization of the signal information in the time-scale (or wavelet) domain with different resolutions could accurately pin down the transitions between the distinct process behaviors. A review paper on multiscale process monitoring algorithms can be found in the literature.13 Though multiscale-based PCA (MSPCA) has been successful in statistical process monitoring, the applications are restricted * To whom all correspondence should be addressed. E-mail: jason@ wavenet.cycu.edu.tw.

because interscale dependencies of wavelet coefficients existed. The signal distributions may locate and cluster in some timefrequency frames.14-16 In this paper, fractal encoding (FE) is used to extract the features from the time-frequency representation. The concept of FE has been proposed in the early twentieth century, but the automatic blockwise fractal image compression algorithm was first presented17,18 during the era of the 1980s. The study of FE for signal and image compression has been an interesting research area since the early 1990s due to its potential high compression ratio and fast decoding performance.19-22 The basic idea of fractals is that individual regions of an image may exhibit similarity to each other at different regions. The arbitrarily sub-image block of an image can be approximated by a bigger sub-image block in the same image which is transformed by the contractive map. However, the spatial domain based on the FE scheme suffers from an annoying blocking effect especially at some local blocks which contain the information contents. This will hinder its practical applications. In this paper, a novel approach that combines wavelet transforms (WT) with fractal encoding (FE) as feature extraction method is presented to detect on-line operating batch processes. The wavelet transform (WT) decomposes measured data into a combined time-frequency representation. To have proper regions for FE, a recursive binary partitioning method for classification and regression trees (CART)23 is applied to breaking the time-scale representation into a set of nonoverlapping range blocks with a hierarchical tree structure. Then, by FE, a small set of good local features is extracted from the original timescale information. As for the interactive multivariable batch process, the measured profiles preprocessed by MPCA are decomposed into the score variables in the lower dimensional space. The principal component scores then are modeled by WT and FE. For on-line batch process monitoring, the control limits calculated from the statistics are also developed. The paper is structured as follows. The next section prepares the groundwork for the proposed method by presenting a brief

10.1021/ie050856i CCC: $33.50 © 2006 American Chemical Society Published on Web 05/02/2006

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006 3865

Figure 3. SWT of a measured profile (a) from the fine level to the coarse level (b-e).

Figure 1. Wavelet decomposition (a) with DWT and (b) with SWT.

Figure 4. Time-scale resolution of SWT of the measured profile.

of the proposed method are developed in section 3. Its monitoring statistics are also established to build up the confidence limit at each sampling point in order to monitor the on-line operating profiles. In section 4, the effectiveness of the proposed method is demonstrated through two sets of benchmark data, including a DuPont industrial batch polymerization reactor and fed-batch penicillin production. The examples investigate the performance of the proposed method and make a comparison with conventional algorithms. Finally, concluding remarks are made.

Figure 2. DWT of a measured profile (a) from the fine level to the coarse level (b-e).

overview: (1) of wavelet as a tool for multiresolution analysis, (2) of CART for classifying the clustering and persistence of the wavelet coefficient distribution in the wavelet domain, (3) of FE as a compression tool for extracting the similarity from the cluster regions, and (4) of MPCA for decomposing the process variables. The approaches for monitoring batch units

2. Background Techniques Wavelet Transform (WT). The basic idea of WT is to decompose a time series as a weighted sum of shifted and scaled versions of the wavelets that are suited for capturing the local behavior of nonstationary series such as sharp changes with different frequency characteristics at the same time intervals. The wavelet coefficients are obtained by computing the correlation between the scaled and time shifted version of the wavelets and the analyzed part of the series. The vector of coefficients at scale V is represented as

xVa ) H0xV-1 a xVd ) H1xV-1 a

3866

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006

SWT, the main drawbacks of DTW are its stationarity24,25 and lack of accuracy at higher scales where fewer wavelet coefficients are available. Figures 2 and 3 illustrate the effect of the discretized wavelet of the performance of DWT and SWT. The same measured profile with the measured noise (Figures 2a and 3a) is decomposed by DWT and SWT. The features from the fine scale to the coarse scale extracted by DWT and SWT are shown in Figures 2b-e and 3b-e, respectively. The SWT model has better quality to capture the change of the process behavior at the coarse scale (V ) 4) when compared with DWT. At the coarse scale, DWT postpones the change to be found after the change takes places. Besides, the significant fractuations of signal captured DWT occurr at the lower scales. On the contrary, in SWT multiresolution plots, the sketch change is captured by large wavelet coefficients at the higher-scaled resolution, and the sharp change, at the lower scaled one. This is also particularly important for automatic control systems since DWT is not susceptible to faults that cause an unacceptable deterioration of the performance or even lead to dangerous situations. In this research paper, SWT is selected to perform the timefrequency analysis. After L decompositions, the transformation of x(K × 1)can be represented by

w ) WAx ) [H11 H21 ‚‚‚ HL1 HL0 ]Tx ) [(x1d)T (x2d)T ‚‚‚ (xLd )T (xLa )T ]T

(2)

where WA is the transformation matrix of SWT containing the filter coefficients at different scales. Hi1 and Hi0 are expressed in terms of successive filters of SWT, H0 and H1. Thus, w is the decomposition of x by applying the wavelet transform. Similarly, the reconstructed signal xˆ (K × 1) is xˆ ) WSw ) )

Figure 5. CART procedure whose parent node is divided into two child nodes at the split point ξ.

where x0a ) x and x is the collections of the measured signals at K equally spaced time points. H0 and H1 are the orthonormal wavelet transform matrices for low-pass and high-pass filters, respectively. They are gotten from a sequence of linear filtering operations. The terms xVd and xVa are the projections on the highpass and the low-pass components, respectively, at scale V. The decomposition process is iterated by downsampling such that the approximations are successively decomposed. This analysis is referred to as discrete wavelet transform (DTW) (Figure 1a). The size of each rectangle in the figure is dictated by the particular scale (V) and the fixed interval of the translation (k). Going down, the figures would correspond to higher scales. Stationary (or redundant) wavelet transform (SWT) is derived in a similar fashion as DWT is. The high- and low-pass filters are applied to the data at each scale to produce two sequences at the next scale. Instead of downsampling, an upsampling procedure is carried out before performing filter convolution at each scale (Figure 1b). Therefore, it leads to a redundant signal decomposition that has a better characterization of the signal evolution in the time domain for statistical analysis, especially for recognizing noises.24 It is capable of accurate tracking of the temporal evolution of the signal with the same precision at all the scales, because each subband of SWT contains the same quantity of the coefficients as the input signal. Compared with

[G11 G21 ‚‚‚ GL1 GL0 ]T [(x1d)T (x2d)T ‚‚‚ (xLd )T (xLa )T]T G11x1d + G21x2d + ‚‚‚ + GL1 xLd + GL0 xLa

where WS is the transformation matrix for reconstruction containing the synthesis filter coefficients at different scales. Gi1 (i ) 1,2, ..., L) and GL0 are expressed in terms of successive reconstruction filters of SWT. Classification and Regression Tree (CART). Basically, wavelet decomposition only transfers the data from the time domain into the time-frequency domain. All the information contents in the data remain unchanged, but there are clustering and persistence properties of the wavelet transform.15 This means that the adjacent coefficients (shown in Figure 4) are also very likely to be large/small when a particular wavelet coefficient is large/small.26,27 To get the proper size of ranges, the local dependency in the time-frequency domain can be explored if the split-selection method is applied. The classification and regression tree (CART) is used here. The tree is constructed by a divide-and-conquer strategy whose attribute space (called a parent node) is partitioned into a set of nonoverlapping regions (called two child nodes). The process can be repeated by treating each child node as a parent. Here, we describe how CART is used to solve the classification of the regions in the SWT domain. To classify the regions of the wavelet information in the time-scale domain, the range of w ) {w(k,ν)} which is computed by eq 2 is first divided into several level intervals. For the simple representation, the region of each level (r(c)) is identified by

{r(c)|min{w} + (c - 1)cd e r(c) e min{w} + ccd, c ) 1, 2, ..., C} (4)

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006 3867

where the equal width of intervals (cd) are set, cd ) {[max{w} - min{w}]}/{C}, and C is the number of levels. For any region r extracted from the parent region P (Figure 5), the energy probability estimates (p(r)) for block r are defined as

p(r) )



w2(k,V)



w2(k,V)

(k,V)∈r

(5)

(k,V)∈P

w2(k,V) represents the squared wavelet coefficients over the covered region. The energy probability estimates (p(c|r)) for block r assigned to the level c is defined as



p(c|r) )

w (k,V) c ) 1, 2, ..., C

(6)

2

w (k,V)

(k,V)∈r

A cost function that measures the block (r) impurity is

Q(r) ) Q(p(1|r), p(2|r), ..., p(C|r))

(7)

The cost function should have a maximum value when all the levels are mixed together and have a minimum value when the block contains a pure level. The best-known impurity entropy function (Q) is used, C

Q(r) ) -

[ ] [ ][ ] [ ]

an bn 0 k pn k An V ) c n dn 0 V + qn wk,V on 0 0 sn wk,V

2

(k,V)∈r(c)



Figure 6 illustrates how to encode the wavelet domain by applying the FE procedure. From the range blocks obtained by the classification tree, the SWT plane is partitioned into a set of nonoverlapping range blocks rn, n ) 1, 2, ..., NR, shown at the bottom of Figure 6. Each block rn is the size Mn1 × Mn2, a portion of the SWT plane (D) as rn ∈ BR(D). Each domain block, dl ∈ BD(D), l ) 1, 2, ..., ND (at the top of Figure 6), is the size Ml1 × Ml2, such that the result of applying affine transformation An over dl is the most similar one to the range block, rn. The affine transformation between a range block (rn) and a domain block (dl) is defined as

∑p(c|λ) log p(c|r) c)1

(8)

where sn and on are the contract scaling and the offset, respectively. (pn,qn) is the position of the domain block. The terms an, bn, cn, and dn constitute the isometry transform. The domain block (dl) should be slightly adjusted through geometric and intensity transformation to match the size of the range block (rn). The contractive domain block (dcl ) (shown in Figure 7) is defined as

wdlc ) Rwdl

(11)

where

wd l ) [wdl(1,1) ‚‚‚ wdl(Ml1,1) ‚‚‚ wdl(1,Ml2) ‚‚‚ wdl(Ml1,Ml2) ]T wdlc )

Then, the block r is split into two daughter blocks rL and rR at a split point ξ (Figure 5). The decrease in impurity caused by the split is

∆Q(ξ,r) ) Q(r) - Q(rL)p(rL) - Q(rR)p(rR)

(10)

(9)

where p(rL) is the probability of the case in rL that goes to the left and p(rR) is the probability of the case in rR that goes to the right as shown in Figure 5. The split point may be located at the time axis or the frequency axis, but it is selected to maximize the decrease in the node impurity. By repeating this split process, the local regions governed by each the corresponding blocks are shrunk and the tree is grown. To avoid overfitting and overspecializing problems, cross-validation28 is performed to get a reliably high degree of accuracy and the optimum-sized tree. Fractal Encoding (FE). Fractal is a branch of mathematics. It extends traditional mathematical geometry to describe the nature of geometry. One important feature of fractal is that, by applying a simple rule over and over again, visually complexed data can be produced. Fractal image encoding, which is the inverse problem, is to find out the simple rules that produce the visually complexed image. The basic procedures do this as follows: (i) Partition the image into range blocks (rn, n ) 1,2, ..., NR). (ii) Define domain pools (dl, l ) 1,2, ..., ND), including size and shift amount. (iii) For each range (rn), out of all possible domain blocks (dl, l ) 1,2, ..., ND), for each possible rotation of dl, (a) Shrink dl to the same size as rn and (b) Find out the best set of the fractal parameters for dl that looks like the range block rn by the affine transformation. where rn and dl are range and domain blocks, respectively. More detailed information about fractal image compression can be found in the work of Jacquin.18

[wdlc(1,1) ‚‚‚ wdlc(Mn1,1) ‚‚‚ wdlc(1,Mn2) ‚‚‚ wdlc(Mn1,Mn2) ]T (12) R is a reconfiguration matrix with Mn1Mn2 × Ml1Ml2,

Rij is a weighting factor which is based on the overlap ratios of points in dl mapped from dl to dcl ,

{

1 0.5 0.5 R) 0.5 0.5 0.25

(i,j) ) (1,1), (1,Ml2), (Ml1,1), (Ml1,Ml2) i ) 1, j ) 2, ..., Ml2 - 1 i ) 2, ..., Ml2 - 1, j ) 1 i ) 2, ..., Ml1 - 1, j ) Ml2 i ) Ml1, j ) 2, ..., Ml2 - 1 i * Ml1, j * 1, Ml2

(14)

Because the block is a rectangle, 90° of four rotations of the domain blocks (shown in Figure 6) is applied to map one block to another. To select the most similar domain block for each range block rn, all the overlapped domain blocks dl ∈ BD(D), l ) 1, 2, ..., ND are searched to find a domain block (dl) which minimizes the measure of block similarity,

3868

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006

Figure 6. Representation of the fractal encoding procedure. A domain block is transformed via four types of spatial transformations and is compared to a matching range block.

Figure 7. Contracting mapping procedure.

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006 3869



δn(rn,An(dl)) )

(k,V)∈rn,dcl

(wrn(k,V) - w ˆ dlc(k,V))2

(15)

where wrn(k,V) denotes the wavelet coefficients in the block, rn, and wˆ dlc(k,V), the transformation of wdlc(k,V) in the block dcl through eq 10. With the choice of dl and the corresponding rotation, the parameters sn and on of An can be directly computed by the least-squares method,

[∑

arg min δn )arg min sn,on sn,on

((snwdlc(k,V) + on)-wrn(k,V))2

(k,V)

]

(16)

where

sn ) Nrn(

∑k ∑V wr (k,V)wd (k,V)-(∑k ∑V wr (k,V))(∑k ∑V wd (k,V))) c

n

l

Nrn(

on )

l

time. The MPCA model becomes

∑k ∑V wd (k,V)2) - (∑k ∑V wd (k,V))2

1

c

c

l

[ ∑∑

Nrn

k

V

Figure 8. Two unfolding structures for MPCA.

c

n

l

wrn(k,V) - sn

∑k ∑V wd (k,V) c

l

R

]

(17)

X)

tapa + E ) TPT + E ∑ a)1

(19)

(18)

Nrn is the number of the coefficients in rn. The metric that determines the distance between the range and the domain blocks is used as a measure of block similarity. The above procedure is repeated for all range blocks. Then, the origNR w ˆ rn by approxinal w is a fixed point for the map w ˆ ) ∪n)1 imation. This means that the time-scale plane is decomposed into several blocks and the entire plane can be recovered by the union of the set of range blocks. This FE procedure represents “things” by breaking them down into many interrelated block pieces, similar to the pieces of a jigsaw puzzle. The union procedure puts the pieces back together to retrieve the original plane. After the appropriate affine transform is conducted for all rn, n ) 1, 2, ..., NR, in the range pool, the eight parameters of An (an, bn, cn, dn, pn, qn, on, and sn) n ) 1, 2, ..., NR are stored, which are the compression codes. Thus, the compression ratio of the time-scale domain with L × K cells is {8NR}/{(L × K)}. Like the above single variable procedure, although it can be directly applied to the multivariable batch profiles, it is not always an easy task due to its complex and interactive nature among the profile variables. MPCA will be used to reduce the measured variables and to extract the independent principal components which could capture most of the characteristics of the system pattern in multivariable process behavior. Before developing the proposed WTFE-based batch monitoring, the relevant notations of MPCA are briefly introduced first. Multiway Principal Component Analysis (MPCA). The experimental data matrix X with I × J × K can be constructed in the form of the three-dimensional array as shown in Figure 8, where j ) 1, 2, ..., J variables are measured at times k ) 1, 2, ..., K intervals throughout one batch. Similar data will be run at several numbers of batches i ) 1, 2, ..., I. In MPCA, this three-way matrix must be unfolded to obtain a two-way matrix on which PCA (principal component analysis) can be performed. Figure 8 shows two approaches to unfold the three-way matrix. In this paper, the variablewise unfolding method is adopted (structure II of Figure 8), because it can be used to obtain information on the variability among the batch variables, and the computed score vectors t contain the process feature at each

where R is the number of principal components, E is a residual matrix, T ) [t1 t2 ‚‚‚ tR] expresses the score matrix with KI × R, and P ) [p1 p2 ‚‚‚ pR] is a loading matrix with J × R. Several examples have been used to analyze batch data and detail the procedures for monitoring batch processes.1,29-32 3. MPCA-WTFE Formulation First, the data matrix X (I × J × K) is unfolded to the twodimensional matrix (I × JK). The unfolded data matrix is mean centered and scaled to attenuate the major nonlinear behavior of the process. Then, the variablewise unfolding method is adopted, where each row of the data matrix of structure I for one batch run is extracted to form a two-dimensional matrix with time × variables,

[

i xi1,1 xi1,2 ‚‚‚ x1,J i i i x x ‚‚‚ x2,J Xi ) 2,1 2,2 l i i i xK,2 xK,1 ‚‚‚ xK,J

]

(20)

Each block matrix K × J is put from top to the bottom, starting with the block corresponding to the first batch run (Figure 9):

[]

X1 2 X) X l XI

(21)

To eliminate the interactions between process variables in the latent space, PCA is applied to decomposing array X. The score matrix is

[]

T1 2 T) T l TI and

(22)

3870

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006

[ ]

Figure 9. New approach of MPCA-WTFE for on-line batch monitoring. i ti1,1 ti1,2 ‚‚‚ t1,R i ti ti ‚‚‚ t2,R Ti )[ti1 ti2 ‚‚‚ tiR ]) 2,1 2,2 , i ) 1, 2, ..., I (23) l i i i tK,1 tK,2 ‚‚‚ tK,R

The above procedure enables us to easily model WTFE for each component variable. CART-Based Classification in WT Space. By applying WT to decomposing the score matrix Ta (eq 2), the wavelet coefficients for each batch are represented as

Next, as shown in Figure 9, each component in the batch matrix Ti is extracted over all batches to form a component matrix Ta(K × I),

wia ) WAtia, i ) 1, 2, ..., I

Ta ) t1a t2a ‚‚‚ tIa,

a ) 1, 2, ..., R

(24)

(25)

Figure 9 shows that I different time-frequency planes are generated for each score matrix. Due to the presence of unmeasured noise, the buffered wavelet coefficients will be located in a wide range of frequency. Instead of WTFE based on one single plane

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006 3871

Figure 10. CART procedure whose parent node of each batch is divided into two child nodes at the common split point ξa.

only, the CART procedure using tying across all simultaneous time-frequency plane is applied (Figure 10). This will avoid the WTFE model over-fitting and achieve robust training results. This means, at the component a, the block ria for each batch (i) i and ria,R at a same split is split into two daughter blocks ra,L point ξa. The average decrease in impurity caused by the split at block (ria) is defined as

∆Q(ξa,ra) )

1

the common feature by tying across all simultaneous batch runs is extracted using FE. Hence, for each range block ra,n, n ) 1, 2, ..., Na,R at the component a, one must find a transformed domain block da,l and the rotation matrix (aa,n, ba,n, ca,n, and da,n) that has the smallest average minimum δia,n at the batch i, i ) min δa,n

sia,n,oia,n

i wdi ∑k ∑V ((sa,n

c a,l

i (k,V) + oa,n ) - wri a,n(k,V))2

(27)

I

i i i i )p(ra,L )-Q(ra,R )p(ra,R )] ∑[Q(ria) -Q(ra,L I i)1

(26)

With this method, all possible splits in the time and scale axis for all batches are examined to find the split producing the largest improvement in the impurity. The terminal node is reached if there is no significant decrease in impurity when the validated data is applied. Now, a set of nonoverlapping range blocks (ra,n, n ) 1, 2, ..., Na,R) for the component a is obtained. WTFE-Based Control Charts. (i) Off-Line WTFE-Based MPCA Model. After partitioning the regions for each direction,

The FE algorithm is repeatedly applied onto each domain block until a domain block has found its best-matched range for each component. The sets of the contract scaling and the offset are defined as i S ) {S1 S2 ‚‚‚ SI }, Si ) {sa,n }(n)1,2,..,Na,R;a)1,2,..,R) i O ) {O1 O2 ‚‚‚ OI }, Oi ) {oa,n }(n)1,2,..,Na,R;a)1,2,..,R) (28)

Now the union of the set of range blocks which covers the entire region of batch i in the direction a is

3872

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006

wia ) w ˆ ia + eia

(29)

where eia is the residual vector. w ˆ ia ) {sia,nwdi a,lc(k,V) + oia,n} is the predicated value from WTFE. Now collect w ˆ ia and eia i ) 1, 2, ..., R for all components, the measured data matrix of batch i in the time domain can be represented

X ˆi ) T ˆ i,WTFEPT is the extracted information from WTFE to represent the system variation, and Vi, the unexplained one. For simple expression of extracting system information, only the explained part of the system is shown in Figure 9. (ii) On-Line WTFE-Based MPCA Model. To implement the on-line WTFE-based MPCA model, the current observations have to be decomposed by WTFE and MPCA on-line. At each sampling point k, the wavelet transformed matrix wa(k) of the projections of the current observations x(k) at the direction a is given by

wa(k) ) WAta(k)

(31)

where

Figure 11. Local WTFE range blocks covered at any sampling point for on-line monitoring.

associated with the latest on-line score at time interval k from the process. The residuals is R

v(k) ) x(k) On the basis of the range blocks defined in CART, at the current time (k) the local blocks (ra,z, z ) 1, 2, ..., Na,Z) covered in the time-scale plan are found (Figure 11). Thus, at the current time, only the incomplete data is required to fill in the covered local blocks (shown in the dashed region of Figure 11). Here, the average values of the past operation are used to fill the incomplete data in the covered local blocks. Then, FE is repeatedly applied at each block (ra,z, z ) 1, 2, ..., Na,Z) of the component a to find a transformed domain block da,l and the rotation matrix (aa,rz, ba,rz, ca,rz, da,rz) that has the smallest average minimum δa,rz,

δa,ra,z ) i mini sa,n,oa,n

∑k ∑V ((sa,r

wda,l(k,V) + oa,ra,z) - wa,ra,z(k,V))2

a,z

(33) Na,Z ∪z)1

Compose the set of the blocks w ˆa ) w ˆ a,ra,z, where w ˆ a(k) ) T T [w ˆ Ta,1 ‚‚‚ w ˆ a,V ‚‚‚ w ˆ a,L+1 ]T and w ˆ a,V ) [w ˆ a(1,ν), ..., w ˆ a(k,ν) 0‚‚‚ 0]T (where the number of zeros is K - k). The term w ˆ a,V indicates the reconstructed w ˆ a from time 1 to k at scale V. Then, the projection of the union of the local blocks is back (w ˆ a(k)) to the time space

ˆtWTFE ) WSw ˆ a(k) ) a (1) ‚‚‚ ˆt WTFE (k) ‚‚‚ ˆt WTFE (K) ]T (34) [tˆWTFE a a a To check if the current operating process is normal or in-control, the statistics used for the monitoring procedure should be properly established. The statistics Is2(k), at sample k for a measure of the systematic part of the process variation to the model, is the sum of the squared independent scores, R

I2(k) )

(tˆWTFE (k))2 ∑ a a)1

(35)

ˆt WTFE (k)pa ∑ a a)1

(36)

A similar expression can be obtained for the squared prediction error at time interval k,

Q(k) ) v(k) v(k)T

(37)

In contrast to MPCA monitoring, the latent variables and the residual errors in the proposed method do not follow a Gaussian distribution; hence, the upper control limits and statistics cannot be determined directly from a particular approximation distribution. The kernel density estimation34 is used to determine the confidence limits. To sum up, the proposed batch monitoring consists of modeling and on-line two- step monitoring, as listed below. Modeling Procedures. (1) Collect the historical batch data sets X(I × J × K) indicative of normal operations. The data should cover the range of the batch operating patterns and the conditions that yield desired product quality. (2) Unfold the data in the variablewise X(IK × J). (3) Extract the score matrix T(IK × R) (eq 22) and the loading matrix P(J × R) by performing PCA on X(IK × J) to eliminate the interactions among the variables. (4) Separate the score matrix T(IK × R) into R component score matrices with Ta(K × I), a ) 1, 2, ..., R (eq 24). The number of the retained components can be determined using cross-validation33 to weed out relatively unimportant directions and reduce the system dimensions. (5) Decompose each column of Ta, a ) 1, 2, ..., R with the discrete wavelet transform to form the wavelet coefficient matrix {w1a w2a ‚‚‚ wIa} (eq 25). The scales should be selected to provide maximum separation between the stochastic and the deterministic components of a signal. If the scales are too small, then the last scaled signal will have a significant amount of noise, but if the scales are too large, the coefficient matrix with

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006 3873

many redundant coefficients will waste the computation cost. The scales are determined by cross-validation here. (6) Train WTFE models by applying the tying structure (eq 26) to extract the range blocks for each component. (7) Set k ) 1. (8) Build up batch data until time k. Calculate the score matrix (tia, a ) 1, ..., R, i ) 1, ..., I) by projecting x(k) onto the loading matrix (eq 32). Decompose tia by the discrete wavelet transform (eq 31) and apply its wavelet coefficients to the trained WTFE models (eq 33). , a ) 1, ..., R, i ) 1, ..., I by (9) Restore the predicted ˆti,WTFE a reconstructing wˆ ia (eq 34). (10) Compute the score (I2) and the residual (Q) at time point (k) (eqs 35 and 37) to set up the control limits of I2(k) and Q(k). (11) Keep moving the next time point, k ) k + 1 and go to step 8 until the end time point of the batch run. On-Line Monitoring Procedures. (1) Set k ) 1. (2) Record the operating data until the sampling time k of a new batch run. (3) Calculate the score ta(k), a ) 1, ..., R by projecting the current measurements x(k) onto the loading matrix, P (eq 32). (4) Decompose ta(k), a ) 1, ..., R by the discrete wavelet transform into the wavelet coefficient vector, wa(k) (eq 31). (5) Compute fractal parameters (sa,ra,z and oa,ra,z) using eq 33 based on the pretrained block (ra,z, z ) 1, 2, ..., Na,Z) covered at the current time point. (6) Predict w ˆ a(k) using the WTFE models and restore the (k) predicted wavelet coefficients to the predicted scores, ˆtWTFE a (eq 34). (7) Compute I2(k) and Q(k) (eqs 35 and 37). (8) Monitor whether I2(k) or Q(k) is above its control limit obtained from the modeling procedure. If an abnormality is detected, further analyze what has caused the abnormal situation. Otherwise, keep monitoring the next new time point (k ) k + 1) and go to step 2. Note that the modeling time is not only from MPCA but also from the fractal encoding. The fractal encoding is really timeconsuming during its encoding process. However, the time required by the computation model is for off-line only. When on-line monitoring is conducted, its computation is straightforward without significantly increasing the computation time. 4. Illustration Examples The proposed algorithm is applied to two benchmark data sets: a DuPont industrial batch polymerization reactor and fedbatch penicillin fermentation. Each example will be discussed in the subsections as follows. Example 1: DuPont Benchmark Data. The benchmark data for an industrial batch polymerization is used.1 The batch reactor consists of two stages. Approximately 2 h are needed to finish one batch run. The critical property measurements are usually taken 12 h or more after finishing each batch run. It is important for this type of problem to develop the on-line batch monitoring. For more information of the system, readers can get a detailed coverage from the above reference. Training data comprising 36 successful batches of data sets are used for constructing a monitoring model. Each batch run has duration of 100 time intervals. Ten measurement variables, including temperatures, pressures, and flowrate, are collected. To compare the performance of the different monitoring methods, the various methods are applied to two additional batches that are not selected from the training data sets. One

Figure 12. Control charts for on-line monitoring in Example 1: (a) MPCA with structure I, (b) WT-MPCA, and (c) WTFE. Each chart contains 95% (dash line) and 99% (solid line) control limits. The solid line with positive signs represents the abnormal batch; the solid circles represent the normal one.

with normal conditions and the other with a product that is out of the specification are tested. The abnormal batch is caused by drift operation starting halfway through the batch operation. On the basis of the proposed WTFE model, the data matrix (with structure II) is decomposed with MPCA and via crossvalidation. It is found that four principal components are needed

3874

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006

Figure 13. Trajectories of the measured variables 4, 6, 9, and 10 from a typical normal batch run (blue line) and an abnormal one (red line). These variables at each time have been scaled with the mean and variance obtained from the normal batches.

Figure 14. Control charts for on-line monitoring in example 2 of the normal batch: (a) MPCA with structure I, (b) MPCA with structure II, (c) WT-MPCA, and (d) WTFE. Each chart contains 95% (dash line) and 99% (solid line) control limits.

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006 3875

Figure 15. Control charts for on-line monitoring in example 2 of the abnormal batch: (a) MPCA with structure I, (b) MPCA with structure II, (c) WTMPCA, and (d) WTFE. Each chart contains 95% (dash line) and 99% (solid line) control limits.

to describe the data set. The SPE and Is2 charts at every time point obtained from WTFE monitoring of the normal and the abnormal batches are shown in Figure 12c. The control limits representing 95% and 99% confidence limits are also shown in Figure 12. It is evident from these charts that WTFE can capture the dominant behavior without getting wrong conclusions. In this abnormal batch, it immediately drifts away from the normal operating region at the 57th time point. This indicates that a special variation occurs from the 57th time point of the polymerization. Again, applying MPCA and WT-MPCA to the same test batches gives the results presented in Figure 12a and b, where MPCA with structure II (shown in Figure 8) is applied. WT-MPCA is a similar version of MSPCA,10 but it is modified and applied to batch process monitoring because the unmeasured future observations should be filled. In WT-MPCA, the batch data is first decomposed into the details at different scales by DWT. Then, the decomposed blocks are reconstructed. MPCA is applied to each block and the corresponding control charts are built up for each scale. SPE and T2 can also be computed based on each scale contribution. It is obvious that both MPCA and WT-MPCA have the same capability to catch the relationship between the process variables and to distinguish the normal from the abnormal condition. In this case, the performances of the three methods are all good. This is because some of the measurement variables in this abnormal batch significantly drift

away from the normal one around the 57th-65th time point (Figure 13), particularly measurement variables 4, 6, 9, and 10. Example 2: Fed-Batch Penicillin Fermentation. Data from fed-batch penicillin fermentation35 are used here to demonstrate the application of the proposed method. A total of 50 batches based on normal operation is used as the basis analysis. The duration of each batch is 400 h, comprising a preculture stage of about 45 h and a fed-batch stage of about 355 h. The sampling interval is 0.78 h. In the normal operation condition, small variations are added to the simulation input data to mimic process variation under the normal operating condition. Also, measurement noise is added to each of the monitoring variables. Two additional batches, comprising one normal batch and one abnormal batch, are generated for testing. In the abnormal batch, assume that the substrate feed rate is linearly decreased from 0.042 to 0.032 and from the 100th hour to the end of batch. The simulation condition and relevant parameters are the same as that of Birol et al.35 Four different models, MPCA with structure I and II, WTMPCA, and WTFE, are used for monitoring of the process with the normal batch to make a comparison. The number of components selected via cross-validation for MPCA with structure I, MPCA with structure II, WT-MPCA, and WTFE are 4, 4, 2, and 4, respectively. The control charts of the four models for the normal batch are shown in Figure 14. WTFE

3876

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006

Figure 16. Filtered scores of the normal (solid line) and the abnormal (dashed line) batches obtained from (a) MPCA with structure II and (b) WTFE.

has no apparent outliners during the whole batch run (Figure 14d). However, there are some false detections in WT-MPCA (Figure 14c). For the abnormal batch, the monitoring outcomes of four models are shown in Figure 15. As shown in Figure 15a, the SPE or T2 monitoring charts of MPCA with the first structure do not detect this fault during the whole batch run. Due to the future unmeasurements to be filled up with the predicted data from the current point until the end of the batch, the anticipated observations might distort the model information and cause the false detection. Although MPCA with structure II has no assumption about the future measurements, the fault condition still cannot be detected early until the end of batch run (Figure 15b). However, the detection ability of MPCA has been significantly improved after applying WTFE to move the noise variation of the batch profile (Figure 15d). On the basis of the SPE and Is2 plots shown in Figure 15d, Is2 has increased

remarkably and fallen outside of the 95% confidence limit after the 200th hour, because of the local characteristics of FE with partial filling of the unknown measurements. Figure 16 shows the score plots of the normal and abnormal cases obtained by MPCA with structure II and WTFE. After filtering the noise signal, the filtered outputs by the WTFE denoising operation can significantly distinguish the essential change of the process behavior of the normal and abnormal batches. It indicates that WTFE outperforms over the MPCA based on the time domain only. Compared with WTFE, WT-MPCA (Figure 15c) has an ability to clearly indicate the occurred fault event. However, from the time point of the fault detection, it indicates that WTMPCA is not better than WTFE because WT-MPCA is not good enough to eliminate the false variables under the noise environment. Although WT-MPCA is based on the time and frequency

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006 3877

Figure 17. Multiscale detection of WT-MPCA in example 2 of the abnormal batch.

domain, the scale contribution to SPE and T2 statistics for each scale can be diagnosed (Figure 17). Figure 17 indicates that the scores still have high fluctuations because the interscale dependencies of wavelet coefficients are not considered. At the highest approximation level, the fault condition can clearly be detected, but at other detail levels, several fault alarms occur. Furthermore, due to the local characteristics of FE, the score variables obtained from WTFE in each range block can be constructed. Figure 18 illustrates that t1 and t4 scores of the normal and the abnormal cases are represented in each range block. In Figure 18a, the abnormal fault can be detected at V ) 9, 10 and k g 130. After k g 130, the abnormal features are also obvious at the lower scales (the higher frequencies). The decomposed scores have a good ability to pin down the fault sources at the time and the scale level. In Figure 16b, the scores t4 of the normal and the abnormal batches cannot easily be distinguished due to the component obtained from the information which may contain the higher frequency disturbance, but in Figure 18b, the abnormal fault can be screened from the normal one at some blocks after k g 250. Thus, the proposed method can detect the abnormal condition of this system earlier than the MPCA and WT-MPCA methods. It has a good ability to capture the actual process relationship between the variables under the noise environment, to eliminate the false variables, and to clearly indicate the occurred event. 5. Conclusion The batch unit operation, as with any in the manufacturing industry today, is very concerned with quality. It is desirable to have an automated monitoring and detection system to enhance the consistent and high product quality. It also provides the improved productivity to meet customer demands and to reduce the costs associated with off-quality. Although MPCA has been widely applied, the only time domain based detection and diagnosis cannot capture the local behavior with different frequencies. In the past, the wavelet representation was used to provide a multiresoultion and multiscale expression of nonstationary measurements with localization in the time and fre-

quency domain. However, even if the features existed in the hybrid time and spectral information, the presence of server noise caused buffered wavelet coefficients in a wide range of frequency, particularly at the higher scales. This makes MSPCA diagnostic evaluation difficult. Due to the hierarchical wavelet tree structure, any coefficients composed by the wavelet subtree at different resolutions and orientations may exhibit similarity to the coefficients of another wavelet subtree at a different resolution. FE is developed in this paper to extract the features of the complex wavelet representation and to reduce redundant information with noise content. The combined WT and FE approach to capture local correlation within the variable is developed in this article to improve MPCA-based on-line operating batch processes. First, the measured variables of the batch runs are divided into independent components in the loading space. Then, the WTFE approach is applied to the score variables. The approach consists of three major stages. (1) Enriching information: to transform the time series score data in the time domain into the multiscale data in the time-frequency domain using SWT. (2) Separating regions: to partition the information in the time-scale domain into a set of nonoverlapping regions using CART. (3) Extracting features: to encode the time-frequency contents using FE. At the stage of enriching information, the score variables are transformed into the redundant information contents at different scales by SWT. SWT offers additional precision at higher scales, and it is capable of tracking the temporal evolution of the operating profile which is not present in DWT. At the extracting feature stage, the FE scheme is used to exploit the correlation between the wavelet coefficients within adjacent range blocks as well as among other nonadjacent range blocks. To select the proper region blocks without conducting a blind search in FE, at the separating region stage, the CART method is applied. As it can cut the block size and reduce a set of nonoverlapping range blocks significantly based on the similarity of the adjacent blocks, it requires less computation in FE. The encoding of the time-frequency representation is straightforward and simple while the process features are separated and encoded. Also, tying

3878

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006

Figure 18. Score representations of the normal (solid line) and the abnormal (dashed line) batches in multiscale and multitime blocks (a) t1 and (b) t4.

among the batch runs is applied to extracting disturbance variation and creating a robust statistical model. To build up the control limits, while only requiring nominal (no-fault) conditions for model training, statistics at each time point are calculated based on the normal operating data. Anomalous detection can be achieved by extracting the source of variability with respect to the normal operation at different scales. This allows evaluating the current process. The results also show that the proposed method has better performance than the conventional MPCA and WT-MPCA methods which are used in the time domain and in the wavelet domain, respectively. The application of the proposed method can improve the

accuracy of model expression and, therefore, easily identify the fault event for monitoring the operating process. Acknowledgment This work is partly sponsored by National Science Council, R.O.C., and by the Ministry of Economic Affairs, R.O.C. We are indebted to Dr. Andy Y. Tsen from Advanced Control & Systems Inc. for giving us several valuable comments. Literature Cited (1) Nomikos, P.; MacGregor, J. F. Multivariate SPC charts for Monitoring Batch Processes Technometrics 1995, 37, 41.

Ind. Eng. Chem. Res., Vol. 45, No. 11, 2006 3879 (2) Chen, J.; Liu, K.-C. On-line Batch Process Monitoring Using Dynamic PCA and Dynamic PLS Models. Chem. Eng. Sci. 2002, 57, 63. (3) Lennox, B.; Montague, G. A.; Hiden, H. G.; Kornfeld, G.; Goulding, P. R. Process Monitoring of an Industrial Fed-batch Fermentation Biotechnol. Bioeng. 2001, 74, 125. (4) Louwerse, D. J.; Smilde, A. K. Multivariate Statistical Process Control of Batch Processes Based on Three-way Models. Chem. Eng. Sci. 2000, 55, 1225. (5) Tates, A. A.; Louwerse, D. J.; Smilde, A. K.; Koot, G. L. M.; Berndt, H. Monitoring a PVC Batch Process with Multivariate Statistical Process Control Charts. Ind. Eng. Chem. Res. 1999, 38, 4769. (6) Qin, S. J. Statistical process monitoring: basics and beyond. J. Chemom. 2003, 17, 480. (7) Cinar, A.; Parulekar, S. J.; Undey C.; Birol, G. Batch Fermentation: Modeling, Monitoring and Control; Marcel Dekker: New York, 2003. (8) Bakshi, B. R. Multiscale PCA with Application to Multivariate Statistical Process Monitoring. AIChE J. 1998, 44, 1596. (9) Abntonini, N.; Barlaud, M. P.; Daubechies, M.; Daubechies, I. Image Coding using Wavelet Transform. IEEE Trans. Image Process. 1992, 1, 205. (10) Yoon, S.; MacGregor, J. F. Principal-component Analysis of Multiscale Data for Process Monitoring and Fault Diagnosis. AIChE J. 2004, 50, 2891. (11) Aradhye, H. B.; Bakshi, B. R.; Strauss, R. A.; Davis, J. F. Multiscale SPC Using WaveletssTheoretical Analysis and Properties. AIChE J. 2003, 49, 939. (12) Misra, M.; Yue, H.; Qin, S. J.; Ling, C. Multivariate Process Monitoring and Fault Diagnosis by Multi-scale PCA. Comput. Chem. Eng. 2002, 26, 1281. (13) Ganesan, R.; Das, T. K.; Venkataraman, V. Wavelet-based Multiscale Statistical Process Monitoring: A Literature Review. IIE Trans. 2004, 36, 787. (14) Luo, J.; Chen, C. W. Modeling of Subband Coefficients for Clustering-based Adaptive Quantization with Spatial Constraints. J. Visual Commun. Image Represent. 2003, 14, 205. (15) Crouse, M. S.; Nowak, R. D.; Baraniuk, R. G. Wavelet-Based Statistical Signal Processing Using Hidden Markov Models. IEEE Trans. Signal Process. 1998, 46, 886. (16) Mallat, S.; Hwang, W. L. Singularity Detection and Processing with Wavelet. IEEE Trans. Inf. Theory 1992, 38, 617. (17) Jacquin, A. E. Fractal Image Coding: A Review. Proc. IEEE 1993, 81, 1451. (18) Jacquin, A. E. Image Coding Based on a Fractal Theory of Iterated Contractive Image Transformations. IEEE Trans. Image Process. 1992, 1, 18. (19) Espinal, F.; Huntsberger, T.; Jawerth, B. D.; Kubota, T. WavleteBased Fractal Signature Analysis for Automatic Target Recognition. Opt. Eng. 1998, 37, 166.

(20) Perfect, E. Fractal Models for The Fragmentation of Rocks and Soils: A Review. Eng. Geology 1997, 48, 185. (21) Fisher Y. Fractal Compression: Theory and Application to Digital Images. Spring Verlag: New York, 1994. (22) Lazar, M. S.; Bruton, L. T. Fractal Block Coding of Digital Video. IEEE Trans. Circuits Syst. Video Technol. 1994, 4, 297. (23) Quinlan, J. R. Decision Trees and Decision-making. IEEE Trans. SMC 1990, 20, 339. (24) Wang, X. H.; Istepanian, R. S.; Song, Y. H. Microarray Image Enhancement by Denoising Using Stationary Wavelet Transform. IEEE Trans. Nanobiosci. 2003, 2, 184. (25) Ngan, S.-C.; LaConte, S. M.; Hu, X. Temporal Filtering of EventRelated fMRI Data using Cross-Validation. NeuroImage 2000, 11, 797. (26) Xiong, Z.; Ramachandran, K.; Orchard, M. T. Space-Frequency Quantization for Wavelet Image Coding. IEEE Trans. Image Process. 1997, 6, 677. (27) Mallat, S.; Zhong, S. Characterization of Signals from Multiscale Edges. IEEE Trans. Pattern Anal. Machine Intell. 1992, 14, 710. (28) Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J. Classification and Regression Trees; Wadsworth, Inc.: Belmount, California, 1984. (29) Nomikos, P.; MacGregor, J. F. Monitoring Batch Processes Using Multiway Principal Component Analysis. AIChE J. 1994, 37, 41. (30) Wold, S.; Geladi, P.; Esbensen, K.; Ohman, J. Multi-way Principal Components and PLS Analysis. J. Chemom. 1987, 1, 41. (31) Lennox, B.; Kipling, K.; Glassey, J.; Montague, G.; Willis, M.; Hiden, H. Automated Production Support for the Bioprocess Industry. Biotechnol. Prog. 2002, 18, 269. (32) van Sprang, E. N. M.; Ramaker, H.-J.; Westerhuis, J. A.; Gurden, S. P.; Smilde, A. K. Critical Evaluation of Approaches for On-line Batch Process Monitoring. Chem. Eng. Sci. 2002, 57, 3979. (33) Wold, S. Cross Validatory Estimation of the Number of Components in Factor and Principal Components Models. Technometrics 1978, 20, 397. (34) Bishop, C. M. Neural Networks for Pattern Recognition; Clarendon Press: New York, 1995. (35) Birol, G.; U ¨ ndey, C.; Cinar, A. A Modular Simulation Package for Fed-batch Fermentation: Penicillin Production. Chem. Eng. Sci. 2002, 26, 1553.

ReceiVed for reView July 22, 2005 ReVised manuscript receiVed November 3, 2005 Accepted February 8, 2006 IE050856I