Fault Detection and Isolation Using ... - ACS Publications

Bombay, Powai, Mumbai - 400076, India and Bhabha Atomic Research Centre, Trombay,. Mumbai .... data without increasing the dimension of the data matri...
0 downloads 6 Views 288KB Size
Ind. Eng. Chem. Res. 2006, 45, 223-235

223

Fault Detection and Isolation Using Correspondence Analysis† K. P. Detroja,‡ R. D. Gudi,*,§ S. C. Patwardhan,§ and K. Roy| Systems and Control Engineering and Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai - 400076, India and Bhabha Atomic Research Centre, Trombay, Mumbai - 400085, India

In this paper, a new approach to fault detection and diagnosis that is based on correspondence analysis (CA) is proposed. CA is a powerful multivariate technique based on the generalized singular value decomposition. The merits of using CA lie in its ability to depict rows as well as columns as points in the dual lower dimensional vector space. CA has been shown to capture association between various features and events quite effectively. The key strengths of CA, for fault detection and diagnosis, are validated on data involving simulations as well as experimental data obtained from a laboratory-scale setup. 1. Introduction Online process monitoring for fault detection and diagnosis (FDD) is very important for ensuring plant safety and maintaining product quality. The area of FDD has been very active in recent years. Both model-based and process-history-based methods have been proposed1-3 with a fair amount of success. In a typical process plant, hundreds of variables are measured every few seconds. These measurements bring in useful signatures about the status of the plant operation. While modelbased methods can be used to detect and isolate signals indicating abnormal operation, such quantitative cause-effect models may be difficult to develop from the first principles. Historical-data-based methods attempt to extract maximum information out of the archived data and require minimal physical knowledge of the plant. Because of the high dimensionality and correlation among variables, multivariate statistical tools, which take correlation among variables into account, are better suited for this task. Principal components analysis (PCA) is one of the most popular multivariate statistical methods used for process monitoring and data compression. PCA computes new orthogonal principal directions, which are linear combinations of the actual variables. This is done via singular value decomposition (SVD) of a suitably scaled (mean-centered and variance-scaled) data matrix (X h ) and by retaining those principal components that have significant singular values. PCA has been used for fault detection using statistical control limits Q (squared prediction error) and/or T2 statistics.5,6 Once a fault is detected using either Q or T2 statistics, contribution plots7 have been used to help fault isolation. The main limitation of PCA is that it assumes normality and independence of the samples.8,9 In most process plants, because of the dynamic nature of the process, these conditions are usually not met. The presence of dynamics in the plant data results in a serial and cross-correlation between the variables. This violates the assumption in traditional PCA about the statistical independence of the samples. To overcome these shortcomings of * Author for correspondence. E-mail: [email protected]. Tel.: +91-22-2576-7204. Fax: +91-22-2572-6895. † A shorter version of this manuscript was presented at the 16th IFAC World Congress, Prague (2005).4 ‡ Systems and Control Engineering, Indian Institute of Technology Bombay. § Department of Chemical Engineering, Indian Institute of Technology Bombay. | Bhabha Atomic Research Centre.

PCA, the dynamic PCA (DPCA) algorithm10 has been proposed; however, this algorithm requires an expansion of the column space to model the dynamics and could result in large matrix sizes even for moderately sized plants. Other variants to the PCA algorithm, to address differing time scales and varying frequency components in the data, have also been proposed in terms of the multiscale PCA11 (MSPCA) and the spectral PCA.12 Thus, PCA with the above variants has been quite successfully used for the task of FDD. As indicated before, PCA achieves dimensionality reduction and subsequent classification of the faults by projection onto the lower dimensional subspace. Although these subspaces have been derived in an optimal fashion, the objective function for PCA is primarily related to capturing the variation in the rational data set into a lower dimensional subspace. Thus, although there have been successful applications, this objective function is not strictly oriented towards the task of FDD via discrimination and classification; rather it is oriented towards representation. Chiang et al.2 have shown that, while there are commonalities, other discriminative formulations of the objective function, such as those seen in Fisher discriminant analysis (FDA), are more suitably oriented towards the task of FDD; the data, when projected onto the lower dimensional subspaces so derived, can be shown to be distinguishable more sharply than via projection onto the PCA subspaces. However, even for the FDA algorithm, the presence of dynamics in the data requires expansion of the variable space as well as an increase in matrix sizes (many more samples) to obtain the same level of statistical significance2 as for the static versions. FDA also requires a priori specification of the number of groups/classes and also a priori assignment of the data to the classes. Although a recent approach13 addresses some of these problems via PCA in a preceding step, these tasks are relatively difficult to perform for real-time data, especially when there are overlaps among the classes in the measurement space. Thus, the key requirements of any data-based method for use in the FDD tasks are the ability (i) to compress the data and reveal the lower dimensional behavior, (ii) to generate a discriminatiVe subspace for classification of various fault scenarios with minimal or no a priori knowledge of the fault classes, and (iii) to accommodate dynamic dependencies in the data without increasing the dimension of the data matrix. Depending on the nature of the system dynamics, the row and column spaces could have certain dependencies that can aid the classification task. The requirement to capture this association between rows (samples-observed data) and columns

10.1021/ie058033g CCC: $33.50 © 2006 American Chemical Society Published on Web 12/03/2005

224

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006

(variables) can arise in several applications. First, data collected during transients show dependencies of this nature; for example, values of the controlled variable are dependent on the manipulated variable values at the previous sampling instants. In an FDD context, different fault scenarios such as sensor faults, actuator faults, and parametric changes generate their own (hopefully distinct) signatures (row-column relationships) through the profiles of the manipulated variables. Thus, the relationship between the variables under each of these scenarios is expected to be different. For diagnosis purposes, these relationships need to be discriminated in a suitable space to the best extent possible, so that subsequent classification of online data can be done with minimal misclassification rates. Thus, other than associations between variables (i.e., column space associations), row-column relationships can also bring in useful information towards discrimination and, hence, these relationships need to be captured and suitably represented. In this paper, we address the above requirements through the proposition of correspondence analysis (CA) for the task of fault detection and isolation (FDI). CA14-16 is a powerful multivariate statistical tool which, like PCA and FDA, is also based on generalized singular value decomposition. However, here we show that it is superior to both in terms of its classification and discriminative abilities. CA is a dual analysis, as it simultaneously analyzes dependencies in columns, rows, and the joint row-column spaces in a dual lower dimensional space. Thus, dynamic correlations can be represented relatively easily without having to expand and deal with larger data sizes. Scaling aspects and the choice of the similarity matrix play a very important role in the aggregation and discriminative abilities of the algorithm; in this paper, we highlight these issues in relation to the PCA algorithm. Furthermore, in terms of the relative objective functions of CA and PCA, we also show that a simple nonlinear scaling of the data, which inherently exists in CA, is able to classify regions better even in the presence of nonlinear relationships. The spaces thus generated via solving the CA objective function can be shown to cluster and aggregate the data more effectively. Because of its capability to compress as well as classify the data set, CA has been extensively used in ecological problems, study of vegetation habit of species and social networks, etc.17-20 In this paper, we discuss how CA can be used to classify process data for fault detection as well as isolation simultaneously. The approach is explained through several numerical case studies to emphasize the important properties of CA. For validating the proposed CA-based fault detection and isolation scheme in its ability to detect and isolate the faults online, we present simulations involving a continuously stirred tank reactor (CSTR) for solution copolymerization of methyl-methacrylate (MMA) and vinyl acetate (VA), developed by Congalidis et al.21 Finally, we also present experimental validation studies performed on a quadruple-tank process22 operating under closed loop, which is a benchmark setup reported in the control literature.13,23 This paper is organized as follows: In the following section, some basic terms used in CA are explained. Section 3 presents a brief review of existing methods like PCA and FDA. The next section presents the CA objective function formulation and interpretations in detail with a focus on the FDD task. Section 5 presents the proposed approach of using CA for the FDD task. Later we go on to validate the proposed approach via simulation case studies and online implementation on a laboratory-scale quadruple-tank setup.

2. Brief Preliminaries 2.1. Weighted Euclidean Space. In general, given any two vectors x and y, the multidimensional weighted Euclidean space is characterized by the scalar product

〈x,y〉D ) xTDy

(1)

where D is any positive definite matrix. The squared distance between two points x and y in this space is, thus, the weighted sum of squared differences in coordinates

d2(x,y) ) (x - y)TD(x - y)

(2)

This type of distance function is often referred to as a diagonal metric. One of the most common examples of a weighted Euclidean distance is a suitably formulated chi-squared (χ2) statistic.14 2.2. Centroid and Inertia. The centroid of points x1, x2, ..., xI with different masses w1, w2, ..., wI is defined as the weighted average point.

xj )

∑i wixi ∑i wi

(3)

Hence, xj also tends towards the direction of the points with higher mass. On the other hand, the inertia of this set of points x1, x2, ..., xI can be expressed as the weighted average

IN(.) )

∑i widi2

(4)

where di2 is the squared weighted distance between xi and xj. 2.3. Row Profiles and Row Cloud. For any matrix X with all positive elements, the correspondence matrix is defined as [(1/g)X], where g is the sum of all the elements of matrix X. The row sums of the correspondence matrix are given by

r ) [(1/g)X]1

(5)

where 1 is a vector of all 1’s of appropriate dimension. The matrix Dr is then defined as Dr ) diag(r). The row profiles of matrix X can be defined as the vectors of rows of the correspondence matrix divided by their respective sums. The set of all row profiles is called the row cloud. The matrix of the row cloud can be written as

R ) Dr-1[(1/g)X]

(6)

2.4. Column Profiles and Column Cloud. The column sums of the correspondence matrix of X are given by

c ) [(1/g)X]T1

(7)

The matrix Dc is then defined as Dc ) diag(c). The column profiles of matrix X can be defined as the vectors of columns of the correspondence matrix divided by their respective sums. The set of all column profiles is called the column cloud. The matrix of the column cloud can be written as

C ) Dc-1[(1/g)X]T

(8)

2.5. Similarity Metrics. When it is required to organize data into groups or clusters, analysis may be carried out either on

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006 225

the row or column cloud. In such cases, it is recommended to use appropriate (row- or column-based) similarity metrics that are oriented to the classification task. In general, the similarity metric is defined in terms of the covariance or correlation matrices. For different applications, other similarity metrics24 may be defined as needed, for example, similarity between two data points may be defined to be inversely proportional to the distance between those points. 2.6. Scaling Issues. Conventionally, the data in the matrix X is scaled to unit variance, for example in PCA. A classification analysis of the data resulting from this linear scaling is, however, not effective when the inherent structure or association between the variables is nonlinear. Ding et al.25 have shown that nonlinear scaling of data leads to a space having desirable properties such as self-aggregation. CA also employs similar nonlinear scaling of the data and, hence, brings in the associated advantages. We will discuss this aspect in greater detail in subsequent sections. 3. Review of Existing Methods Any matrix Xm×n consisting of m observations of n-variables collected from an operating plant has a wealth of information regarding the health of the plant. This information can be mined to extract useful relationships for further analysis. Towards this end, the matrix can be decomposed or factorized as a product of two matrices that span simpler, lower dimensional, and relationship-revealing spaces; certain constraints are usually imposed on the factorization to make it unique among several other possible factorizations of the matrix X. The most prevalent technique in the FDD of process systems has been the principal components analysis (PCA), which is briefly reviewed next. 3.1. PCA. Before applying PCA, the data matrix (X) is autoscaled by subtracting each variable by its sample mean and then dividing each variable by its standard deviation. X h is the scaled data matrix, and xj is a vector containing the mean of each variable.

xj ) (1/m)XT1

(9)

X h ) (X - 1xjT)D-1

(10)

where D is a diagonal matrix containing the standard deviation of each variable. Under the assumption of statistical independence of rows (samples), PCA factorizes an appropriately scaled (meancentered and variance-scaled) matrix to reveal hidden relationships in the column space.26 The objective function for PCA involves (canonical) maximization of the variance explained in X h by each direction; for example, the first direction is obtained as a solution of the optimization problem in the space of the first linear combination t1 ) X h p1 as

h TX h p1 max(t1Tt1) ) p1TX p1

(11)

such that p1Tp1 ) 1. It can be shown that the solution to the above optimization problem can be posed as an eigenvalue eigenvector problem as

h p1 ) λp1 X h TX

(12)

In fact, for a canonical decomposition of the variance in X h , it has been shown that the orthogonal directions provided by the SVD of X h are the solution to the above optimization problem.

Thus, PCA decomposes the variance in X h into scores matrix T and loadings matrix P as

X h ) TPT

(13)

Clearly, the above decomposition is unique under the constraints of orthogonality of the loading vectors in P. In PCA, the loading vectors are computed such that they are orthonormal to each other. This gives

PTP ) I n

(14)

where In is n × n identity matrix. As will be shown later, if the constraints are chosen in a different way, a different set of vectors that are oriented towards achieving other objectives related to classification, are obtained. In PCA, if the columns of the matrix X h are correlated, i.e., the matrix X h is not full rank, then it can be approximated by a much lower dimensional matrix (first k columns of P, k , n), giving

X h ) t1p1T + ‚‚‚ + tkpkT + E

(15)

Thus, process monitoring can be done in a (substantially smaller) lower dimensional space than the original data matrix X h . The matrix E contains the components of variance of matrix X h , such as noise, which cannot be explained by the first k principal components, and is also known as the residual matrix. Conventionally, the squared prediction error and Hotelling’s T2 statistics are used to detect occurrence of any abnormal event.1,6 The scores plot also reveals similarities between samples and has been used to differentiate between normal and abnormal operational spaces; also, it has been used to generate normal operation spaces in the presence of multiple regimes of operation, for example, in the production of different grades of polymers.27 The underlying assumption for PCA, however, is that the samples are uncorrelated, which could oftentimes be violated. Several variants and alternatives to the PCA algorithm have, therefore, been proposed in the literature. For example, for dynamic dependencies in the samples, dynamic PCA10 has been proposed to account for the correlation among samples, by expanding the column space of matrix X h . This causes a substantial increase in the number of variables, although the underlying information is already available in the matrix X h. Furthermore, either in a dynamic scenario or otherwise from a FDD perspective, it may be desirable to reveal latent crossrelationships between the samples (rows) and variables (columns); this task cannot be performed by PCA. The PCA algorithm does have some discriminative ability, for example, the scores plot clusters similarly behaving samples together. However, the objective function for PCA, viz. finding directions of maximum variance canonically, does not explicitly bring out the possibility of discrimination. Towards this end, the FDA2,28 is more suited for this discrimination task. We briefly review the FDA algorithm in the next section, for the task of enhanced discrimination between different classes of data. 3.2. FDA. The data matrix X is autoscaled before applying FDA as well. Given that there are h classes in the data (this information needs to be known a priori), FDA29 determines a maximum of h - 1 discriminating directions oriented towards maximizing the scatter between different classes of data, while

226

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006

minimizing the scatter within each class. For the data matrix X h having m samples, these scatter matrices are defined as below.

[]

x1T xT X h) 2 l xmT

(16)

A subset of samples (in X h ), which belong to class j, is denoted as X〈j〉. The total mean vector (xj) and the mean vector for class j (xj〈j〉) is given by eqs 17 and 18, respectively. m

xj )

xi ∑ i)1 m

(17)

∑ xi

xj〈j〉 )

xi∈X〈j〉

(18) m〈j〉

where m〈j〉 is number of samples in class j The total scatter matrix St; the within-class scatter matrix for class j, S〈j〉; the within-class scatter matrix Sw; and the betweenclass scatter matrix Sb are defined as m

St )

(xi - xj)(xi - xj)T ∑ i)1

∑ (xi - xj〈j〉)(xi - xj〈j〉)T

S〈j〉 )

(19) (20)

xi∈X〈j〉

h

S〈j〉 ∑ j)1

(21)

m〈j〉(xj〈j〉 - xj)(xj〈j〉 - xj)T ∑ j)1

(22)

Sw ) h

Sb )

The objective function of FDA explicitly incorporates discrimination between the classes and can be posed as follows: It attempts to find orthogonal directions that maximize the between-class scatter while minimizing within-class scatter. This is achieved by the following objective function:

vTSbv

J(v) ) max T v*0 v S v w

4. Correspondence Analysis The aim of CA is to develop simple indices to highlight associations between the rows and the columns. Unlike PCA, which canonically decomposes the total variance in the matrix X, CA decomposes a measure of the row-column association, typically formulated as the total χ2 value, to capture the dependencies. Starting from a matrix X of positive column entries, the two key steps in CA that are different than those in PCA are (i) estimation of the expected value of the entries in X and (ii) estimation of the deviation of these expected values from the observed ones. These steps can be explained as follows: Starting from the matrix Xm×n, the expected value Eij of each element xij is first estimated under the assumption of statistical independence of rows and columns as

Eij ) (23)

It can be shown that a vector v, which maximizes J(.), must satisfy

Sbv ) λSwv

the FDA objective function explicitly incorporates discrimination between the samples and, hence, the directions (loading vectors) would be generally different than that of PCA. The FDA algorithm, however, requires an a priori specification of the number of distinct clusters/groups in the data; it also requires assigning the samples to each of these known classes, during the model-building step. Both these tasks require a priori knowledge, which may not necessarily be available exactly. Furthermore, association between the time samples and the measured variables (for example, in the case of dynamic data or serial correlation) is not detected automatically. As in PCA, it is recommended2 that suitable expansion of column space of the original matrix X h be done to bring out these dependencies. As mentioned before, the resulting implication of this would mean that we need more samples to achieve the same statistical significance level, during the model-building step. As seen above, PCA has a rather limited discriminative ability, while FDA explicitly incorporates the discrimination requirement in the objective function. However, to handle serial (cross) correlation (row-column associations), both of the above algorithms have to work with an expanded column space of the matrix X h . In the next section, we show that CA achieves both of the above tasks, viz. discrimination and detection of row-column associations, in a relatively simple way. It also has an additional merit over FDA in terms of not having to know a priori the number of data groups. Towards discriminative ability, it has been shown25 that the inherent nonlinear scaling of the data matrix X h results in sharper discrimination between groups and, hence, would aid in the task of fault resolution.

(26)

where r˜i is the row sum of the ith row, c˜ j is the column sum of the jth column, and g is the sum of all the elements of X. Next, the corresponding observed values xij are compared with these estimated values using the statistic t defined as

(24) m

Equation 24 is a generalized eigenvalue problem. If Sw is nonsingular, we can obtain a conventional eigenvalue problem as

Sw-1Sbv ) λv

r˜ic˜ j g

(25)

In the above eq 23, if Sw ) I (dependent on the data orientation), then the objective function reduces to that of PCA. Therefore,

t)

n

∑ ∑ i)1 j)1

(xij - Eij)2 Eij

(27)

It can be shown that, under the hypothesis of independence, the variable t has a χ2 distribution14 and, therefore, any significant departure from the assumption of independence (i.e., due to the presence of row-column associations) can be systematically analyzed. This is typically done by constructing

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006 227

a matrix C of the deviation values for each element in xij, which is written as

Cij )

(xij - Eij)

(28)

Eij1/2

The elements Cij may be viewed as measuring the weighted departure between the observed xij and the predicted values, Eij, under the assumption of statistical independence. The elements Cij thus measure the association between the rows and columns of X. They also represent a value of the inertia scaled by the grand sum g. To assess the associations represented by the scaled inertia values in C, the matrix C is subjected to singular value decomposition to yield

C ) LDµM

T

(29)

where it is known that L and M contain the eigenvectors of CCT and CTC, respectively, and Dµ contains is a diagonal matrix give by eq 30

Dµ ) diag(µ)

(30)

where µ is a vector containing singular values of the matrix C. If k is the rank of C, then it can be shown using matrix theory that k

trace(CCT) )

∑ i)1

m

µi )

n

Cij2 ) t ∑ ∑ i)1 j)1

(31)

which justifies the choice of the statistic t and also shows that the SVD of this measure of deviation can be interpreted as decomposing the total χ2 value, rather than the variance. An alternate interpretation of CA can also be presented in terms of the weighted Euclidean space as follows. In general, through an optimization procedure, we seek a lower dimensional (say k) approximation of the matrix X in an appropriate space S. In terms of the row and column points (see section on preliminaries), each row of X can be represented as a point xi (i ) 1, 2, ..., m) in an n-dimensional space. When one seeks to estimate the lower dimensional space (approximation) S that is closest to this cloud of row points, one could solve optimization problems that are formulated in several possible ways. We discuss two such formulations below in view of our focus on PCA and CA. Let us consider that the projection of the row point xi onto the space S results in xˆ i. The optimization problem to determine the space S could then minimize a weighted Euclidean distance defined as

d2 ) (x - xˆ )TD(x - xˆ )

(32)

Alternately, the choice of the weight matrix D could be identity, which would result in an optimization problem that minimizes the Euclidean distance as

d2 ) (x - xˆ )T(x - xˆ )

(33)

The solution to the minimization of the distance defined by eq 33 can be quite easily determined through SVD and is indeed the well-established PCA solution. However, when the weight matrix D is not an identity matrix, it can be shown14 that the solution to the problem of minimizing the weighted distances in eq 32 can be given by decomposition of inertia of the row (or column) cloud, i.e., generalized SVD of the matrix [(1/g)X

- rcT]. The inertia of the row cloud and the column cloud can be shown to be the same14 and is given by the χ2 value divided by g. The weight matrix D is chosen as the diagonal matrix of the row sums or the column sums. The generalized SVD of this matrix is defined as

[(1/g)X - rcT] ) ADµBT

(34)

where the matrices, A and B, have to satisfy the following constraints

ATDr-1A ) Im

(35)

BTDc-1B ) In

where Im and In are the identity matrices of m × m and n × n, respectively. Matrices Dr and Dc are diagonal matrices of row and column sums of the matrix [(1/g)X], respectively (as defined in Sections 2.3 and 2.4). It is important to contrast this set of constraints defined by eqs 34 and 35 with that of PCA (eq 14). These generalized SVD results of eq 34 can also be realized via the singular value decomposition of an appropriately scaled X matrix, as explained below. We define the matrix P as

P ) Dr-1/2[(1/g)X - rcT]Dc-1/2

(36)

where r and c are vectors of the row sum and the column sum of [(1/g)X], respectively. Then, the regular SVD of the matrix P gives the same singular vectors as in eq 29. From eqs 28, 29, and 34, the relationship between the matrices A, B, L, and M can also be derived (see Appendix A) as

A ) Dr1/2L

(37)

B ) Dc1/2M This establishes the equivalence of the two approaches for CA as discussed above. The vectors in B thus give the principal axis (loading vectors) for the row cloud. As described earlier, we may consider the matrix X as either a set of row points or a set of column points, i.e., as a row cloud or a column cloud. When we need to find the principal axes for the column cloud, the above discussion holds by replacing X with XT. However, it also turns out that the problems for finding the principal axes for the row cloud and the column cloud are dual to each other and A and B define the principal axes for the column cloud and the row cloud, respectively. Some of the important interpretations and metrics for CA are explained briefly. 4.1. Interpretation from the CA Model. Once the new principal axes (A and B) are computed, the next step is to interpret and analyze the results that are generated from CA. Some metrics and their interpretations can be described as follows. 4.1.a. Singular Values and Inertia. The sum of the squared singular values gives the total inertia of the cloud. The inertia explained by each principal axis can then be computed by

IN(ith axis) )

µi2

∑j

(38) µj2

where j ) 1, ..., n. Similarly, cumulative inertia explained up to the ith principal axis is the sum of inertias explained up to

228

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006

that principal axis. This gives a measure of accuracy (or percentage of the χ2 value explained) of the lower dimensional approximation. 4.1.b. Number of Principal Axes to be Retained. There is no generally fixed criterion proposed to determine how many principal axes should be retained. Although mathematical criteria do exist for selecting the number of principal axes, for better graphical interpretation of results, it is generally limited to two or, at the most, three. In most practical applications, it has been found that the first two principal axes explain more than 80% of total inertia.14 However, for better resolution/classification of the row or column points, one could look at including more than two principal axes. For the case when the number of principal axes retained is three, the graphical interpretation would be done on a three-dimensional plot (see case study 6.2 below). 4.1.c. Row and Column Coordinates. The row and column scores are the new coordinates of each row and column profile point on the principal axes retained. The new coordinates can be computed by projection on A and B, respectively (eqs 39 and 40).

F ) Dr-1ADµ

(39)

G ) Dc-1BDµ

(40)

4.1.d. Absolute Contribution by Points. This is the metric that indicates the points that have contributed most in the ordination to each of the principal axes. The absolute contribution by the ith row point in X to the jth principal axis can be calculated from eq 41.

Abs contribution (row(i), j) )

rifij2

(41)

IN(j)

Similarly, the absolute contribution of column points can be computed by eq 42.

Abs contribution (column(i), j) )

cigij2 IN(j)

(42)

The points with a high absolute contribution value play a major role in causing the final orientation of the principal axis. 4.1.e. Relative Contribution of Each Axis. This metric measures how well a particular row or column profile point is represented by a particular principal coordinate. It is given by the squared cosine of the angle between the actual profile point and the principal axis. The relative contribution for the ith row (or column) profile point to the jth principal axis can be calculated by eq 43 (or eq 44).

Rel contribution(row(i), j) )

() ()

Rel contribution(column(i), j) )

fij di

2

gij di

(43)

2

(44)

where di is the distance of the ith row (or column) profile point from the origin. This metric is also a measure of the squared cosine of the angle between the row (or column) profile and the principal axis. A higher value of this metric indicates that the inertia of the point is explained very well by the axis and the error in representing the point in the lower dimensional space is very small.

4.2. Interpretation from Graphical Results.14 The most important aspect of CA is the graphical representation of the row and column profile points in the dual lower dimensional space. In most cases, the first two principal coordinates are plotted against each other, and sometimes, one could consider up to the first three principal coordinates. The two-dimensional (for the case when the number of principal axes retained is two) plane thus created (typically called a bi-plot30) can be used to plot the row profile points and/or column profile points on a single figure. This plot can be used for analysis and interpretation. In this figure, the origin is the centroid of the row cloud as well as the column cloud. Hence, a particular point (row or column) projected close to the origin indicates a profile point similar to the centroid. If the points from the row (or column) cloud are close to each other, they can be considered to have a similar profile. Row or column points that lie on a straight line are indicative of linear dependencies among them. Further, closeness of a row point to a particular column point is indicative of an association between them. This analysis needs to be done carefully because it holds only if the points are away from the centroid. Hence, it is better to use the angle between the points as a measure and not the distance, while analyzing the association between the points in the row and column clouds. If the angle between any point in the row cloud and another point in the column cloud is acute, then the two are correlated. If the angle is obtuse, the two are correlated but negatively, and if the angle is a right angle, the points do not interact or there is no association between the row and column points. Thus, when the coordinates derived from CA are projected on the biplot, they reveal useful information about the dependencies in the row, the column, and the joint row-column space. Remark. The term bi-plot implies plotting the profiles of two clouds (the row and column clouds) in the lower dimensional subspace. For the case when the number of principal axes retained is three, one would then project the row and column points on a three-dimensional figure and generate a threedimensional bi-plot. 5. Application of Correspondence Analysis To Fault Detection CA can be used for statistical process monitoring as well as fault detection and isolation. In multivariate statistical process monitoring, correlation among variables is taken into account and statistical limits are drawn for normal operating regions. If the correlation structure is broken, these statistical limits are violated and can be indicative of a fault. Once the fault is detected, the next step is fault isolation. Fault isolation is basically a classification problem, and depending on the inputoutput data, one would be able to classify and identify the root cause. CA is carried out in dual space, i.e., on row as well as on column profiles. This can be used to advantage for performing both, fault detection and isolation, simultaneously. For process monitoring using CA, the data matrix X is formed such that each column constitutes an input/output variable and each row constitutes a time sample measurement of these variables. Each row profile then can be indicative of the operating condition of plant. Column profiles, on the other hand, give information about how variables are related to a particular operating point. Unlike DPCA, this algorithm does not require an expansion along the column space to detect the presence of dynamic auto- and crosscorrelations. 5.1. Data Collection and Model Building. In all statistical process monitoring techniques, data collection is the most

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006 229

important aspect and determines the ability to resolve faults. However, it is important to contrast the method by which the traditional methods flag aberrant or faulty operation. In PCA, the data collection for model building involves picking normal operating data to establish the (true and normal) underlying correlation structure. This is in keeping with the objective function of PCA, which is “representation-oriented” rather than “classification-oriented”. When the PCA model is deployed online, any deviant operation can then be detected in relation to this normal statistical model and, subsequently, tools such as contribution plots are used to isolate faulty causal variables. On the other hand, the FDA algorithm requires collecting together and jointly analyzing data for the normal as well as the fault modes. As mentioned before, FDA requires an a priori classification of the data into clusters of normal and fault modes and then establishing the loading vectors and the scores. When the classes or groups in the data overlap, this a priori classification is difficult to perform. In CA as well, the data matrix X consists of samples from both the normal and aberrant operations. However, the key difference when compared with FDA is that the a priori knowledge on the number of groups/classes and the classification of the data into groups is not necessary. The information extracted from any analysis depends on how good or rich the data is, in terms of significant events of interest. The proposed CA method assumes that the data set carries information about normal operation as well as representative fault scenarios, which could be used to build the statistical model. Subsequently, this model, when deployed online, would classify real-time data to belong to the classes represented in the statistical model. Hence, both fault detection and isolation tasks are achieved. Another important point in favor of CA is the issue of unbalanced clusters/groups. Typically, in an FDD analysis, data corresponding to normal operation is available in abundance whereas the samples corresponding to various fault modes are usually fewer. Even in such an unbalanced scenario (in terms of the samples), it has been shown25 that CA performs in a consistent manner and the analysis is not affected due to the statistical imbalance. For the applications considered here, we retain the first two or three principal axes for better understanding and insight through graphical display on a two- or three-dimensional plot. Analyzing the results without the help of graphical display is also possible using the quantitative metrics discussed earlier. 5.2. Online Implementation. For online process monitoring, the new measurement sample (row) has to be projected onto the lower dimensional space obtained via the CA. Since, the number of variables (number of columns) being measured does not change during the plant operation, the column cloud remains unaltered. The new measurement can then be scaled by its sum to give the new row profile point xnew. Its lower dimensional approximation can then be obtained from the following equation, (see Appendix B for a derivation of the concerned equations)

Fnew ) xnewGDµ-1

(45)

6. Case Studies For validation of above methodology and to emphasize important properties of CA, we present the following case studies. The next subsection presents results involving simple numerical examples and is followed by the subsection that involves numerical simulations of representative plant behavior. The proposed approach has also been validated on a laboratoryscale quadruple-tank setup that is housed in the Process Automation Lab, at the Department of Chemical Engineering,

IIT Bombay. We present validation results for various fault scenarios involving the four-tank experimental setup in the last subsection. 6.1. Numerical Examples. Example I. Classification of Fisher’s Iris Data.28 We first show some illustrative examples that highlight the discriminatory abilities of the algorithm. To compare the discriminative abilities of the three methods discussed in an earlier section, we apply them on a case study involving Fisher’s Iris data. The data consists of three different clusters of 50 samples of 4 variables. Of these three clusters or groups, one of them is quite distinct from the other two, which have an overlap. PCA, FDA, and CA methods were applied on these data to evaluate the discriminative ability in the lower dimensional subspace. Figure 1a shows the scores plot for PCA when the first three PCs are retained, corresponding to a total explained variation of 99%. As can be seen in the figure, the points classify into two distinct clusters and it is very difficult to differentiate between the two groups of data that show significant overlap. Figure 1b shows the results obtained via the use of FDA. As can be seen in Figure 1b, the data in each of the groups are resolved to a better extent because of the explicit form of the objective function for FDA (eq 23). Thus, FDA gives better discrimination between the groups than PCA, for the overlapping clusters. However, the FDA algorithm needs an a priori specification of the number of groups and requires (manual) classification of the data in those groups. This could be cumbersome for typical process operating data and could require iteration on the number of groups and also on the initial a priori classification of the points in the group. In CA, this a priori specification is not necessary. Figure 1c shows results obtained via CA, where two principal axes, corresponding to a representation of 98% of the total inertia, were retained. It can be seen that CA algorithm yields a comparable cluster separation to that of the FDA and a substantially better separation than PCA; at the same time, no information on the number of clusters and the classification of the points needs to be provided, as in FDA. It must be mentioned that part a of Figure 1 is presented here with the appropriate orientation so as to give a realistic picture of the resolution between the clusters. Example II. Self-Aggregation via Correspondence Analysis. In this numerical example, we compare the discriminative abilities of PCA and CA for classes/groups that involve nonlinear relationships. Both PCA and CA achieve cluster separation by finding discriminatory directions. However, CA also has an additional property of self-aggregation25 that results in better cluster separation through the generation of more compact clusters. The following example was constructed to demonstrate the capabilities of the CA to achieve self-aggregation of the data as well as to compare its discriminative ability of the data with PCA. We consider a 500-sample matrix X of 5 variables, where X ) [x1 x2 x3 x4 x5]. In these variables, x1, x2, and x3 were assumed to be random with Gaussian distribution and x4 and x5 were assumed to be nonlinearly dependent on x1-x3 as follows:

x4 ) x12x2 x5 ) x1x32

(46)

To simulate a fault-like scenario, a bias of magnitude 3.5 and 4.5 was introduced to variables x3 and x4, respectively, after 400 samples. The matrix X of 500 samples was then analyzed using CA and PCA. Figure 2 a shows the results obtained by PCA. Even though the first three PCs, corresponding to 92% variation explained, were retained, it is seen that the two clusters

230

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006

Figure 2. (a) PCA results for example in case study II; two clusters cannot be discriminated. (b) CA results for example in case study II.

Figure 1. (a) PCA result for Fisher’s Iris data. (b) FDA results for Fisher’s Iris data. (c) CA results for Fisher’s Iris data.

have an overlap and cannot be separated distinctly. CA, on the other hand, gives two distinct clusters when only two PCs were used (Figure 2b), corresponding to 80% of the total inertia explained. As can be seen in Figure 2b, both the clusters have been almost reduced to just a single point. This behavior is essentially due to self-aggregation that results because of nonlinear scaling of the data, as has been theoretically shown by Ding et al.25 In fault detection and diagnosis, this ability of CA to self-aggregate points in the lower dimensional subspace, which otherwise show overlap in the Euclidean space, would

be expected to be quite useful in sharper resolution of various clusters depicting normal and fault operating modes. Thus, misclassification rates would be smaller and would result in better fault classification and diagnosis. Example III. Application to Dynamic Data: Detection of Row-Column Associations. As indicated earlier, the advantage of CA is that it can highlight the correlation among the columns, the correlation among the rows, and the cross-correlation between the rows and the columns. To highlight this ability, we carried out a numerical simulation involving variables x1x4. In the simulations to generate matrix X ) [x1 x2 x3 x4], variables x1 and x4 were assumed to be random with Gaussian distribution. Variable x3 was assumed to be a linear static combination of x1 and x4, while variable x2 was assumed to have serial cross-correlations, i.e., it had dependencies across variables, as seen in eq 47.

x3 ) R1x1 + R2x4 x2(i) ) β1x1(i - 1) + β2x4(i - 1)

(47)

The results obtained from CA of the matrix X are shown in Figure 3. As explained in the earlier section, the row and column points are plotted on the same bi-plot seen in Figure 3a. The proximity of the row points, marked as “•”, is indicative of a similar profile and a single cluster. There are four column points for variables x1-x4. The fact that x3 is a linear combination of

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006 231

Figure 4. Control structure for CSTR simulation case study.

effectively, a 4 × 4 system. The control structure used for closed loop simulation is shown in Figure 4. The data were collected for simulations for (i) the normal operating condition, (ii) bias in all 4 sensors, and (iii) disturbances introduced via the input U1. These data was subjected to CA for classification.

[

G(s) ) 6.46(0.9s + 1) 0.50(0.50s + 1) 0.34 0.21 0 0.85s + 1 0.42s + 1 0.12s2 + 0.40s + 1 0.07s2 + 0.30s + 1 -3.72 0.66 -0.3 -0.41 0 2.41s + 1 1.51s + 1 2.71s + 1 0.80s + 1 -4.71 0.49 -0.71 -0.20 0.30 2.54s + 1 1.54s + 1 1.35s + 1 2.71s + 1 0.08s2 + 0.41s + 1 1.03(0.23s + 1) 0 0 0 0 0.07s2 + 0.31s + 1

Figure 3. (a) CA results for the system of eq 47 in case study III. (b) CA results for the system of eq 48 case study III.

x1 and x4 is also highlighted by the column points x1, x3, and x4, which lie on a straight line. In the presence of noise, when x3 is corrupted with noise, the column points x1, x3, and x4 still lie on a best-fit straight line. As can be seen, the row profile points are all clustered around the column point x2 indicating that x2 has dependencies along rows. In another simulation involving the system as shown in eq 48 below, x2 is assumed to have a dynamic dependency with itself (autocorrelation) and with x1 and x4. Similarly, x1 is assumed to be an autocorrelated sequence. When CA is applied to the data generated via these equations, we still get the column points x1, x3, and x4 falling on a line (see Figure 3b). At the same time, it can be seen that the row points are clustered around x2, x3, and x1, indicating that x2 as well as x1 and x3 has dependencies along rows.

x3 ) R1x1 + R2x4 x2(i) ) β1x1(i - 1) + β2x4(i - 1) + β3x2(i - 1) (48) x1(i) ) γ1x1(i - 1) 6.2. Copolymerization Reactor Simulation. A 4 × 5 transfer function matrix (eq 49) model for a CSTR used for solution copolymerization of methyl-methacrylate (MMA) and vinyl acetate (VA)21 was simulated under closed-loop conditions. Based on relative gain array (RGA) analysis, the pairings of controllers were chosen and U1 was kept constant, resulting in,

]

(49)

The data matrix formed contained four inputs and four outputs as column profile points. From the simulations, representative points for normal operating conditions and different fault scenarios are taken as row profile points. Figure 5a shows the results obtained from CA. It was found that the first principal axis explained 71% of the total inertia, whereas the second and third axes cumulatively explained 97 and 99% of the total inertia, respectively. The column points for the outputs (Y1-Y4) are close to each other, indicating a similar profile for those points. This is because, when operated under closed loop, the outputs remain around their respective set-point values and the variability due to any fault or disturbance is transformed into variability in the inputs. Hence, column profiles for the inputs were found to be entirely different for each variable. The row profiles show how the normal operating point is placed relative to the points depicting sensor biases and the disturbances. As the magnitude of the bias increases, the row points move away (shown as outward migration of the “+” points) from the normal operating point. Also, for a disturbance in U1, the row point is away from the normal operating point. As can be seen from Figure 5a, which corresponds to the case when only two principal axes are retained, the points corresponding to bias in Y1 and bias in Y3 are very close to each other. This is because both the biases resulted in a major variation of the chain transfer agent (U4). When only two principal axes were retained, the discrimination based on patterns of the other inputs was relatively insignificant. Hence, resolution and isolation of these faults would be expected to be difficult under dynamic conditions using the twodimensional bi-plot. However, as seen in the three-dimensional plot of Figure 5c, the biases in Y1 and Y3 get easily resolved when the number of principal axes retained is three. This indicates that, when the objective of classification is FDD, % inertia explained may perhaps not be a sufficient criterion to determine the number of principal axes retained. One would have to choose the principal axes dimension that results in better classification and resolution of the groups in the data. For online implementation of the proposed method, the new row points

232

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006

Figure 6. Schematic diagram of quadruple-tank setup.

Figure 5. (a) Results obtained from CA for the CSTR simulation. (b) Results obtained from online implementation of the proposed method for the CSTR simulation. (c) 3D bi-plot results obtained from online implementation of the proposed method for the CSTR simulation.

(measurements) are plotted on this lower dimensional space. Figure 5b shows the results obtained from online implementation of the proposed method. When a bias in sensor 2 was introduced, it can be seen that the row profile points (shown as dots) start drifting towards the cluster of “bias in Y2” points. Thus, aberrant operation can be detected and classified based on the statistical model generated using CA. 6.3. Experimental Validation on the Quadruple-Tank Setup. The quadruple-tank process is a multivariable laboratory process with an adjustable zero.22 A schematic diagram of the setup is shown in Figure 6. To validate the proposed method, we carried out real-time analysis of the proposed FDD strategy

Figure 7. (a) Results obtained from CA. (b) Results obtained from online implementation of the proposed method.

on the quadruple-tank setup housed in the Process Automation Lab, Department of Chemical Engineering, IIT Bombay. The levels in tank 1 and tank 2 were controlled using a state space formulation of the dynamic matrix control (DMC) algorithm, which updated the values of the manipulated variables (in this case, total flow in each of the lines). The data were collected for normal operating conditions as well as biases in each of the sensors. The data matrix formed contained two inputs and two outputs as column profile points and was subjected to CA. It was found that the first principal axis explained 98% of the inertia in the data and the next dimension cumulatively explained

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006 233

analysis, and yet the discrimination between different fault clusters was not evident. From Figure 8b, it is evident that correspondence analysis yields a clearer discrimination of the fault clusters. This discriminative ability can be attributed to the additional information on row-column associations that correspondence analysis exploits to generate directions that introduce self-aggregation as well as compression. These results clearly demonstrate the superiority of CA over PCA for the overall task of diagnosis and isolation of faults. 7. Conclusion

Figure 8. (a) PCA applied on quadruple-tank experimental data; discrimination between different fault clusters is difficult. (b) Discrimination of different fault signatures (for quadruple-tank process) using correspondence analysis.

99%. Since this also gave adequate resolution of the groups in the data, it was decided to restrict the number of principal axes to two. In this section, we also present comparative results with PCA to highlight the key aspects of discrimination that correspondence analysis offers. Figure 7a shows the bi-plot results obtained from CA method. As can be seen, the column points for outputs are quite close to each other, whereas the column profile points for the inputs, namely U1 and U2, are far away from each other as well as Y1 and Y2. This indicates that the column profiles for the inputs are quite dissimilar, whereas column profiles for outputs are almost similar. The row profile points (shown as “+”) indicate how the process deviates in different directions for faults in different sensors. As can be seen, the fault conditions viz. sensor and actuator biases are shown in distinct clusters. Figure 7b shows the online implementation of the proposed method. During normal operation, the new row profile points lie in the vicinity of a normal operating point. When a bias in actuator 2 was introduced, the row profile points can be seen to migrate from the normal operating point towards the “bias in U2” cloud, as expected. When PCA was applied to the same data matrix that was used for FDI using correspondence analysis, it was found that PCA was not able to give discrimination between different fault clusters (Figure 8a). Three principal components, which explained 96.8% variance in the data, were retained for the PCA

A new method based on CA is proposed for FDI of process systems. The ability of CA to depict the row profile points and column profile points in the dual lower dimensional space can be readily used for FDI. Additionally, the method is also shown to have a similarity to the traditional PCA algorithm but differs in the way the data is scaled, so as to effectively reveal nonlinear relationships. The algorithm has also been shown to have better discriminative ability and can yield sharper resolution between various classes of operating data with minimal or no a priori knowledge of the classes, when compared with existing methods. The proposed methodology has been evaluated through several numerical simulations and has also been validated experimentally on a pilot-scale process. The scaling used in the proposed method indicates good potential for discrimination of the classes. Apart from the graphical results and analysis which are presented in this paper, different metrics (absolute and relative contributions) can yield additional information, which may be useful for the purpose of FDI. Further theoretical work is necessary to exactly understand and modify the nature of the scaling so that it can be effective in the presence of a variety of nonlinear mappings. Further, when the classes cannot be crisply classified any further, methods to accommodate overlaps in the classes and yield monitoring and classification algorithms that have smaller misclassification rates need to be proposed. These are areas of further research. Acknowledgment We gratefully acknowledge research funding from the Board of Research in Nuclear Sciences (Project 2002/34/9-BRNS), Government of India. Appendix A Equivalence between the Two Approaches for Correspondence Analysis. While decomposing the χ2 value of the matrix X, we first calculate the matrix C, which is defined as

Cij )

(xij - Eij) Eij1/2

(50)

where Eij is given by

Eij )

r˜ic˜ j g

(51)

Substituting Eij in eq 50, we get

Cij )

(

xij -

( ) r˜ic˜ j g

)

r˜ic˜ j g 1/2

(52)

234

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006

The r and c are vectors of row sum and column sum of the matrix [(1/g)X]. This gives

r˜i ) gri

(53)

c˜ i ) gci

Substituting in eq 5,

r ) (1/g)X1 ) (1/g)rnew Thus, the new scaling matrix D is related to Dr as

D ) diag(rnew)

Substituting eq 53 in eq 52, we get

(

)

(gri)(gci) xij g Cij ) (gri)(gci) 1/2 g

(

)

) diag(gr)

(66)

) gDr (54)

Substituting this equation into eq 63, we get

Fnew ) Dr-1[(1/g)X]GDµ-1

Rearranging eq 54 we get

Cij )

(65)

(67)

which would, by eq 62, yield

xij - gricj (gricj)1/2

(55)

i.e.,

Cij ) g1/2{ri-1/2[(1/g)xij - ricj]cj-1/2}

(57)

The matrix P, in eq 36 is

P ) Dr-1/2[(1/g)X - rcT]Dc-1/2

(58)

Hence, P ) P ˜ and the principal axes obtained either way yield the same results. Appendix B Projection of New Data Points on Principal Axes Obtained via CA. The principal coordinates for the data matrix X are obtained from

F ) Dr-1ADµ

(59)

G ) Dc-1BDµ

(60)

The matrices F and G are related to each other by the following formulas:14

Dc-1[(1/g)X]TF ) GDµ

(61)

Dr-1[(1/g)X]G ) FDµ

(62)

For cross-validation/monitoring, we show in the following that it is adequate to scale a new feature sample(s) Xnew by its (their) row sum. Consider that we obtain Xnew from an operating plant and would like to assess its normality. We generate the coordinates of Xnew as given in eq 45,

Fnew ) D-1XnewGDµ-1

(63)

where D ) diag(rnew) with rnew ) Xnew1. We now show that this scaling procedure is identical to the scaling shown by eq 34 as follows. From eq 5, r ) [(1/g)X]1 and Dr ) diag(r). Hence, for the case of Xnew ) X, we get

rnew ) X1

(68)

Fnew ) F

(69)

Literature Cited

(56)

Hence, for decomposition of the inertia, the matrix P ˜ is subjected to SVD as

p˜ ij ) ri-1/2[(1/g)xij - ricj]cj-1/2

Fnew ) FDµDµ-1

(64)

(1) Kresta, J. V.; MacGregor, J. F.; Marlin, T. E. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 1991, 69, 35-47. (2) Chiang, L. H.; Russel, E. L.; Braatz, R. D. Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis. Chemom. Intell. Lab. Syst. 2000, 50, 243-252. (3) Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S. N. A review of process fault detection and diagnosis. Part I: Quantitative model-based methods. Comput. Chem. Eng. 2003, 27 (3), 293-311. (4) Detroja, K. P.; Patwardhan, S. C.; Gudi, R. D. Fault Detection and Isolation using Correspondence Analysis. Presented at 16th IFAC World Congress, Prague, 2005; Elsevier Press: 2005. (5) Nomikos, P.; MacGregor, J. F. Multivariate SPC charts for monitoring batch processes. Technometrics 1995, 37 (1), 44-59. (6) Goulding, P. R.; Lennox, B.; Sandoz, D. J.; Smith, K.; Marianovik, O. Fault detection in continuous processes using multivariate statistical methods. Int. J. Syst. Sci. 2000, 31 (11), 1459-1471. (7) Miller, P.; Swanson, R.; Heckler, C. Contribution Plots: A Missing Link in Multivariate Quality Control. Appl. Math. Comput. Sci. 1998, 8 (4), 775-792. (8) Luo, R.; Misra, M.; Himmelblau, D. M. Sensor Fault Detection via Multiscale Analysis & Dynamic PCA. Ind. Eng. Chem. Res. 1999, 38, 1489-1495. (9) Vijaysai, P.; Gudi, R. D.; Lakshminarayanan, S. Errors-in-VariablesBased Modeling Using Augmented Principal Components. Ind. Eng. Chem. Res. 2005, 44 (2), 368-380. (10) Ku, W.; Storer, R. H.; Georgakis, C. Disturbance Detection & Isolation by Dynamic Principal Components Analysis. Chemom. Intell. Lab. Syst. 1995, 30, 179-196. (11) Bakshi, B. R. Multiscale PCA with Application to Multivariate Statistical Process Monitoring. AIChE J. 1998, 44 (7), 1596-1610. (12) Thornhill, N. F.; Shah, S. L.; Huang, B. Detection of distributed oscillations and root-cause diagnosis. Presented at ChemFas 4, Korea, 2001; Elsevier Press: 2001. (13) He, Peter Q.; Wang, J.; Qin, J. S. A new fault diagnosis method using fault directions in Fisher discriminant analysis. AIChE J. 2005, 51 (2), 555-571. (14) Greenacre, M. J. Theory and Application of Correspondence Analysis; Academic Press Inc.: London, 1984. (15) Greenacre, M. J. Correspondence Analysis in Practice; Academic Press: London, 1993. (16) Hardle, W.; Simar, L. Applied MultiVariate Statistical Analysis; MD Tech: Berlin, 2003; Chapter 13. (17) Nooy, W. D. Fields & Networks: Correspondence analysis & social networks analysis in the framework of field theory. Poetics 2003, 31 (56), 305-327. (18) Tsuyuzaki, S. Vegetation development patterns on skislopes in lowland Hokkaido, northern Japan. Biol. ConserV. 2002, 108 (2), 239246. (19) Chen, J. S. A case study of out-bound travellers’ destination images by using correspondence analysis. Tourism Management 2001, 22, 345350.

Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006 235 (20) Bell, R.; Marshall, D. Meal construction: Exploring the relationship between eating occasion and location. Food Qual. Preferences 2003, 14, 53-64. (21) Congalidis, J. P.; Richards, J. R.; Ray, W. H. Modeling and control of copolymerization reactor. Proc. Am. Control Conf. 1986, 3, 1779-1793. (22) Johansson, K. H. A Quadruple-tank process: A multivariable laboratory process with an adjustable zero. IEEE Trans. Control Syst. Technol. 2000, 8 (3), 456-465. (23) Gatzke, E. P.; Meadows, E. S.; Wang, C.; Doyle, F. J. Model based control of a four-tank system. Comput. Chem. Eng. 2000, 24, 1503-1509. (24) Elmore, K. L.; Richman, M. B. Euclidean Distance as a Similarity Metric for Principal Component Analysis. Mon. Weather ReV. 2001, 129, 540-549. (25) Ding, C.; He, X.; Zha, H.; Simon, H. Unsupervised Learning: Selfaggregation in Scaled Principal Component Space. 6th European Conference on Principles of Data Mining and Knowledge DiscoVery (PKDD 2002); Springer: Berlin, 2002; pp 112-124.

(26) Jolliffe, I. T. Principal Components Analysis; Springer-Verlag: New York, 1986. (27) Sebzalli, Y. M.; Wang, X. Z. Knowledge discovery from process operational data using PCA and fuzzy clustering. Eng. Appl. Artif. Intell. 2001, 14 (5), 607-616. (28) Fisher, R. A. The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugenics 1936, 7, 179-188. (29) Duda, R. O.; Hart, P. E.; Stork, D. G Pattern Classification; WileyInterScience Publication: New York, 2003. (30) Aldrich, C.; Gardner, S.; Le Roux, N. J. Monitoring of metallurgical process plants by using Biplots. AIChE J. 2004, 50 (9), 2167-2186.

ReceiVed for reView April 11, 2005 ReVised manuscript receiVed October 5, 2005 Accepted October 24, 2005 IE058033G