Cluster Analysis for Autocorrelated and Cyclic Chemical Process Data

A clustering algorithm based on principal components analysis (PCA) is proposed for .... cluster analysis for chemical process data,10 but an automate...
0 downloads 0 Views 287KB Size
3610

Ind. Eng. Chem. Res. 2007, 46, 3610-3622

Cluster Analysis for Autocorrelated and Cyclic Chemical Process Data Scott Beaver and Ahmet Palazoglu* Department of Chemical Engineering and Materials Science, UniVersity of California-DaVis, One Shields AVenue, DaVis, California 95616

Jose´ A. Romagnoli Department of Chemical Engineering, Louisiana State UniVersity, Baton Rouge, Louisiana 70803

A clustering algorithm based on principal components analysis (PCA) is proposed for clustering autocorrelated and cyclic data sets typical of continuous chemical processes. A moving window approach is used to adjust the temporal properties of the solution; different parametrizations of the moving window can be used to isolate the high- and low-frequency content of the time series measurements. A framework is proposed to combine separate cluster analyses performed at different time scales to identify all process states and accurately detect the transition points between the states in the face of a periodic disturbance affecting a process. The method is tested on experimental data from a continuously operated pilot-scale process, and is shown to be superior to traditional k-means clustering. PCA variable contribution analysis is used to diagnose the nature of the various operating regimes and faults identified by the cluster analysis. 1. Introduction Industrial chemical processes can exhibit a multitude of system states, some of which may be unknown to the operators. For continuous processes, these states include the (possibly multiple) desired operating point(s), transitions such as startup, shutdown, or a change in operating point, and a variety of faults and disturbances which degrade the performance of the process. Intuitive knowledge of all visited states is invaluable for safe plant operation.1 Also, process data obtained during typical states are necessary for implementation of statistical process monitoring as part of an effective control strategy.2 In practice, perfect knowledge of the process history is often unavailable to label the time periods during which known system states were realized, and in addition, potentially important, unknown states may exist. Statistical identification of the represented states and their times of occurrence from historical process data is not a trivial task, and herein a method is presented to address this problem. In this paper, cluster analysis3 is used as a framework to simultaneously identify system states and label their times of occurrence using historical time series data. The method is intended for chemical process data exhibiting two important features. First is the inherently autocorrelated nature of the measurements, which may represent events occurring at multiple time scales. Time scales can range from brief disturbances affecting only a single sample to steady states which may persist for long periods of time. Second is a cyclic component present for at least one of the process variables. Such a confounding cycle in the process variables is caused by the convolution of process dynamics with a periodic disturbance or process input. The cycle is known to affect the process, and represents undesired variability which should not be reflected in the ultimate labeling of observations among the set of identified process states. Because of their convolved nature, the cyclic variables are not periodic and the effects of the confounding cycle cannot be removed by linear filtering methods. * To whom correspondence should be addressed. Tel.: (530) 7528774. Fax: (530) 752-1031. E-mail: [email protected].

By properly accounting for these autocorrelated and cyclic properties, a clustering algorithm can be tailored for pattern recognition in chemical process data. Generally, cluster analysis is used to find homogeneous groups (or clusters) of observations in a multivariate data set, such that each group is sufficiently distinct from the others. A clustering algorithm makes a sequence of statistical decisions which rearrange existing clusters of observations into new clusters, starting from some initial configuration of the observations among a fixed number of clusters. The decisions are based on some statistical model that is assumed to account for meaningful variability in the data set. The nature of any identified patterns, then, is largely determined by the form of this statistical model. Traditional clustering algorithms, such as the widely used k-means algorithm,4 are intended for independent observations, an assumption clearly violated for time series data sets. Such algorithms typically group observations based on distances in the measurement space, yielding clusters that are distinguished by the levels of their means. Dynamic events by definition do not have a constant mean, and thus cannot be properly detected by traditional clustering algorithms. Additionally, cyclic variables do not have a truly constant mean, but rather oscillate between peaks and valleys. As will be demonstrated by example in this paper, traditional clustering algorithms tend to produce periodically biased results when applied to cyclic time series measurements. Such periodically biased cluster labels reflect the phase of the confounding cycle, tracking the peaks and valleys of the cyclic variables, but do not indicate physically meaningful modes of process variability with which the periodic input is convolved. This problem becomes severe for identifying steady states, in which the cycle represents the dominant source of process variability. Thus, because of their inappropriate statistical models, traditional clustering algorithms are not useful for clustering autocorrelated or cyclic data. Principal components analysis5 (PCA) is a statistical model that has been widely used for modeling chemical process data.6 Here, PCA is used as the basis for a clustering algorithm intended specifically for autocorrelated and cyclic measurements. Though PCA is technically not appropriate for time series data, it is frequently applied for chemical process analysis with

10.1021/ie060544v CCC: $37.00 © 2007 American Chemical Society Published on Web 04/27/2007

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007 3611

Figure 1. Diagram for moving window using window density LR-1 ) 3. Pairs of brackets and parentheses connected with dotted lines depict the window intervals, with the brackets and parenthesis indicating endpoints which are included or excluded, respectively, from each interval. For example, window 4 contains all observations in the interval t g 3R and t < 6R; the sample at t ) 6R is contained only in windows 5, 6, and 7.

success. PCA is a linear model based on the correlation structure of the observed data and is also capable of modeling linear trajectories of the variables in time. Thus, PCA is capable of modeling simple dynamics such as increases or decreases in magnitude from some fixed operating point, corresponding to disturbances or transitions from some steady state. In the event that the process dynamics are too complicated to be adequately modeled by PCA, an extension known as dynamic PCA7 can be implemented by simply concatenating lagged variables onto the observed data matrix. PCA is typically applied to model a set of homogeneous samples; however, it can also be used to identify heterogeneities among observations on the same set of variables. Krzanowski introduced the PCA similarity factor,8 providing a quantitative estimate of the similarity between two blocks of measurements on the same set of variables. This method has been extended to pattern matching for chemical process data, developing an automated scheme by which occurrences of events similar to a specified target event are isolated.9 These authors implement a moving window, defined by window length L and spacing R (see Figure 1), to generate N blocks of data from the raw time series which are then compared to the target using the PCA similarity factor. They show how the definition of the moving window can control the time scale of the detected patterns and the temporal resolution at which their times of occurrence are isolated. Krzanowski’s method is further extended to an informal cluster analysis for chemical process data,10 but an automated clustering algorithm to delineate process regimes has not been developed. The above studies utilizing the PCA similarity factor in combination with a temporal windowing scheme evidence the utility of the PCA model for time series cluster analysis. In theory, PCA similarity factor calculations between all possible pairs of the N windows could be used to form an N × N symmetric distance matrix which could be input to any standard hierarchical clustering algorithm. These distance matrix calculations would be based on N individual PCA models, one estimated for each window of small sample size L. Thus, such a distance matrix does not tend to contain an accurate statistical representation of the data set because the PCA models themselves are not estimated with high accuracy due to the small sample sizes. Due to their heuristic nature, forming N - 1 sequential and irreversible decisions considering elements of the distance matrix, hierarchical algorithms tend to fail if the distance matrix is not properly defined; this problem becomes exacerbated for large N as the computational complexity increases. In this paper, a nonhierarchical clustering algorithm is proposed using the PCA statistical modelsthe method is termed the “k-PCA models” algorithm by analogy to the traditional k-means algorithm in which the clusters are represented by their

means. Using a nonhierarchical framework, the data are partitioned into k groups of average sample size k-1LN from which PCA models are estimated. Thus, as compared to hierarchical clustering based on the PCA similarity factor, the k-PCA models algorithm requires far fewer statistical decisions based on considerably larger sample sizes. As a result, the data are better partitioned into meaningful groups and the algorithm scales better for large N. Nonhierarchical algorithms are more difficult both to implement and interpret than their hierarchical counterparts; however, recent advances provide a method by which an accurate, graphical solution can be generated using a nonhierarchical algorithm.11 2. Overview of Proposed Method To implement the cluster analysis, a moving window is first applied to divide the original time series data set into N subsets of equal length L samples. Then, the nonhierarchical k-PCA models algorithm is used to optimize the partitioning of the N windows of data among k clusters. On each iteration, a prototype PCA model is estimated from the observations assigned to each cluster, and then the observations are reassigned to the cluster with the best fitting PCA model. As with traditional nonhierarchical algorithms, the iterations are noted to rapidly converge to a solution, usually corresponding to a local minimum for the objective function. Aside from the moving window aspect, the k-PCA models framework is similar to a mixture of probabilistic PCA models.12 Here, however, the partitioning is performed using a simple iterative scheme analogous to k-means clustering instead of rigorous expectation maximization.13 A consistent estimate for the global optimum of the combinatorial problem can be obtained by aggregating the results of a large number of individual runs of the k-PCA models algorithm.11 The moving window length L determines the temporal properties of the cluster solution, affecting the time scales for any detected patterns. By setting L equal to the period of the confounding cycle, any periodic biases can be averaged out from the cluster labels. Unfortunately, this window length will also average out events of shorter duration than the confounding cycle. Thus, this particular parametrization of the moving window can reveal the low-frequency content of the time series without any periodic biases; a separate analysis using smaller L can reveal the high-frequency content, but periodic biases will contaminate many of the cluster labels. A scheme is presented to combine the separate high- and low-frequency cluster analyses to arrive at a final solution describing process states at all desired time scales while preventing the confounding cycle from introducing periodic biases into the labels. By corroboration of the low- and high-frequency solutions, the true process states can readily be distinguished from any periodically biased artifacts of the clustering algorithm. This scheme to isolate time scales of interest is markedly different from the multiscale PCA14 framework in which no a priori knowledge of the process time scales is required. While multiscale PCA uses the wavelet transform15 to search for patterns existing over all possible time scales using a dyadic approach, here only a simple windowing scheme will suffice because a priori knowledge of the process time scales is incorporated into the cluster analysis to focus on certain frequency bands of interest. The proposed clustering framework is applied to a pilot plant exhibiting multiple steady states, transitions, and disturbances. The historical observations exhibit a confounding cycle introduced by a periodic disturbance, and the data have physically relevant events existing at both high and low frequencies. Lowfrequency content includes multiple steady states at which the

3612

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007

process operates, as well as a transition induced by a change in operating point. The high-frequency events include several brief faults which affect the process, as well as any abrupt transitions between low-frequency states. Thus, the goal of the cluster analysis is to determine the times of occurrence for each of the process states in the face of a periodic disturbance which persists throughout the experiment. The proposed algorithm requires certain decisions by the user: this paper is intended as a guideline for application of the proposed methodology, but it does not contain a rigid formulation of an analysis that is overly specific to the data set studied. Results from k-means analysis are also given to show how such traditional clustering algorithms fail for cyclic time series. 3. Theory 3.1. Principal Components Analysis. Principal components analysis (PCA) is a common multivariate model for correlated variables.5 The PCA model for a matrix of observations X of rank p can be estimated using singular value decomposition (SVD).

X ) USVT

(1)

Here, S is a diagonal matrix of singular values sj whose corresponding right singular vectors vj appear as the columns of V. The ordered, non-negative singular values sj are proportional to the variability along each of the orthogonal singular vector directions vj. The q < p singular vectors corresponding to the q largest singular values are retained as the principal components (PCs) and are stacked into the columns of the PCA loading matrix P. The model should contain enough PCs to capture sufficient variability in the training data X (here taken as approximately 95%). q

% variance captured )

sj ∑ j)1 p

× 100

(2)

sj ∑ j)1 Any matrix X(new) containing L observations on the same p variables as X can be projected into the PCA model. The model residuals can be calculated directly once the PCA loading matrix P is estimated from data set X.

E ) X(new)(I - PTP)

(3)

The Q-statistic,2 Qi, is a scalar that quantifies how well the PCA model fits a single observation i. Here, the overall Q-statistic, Qtot, is used to gauge the degree of model fit between P and the entire block of L measurements contained in X(new). L

Qtot )

∑ i)1

L

Qi )

p

Eij2 ∑ ∑ i)1 j)1

(4)

The contribution of variable j to Qi, denoted Cij, indicates the degree of model fit for variable j at sample i.

Cij ) Eij2

(5)

A relatively large variable contribution indicates a sample time at which that variable is not well represented by PCA model P. Note that the Hotelling T2-statistic,16 commonly used to account for variability within a given PCA model, is not considered

because we are concerned only with finding the best fitting PCA model for a given set of observations. Intuitively, the PCA model is based on the covariance structure of the data set. Therefore, it is critical that the individual variables be scaled to similar levels of variability prior to performing PCA, or else variables with larger variances will be over-represented in the PCA model. Typically, the variables are rescaled to unit variance using the standard deviations estimated from the raw data. In addition to scaling, a data matrix is usually centered to zero mean by removing the sample mean before performing PCA. This ensures that the first PC aligns with the dominant direction of variability for the training data. For uncentered data, the first PC will simply point from the coordinate system origin toward the mean of the modeled data. Though typically undesirable, here this property of PCA will be exploited to detect mean shifts in time series data sets. 3.2. k-PCA Models Algorithm. Prior to clustering time series data, a moving window9 defined by its length L samples and spacing between adjacent windows R samples produces a set of N windows labeled serially from 1 to N (see Figure 1). Each window in time w is represented by the L × p matrix of time series X(w). The k-PCA models algorithm will then partition these N windows into k disjoint clusters by optimizing an objective function J. Each window of data X(w) is considered as an indivisible unit by the clustering algorithm, and reassignment of any window to a new cluster is performed en bloc such that its L observations always remain intact within a single cluster. Therefore, L determines the time scale of the detected patterns; as no cluster can contain a contiguous set of observations of less than L samples, events of duration significantly shorter than L are “averaged out” of the cluster solution when estimating the PCA models. The window spacing R determines the temporal resolution of the cluster solution: smaller R allows better estimation of the cluster transition points, but also increases N and therefore the complexity of the combinatorial optimization problem posed to the clustering algorithm. Evenly spaced windows, obtained using an integer window density LR-1, ensure equal representation of all observations in the analysis and are required for the definition of the ultimate cluster membership probabilities discussed in Section 3.4. Parametrization of the moving window to identify events occurring at a time scale of interest will be deferred until Section 3.5. After the windowing is performed, the clustering algorithm can be implemented. The algorithm is initialized by choosing k and k nonoverlapping windows of data to seed the clusters. After initialization, the configuration of the N windows among the k clusters is optimized by changing the cluster assignments in an iterative fashion. For each iteration, a prototype PCA model P(r) is first calculated for each cluster r. The PCA prototype for cluster r is estimated using all L observations from each window of data X(w) currently assigned to cluster r. Overlapping windows cause some observations to be assigned multiple times to the same cluster, and those observations are taken with that same multiplicity when estimating the prototype PCA models. After estimating the prototype PCA models, the model loss Q(wr) tot upon projecting X(w) into each PCA model P(r) is quantified using eqs 3 and 4. The iteration is completed when each window w is reassigned to the cluster r satisfying argminr Q(wr) tot , and the value of the objective function J can be calculated. N

J)

∑ min Q(wr) tot w)1 r

(6)

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007 3613

The iterations terminate when no further cluster reassignments are indicated and the algorithm converges. The k-PCA models algorithm, like traditional nonhierarchical routines, deterministically produces a distinct output for each distinct input and generally becomes trapped in a local minimum of the solution space. As described in Section 3.3, an ensemble averaging technique can be applied to obtain an accurate clustering by aggregating the results of a large number of randomly initialized runs of the nonhierarchical clustering algorithm. Scaling of the data is a critical aspect of PCA modeling. Usually, the observations used to generate a PCA model would be scaled immediately before the model is estimated. For the k-PCA models clustering algorithm, however, the set of member observations for each cluster is constantly changing. Rescaling the observation matrix for each cluster at each iteration of the algorithm generates instability, and therefore it is necessary to scale the data only once, before the algorithm is implemented. For chemical process data exhibiting mean shifts (as evidenced by visual inspection of the time series), raw data from the entire observation period can be autoscaled to normalize the variables to zero mean and unit variance. This global scaling will preserve the mean shifts between the windows while approximately weighting each variable equally in the estimation of the PCA model. 3.3. Hierarchical Aggregation of Nonhierarchical Solutions. As presented in a previous study,11 an ensemble of nonhierarchical partitionings can be aggregated in a hierarchical fashion to produce a single, reproducible, graphical cluster solution. An ensemble of randomly initialized nonhierarchical solutions is generated containing nruns solutions for each k ranging from 2 to kmax. The parameter kmax indicates the largest k for which the nonhierarchical algorithm produces meaningful solutions: using larger k forces the presence of exceedingly small clusters, degrading the performance of the statistical prototype model to capture relevant patterns among the set of observations. The solution from a single run m of a nonhierarchical solution is stored in the N × k binary matrix B(m), where element B(m) wr is 1 if window w is assigned to cluster r in run m and 0 otherwise. An ensemble of n solutions can then be represented in a single matrix B by concatenating the matrices B(m) for the n ) (kmax - 1)nruns individual solutions.

B ) [B(1), B(2), ..., B(n)]

(7)

The N × N aggregated distance matrix Davg is then calculated, where 1 is a matrix of ones.

1 Davg ) 1 - BBT n

(8)

The second term of eq 8 gives the fraction of the n runs for which pairs of windows are assigned to the same cluster. Thus, Avg element DwV indicates the relative dissimilarity between the pair of windows w and V, and DAvg can be input into any standard agglomerative hierarchical clustering algorithm3 to produce a single dendrogram describing the ensemble of nonhierarchical solutions. The average linkage is preferred over the Ward linkage,17 as the latter tends to force the branches of the dendrogram to contain clusters of equal size.18 The parameter kmax is determined systematically by comparing results obtained for different trial values k′max. For incrementally larger trial values k′max starting from 2, nruns randomly initialized runs of the nonhierarchical algorithm are performed with k ) k′max. An aggregated distance matrix DAvg(k′max) can

be calculated using eqs 7 and 8 to aggregate the nruns runs for each k from 2 to k′max. The discrepancy between distance matrices generated using incrementally larger k′max can be assayed using the mean squared error over their 2-1N(N - 1) unique elements, denoted ∆SSE(k′max).

∆SSE(k′max) )

Avg Avg {DwV (k′max + 1) - DwV (k′max)}2 ∑ w 1 and R < L will produce overlapping windows, resulting in multiple cluster assignments for the individual observations and ambiguity of the cluster transition points. The window density LR-1 (which must be selected as an integer) indicates the number of windows to which each observation is assigned (see Figure 1). The cluster membership probability Pir can be defined simply as the fractional representation of observation i in cluster r.

Pir )

number times sample i assigned to cluster r LR-1

(10)

Note that this definition implies reduced cluster membership probabilities for observations near the edge of the observation period because such sample times are present in fewer than LR-1 windows. Plotting the time series of membership probabilities for each cluster provides a visual tool to estimate the cluster transition points. Membership probabilities of 1 confidently indicate the sample times during which a system state is realized, whereas membership probabilities of zero confidently indicate the absence of that state. Observations labeled with membership probability unity can be taken as the best available information for further characterization of the identified states. Due to the temporal windowing scheme, there remain near the cluster transition points some samples having intermediate membership probabilities; such samples are assigned to multiple clusters at lower levels of confidence. The cluster membership probabilities can also be used to validate the performance of the time series clustering algorithm. While cluster membership probabilities intermediate zero and unity are expected near the cluster transition points, sporadic occurrences of intermediate membership probabilities indicate samples for which no clear system state is identifiable. A large proportion of such samples labeled with low confidence indicates failure of the clustering algorithm to produce statistically significant results. This poor performance could be caused by the physical lack of any identifiable phenomena, or it could be indicative of a poor parametrization of the clustering algorithm (i.e., bad choice of q or L) which does not properly partition the time series into a sequence of physically meaningful events.

3614

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007

Figure 2. Process diagram for pilot plant.

Figure 3. Time series for nine pilot-plant variables. Dashed vertical lines separate four known regimes of operation.

3.5. Clustering with Multiple Time Scales. This study considers chemical process data which exhibits phenomena occurring at multiple time scales. The set of time scales includes the confounding cycle of period T in addition to the time scales for the true process states, which are dichotomized relative to T. For the purposes of this study, process states with duration less than T samples are considered the high-frequency content of the signal, while states of longer duration than T are taken as the low-frequency content. The window length L determines the time scale for the detected patterns, and this parameter can be adjusted to emphasize either the low- or high-frequency portions of the historical data. The low-frequency process states, by definition, persist for at least T samples in duration and therefore contain cyclic variability. Different portions of the low-frequency state realizations will be associated with different phases of the confounding cycle; therefore, the cyclic variability must be statistically

separated from the low-frequency behavior to reveal the true low-frequency states. To prevent periodic biases from contaminating the low-frequency cluster solution, the window length L should be set to an integer multiple of the cycle period T. This ensures that each window of data, and in turn each cluster, will contain equal numbers of observations from each phase of the cycle, effectively “averaging out” any periodic biases which would be introduced into the cluster labels by the confounding cycle. This large window length may also average out the highfrequency patterns because they are short in duration and represent only a limited portion of the variability in any window. Thus, setting L ) T simultaneously eliminates periodic biases related to the confounding cycle and mitigates the highfrequency states from appearing in the solution: such cluster analysis readily reveals the low-frequency states. Any high-frequency events, including transitions between identified low-frequency states, will remain either undetected

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007 3615 Table 1. Pilot-Plant Monitored Variables variable

units

condensate temperature reactor level reactor output temperature reactor input temperature steam inlet temperature coolant inlet temperature reactor input flow rate reactor output flow rate

°C % full °C °C °C °C L/min L/min

Table 2. Times of Occurrence for Known Process States parameter

samples

normal operation feed flow rate increase single fault double fault coolant spike faults

1-100 101-200 201-300 301-400 92, 100, 292, 300, 392, 400

Table 3. Level of Variability Captured by Each PCA Model for Each Cluster of Observations in the Five-Cluster Low-Frequency Solutiona data source

prototype 1

prototype 2

prototype 3

prototype 4

prototype 5

cluster 1 cluster 2 cluster 3 cluster 4 cluster 5

92.6 79.9 71.7 94.1 93.6

80.0 92.5 61.0 82.6 81.6

82.8 55.5 96.8 94.3 93.4

86.5 57.2 81.3 97.2 96.7

86.4 55.9 78.4 97.1 96.8

a The largest value in a given row indicates the PCA model that bestfits that data (these values are boldfaced); the columns of the table have no meaning.

or poorly resolved in time after performing the above lowfrequency cluster analysis. To reveal such high-frequency content, a separate cluster analysis must be performed using a shorter window length. Practically, only values 2 e L e 2-1T need to be considered. (L ) 1 does not window the data, while the Nyquist theorem establishes the upper bound for L as 2-1T because L ) T has already been explored in the low-frequency analysis.) The optimal value of L will of course depend on the actual time scales for any high-frequency states existing in the data. Systematic variation of L is performed to determine the value producing the cleanest set of cluster membership probabilities: the high-frequency events should be indicated with membership probability 1, and the transition times between the various states should be sharply indicated. In addition to the true high-frequency states, however, the high-frequency cluster analysis will also indicate many of the previously identified low-frequency patterns (especially any steady states) with a periodic bias tracking the cycle phase. The true high-frequency events can be determined upon comparison of the low-frequency solution with the high-frequency solution to ensure that the low-frequency patterns are not counted twice. The periodically biased cluster labels of the high-frequency solution are ignored, and the remainder of the high-frequency solution indicates the true high-frequency events. A precise algorithm for combining the low- and high-frequency results is not given; rather, the intuitive procedure will be illustrated by example in Section 5.2. 3.6. k-PCA Models Parameter Estimation. The above theory develops a method for cluster analysis of data containing events at multiple time scales. Depending on whether a highor low-frequency analysis is desired, the user is able to specify proper values of the moving window parameters L and R with little systematic variation. The method also requires specification of two other parameters, q and kmax, which cannot be specified a priori but must be estimated from the data set. The number

of retained PCs, q, must be determined by trial and error to produce a valid statistical model; however, kmax is trivial to estimate using the method presented in Section 3.3 once L, R, and q are specified. A systematic procedure is suggested to minimize the computational burden of estimating a value for q while performing the high- and low-frequency cluster anlyses. Because the large window length used in the low-frequency analysis inherently limits the temporal resolution for the cluster transition points, use of a small window rate R will only increase the computational demand of the low-frequency cluster analysis without increasing the temporal resolution. Therefore, the lowfrequency analysis should be performed using low window density (LR-1 should be set to a small integer). This parametrization produces the smallest number of windows possible for a given L, speeding the execution of the clustering algorithm, and allowing the user to determine an appropriate value for q as rapidly as possible. A trial value for q, denoted q′, is first assumed. One method for choosing an initial q′ is forming a global PCA model for the entire data set. For a given q′, the amount of variability captured by this global PCA model can be taken as a lower bound for the variability captured by the eventual cluster prototype PCA models, as they will necessarily be estimated from smaller, more homogeneous sets of observations than the global model. The low-frequency analysis is then performed using L ) T, small integer window density LR-1, and q′ PCs in the model. An ensemble of k-PCA models runs are generated, requiring estimation of the value of kmax, which may vary with q, using the procedure of section 3.3. Then, a converged low-frequency dendrogram is obtained, a meaningful set of clusters are selected, and the cluster prototypes are scrutinized to ensure the PCA models capture a sufficient level of variability within their own clusters (here taken as approximately 95%). If the prototype PCA models capture the desired level of variability, the parameter q is taken as q′, and the low-frequency cluster analysis is complete. If insufficient variability is captured, the value for q′ is incremented and the procedure is repeated until an acceptable low-frequency solution is generated. Once a value for q is established, the low-frequency cluster solution should be checked to ensure that low-frequency events at the desired time scale are captured. The window length can be increased to larger multiples of the cycle period T (e.g., L ) 2T) if events of larger time scale are suspected to exist in the historical data but are not revealed by parametrizing the moving window with L ) T. Because the same set of observations is being modeled in both the low- and high-frequency cluster analyses, the value for q should remain unchanged for the high-frequency analysis. Thus, unnecessary runs of the more computationally expensive high-frequency portion of the analysis do not need to be performed to estimate q. Using maximum window density (setting R ) 1) and taking q determined from the low-frequency analysis, the lone parameter 2 e L e 2-1T can be varied systematically to determine a window length that best captures the high-frequency content. 4. Case Study The proposed clustering strategy is applied to historical data obtained from a continuous chemical process.19 This pilot plant, whose flowsheet is shown in Figure 2, contains two continuously stirred tank reactors (CSTRs), a mixer, and a heat exchanger. Material from the feed tank is heated before entering the first reactor. The effluent from the first reactor is mixed with

3616

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007

Figure 4. ∆SSE for k-means clustering indicates aggregated solution has converged for k′max ) 10.

additional feed in the mixer, then fed to a second reactor for further processing. This study considers only the operation of the first CSTR, for which nine variables are monitored including various temperatures, flow rates, and the liquid level (see Table 1). The variables are monitored at a 5-min sample rate for a total of 400 observations. Time series for the historical data are shown in Figure 3. As can be seen from the time series of Figure 3, several of the variables exhibit cyclic activity. This confounding cycle is caused by a periodic disturbance in the steam supply utility. Periodic fluctuations in the steam supply pressure result in the cycle observed for the steam inlet temperature, ultimately affecting several variables including the reactor output temperature. The historical data represent four distinct regimes of operation, arranged sequentially in blocks of 100 samples. The process operates first at the normal steady-state operating point, then at sample 101 a step change to the feed flow rate is performed, resulting in a dynamic response for several other variables. At sample 201 the feed flow rate is reset and a single fault in the feed temperature is introduced. The final operating regime contains two simultaneous faults: the increase in feed temperature from the previous operating regime persists, and a sharp increase in the condensate temperature occurs at sample 301. Additional shifts for both of these faulty variables occur at sample 351 and remain through the end of the experiment. There are also several very short disturbances in the data, lasting only for 1 sample and appearing most notably as downward spikes in the coolant outlet temperature. A summary of the known process states is given in Table 2. 5. Results 5.1. k-Means Clustering Results. The standard k-means clustering algorithm, used in conjunction with the hierarchical aggregation scheme of Section 3.3, is applied to the experimental data set. For each k ranging from 2 to 10, 50 randomly initialized k-means solutions are generated. The quantity ∆SSE is calculated using trial values for k′max ranging from 2 to 10. Figure 4 indicates that the solution has converged for kmax of 10. The aggregated dendrogram describing the 50 runs each for k from 2 to 10, 450 randomly initialized runs in total, is shown in Figure 5. Five clusters are selected from the dendrogram based on the large merging distances between them. As there is no windowing used for this analysis, each observation time is assigned to exactly one cluster with membership probability of 1. The time series for cluster membership is shown in Figure 6. Cluster 1 captures the double fault occurring from t ) [351, 400] due to

the large mean shifts in the condensate and input temperatures. But cluster 1 is also indicated briefly for t ) [329, 330]: these labels appear inconsistent with the known process behavior. Clusters 4 and 5 indicate the large mean shifts for several variables associated with the step change to the feed flow rate, but fail to isolate the event as a single dynamic transition because the process mean is evolving with time. As shown in this example, the k-means algorithm tends to oversegment any dynamic trends in the time series. Clusters 2 and 3 exhibit a periodic bias and are aligned with the cycle phase for the remaining samples for which no large mean shifts occur: 3 represents the crests while 2 contains the troughs. The k-means algorithm fails to distinguish between normal operation and the feed temperature fault in the face of the oscillations present in these steady-state signals. It would be possible to select a larger number of clusters from the dendrogram of Figure 5. Viewing the solution at such a higher resolution, however, would only exacerbate the oversegmentation of the dynamic regimes and periodic biasing of the steady states. Using a windowing approach does not improve the k-means solution either. The periodicity present in the data is difficult to remove by simple time-averaging processes, and the periodic bias remains in any k-means solution for cyclic data. Employing a windowing approach actually degrades the performance of the k-means algorithm because the periodic biases will remain, but the windowing will reduce the temporal resolution at which the cluster transitions can be detected. 5.2. k-PCA Models Clustering Results. 5.2.1. LowFrequency Cluster Analysis. Prior to the windowing and clustering phases of the analysis, the entire time series of 400 observations is autoscaled. No further rescaling of the data is performed at any point in the analysis. A global PCA model of the entire autoscaled time series is used as a first estimate for the number of PCs required by the k-PCA models algorithm. Four PCs capture approximately 90% of the variability in this global PCA model of the entire set of 400 observations. Using q′ ) 4, the cluster prototpye models should be expected to capture even higher levels of variability. The low-frequency analysis is then performed by parametrizing the moving window with L ) T ) 21 (identified by visual inspection of the time series). Choosing R as 7 provides low window density LR-1 ) 3. Because 400 modulo 7 equals 1, a single edge observation must be excluded from the cluster analysis. The first sample is eliminated from the analysis, and the windowing begins at sample 2 such that the last window ends on sample 400, generating N ) 55 overlapping windows. Assuming q′ as 4 PCs, a randomly initialized ensemble of the k-PCA models solutions is generated containing 50 runs for each k from 2 to 12. Figure 7 indicates that kmax of 12 is appropriate. The aggregation procedure of Section 3.3 is applied to the ensemble of solutions to produce the aggregated dendrogram shown in Figure 8. As indicated by the numbering and dashed vertical lines in Figure 8, five main clusters are selected from this low-frequency dendrogram. Collecting the observations for the windows assigned to each cluster, PCA models are estimated for each cluster. The fraction of variability captured by each PCA model for each cluster of observations is given in Table 3. Clearly, each PCA model fits the data from its own cluster best. Moreover, each model captures on average 95% of the variability among its own member observations, and q is taken as 4. The membership probabilities for the five-cluster lowfrequency solution are calculated and are shown in Figure 9.

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007 3617

Figure 5. Dendrogram depicting hierarchical aggregation of k-means solutions for k ranging from 2 to 10.

Figure 6. Time series for cluster membership for aggregated k-means solution. Vertical position of each marker indicates assigned cluster label.

Figure 7. ∆SSE for low-frequency k-PCA models clustering with L ) 21, R ) 7, q ) 4.

Clusters 4, 3, 5, and 1 largely indicate the four sequential operating regimes, respectively. The periods [86, 106] and [289, 302] are assigned to cluster 2 with membership probability 1. It is unclear at this point whether state 2 captures the spike faults or transitions between the operating regimes. Due to the large window length used in the low-frequency analysis, there is poor temporal resolution for these high-frequency events. The large merging distances between the five selected branches of the dendrogram in Figure 8 indicate well-defined clusters; however, the solution can be viewed at higher resolution by bisecting each original cluster into two or three subclusters, as labeled in the dendrogram with a lettering scheme (Figure 8). The PCA prototypes based on this 12-cluster solution necessarily better capture the variability in the data set because these new subclusters are smaller and more compact than their parent clusters: the 12 PCA models capture 94-99% of their own cluster’s variability. The sequence of cluster labels for the 55 windows in this 12-cluster solution is shown in Figure 10. Membership probabilities for the 400 individual observation

times are not shown because the cluster transition points are ambiguous due to the large degree of temporal overlap between these small clusters. Still, it is evident that the cluster labels represent the dynamic evolution of the system through a sequence of states. Viewed at this higher level of resolution (i.e., more, smaller clusters), the four identified operating regimes are segmented into a sequence of substates, while clusters 2A and 2B capture the individual high-frequency events. The fact that no state is ever revisited after it is left indicates that the system is constantly evolving; the “steady states” in fact have very slow dynamics. Most importantly, the lowfrequency cluster solution does not contain any periodic biases. 5.2.2. High-Frequency Cluster Analysis. The low-frequency solution identifies the four low-frequency process regimes and senses the presence of two high-frequency events as well; however the transition points are poorly resolved in time. To better estimate the cluster transition times and any additional high-frequency events, the k-PCA models cluster analysis is repeated using 2 e L e 2-1T. The number of PCs can be assumed unchanged, and the analysis will this time be performed using the highest window density (R ) 1) to provide the maximum possible temporal resolution. L values of 10, 4, and 2 are tested: for each case, an aggregated dendrogram is generated after checking ∆SSE for convergence, a small number of clusters are selected, and the PCA models are checked to ensure they properly represent the modeled data. Each of these high-frequency solutions appears valid; however the L ) 4 solution produces the cleanest set of membership probabilities: it identifies the high-frequency event realizations with probability 1 while minimizing the ambiguity in the cluster transition points. The high-frequency dendrogram for the 397 windows generated using L ) 4 and R ) 1 is shown in Figure 11. A five-cluster solution is selected from the dendrogram of Figure 11, and the membership probabilities for the observations are shown in Figure 12. Clusters 2, 5, 3, and 1 capture the

3618

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007

Figure 8. Low-frequency dendrogram for k-PCA models clustering with L ) 21, R ) 7, q ) 4. 5 main clusters are indicated by vertical dashed lines. These clusters can also be dissected into a 12-cluster solution indicated by the lettering.

Figure 9. Membership probabilities for five-cluster low-frequency k-PCA models solution with L ) 21, R ) 7, q ) 4.

Figure 10. Sequence of cluster labels for 55 windows from 12-cluster low-frequency k-PCA models solution using L ) 21, R ) 7, q ) 4. Labels next to the asterisks refer to the dendrogram of Figure 8.

dominant process operating regimes and are consistent with the previous low-frequency solution (using L ) 21). Cluster 4 captures the first two sets of spike faults exactly: membership probability is 1 for t ) 92, 100, 292, and 300 only. Using the smaller window length, it is clear that each fault is actually a pair of spikes in the data, with the intermediate data points [93, 99] and [293, 299] being partially assigned to the steady-state operating regime during which the spike faults occur. The spikes at samples 392 and 400 are not well-detected because of the large mean shifts for other variables during this time. Again, to determine the consistency of the solution, the fivecluster solution is dissected into a 12-cluster solution as indicated by the letterings of Figure 11. The sequence of cluster labels for the windows of data are shown in Figure 13. Clearly, the previously identified low-frequency patterns are now labeled

with periodic biases. Clusters 2A and 2B of the high-frequency solution (Figure 13) correspond to cluster 4 of the low-frequency solution (Figure 9), 5B and 5C to 3, 3A and 3B to 5, and 1C and 1D to 1. These eight periodically biased clusters in the highfrequency solution can be associated with the previously identified low-frequency states and thus are artifacts of the clustering procedure, not real process states. Once the times of occurrence for the low-frequency events are identified, the high-frequency process states can be readily identified as the remainder of the high-frequency cluster solution: clusters 4, 5A, 5D, 1B, and 1A of Figure 13. Clusters 5A and 5D occur temporally on the edges of a dynamic state captured by 5B and 5C, and also are the closest clusters to 5B and 5C on the dendrogram of Figure 11. Clusters 5A and 5D may be brief transition states into and out of the process regime

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007 3619

Figure 11. High-frequency dendrogram for k-PCA models clustering with L ) 4, R ) 1, q ) 4. 5 main clusters are indicated by vertical dashed lines. These clusters can also be dissected into a 12-cluster solution indicated by the lettering.

Figure 12. Membership probabilities for five-cluster high-frequency k-PCA models solution with L ) 4, R ) 1, q ) 4.

Figure 13. Sequence of cluster labels for 397 windows from 13-cluster high-frequency k-PCA models solution using L ) 4, R ) 1, q ) 4. Labels refer to the dendrogram of Figure 11.

realized on [101, 200]; however, they are likely mathematical artifacts resulting from windows spanning the true transition points and containing observations from both operating regimes. A similar argument can be made for cluster 1B; however, 1A may indicate a fault near the end of the historical data. Cluster 1A is not easily interpreted as it indicates two simultaneous low-frequency disturbances, contains spike faults, and occurs near the edge of the time series where the membership probabilities are not well defined. Cluster 4 perfectly isolates the spike faults at 92, 100, 292, and 300. Thus, the cluster membership probabilities of Figure 12 essentially are the final solution, although it must be emphasized that the low-frequency patterns indicated by this high-frequency solution cannot be considered as true process states without corroboration of the low-frequency solution.

6. Cluster Interpretation On the basis of knowledge of the process history, cluster 2 of Figure 12 is identified as the desired operating regime. Supervised comparisons of other periods of operation with this normal regime of operation can be made to elucidate the physical nature of the process states. Observations on t ) [5, 70] are used as training data to represent the period of normal operation. A PCA model with 4 PCs is calculated for these 66 observations to serve as a representation of the desired plant operation. The full set of 400 historical observations are projected into the model of desired plant operation, and variable contribution analysis (section 3.1) is used to determine the nature of differences between the process regimes. Time series for the variable contributions to the Q-statistic are shown in Figure 14. The contributions for all variables are small and nearly constant

3620

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007

Figure 14. Variable contributions to Q-statistic upon projecting data set into PCA model estimated on t ) [5, 70] (normal operating regime).

for the training data. Aside from the spike faults at t ) 92 and 100, the unmodeled data from the normal operation period ([1, 4; 71, 91; 93, 99]) exhibit small variable contributions, validating that the PCA model accurately represents normal plant operation. Larger contributions are noted for the other regimes of operation, indicating variables that behave differently than in the normal operation period. Most notably, the spike faults are detected with excellent accuracy. Contributing variables include not only the coolant outlet temperature (which is obvious in the raw data time series of Figure 3), but also the reactor output temperature, steam inlet temperature, and coolant inlet temperature. This combination of affected process variables indicates that a true fault has occurred in the process, affecting multiple variables, and is not just a failed or noisy sensor for the coolant outlet temperature. As both the coolant outlet and inlet temperatures are affected, the fault likely lies in the utility cooling system and not the process itself. Ignoring the spike faults, the three other, faulty operating regimes are easily characterized. A large jump in the variable contribution for the reactor input flow rate at t ) 101 indicates a sudden change in this variable. Gradually increasing contributions for reactor output flow rate, reactor input temperature, reactor level, and condensate temperature indicate a dynamic response in these variables due to the changing reactor input flow rate. The raw data indicate a significant increase in the reactor level and output flow rate, indicating that an increased feed rate has upset the process. The effect of this disturbance suddenly disappears at t ) 201 when the reactor feed flow rate is reset to its nominal value. For t ) [201, 300], the only variable with somewhat elevated Q contributions is the reactor input temperature. Examination of the raw data quickly reveals the step change in this variable. As the disturbance for this input variable does not affect any other monitored variables, this regime may indicate a failed sensor and not a true process state. Note that the low-frequency states occurring on [1, 100] and [201, 300] appear as immediately connected branches in the hierarchies of both the lowand high-frequency dendrograms (clusters 4 and 5 of Figure 8

and clusters 2 and 3 of Figure 11, respectively), reflecting the similarity between this faulty regime and normal operation. At t ) 301, variable contributions for the condensate temperature suddenly increase. Contributions from the input temperature remain above their nominal value, indicating the disturbance for this variable introduced at t ) 201 still persists. The magnitude of the input temperature contribitions are reduced relative the previous operating regime, however, likely reflecting the larger disturbances for other variables. At t ) 351, contributions for the condensate temperature again sharply increase, appearing to cause a lagging response in the reactor level, input temperature, input flow rate, and output flow rate. Drifts for these variables can be detected in the time series as well. As multiple process variables are affected, it is probable that this cluster indicates that a true fault has occurred in the process. 7. Discussion The k-means analysis demonstrates the inability of traditional, distance-based clustering algorithms to determine temporal patterns from autocorrelated and cyclic data. Clustering algorithms based on magnitude will never be able to isolate regimes characterized by a moving mean: such dynamic states will be identified as a progression of states. For the case of cyclic data, the k-means algorithm cannot properly label the steady states either; these states are not characterized by truly temporally constant variables, and the k-means cluster labels become associated with the phase of the confounding cycle (as evidenced by Figure 6). The use of a windowing scheme is not practical for k-means clustering, as the prototype mean models for each steady state would be time-averages for the cyclic variables and would be estimated with large confidence bounds. Unless the distances between the means of different operating points are considerably greater than the amplitude of the confounding cycle, the k-means algorithm will fail to distinguish them. The utility of the proposed k-PCA models algorithm, when combined with the hierarchical aggregation scheme, is demonstrated in the case study. PCA is commonly used for modeling

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007 3621

chemical process data, and the superiority of a clustering algorithm based on the PCA model over traditional algorithms considering Euclidean distances is readily apparent. Individual cluster analyses are implemented to detect patterns at different time scales by properly defining the moving window, and these solutions are combined in a logical fashion to provide a synopsis of all states visited by the process. Only one window length was tested for the low-frequency analysis. The first (and only) window length tested is the smallest L possible, equal to one confounding cycle period, or 21 samples. For data exhibiting low-frequency events persisting for many confounding cycle periods in duration, it may be necessary to increase the averaging length for the prototype models by taking L as a larger integer multiple of T. Here, however, using L ) T was found to indicate patterns consistent with low-frequency behavior, and thus, longer window lengths were not required to isolate the low-frequency content. Because the detected events are of duration significantly longer than the window length, it can be concluded that extending the window length to twice the confounding cycle period would identify no new events but would only degrade the temporal resolution at which the cluster transition points are detected. In fact, to obtain a different set of low-frequency cluster labels, the window length would have to be extended to over 100 samples (the duration of each operating regime) such that the detected events would contain multiple operating regimes. Clearly, such behavior is not desired as the low-frequency events identified above have physical significance. For the low-frequency analysis, the window length L is 21, and the window spacing R is set to 7 to yield small integer window density LR-1 ) 3. Setting LR-1 to an integer value ensures evenly spaced windows so that each observation is equally represented in the set of windows and the cluster membership probabilities of (10) are properly defined. Not in all cases will the confounding cycle, and therefore the window length, be identified such that an integer divisor exists. For example, if the confounding cycle period were identified as 23 samples, no value of R would exist for which an integer window density could be produced. In this case, modification of the window length by a few sample times (say, L ) 24 even though the confounding cycle period is 23, allowing for LR-1 of 2, 3, or 4) should not introduce large periodic biases into the cluster labels. Another strategy, if possible, may be to resample the raw measurements at a higher rate such that the confounding cycle period has an integer divisor. A final remedy would be to take L ) 2T, guaranteeing LR-1 of 2 by using R ) T; however a disadvantage of this method is that only low-frequency events lasting at least two cycles in duration would be detected in the low-frequency analysis. The high-frequency cluster analysis was tested for various window lengths, and L ) 4 best revealed the high-frequency patterns and cluster transition points. The five-cluster highfrequency solution (Figure 12) is noted to produce the appropriate result, as it compares favorably to the five-cluster lowfrequency solution (Figure 9). Without having first produced the five-cluster low-frequency solution, however, it would not be possible to accept the high-frequency cluster membership probabilities of Figure 12 because upon further decomposition these patterns are realized with a cyclic bias (Figure 13). Thus, the transition points for the low-frequency patterns identified from the high-frequency analysis cannot be accepted without the corroboration of the low-frequency solution. Once the correspondence between the low- and high-frequency clusters is established, the high-frequency cluster membership prob-

abilities can be taken as the best available estimate of the cluster transition points. 8. Conclusions This paper presents a clustering algorithm based on PCA for use with autocorrelated and cyclic data and a novel scheme which allows identification of patterns at different time scales. The results based on both a low-frequency and highfrequency analysis can be combined in a logical and systematic fashion to provide a synopsis of all historically realized process states. This paper does not intend to provide a quantitative methodology for combining the results of the high- and lowfrequency cluster analyses; rather, a general time series clustering framework is provided which can guide the user through a series of decisions necessary to arrive at a meaningful cluster solution. The utility of the proposed clustering scheme is demonstrated on a relatively simple data set containing five process regimes occurring at two different time scales and also having a confounding cycle. The method is easily extendable to other cyclic time series data sets, and can incorporate any multivariate model (aside from PCA) for which a meaningful scalar measure of model error can be obtained. It is also possible that the method could be extended to noncyclic time series; however, a method to determine which window lengths to explore would be required. The proposed theory would also benefit from having an automated method to assimilate the cluster solutions performed on different frequency bands; however, the intuitive, graphical method presented herein demonstrates the feasibility of the proposed clustering scheme for mining large chemical process data sets. Literature Cited (1) Papazoglou, I. A.; Nivolianitou, Z.; Aneziris, O.; Christou, M. Probabilistic Safety Analysis in Chemical Installations. J. Loss PreV. Process Ind. 1992, 5 (3), 181-191. (2) Kourti, T.; MacGregor, J. F. Statistical Process Control of Multivariate Processes. Control Eng. Pract. 1995, 3 (3), 403-414. (3) Everitt, B. S. Cluster Analysis; Heinemann Education: London, 1993. (4) MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, 1967. (5) Jackson, J. E. A User’s Guide to Principal Components; Wiley: New York, 1991. (6) Kaspar, M. H.; Ray, W. H. Chemometric Methods for Process Monitoring and High-Performance Controller Design. AIChE J. 1992, 38, 1593-1608. (7) Ku, W.; Storer, R. H.; Georgakis, C. Disturbance Detection and Isolation by Dynamic Principal Component Analysis. Chemom. Intell. Lab. Syst. 1995, 30, 179-196. (8) Krzanowski, W. J. Between-Groups Comparison of Principal Components. J. Am. Stat. Assoc. 1979, 74, 703-707. (9) Johannesmeyer, M. C.; Singhal, A.; Seborg, D. E. Pattern Matching in Historical Data. AIChE J. 2002, 48, 2022-2038. (10) Srinivasan, R.; Wang, C.; Ho, W. K.; Lim, K. W. Dynamic Principal Component Analysis Based Methodology for Clustering Process States in Agile Chemical Plants. Ind. Eng. Chem. Res. 2004, 43, 21232139. (11) Beaver, S.; Palazoglu, A. A Cluster Aggregation Scheme for Ozone Episode Selection in the San Francisco, CA Bay Area. Atmos. EnViron. 2006, 40, 713-725. (12) Tipping, M. E.; Bishop, C. M. Probabilistic Principal Component Analysis. J. R. Stat. Soc. B 1999, 61, 611-622. (13) Dempster, A. P.; Laird, N. M.; Rubin, D. B. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc. B 1977, 39, 1-38. (14) Bakshi, B. R. Multiscale PCA with Application to Multivariate Statistical Process Monitoring. AIChE J. 1998, 44, 1596-1610. (15) Daubechies, I. Ten Lectures on WaVelets; Society for Industrial and Applied Mathematics: Philadelphia, 1992.

3622

Ind. Eng. Chem. Res., Vol. 46, No. 11, 2007

(16) Hotelling, H. Analysis of a Complex of Statistical Variables into Principal Components. J. Educ. Psych. 1933, 24, 417-441. (17) Ward, J. H. Hierarchical Grouping to Optimise an Objective Function. J. Am. Stat. Assoc. 1963, 58, 236-244. (18) Kalkstein, L. S.; Tan, G.; Skindlov, J. A. An Evaluation of Three Clustering Procedures for Use in Synoptic Climatological Classification. J. Clim. Appl. Meteorol. 1997, 26, 717-730. (19) Joe, Y. Y.; Wang, D.; Romagnoli, J. A.; Tay, A. Robust and Efficient Joint Data Reconciliation-Parameter Estimation Using a General-

ized Objective Function. In Proceedings of the 7th International Symposium on Dynamics and Control of Process Systems, Cambridge, MA, 2004.

ReceiVed for reView April 28, 2006 ReVised manuscript receiVed March 5, 2007 Accepted March 23, 2007 IE060544V