Ind. Eng. Chem. Res. 2003, 42, 6145-6154
6145
Multiple-Fault Diagnosis Based on System Decomposition and Dynamic PLS Gibaek Lee* Department of Industrial and Engineering Chemistry, ChungJu National University, Chungju, Chungbuk 380-702, Korea
Sang-Oak Song and En Sup Yoon School of Chemical Engineering, Seoul National University, Seoul 151-742, Korea
This study suggests a hybrid method of signed digraph and partial least squares for the fault diagnosis of chemical processes. Using the local qualitative relationships of each variable in a signed digraph, a process is decomposed into subprocesses to allow for the reliable diagnosis of multiple faults. A partial least-squares model is built for the estimation of each measured variable in each decomposed subprocess. To handle the process dynamics accurately, the input of the partial least-squares model uses both the current values and the past values of the process variables. In each case, the measured and estimated values are compared to diagnose the fault. A case study of a continuously stirred tank reactor illustrates the conclusion that the proposed method greatly improves the accuracy, resolution, and reliability of the diagnosis. Introduction Because of the many serious accidents that have occurred in the history of the chemical process industry, safety has become a top priority for companies associated with this industry. Such accidents result in an economic loss of over $16 billion/year for the petrochemical industry in the U.S.1 The impact of such accidents on the economy includes costs of several billion dollars due to personal injuries, loss of production, and reduction in market share for the companies involved. Therefore, the safe and reliable operation of their chemical processes has become one of the primary concerns of chemical companies attempting to survive in the highly competitive international market. For the above reasons, automatic fault diagnosis systems are very much in demand, to help operators with decision making and to keep these operations running continuously, while simultaneously being both efficient and safe. Such systems are used to analyze process data on-line, monitor process trends, and diagnose faults when abnormal situations arise. Considering the characteristics of chemical processes, the requirements of any fault diagnosis methodology include such factors as speed, accuracy, resolution, robustness, portability, and reliability.2 Among these factors, reliability means that diagnosis should be performed for all predictable faults, including both novel faults and multiple faults. In most of the studies performed involving fault diagnosis, multiple faults have not been considered, because they rarely occur. However, disturbances such as changes in demand and product specifications often occur during the operation of a chemical process. Such disturbances cause the process state to change and subsequent faults might occur. Consequently, a reliable fault diagnostic method should be * To whom correspondence should be addressed. Tel.: +8243-841-5230. Fax: +82-43-841-5220. E-mail: glee@chungju. ac.kr.
Figure 1. Classification of diagnostic methods.
able to handle multiple faults. A multiple fault is defined as two or more faults that happen simultaneously or sequentially, and multiple faults can be classified into four categories, including induced faults, independent multiple faults, dependent multiple faults, and masked multiple faults.3 Among these categories, a masked multiple fault is a series of faults for which all symptoms can be explained on the basis of the others; such faults cannot be recognized using the qualitative method only. Diagnostic methods for chemical processes are broadly classified as those that use a process model and those that rely on process history data, and these methods can be subclassified as qualitative or quantitative (Figure 1).4 Using the signed digraph (SDG) aproach, which is classified as a qualitative model-based method, this study suggests a system decomposition method that can be used to facilitate the diagnosis of masked multiple faults. Moreover, the partial least-squares approach (or projection to latent structures, PLS5), classified as a quantitative process-history-data-based method, is applied to the decomposed subsystem. Although the application of PLS to highly nonlinear
10.1021/ie030084v CCC: $25.00 © 2003 American Chemical Society Published on Web 10/25/2003
6146 Ind. Eng. Chem. Res., Vol. 42, No. 24, 2003
Figure 2. Process flow diagram of the CSTR process.6
problems such as high-purity, nonideal distillation columns can sometimes be inadequate since the method is linear, the suggested methodology has the advantages of improving diagnosis accuracy and resolution and allowing the reliable diagnosis of multiple faults. The proposed method will be illustrated using a case study of a heat exchanger and a CSTR wherein the irreversible first-order reaction A f B takes place (Figure 2). In this study, the diagnosis results will be compared with the results obtained using the qualitative method suggested in our previous study3. Previous Studies SDG. Signed digraphs or SDGs offer a simple and graphical representation of the cause and effect relationships between process variables and have been widely used for fault diagnosis.7 An SDG consists of nodes that represent process variables and arcs that show the causal relationship between nodes. The direction of the causal relationship is indicated by the sign of the arc. A positive (+) sign is used when the source node and target node move in the same direction, and a negative (-) sign is used when they move in opposite directions. The SDG of the CSTR process is shown in Figure 3. An SDG can be built on the basis of basic principles, experience, and process data, and it has good portability, so that the SDG of a target process can be automatically constructed from the SDG of each piece of equipment used in the process. In addition, with this method, it is easy to analyze and understand fault propagation paths. However, SDG-based methods are as difficult to use for the diagnosis of multiple faults as are other qualitative methods.3 PCA, DPCA, and PLS. Principal component analysis (PCA), a preferred method for dimensionality reduction, can be used for process modeling and information extraction.5,8,9 PCA produces linear combinations of the original variables to generate a new set of transformed variables called principal components (PCs) that describe the major variations in a data set. PCA decom-
Figure 3. Signed digraph for the CSTR process.6
poses the data matrix X into the product of score vector T and loading matrices P k
X)X h +E)
tipiT + E ) TPT + E ∑ i)1
(1)
In the above equation, the residual matrix E is the difference between X and X h , and k is the number of PCs. The pi vectors refer to the relations between the original variables and the PCs, as the coefficients of the linear combinations. The ti vectors are defined as the transformed values of the PCs evaluated for the original data (ti ) XpiT). Combining PCA with the ARMAX (autoregressive moving average with exogenous inputs) time series model led to the development of dynamic PCA (DPCA),
Ind. Eng. Chem. Res., Vol. 42, No. 24, 2003 6147
which incorporates both static and dynamic process characteristics and allows a much more accurate model of the autocorrelation data to be obtained.8 The data matrix, Xd, of DPCA includes both the current observation data matrix X and the previous l observations, so that those current values that depend on past values in a dynamic system can be identified
Xd ) [X(t) X(t - 1) ‚‚‚ X(t - l)]
(2)
By maximizing the covariance between the input data matrix X and the output matrix Y, PLS can predict the values of Y from the values of X.5,9-13 The PLS model consists of outer relations (X and Y blocks individually) and an inner relation (linking the two blocks). The scaled and mean-centered X and Y matrices are decomposed into score and loading vectors denoted by T and U, and P and Q, respectively. These decomposed vectors have the same form as the vectors used in PCA. The inner relationship between two blocks of X and Y is represented as a linear algebraic relation (U ) TB) between their scores
X ) TPT + E
(3)
Y ) UQT + F
(4)
Figure 4. Partial SDG focusing on TR.
where E and F are the residual matrices for X and Y, respectively. In PLS, the loading and score vectors are determined in such a way as to maximize the prediction accuracy of Y while describing a large amount of the variation in X. The most common method used to calculate the PLS model parameters is known as NIPALS for noniterative partial least squares.5 The prediction of Y from X is done according to the following regression model.12
Y ˆ ) XBPLS ) XW(PTW)-1BQT
(5)
In the above equation, BPLS is the coefficient of the PLS regression model, and the weight W is defined in the NIPALS algorithm by
T ) XW
(6)
Diagnosis Strategy System Decomposition. Two methods of system decomposition for fault diagnosis have been developed. One is a two-step diagnostic strategy that narrows the diagnostic focus to a particular decomposed subsystem and performs diagnosis on this subsystem.14-16 In the other method, a diagnosis knowledge base is developed separately for each subsystem, and diagnostic results are collected from all of the subsystems before the final conclusion is drawn.17,18 System decomposition has the advantages of providing flexible diagnosis throughout operating condition changes, reducing the size of the knowledge base, and simplifying the understanding of complex process to process interactions. The system decomposition method used in this study is based on the SDG. In an SDG, each arc represents the instantaneous effect produced by the source node on the target node. All source nodes connected to a particular target node by means of the arcs have a direct influence on that target node. That is, only the source nodes connected to a target node can affect the particular target node. Consider Figure 4, which shows the partial SDG focusing on TR (recycle temperature) in the
Figure 5. Reduced digraph of the CSTR process.
CSTR process. The nodes having a direct effect on TR are FR, T, FW, TWI, and U; any faults shown in the figure do not connect directly to TR. FR represents the direct effects resulting from VR-BH, VR-BL, RPBK, and PUMP-EF. Also, TWI and U are directly affected by CW-TCH and CW-TCL and by HX-PL, respectively. However, TWI and U cannot demonstrate the direct effects produced by CW-TCH, CT-TCL, and HX-PL, because these nodes are not measured nodes. For this reason, unmeasured nodes are not used in this study, and the digraph used is a reduced digraph, consisting of the original SDG with the unmeasured nodes removed. The SDG of the CSTR process is reduced to the digraph shown in Figure 5. After the unmeasured nodes are removed, the nodes having a direct effect on TR are the measured nodes of FR, T, and FW and the faults of CW-TCH, CW-TCL, and HX-PL. If these faults do not occur, only the source nodes of FR, T, and FW affect TR. This study proposes a system decomposition method centering on measured variables in an SDG. Each decomposed subprocess includes a central measured variable (referred to herein as the target variable), as well as measured variables (referred to herein as source variables) and faults connected to the target variable. Local fault diagnosis can be performed for each decomposed subprocess. Because fault diagnosis is locally executed for each measured variable, the fault diagnosis method, which is based on the suggested system de-
6148 Ind. Eng. Chem. Res., Vol. 42, No. 24, 2003
Figure 6. Reduced digraph of the decomposed subprocess for TR. Table 1. Fault Classification According to Equipment in the CSTR equipment sensor
valve pipe reactor pump heat exchanger external disturbance
operator disturbance
fault
abbreviation
failure/stuck high failure/stuck low bias/stuck high bias/stuck low bias/failure/stuck high bias/failure/stuck low blockage leak equipment failure fouling flow rate change high flow rate change low temperature change high temperature change low composition change high composition change low set point change high set point change low
FH FL BH BL BH BL BK LK EF FL FCH FCL TCH TCL CCH CCL SVCH SVCL
Table 2. Fault Added to the Measured Node in the CSTR measured variable
positive (+)
CA0 F0 T0 CL CR CT FP FR FW L T TR
FEED-CCH FEED-FCH FEED-TCH LC-SVCL FC-SVCH TC-SVCL VL-BH VR-BH, FS-BH VT-BH LS-BH, PUMP-EF TS-BH CW-TCH, HX-PL
sign of arc negative (-) FEED-CCL FEED-FCL, FP-BK FEED-TCL LC-SVCH FC-SVCL TC-SVCH VL-BL, PP-BK, PUMP-EF VR-BL, FS-BL, PUMP-EF, RP-BK VT-BL, WP-BK LS-BL, RX-LK TS-BL CW-TCL
composition, can be used to diagnose all types of multiple faults except for those multiple faults that affect the same measured nodes. For instance, in this example, the process is decomposed into 14 subprocesses centering on 14 measured variables (refer to Figure 5). The reduced digraph of the decomposed subprocess for TR is shown in Figure 6. This study defines physically feasible faults for each piece of equipment and adds them to the measured nodes, to handle only physically meaningful faults.3 The faults defined for each piece of equipment of the CSTR are listed in Table 1. The defined faults, except for 28 independent sensor faults, are added to each measured node as shown in Table 2. The combination of the type of fault and the name of the piece of equipment composes the name of the fault. For example, VL-BH means bias high for the level control valve (VL). The simple diagnosis method available using the decomposition method is to estimate the value of each target node using the measured values of the source
nodes connected to the target node. A substantial difference between the estimated and measured values implies the occurrence of one or more faults. Sensor faults occurring in the sensor corresponding to the source variables used for the estimation result in errors in the estimated values. Also, the occurrence of the faults added to the target node gives rise to errors in the measured values. For instance, an estimation of TR using the measured nodes of FR, T, and FW is available. When the estimated value of TR differs from the measured value, there are two types of faults that might be the cause of the problem. When faults in the sensors for FR, T, and FW, used in the estimation of TR, occur, the estimated value of TR contains an error. The faults of CW-TCH, CW-TCL, or HX-PL give rise to errors in the measured value of TR. Off-line Analysis. (1) Composition of the DPLS Model. The suggested fault diagnosis method is based on the PLS model built for each decomposed subprocess. The value of each target variable can be estimated by using PLS model, with the input X containing the source variables connected to the target variable. The output Y of the model is the estimated value of the target variable. To handle the process dynamics accurately, PLS is integrated with ARMAX, with the resultant method being referred to as dynamic PLS (DPLS). In addition to the past values of the source variables, the input of DPLS for a target variable includes the past values of the target variable. In the example of the CSTR process, the DPLS model based on the measured variables FR, FW, and T can be used to estimate the value of TR. If the current value and one previous value are used as input data, the input matrix X for the estimation of TR is formed as X)
[
FR(1) FW(1) T(1) TR(0) FR(0) FW(0) T(0) FR(1) FW(1) FR(2) FW(2) T(2) TR(2) T(1) l l l l l l l FR(t) FW(t) T(t) TR(t - 1) FR(t - 1) FW(t - 1) T(t - 1)
]
(7)
(2) Learning Data. Operational data are needed to build the DPLS model. Methods that require information corresponding to a faulty situation cannot be used to diagnose faults that have not previously occurred. This can represent a major disadvantage for their application to real processes. However, in the method proposed in this study, each DPLS can be built from the data set representing the local relations between the input variables and the output variable of the DPLS models. Therefore, the required data set for each DPLS model can be easily obtained. The available data sets can be obtained in the presence of set-point changes or external disturbances that occur frequently. Therefore, the proposed method does not need a faulty case data set, which would otherwise be difficult to obtain. In the preparation of the learning data set, one point in particular should be considered. Assume that a disturbance is used to obtain a learning data set. When this data set is used for the model of the target variable directly connected to the disturbance, the resulting model includes the direct effect produced by this disturbance, and therefore, it cannot detect the occurrence of this disturbance. Thus, the data set obtained in the presence of a particular fault should not be used to
Ind. Eng. Chem. Res., Vol. 42, No. 24, 2003 6149
produce the model for a measured variable that is directly affected by the same fault. For example, the data sets obtained in the case of the disturbances of CWTCH and CW-TCL cannot be used for the DPLS model of TR (Figure 6). (3) Determination of Model Parameters. The necessary number of past values (time lags) l and the principal components (PCs) are determined from the learning data. The number l is usually 1 or 2, which indicates the order of the dynamic system. The design method is analogous to the method used in the study of dynamic PCA8 and uses the cross-corelation plots of the scores to determine the number of PCs. The input matrix X (refer to eq 7) is prepared with the determined number of time lags, and the DPLS model is developed by means of the multivariate statistical package PlantAnalyst. Using process history data, the input (source) variables of a DPLS model that have little effect on the output (target) variable of the same model can be confirmed. That is, a mistake made during the preparation of the cause and effect relationships in an SDG can be statistically corrected through the analysis of the DPLS models. This procedure can increase the resolution of the diagnosis. This study uses the scaled regression coefficients to obtain the magnitude of the effects produced by the input variables on the output variable. In the example of the CSTR process, the relationship between the input and output variables having the smallest scaled regression coefficient is that of the input variable FR used for the estimation of output variable L. The regression coefficient is 0.012% with a value of l ) 2, and FR is removed from the input of the model to estimate L. This can be explained by the fact that the arc from FR to L in the original SDG has the signs of + and -. However, diagnosis will fail if any meaningful relationships are erased. Therefore, the correction of the SDG based on the statistical data should be carefully determined considering the resolution as well as the accuracy of the diagnosis. (4) Preparation of the Fault Set. Fault detection is performed by the observation of the residual, which is the difference between the estimated value determined by DPLS and the measured value
ri ) yi - yˆ i
(8)
ri is the residual of variable i, and yi and yˆ i are the measured and estimated values of variable i, respectively. The equation for the estimation has the form of eq 5. For example, the regression equation for TR of eq 7 is as follows
yˆ TR(1) ) b1FR(1) + b2FW(1) + b3T(1) + b4TR(0) + b5FR(0) + b6FW(0) + b7T(0) (9) A qualitative state that correspond to ranges of possible values for the residual becomes an attribute of the residual. We will consider methods that use three ranges: low, to which the qualitative state - is assigned; normal, assigned 0; and high, assigned +. If a fault occurs, the qualitative state for the residual can be + or -. The abnormal qualitative state for the residual becomes a symptom, which is expressed as the pair of the target variable and the qualitative state of the residual. Faults inducing the abnormality of each
Table 3. Fault Sets of the CSTR Process symptom
fault set
CA0(+) CA0(-) F0(+) F0(-) T0(+) T0(-) CL(+) CL(-) CR(+) CR(-) CT(+) CT(-) FP(+) FP(-) FR(+) FR(-) FW(+) FW(-) L(+) L(-) T(+) T(-) TR(+) TR(-) CA(+) CA(-) CB(+) CB(-)
FEED-CCH FEED-CCL FEED-FCH FEED-FCL, FP-BK FEED-TCH FEED-TCL LC-SVCL LC-SVCH FC-SVCH FC-SVCL TC-SVCL TC-SVCH VL-BH, FS-BH VL-BL, PP-BK, PUMP-EF, FS-BL VR-BH, FS-BH VR-BL, FS-BL, PUMP-EF, RP-BK VT-BH VT-BL, WP-BK LS-BH, PUMP-EF LS-BL, RX-LK TS-BH, FS-BH, LS-BL TS-BL, FS-BL, LS-BH CW-TCH, HX-PL, FS-BL, TS-BL CW-TCL, FS-BH, TS-BH LS-BH, TS-BH LS-BL, TS-BL LS-BL, TS-BL LS-BH, TS-BH
residual are classified along with their symptoms, and the classified faults are stored in a set (called a fault set). Also, faults can be classified into two types. One type of fault is a fault added to the target variable. The sign of the symptom produced by such faults is the same as that of the arc connecting the fault to the output variable. For instance, CW-TCH added to TR is stored in a fault set for TR(+), because the sign of the arc from CW-TCH to TR is +. The other type of faults is the sensor faults occurring in the sensor corresponding to the source variables in the DPLS model. The sign of the symptom produced by such faults is the inverse of the product of the sign of the sensor fault and the sign of the arc connecting the source variable to the target variable. For example, the sign of the arc connecting T to TR is +, and the sign of TS-BL is -. Therefore, TSBL is stored in a fault set for TR(+) as the inverse of the product of + and -. The fault sets for the example process are shown in Table 3. On-line Diagnosis. (1) Detection Method. The proposed method has to monitor the residuals to detect their qualitative changes of state. The most common detection methods are based on statistical methods, such as the Shewart chart, the moving average method, and CUSUM. This study uses CUSUM, which has a recurrent computation form suitable for real-time analysis and does not need filtering. CUSUM uses two parameters referred to as the minimal jump size and the threshold size. It is known that a false detection can occur if a smaller threshold than necessary is used, and that the diagnosis fails if the threshold being used is too large. This study uses 6σ of the residual distribution as the minimal jump size and 3σ of the CUSUM distribution as the threshold.3 (2) Diagnosis Using Fault Set. The basic diagnostic strategy is to obtain the minimum set of faults that can explain all of the detected symptoms. That is, if one fault cannot explain all of the detected symptoms, a multiple fault will be sought. The diagnosis procedure is as follows:
6150 Ind. Eng. Chem. Res., Vol. 42, No. 24, 2003 Table 4. Number of Principal Components (PCs) and Time Delays of the CSTR Process
Figure 7. Example of the fault diagnosis procedure.
(a) Search of the Initial Fault Candidates. The first diagnosis step is to obtain the set of detected symptoms. The detected residual of a variable becomes an element in the set. The next step is to search all possible single-fault candidates that are the initial fault candidates. They are obtained from the list of the fault sets that correspond to the detected symptoms (refer to the example of diagnosis). The union of the fault sets in this list becomes the set of initial fault candidates. Among the elements of the initial fault candidates set, each sensor fault is preexamined to determine whether the residual of the measured variable affected directly by the sensor fault has the same sign as the sensor fault. If so, it is retained; otherwise, this sensor fault is removed from the initial fault candidates set. This is because the measured value of a sensor is affected immediately in the faculty case of the sensor. If the removed sensor faults belong to the fault sets in the list of fault sets, those faults are removed from the fault sets. (b) Search of the Single-Fault Candidates. After the set of initial fault candidates is obtained, it is examined to determine whether a single fault can explain all of the detected symptoms. To determine how many symptoms each fault candidate, F, can explain, the number of fault sets contained in the list of fault sets that involve F is calculated. The resulting number of fault sets is defined as nES (number of explained symptoms). Those fault candidates having the largest nES values are included in the set of first fault candidates. If the nES value of the first fault candidates is the same as the number of detected symptoms, this implies that the first fault candidates can explain all of the detected symptoms. In this case, the set of first fault candidates becomes the set of final fault candidates. If all of the detected symptoms cannot be explained by single faults of the first fault candidates, multiple faults will be sought in the next step. (c) Search of the Multiple-Fault Candidates. The multiple-fault search is repeated until all of the detected symptoms can be explained. The search starts with the fault candidate explaining the maximum number of symptoms. The first element in the set of first fault candidates is selected. Once the fault sets including the selected first fault candidate are removed from the list of fault sets, the resulting reduced list of fault sets is assigned to the selected first fault candidate. The union of the fault sets in the reduced list of fault sets becomes the set of multiple-fault candidates for the selected first fault candidate. The nES of each fault in the obtained multiple-fault candidates set is calculated. Those fault candidates having the largest nES value are included in the second fault candidates set for the selected first fault candidate. For example, assume that F1 is a first fault candidate and that the second fault candidates for F1 are F4 and F5. In this case, the first fault candidate is linked to the second fault candidates as shown Figure 7. When the nES value of the second fault candidate is
measured variable
number of PCs
number of time delays
FP FR FW L T TR CA CB
2 2 1 4 7 3 5 5
1 1 1 2 2 1 2 2
the same as the number of fault sets in the list of fault sets assigned to the first fault candidate, double faults of the first and second fault candidates can explain all symptoms. Therefore, the combination of the first and second fault candidates becomes the final fault candidates. In the example of Figure 7, double faults of {F1, F4} and {F1, F5} becomes the elements in the final fault candidate set. This step should be repeated for the remaining first fault candidates, also. If all of the detected symptoms cannot be explained in this way, triple faults are sought by repeating step c. Example and Discussion Process and Diagnosis Description. The example process is simple but displays the most common characteristics of industrial processes. This process was originally used by Kramer and is simulated by the model of Sorsa.6,19 The sampling interval is 5 s, and the total diagnosis time is 2000 s. The first or single fault occurs at 100 s, and the second fault occurs at 100 or 200 s. To compare the diagnosis results of the method proposed in this study with those of the fault-effect tree model suggested in our previous study, the same fault situations are selected.3,20 These are 15 single faults and 37 double faults that are generated randomly. In total, eight DPLS models for the eight measured variables are obtained. As the learning data for each PLS model, this example uses seven data sets obtained in the presence of set-point changes or external disturbances. These are situations that occur frequently in normal operation. The number of time lags and the PCs for the DPLS model of the CSTR process are shown in Table 4. The same data are used to determine the CUSUM parameters of minimal jump size and threshold size. Because F0, T0, and CA0 are not effected by other variables in the process, the DPLS models used for the estimation of the variables are not built. CUSUM is used to interpret the values of the variables into their qualitative states as the SDG-based methods. Traditionally, the controller output does not need to be monitored to detect set-point changes, because the set point can be input to the diagnostic system. However, because the set points of three of the controllers used in this process are not measured, the corresponding controller outputs need to be monitored. For this reason, three control outputs (CR, CL, and CT) are estimated by means of PLS models using the error and integration error of the controller as the input data for the models. The constraint variable suggested in our previous study of the fault-effect tree model is used to decrease the number of fault candidates in the diagnosis results, which leads to enhanced resolution.3 It represents the quantitative governing equation, such as the balance equation and valve relation as variables. The previous study used a mass balance and two control valve
Ind. Eng. Chem. Res., Vol. 42, No. 24, 2003 6151 Table 5. Detection Sequence of Symptoms in Case Study time(s)
detected symptoms
105 110 120 145 175 200
L(-) L(-), FR(-), FP(-) L(-), FR(-), FP(-), TR(+) L(-), FR(-), TR(+) L(-), FR(-), TR(+), CA(-) L(-), FR(-), TR(+), CA(-), CB(+)
nES(LS-BL) ) 3 nES(RX-LK) ) 1 nES(VR-BL) ) 1 nES(FS-BL) ) 2
equations for CSTR. Because the control valve equations are expressed by means of DPLS models, this study uses only the mass balance equation. Reactor leakage (RXLK) is a root cause of positive deviations of the constraint variable, DF
DF ) FO - FP
becomes the first fault candidate. As the number of detected symptoms is 5, double faults should be sought.
(10)
Example: Level Sensor Bias Low and Flow Rate Sensor Bias Low. In this double-fault case, while the level sensor is biased to low at 100 s, the recycle flow rate sensor is biased to low at the same time. The diagnosis procedure will be explained using the data obtained at 200 s. The detection sequence of the symptoms is shown in Table 5. Figure 8 shows the measured/estimated values and residuals of four detected variables. The thick lines in parts a, c, e, and g of Figure 8 represents the measured value, and the thin lines represents the estimated value. The bounds of in parts b, d, f, and h of Figure 8 are 3σ of the residual. When L(-) is detected at 105 s, the final fault candidates are the two single faults of LS-BL and RX-LK. At 110 s, FR(-) and FP(-) are detected, and four double faults, including the true solution, are suggested as the final fault candidates. The symptom of FP(-) is missed from 145 s. The symptom of TR(+) is detected at 120 s, CA(-) at 175 s, and CB(+) at 200 s. The detected symptoms are L(-), FR(-), TR(+), CA(-), and CB(+). The fault sets for the detected symptoms are listed as follows
{LS-BL, RX-LK} {VR-BL, FS-BL, PUMP-EF, RP-BK} {CW-TCH, HX-PL, FS-BL, TS-BL} {LS-BL, TS-BL} {LS-BL, TS-BL} The set of initial fault candidates is obtained from the union of the fault sets in the list, that is
{LS-BL, RX-LK, VR-BL, FS-BL, PUMP-EF, RP-BK, CW-TCH, HX-PL, TS-BL} Among the initial fault candidates, TS-BL is removed because the residual of T(-) is not detected. TS-BL is also removed from the third, fourth, and fifth fault sets, leaving the following list of fault sets:
{LS-BL, RX-LK} {VR-BL, FS-BL, PUMP-EF, RP-BK} {CW-TCH, HX-PL, FS-BL} {LS-BL} {LS-BL} The nES of each initial fault candidate is obtained as follows. LS-BL, which has the largest nES value of 3,
nES(PUMP-EF) ) 1 nES(RP-BK) ) 1 nES(CW-TCH) ) 1 nES(HX-PL) ) 1 Among the first fault candidates, LS-BL is selected. Once the fault sets that include LS-BL are removed from the list of fault sets, the following reduced list of fault sets is assigned to LS-BL
{VR-BL, FS-BL, PUMP-EF, RP-BK} {CW-TCH, HX-PL, FS-BL} From the union of the fault sets present in the list of fault sets assigned to LS-BL, the set of multiple-fault candidates for LS-BL is {VR-BL, FS-BL, PUMP-EF, RPBK, CW-TCH, HX-PL}. The nES values of the faults in the obtained multiple-fault candidates set are calculated as follows.
nES(VR-BL) ) 1 nES(FS-BL) ) 2 nES(PUMP-EF) ) 1 nES(RP-BK) ) 1 nES(CW-TCH) ) 1 nES(HX-PL) ) 1 FS-BL, which has the largest nES value of 2, becomes the second fault candidate for LS-BL. Because the nES value of FS-BL is equal to the number of fault sets present in the fault sets list assigned to LS-BL, the double fault of LS-BL and FS-BL becomes a final fault candidate. Although the diagnosis results based on the faulteffect tree model were accurate, they gave a tier of 2 and a resolution of 10 in the worst-case diagnosis.3 The proposed method shows a much better diagnosis resolution, having a value of 1 for most of the diagnosis time. Results of Single-Fault Cases. To measure the diagnostic performance, the four parameters of accuracy, resolution, wrong detection, and detection delay are used. The accuracy is 1 if the diagnosis is accurate, that is, if the true fault is included in the final fault candidates set. Otherwise, the accuracy is 0. Consider a single-fault case. The diagnosis can be considered to be accurate if it suggests the occurrence of a multiple fault that includes the true single fault. The resolution denotes the number of final fault candidates. Even though the diagnosis might be accurate, a diagnostic result giving many fault candidates implies poor diagnostic performance. Both accuracy and resolution will be used to compare the diagnostic performance of the proposed method with that of the fault-effect tree model.3 Wrong detection refers to the number of falsely
6152 Ind. Eng. Chem. Res., Vol. 42, No. 24, 2003
Figure 8. Measured/estimated values and residuals of the detected variables in the case study.
detected symptoms independent of the true solution. A wrong detection value of more than 1 means multiple faults involving both true faults and other faults are included in the top tier as the solution. Detection delay refers to the time from fault occurrence to fault diagnosis, and a short detection delay means rapid detection and diagnosis. This parameter must be compromised with wrong detection, because quick response might bring out false detection and diagnosis during normal operation.
In all selected single-fault cases, the accuracy is 1. Because there are no falsely detected symptoms, wrong detection is 0, and the true single fault is in the top tier. Table 6 compares the resolution of the diagnosis obtained using the proposed method with that of the fault-effect tree model. The resolution shown in this table is the average of the resolution measured every 5 s from the initial detection time to the last diagnosis time. The value in parentheses refers to the worst result obtained during the entire diagnostic period. For the
Ind. Eng. Chem. Res., Vol. 42, No. 24, 2003 6153 Table 6. Diagnosis Results for Selected Single Faults resolution fault
description
detection delay (s)
suggested method
fault-effect tree
LC-SVCH FEED-FCH FEED-CCL FEED-TCL RX-LK WP-BK FC-SVCH VR-BL VL-BH TC-SVCH VT-BH CW-TCH LS-BL TS-BH FS-BL
level controller set point change high feed flow rate change high feed composition change low feed temperature change low reactor leaking cooling water pipe blockage flow rate controller set point change high recycle flow rate C/V bias low level C/V bias high temperature controller set point change high temperature C/V bias high cooling water temperature high level sensor bias low temperature sensor bias high recycle flow rate sensor bias low
5 15 5 10 5 20 10 10 10 5 10 20 5 5 5
1 (1) 1 (1) 1 (1) 1 (1) 1.16 (2) 2 (2) 1 (1) 4 (4) 1 (1) 1 (1) 1 (1) 2 (2) 1.03 (2) 1 (1) 1.03 (4)
1 (1) 1 (1) 1 (1) 1 (1) 1 (1) 3 (3) 1 (1) 4 (4) 2.04 (4) 1 (1) 2 (2) 2 (2) 1 (1) 1 (1) 4 (4)
case of VL-BH, our previous study based on the faulteffect tree resulted in a solution involving a double fault for 45 s,3 but the proposed method shows more robust results during all diagnostic periods. Although the resolution for the two cases of RX-LK and LS-BL is worse than that provided by our previous method, the resolution obtained with the proposed method is better for the four cases of WP-BK, VL-BH, VT-BH, and FSBL. Also, the detection delays for these faults are listed in Table 6. The results show that it is possible for the proposed method to perform early detection as well as to reduce wrong detections. Results of Double-Fault Cases. Our previous qualitative method for nine cases (5, 8, 9, 13, 21, 24, 27, 29, 33) failed, because these cases are masked multiple faults in which one fault can explain the symptoms produced by the other faults. However, the proposed method successfully diagnosed 36 of 37 multiple-fault cases during all diagnostic periods. The diagnosis for only one case (5) of RX-LK and PP-BK failed, because the process variations resulting from PP-BK (blockage of the product pipe) were too weak for the symptoms of PP-BK to be detected. Although 33 of 36 cases showed a wrong detection of 0, the diagnosis for the remaining three cases (1, 15, and 28) included the detection of a wrong symptom not related to the true solution. For example, consider case 28 of FEED-FCH and LS-BL. In the example, the sequential detection of L(-), F0(+), CA(-), T(+), and CB(+) gave the true solution. However, DF(+) was detected (from 225 to 735 s) because of the combined effect of two faults. The final fault candidate obtained for this 510-s period is a triple fault of FEED-FCH, LSBL, and RX-LK. These wrong detections were due to the use of smaller CUSUM parameters than necessary. To prevent wrong detection, the CUSUM parameters were doubled for the diagnosis. With the new parameters, the wrong detection of these three cases did not occur. Although fault detection for most cases was delayed for 5-30 s, the diagnosis accuracy and resolution for the 33 cases were not affected, being the same as those obtained with the previous parameters. Therefore, despite the results for single-fault cases, these cases mean the detection parameters should be determined considering the tradeoff between wrong detection and time delay. Among the 33 cases showing a wrong detection of 0, 24 cases showed a resolution of less than 2 in the worstcase diagnosis. A resolution value of 1 or 2 means that
Table 7. Diagnosis Results for Selected Double Faults detection delay (s)
resolution
no.
first fault
second fault
suggested method
fault-effect tree
6 7 10 13 14 17 19 20 31
15 5 35 10 5 5 10 10 10
10 10 10 10 10 10 15 10 10
2.23 (4) 1.09 (4) 1.01 (4) 3.84 (4) 4.0 (4) 3.82 (4) 1.17 (4) 8 (8) 4 (4)
5.38 (6) 4.87 (10) 8.42 (10) failed 5 (5) 3.86 (4) 6.75 (7) 10 (10) 4 (4)
the result showed the highest performance; hence, no comparison is required. Excluding these 24 cases, Table 7 shows the diagnosis results of nine double-fault cases. It compares the resolution of the diagnosis with that obtained with the fault-effect tree model. The table shows that the resolution obtained by the proposed method is better than that obtained by the fault-effect tree model. Case 20 of FV-BL and WP-BK showed the worst resolution value of 8. In this case, the detected symptoms are FW(-) and FR(-) during all diagnostic periods. The fault sets of {VT-BL, WP-BK} and {VRBL, FS-BL, PUMP-EF, RP-BK} are obtained from FW(-) and FR(-), respectively. Thus, eight (2 × 4) double faults became the final fault candidates. Conclusion This study is about fault diagnosis by the hybrid method of signed digraphs and partial least squares. The target system is decomposed on the basis of the local causal relationships of each variable in the signed digraph, and the local diagnosis is performed using a DPLS model built for each decomposed subprocess. The diagnostic performance of the system was successfully illustrated by means of a diagnosis example involving a continuously stirred tank reactor. The proposed method has the following advantages: (1) System decomposition based on an SDG enables local fault diagnosis to improve the ability to diagnose multiple fault. Also, it can improve diagnosis portability, thus rendering system development easier. (2) The proposed diagnostic method improves the diagnosis resolution and accuracy compared to the previous qualitative methods. Moreover, it enhances the reliability of the diagnosis for all predictable faults, including multiple faults.
6154 Ind. Eng. Chem. Res., Vol. 42, No. 24, 2003
(3) The proposed method allows the diagnosis model to be built from easily obtainable data sets and does not require faulty case data sets, which are rarely available. Through case studies of multiple faults, it was found that false detection can decrease diagnosis performance. Although increasing the CUSUM parameters can prevent the problem of false detection, it can also cause the diagnosis to be delayed. Therefore, the parameters should be determined according to the relative importance of wrong detection and time delay. In the future, the proposed diagnostic method needs to be extended to highly nonlinear processes,12,21-23 and the method used for determining the CUSUM parameters needs to be improved. Acknowledgment This work was supported by Grant No. (R05-2002000-00057-0) from the Basic Research Program of the Korea Science & Engineering Foundation. Notation BPLS ) matrix of PLS regression coefficient bi ) PLS regression coefficient E ) residual matrix for X F ) residual matrix for Y k ) number of principal components l ) number of time delays P ) loading matrix for X Q ) loading matrix for Y ri ) residual of variable i T ) score matrix for X U ) score matrix for Y W ) weight matrix for X X ) input matrix Y ) output matrix Y ˆ ) predicted output matrix yi ) measured value of variable i yˆ i ) estimated value of variable i
Literature Cited (1) Nimmo, I. Adequately Address Abnormal Operations. Chem. Eng. Prog. 1995, 91, 36-45. (2) Finch, F. E. Automated Fault Diagnosis of Chemical Process Plants Using Model-based Reasoning. Sc.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, 1989. (3) Lee, G.; Lee, B.; Yoon, E. S.; Han, C. Multiple Fault Diagnosis under Uncertain Conditions by the Quantification of Qualitative Relation. Ind. Eng. Chem. Res. 1999, 38, 988-998. (4) Venkatasubramanian, V. Managing Abnormal Situations in Process Plants: the Next Control Frontier. In Proceedings of PSE Asia 2000; Kyoto, Japan, 2000; pp 597-616. (5) Geladi, P.; Kowalski, B. R. Partial Least-Squares Regression: A Tutorial. Anal. Chim. Acta 1986, 195, 1-17.
(6) Kramer, M. A.; Palowitch, B. L. A Rule-Based Approach to Fault Diagnosis Using the Signed Directed Graph. AIChE J. 1987, 33, 1067-1078. (7) Iri, M.; Aoki, K.; O’Shima, E. An Algorithm for Diagnosis of System Failures in the Chemical Process. Comput. Chem. Eng. 1979, 3, 489-493. (8) Ku, W.; Storer, R. H.; Georgakis, C. Disturbance Detection and Isolation by Dynamic Principal Component Analysis. Chemom. Intell. Lab. Syst. 1995, 30, 179-196. (9) MacGregor, J. F.; Kourti, T.; Statistical Process Control of Multivariate Processes. Control Eng. Pract. 1995, 3, 403-414. (10) Kourti, T.; Lee, J.; MacGregor, J. F. Experiences with Industrial Applications of Projection Methods for Multivariate Statistical Process Control. Comput. Chem. Eng. 1996, 20, S745S750. (11) Shi, R.; MacGregor, J. F. Modeling of Dynamic Systems Using Latent Variable and Subspace Methods. J. Chemom. 2000, 14, 423-439. (12) Baffi, G.; Martin, E. B.; Morris, A. J. Nonlinear Dynamic Projection to Latent Structures Modelling. Chemom. Intell. Lab. Syst. 2000, 52, 5-22. (13) Yoon, S.; MacGregor, J. F. Fault Diagnosis with Multivariate Statistical Models Part I: Using Steady-State Fault Signatures. J. Process Control 2001, 11, 387-400. (14) Becraft, W. R.; Guo, D. Z.; P. L. Lee; Newel, R. B. Fault Diagnosis Strategies for Chemical Plants: A Review of Competing Technologies. In Proceedings of PSE ’91; Montebello, Canada, 1991; Vol. 2, pp 12.1-12.15. (15) McDowell, M.; Davis, J. F. Managing Qualitative Simulation in Knowledge-Based Chemical Diagnosis. AIChE J. 1991, 37, 569-580. (16) Finch, F. E.; Kramer, M. A. Narrowing Diagnostic Focus Using Functional Decomposition. AIChE J. 1988, 34, 25-36. (17) Nam, D. S. Fault Diagnosis Using the Extended SymptomFault Association Model in Continuous Chemical Processes. Ph.D. Dissertation, Seoul National University, Seoul, South Korea, 1995. (18) Lee, G.; Yoon, E. S. A Process Decomposition Strategy for Qualitative Fault Diagnosis of Large-Scale Processes. Ind. Eng. Chem. Res. 2001, 40, 2474-2484. (19) Sorsa, T.; Koivo, H. N. Neural Networks in Process Fault Diagnosis. IEEE Trans. Syst., Man, Cybernetics 1991, 21, 815825. (20) Lee, G. A Study on Process Fault Diagnostic Systems Using the fault-effect Tree Model. Ph.D. Dissertation, Seoul National University, Seoul, South Korea, 1997. (21) Baffi, G.; Martin, E. B.; Morris, A. J. Nonlinear Projection to Latent Structures Revisited: The Quadratic PLS Algorithm. Comput. Chem. Eng. 1999, 23, 395-411. (22) Liu, J.; Min, K.; Han, C.; Chang, K. S. Robust Nonlinear PLS Based on Neural Networks and Application to Composition Estimator for High-Purity Distillation Columns. Korean J. Chem. Eng. 2000, 17, 184-192. (23) Malthouse, E. C.; Tamhane, A. C.; Mah, R. S. H. Nonlinear Partial Least Squares. Comput. Chem. Eng. 1997, 21, 875-890.
Received for review January 29, 2003 Revised manuscript received September 10, 2003 Accepted September 22, 2003 IE030084V