402
Ind. Eng. Chem. Process Des. Dev. 1903, 22, 482-487
Literature Cited AIRCO, Inc., 575 Mountain Ave., Murray Hill, NJ 07974. 1982 pice. Baker, E, L,; bndrigan, p. J.; F ~ ~ M p. ,J.; hsteyns, B. j,;wozzi, p, E,; Skinner, H. G. Arch. Envlon. Health 1978, 33, 89-94. BenefieU, L. D.; I“Jl, C. W.; King, P. H. J . Water Pollut. Control Fed. 1977, 49, 269-279. Berkman, S.; Morreii, J. C.; Egioff, G. “Catalysis: Inorganic and Organic”; Reinhdd: New York, 1940; pp 906, 948-949. 956. 981-982, 1074. Beychok, M. R. Abstracts of Papers 188th National Meeting of the American Chemlcai Society, Atlantic City, NJ. Sept 1974; American Chemical Society: Washington. DC, 1974; FUEL 33. Brignac, P. J.; Yu, Tien-hua Anal. Lett. 1975, 8 , 315-322. Cavanaugh, E. C.; Corbett, W. E.; Page, G. C. “Environentai Assessment Data Base for Low/Medium-Btu GesificaUon Technology: Volume 11. AppendiCeS AP‘, EPA-60017-77-1256. US EPA, Office of Research and Deveiopment: Washington, DC, Nov 1977; pp D-18 to D-35. Chaitykyan, 0. A. “Copper-Catalytic Reactions”: Consultants Bureau: New York, 1966. Chemetals Corporatlon. Empire Towers, 7300 Ritchk Highway, Glen Burnie, MD 21081. 1982 price based on a truckload. Cotton, F. A.; Wllkinson. G. “Advanced Inorganic Chemistry, A Comprhensive Text”. 3rd revlsed ed.;Wiley: New York. 1972 p 799. Danner, D. J.; Brignac. P. J. Jr.; Arceneaux, D.; Patei, V. Arch. Blochem. BiophvS. 1973, 158, 759-763. Eisenhauer, H. R. J. Water PONut. Control Fed. 1988, 40, 1887-1899. Eisenhauer, H. R. J. Wafer PolM. ControlFed. 1964, 36, 1116-1128. Ficek, K. J.; Boll, J. E. Aqua (London) 1980, No. 7, 153-156. GeISsrt”, T. A. “Principles of Organic Chemistry”, 4th ed.,W. H. Freeman and Co.; San Francisco, 1977; pp 678-877. Hay, A. S. J. Powmn. Sci. 1982, 58, 581-591. Hay, A. S.; Banchard. H. S.; Endres, G. F.; Eustance, J. W. J. Am. Chem. SOc. 1959. 87, 6335-8336.
Hoiladay, D. W.; Hancher, C. W.; Scott, C. D.; Chilcote. D. D. J. Wafer Po#ut. Confrol Fed. 1978, 5 0 , 2573-2589. Joschek, H.-I.; Miller, S. I.J . Am. Chem. SOC. 1988, 88, 3273-3281. Katzer, J. R.; Ficke. H. H.; Sadana. A. J . Water Polluf. ConfrolFed. 1976, 48, 920-933. Keating, E, J,; Brown, R, Cxeenberg, E, s, In a,Proceedings of the 33rd Industrial Waste Conference, Purdue University, Lafayette, Indiina, May 1978”, Bell, J. M., Ed.; Ann Arbor Science Publishers, Inc.: Ann Arbor. MI, 1979; pp 464-470. Luthy’ ” G’; Talbnv J’ T’ WaferRes. 1980* 7 4 p1289-1282. Mellow, J. W. “A Comprehensive Treatise on Inorganic and Theoretical Chemistry”; Longman. (;reen & Co.: London, 1923; Voi. 111. pp 157-158. Ohta, H.; Goto, S.; Teshima, H. Ind. Eng. Chem. Fundam. 1980, 79, 180-185. prog‘ 1872’ 68’ 72-77’ Pra*’ L’ A‘ Price, C. C.; Nakaoka, K. Macromolecules 1971, 4 , 363-369. Remy, H. “Treatlse on Inorganic Chemistry”; Elsevier: New York, 1956; Voi. 11, pp 379-383. Rogic, M. M.; Demmin, T. R. J. Am. Chem. Soc., 1978, 100, 5472-5487. Schouten, A. J.; Chaila. G. J. Mol. Cafal. 1980, 9 , 41-50. Throop, W. M. J. HazardMafer. 1977, 7 , 319-329. Vela, G. R.; Ralston, J. R. Can. J . Micr&lol. 1978, 24, 1366-1370. Weast, R. C. “CRC Handbook of Chemistry and Physics”, 56th ed.;CRC Press: Cleveland, OH, 1975; p D-151. Wells, A. F. “Structural Inorganic Chemistry”, 3rd ed.; Oxford University Press: London, 1962; p 337. WesterfeM, W. W.; Lowe, C. J. 6/01.Chem. 1942, 745, 463-470.
Received for reuiew May 10, 1982 Reuised manuscript received October 20, 1982 Accepted January 17, 1983
Fault Detection and Diagnosis via Parameter Estimation in Lumped Dynamic Systems Sunwon Park’ and David M. Himmelblau’ Department of Chemical Enginewing, The Universky of Texas at Austin, Austin, Texas 78772
Fault detection and diagnosis are Wustrated by application of on-line parameter estimation techniques to a nonlinear chemical reactor. An extended Kalman filtering algorithm was successfully used to estimate both the states and time varylng parameters corresponding to faults in a lumped parameter system. Trajectories of parameter estimates were compared with predetermined confidence limits established under normal operating conditions. The results clearly indicate the feasibility of on-line fa& detection as well as diagnosis, that is, discrimination among two faults occurring approximately at the same time.
Introduction Malfunctions of plant equipment and instrumentation are relevant from many viewpoints. They increase the operating costs of any plant. Even more serious are the potential consequences of a gross accident, such as an explosion, because of faulty design or operation (Farmer, 1971). Among the objectives of early detection of malfunctions are the prevention of sudden failure of equipment, the collection of higher-grade information to guide operations, the improved planning of maintenance, and the achievement of more highly automated plants. The terms fault, failure, and malfunction have many connotations in the literature as well as in general usage. We will use the words fault and malfunction in relation to equipment as synonyms to designate the departure from an acceptable range of an observed variable or calculated parameter associated with the equipment. As examples, the response of a vessel might differ in some measurable way from a normal response, or the system transfer function might not be the same as that specified by the Celanese Chemical Co., Bay City, TX. 0196-4305/83/1122-0482$01.50/0
designer. Figure 1 illustrates different circumstances corresponding to different ranges of acceptable performance. Clearly, the prescription of the boundaries to delineate a fault is a subjective task, and even after the boundaries are established, the classification of faulty vs. nonfaulty is by no means clear-cut if the stochastic aspects of classification are taken into account. Thus, the definition of a fault depends on the characteristics of the process chosen for measurement (or calculation), their acceptable range(s), and the accuracy of the statistic used for classification of a potential fault (Willsky, 1976). A fault implies a certain degradation of performance. Failure, on the other hand, will refer to complete inoperability of equipment or the process. That is, the equipment or an instrument will lack the capability of carrying out its specified function. As an example, a sensor may be faulty and become completely nonoperational (“hard-over”failure). Or,it may simply suffer degradation in performance leading to a bias or increased inaccuracies, which may be modeled as an increase in the sensor standard deviation. In the latter case, estimates of the bias or the increase in standard deviation, may allow you to continue to use the sensor, although in a degraded mode, 0 1983 American Chemical Society
Ind. Eng. Chem. Process Des. Dev., Vol. 22, No. 3, 1983 483
1
DANGEROUS O P E R A T I O N
~
FAULTY A -
- - -
!I---ACCEPTABLE
:I
REGION OF ACCEPTANCE
REGION
" I LIMIT FOR SHUTDOWN
L TIME
~
DANGEROUS O P E R A T I O N
L C O E F F I C I E N T A1
Figure 1. Acceptable performance range and definition of fault.
so that we would categorize the sensor as faulty but not failing. Most chemical processes are sufficiently flexible and well organized that as soon as a fault shows up in any subsystem, the system compensates for the fault so as to continue operation. Thus, a fault will not necessarily be a failure (Bellingham and Lees, 1977). For those instances in which more than one cause of a fault can occur, we would like to carry out a f a u l t diagnosis, that is to determine (after detection of the occurrence of a fault) the aspect of the equipment, or portion thereof, that is causing the fault(s). Diagnosis is the determination of which of the subsystems, or the environment, is violating its given sufficient conditions for satisfying the process performance specifications. Because of the interaction between process components, it can be very difficult to isolate causes that occur in complex systems. An engineer wants to obtain the maximum possible degree of discrimination using the available test data, with the least amount of computation. However, if the parameters used to classify the equipment among the possible faulty states are not unique, then there is no hope of determining which element among several is the source of the malfunction. See Bellman and Astrom (1970). In addition to the problem of uniqueness, a satisfactory scheme for fault diagnosis has to be able to cope with the presence of noise in the measurements, as well as with the phenomenon of parameter drift. Thus, fault diagnosis is a statistical decision problem much like signal detection. Diagnostic resolution (distinguishability, detectability) refers to the precision with which a single fault can be identified for the case in which multiple faults occur. Applications of fault detection via parameter estimation are more common in systems in which the parameters are treated as time-invariant. Willsky (1976) gives a general survey of modern techniques. However, in many instances one finds that certain parameters change with time as the system is operating because of the occurrence of one or more factors that lead to degradation of performance. Consequently, continuous monitoring and rapid detection and diagnosis of the cause of the fault would be desirable. We pose the problem of fault detection and diagnosis in nonlinear lumped parameter (well-mixed) systems as an extension of real time estimation of time varying parameters in the system equations. Although detecting faulty operation alone may, or may not, be detected from examination of the measured process outputs, diagnosis of the cause of the degradation in performance is rarely possible from direct inspection of such measurements if more than one fault occurs at a time. A continuous stirred tank reactor with a heat exchanger is used to illustrate the procedure and the application. The extended Kalman filtering algorithm (Kalman, 1960; Kalman and Bucy, 1961),a summary of which appears in the Appendix, is used to estimate both the states (the dependent variables) and the time varying parameters. By
extended filtering one means processing the measurements to achieve estimates of the dependent variables and the parameters in the process model. Parameter estimates are tracked and compared with predetermined confidence limits for the parameters established under normal operating conditions. Because all of the assumed fault modes can be tied directly with changes in the associated model parameter values, fault detection and diagnosis can be achieved simultaneously via parameter estimation. Parameter Estimation Procedure A lumped nonlinear system can be described by the following general set of stochastic differential equations X ( t ) = f(x(t), t ) w(t) (1) where x is the n-dimensional state vector, f is a nonlinear function of the state x ( t ) ,and w ( t )represents additive Gaussian noise that has an expected value of zero and a covariance matrix (matrix of variances and covariances) Q ( t ) . Q ( t ) is a measure of the uncertainty in the model itself. We want to estimate x ( t ) from sampled nonlinear measurements of the form Zk = hk(X(tk))+ Vk ( k 1, 2, ...) (2) where z k is an n-dimensional vector of observations, hk is a vector that may change from sampling time tk to another sampling time, and v k represents random Gaussian variables that have an expected value of zero and an associated covariance matrix Rk. Rk is a measure of the uncertainty in the measurements. Equations 1 and 2 constitute a class of estimation problems for nonlinear systems having continuous dynamics and discrete-time measurements. The Appendix lists the equations forming the extended Kalman filter to estimate the state vector. Estimation of the unknown time varying parameters can be made part of the overall state estimation procedure as follows. The system equations are of the form X d t ) = f ( x s ( t ) ,4 t h t ) (3) where x s ( t ) is a vector of n state variables and a ( t )is a vector of p parameters. Here the subscript "S" is used to distinguish the state vector from the augmented state vector which is described next. Both B ( t )and x s ( t ) are to be estimated in real-time from measurements z k that include noise and are given by the expression (k = 1, 2, ...) (4) Zk = Hsxs(t) Vk where z k is a vector of m measurements and Hs is a (nxm) matrix that gives the deterministic relations between x and z , and vk is a vector whose elements are random Gaussian noise with zero mean and associated covariance matrix R. The exact nature of the time variation of the parameters is usually unknown in fault detection and diagnosis problems. One useful method of accounting for the time variation of the parameters is to assume that the time variation in a parameter value is random in nature &(t) = W I ( t ) (5) where wl(t) is Gaussian noise with zero mean and covariance matrix Q1. If a ( t )and x s ( t ) are combined into an augmented state vector x ( t ) with the dimensions ( n + p ) , the combined equations for the system become
+
+
484
Ind. Eng. Chem. Process Des. Dev., Vol. 22, No. 3, 1983
C,V(dT/dt) = CpFCTi - T ) - AHVKOCA~exp(-E,/RT) - AoU(T- T,) (10)
Figure 2. A continuous stirred tank reactor with a heat exchanger.
The equations of the system dynamics (6) and observations (7) have the same form as the general nonlinear system equations (1)and (2); hence the extended Kalman filtering algorithm or any other effective filtering technique can be applied to achieve fault detection if Q1 is suitably determined. A rough criterion that may be used to select the elements in the covariance matrix of the process noise is that the size of the matrix elements of the noise should correspond roughly to the possible range of parameter variation. For example, if it is known that the ith element of a is likely to change by an amount of Aai over the interval of interest, At, then from Gelb (1974), the ith diagonal element of Q1 N Aa?/At. The elements of the covariance matrix for parameter estimation with time varying parameters should be seleded considering the following factors: (1)the mode of parameter variation (i.e., whether it changes rapidly or slowly); (2) the output sensitivity to the change in the parameter values; and (3) the tradeoff between the increased fluctuations in the response vs. fast convergence of the response to the change in the parameter value. High values of the diagonal elements of Q yield as fast convergence, and vice versa. Because of the complicated nature of these factors, in practice the elements of the Q matrix are usually selected by carrying out many trial runs with different values of the elements of Q to determine the optimum values of the elementa of Q matrix for the particular process model. Application: Real Time Parameter Estimation for Fault Diagnosis of a Nonlinear Well-Stirred Tank Reactor Process Treated. A continuous stirred (well-mixed tank) reactor with a heat exchanger provides a good example application of the procedure. Figure 2 shows a schematic diagram. Constant volume and ideal mixing are assumed. A second order elementary reaction occurs in the reactor: 2A B. I t is assumed that the cooling medium is steam condensate at a constant temperature, T,, of 400 K, that the density and heat capacity of the reactor inlet stream are equal to those of the outlet stream, and that the reaction rate constant is assumed to follow the Arrhenius form
-
K = KOexp(-E,/RT) (8) where KO= frequency factor (8.43 X lo8 m3/kg-mol/s), E, = activation energy of the reaction (1.040 X 108J/kg-mol), R = gas constant (2.568 X lo3 J/kg-mol K), and T = reaction temperature (K). With these assumptions, a mass balance on the reactant is VdCA/dt = FCAi - FCA = ~ V K O Cexp(-EJRT) A~ (9) where V = reactor volume (8.495 m3), CAi = inlet concentration of reactant (9.61 kg-mol/m3), CA = outlet concentration of reactant (kg-mol/m3), and F = inlet, outlet volume flow rate (0.00850 m3/s). The following differential equation represents the energy balance for this process
where pC, = volumetric heat capacity (398.1 J/m3), AH = heat of reaction (-9.30 X lo7 J/kg-mol), A. = heat exchanger surface area (46.5 m9, Ti = inlet temperature (400 K), T, = coolant temperature (400 K), and U = overall heat transfer coefficient (926 J/(m2 s K)). Possible Malfunctions Considered. Two types of fault modes were considered: (1)a decrease in the value of the heat transfer coefficient, U , caused by fouling; (2) a change in the flow rate, F , caused by partial pipe plugging. We assume the only available measurements are the outlet temperature and concentration. Estimation Problem Formulation. Then the augmented state vector is
x = (CA, T, F , U T
(11)
and the system equations are
dT _ dt
(13) dF/dt = WF(t)
(14)
dU/dt = wu(t)
(15)
where W F and wu each are uncorrelated Gaussian noise having expected values of zero and variances of uF2and uu2, respectively. We will let the initial conditions be CA = C h , T = T,, F = F,, and U = Us,where the subscript “s” means normal steady state operating value. In this example only the outlet temperature and concentration are assumed to be measured, and the measurements are assumed to contain additive Gaussian random noise, vk. Thus, the measurement equation is zk = HXk vk (16)
+
where v k is a vector whose elements have zero expected value and an associated covariance matrix R. It is convenient to write eq 12-16 in terms of dimensionless variables and parameters using normalized deviation variables, that is, in terms of deviations from the normal operating point divided by the normal value. Let the normalized deviation variables be defined as CA-CA,
’
T - T , F - F , U - U, T, ’ F, ’ U, [x1*,
xz*,
x3*,
%*IT
where CAS= 3.84 kg-mol/m3, T, = 429.5 K, F, = 0.00850 m3/s, and U, = 926 J/(m3 s K). Then the dimensionless equations are dxi*/dt = C1(~3*+ 1)- C , ( X ~ + * 1) ( X I * + 1) C3(x1* + 1)2exp(-C6/(x2* + 1)) (17) d~,*/dt=
+ 1)- C 2 ( ~ 3 *+ 1) ( ~ 2 * + 1)Cs(Xl* + 1)2exp(-C,/(x,* + 1)) (18)
C4(~3*
Ind. Eng. Chem. Process Des. Dev., Vol. 22, No. 3, 1983 485 0.1
COVARIANCE MATRIX O F MEASUREMENT ERROR
2
P < 5 0.0 0 Y
EXACT VALUE MEASURED VALUE ESTIMATED VALUE
B -0.4
I
I
I
I
0
I
1
1
I
1
1
I
I
I
1 do0
5 0
I
I
I
I
I
I
2000
1500
TIME. S E C
Figure 3. Exact, estimated, and measured concentration changes all essentially fall on the same line.
Simulation of Faults. A ramp decrease in flow rate and simultaneous ramp change in the heat transfer coefficient (see the straight line segments in Figure 4) were introduced as simulated faults. dx,*/dt = 0.2 (-u(t - 300) + ~ (- t320)) (23) dxd*/dt = 0.001 (-u(t - 200)
+~
(- t500)) (24)
where u means unit step function. Simulation of Experimental Measurements. Measurement noises u1 and u2 (having variances R1and R2 on the main diagonal of R and zero elements off the main diagonal) were generated and added to xl* and x2* as shown in eq 22a and 22b, respectively, to get the simulated measurement data zl* and z2*. We examine three cases for different R
T
x;
0.0
5 Y
n
-0.1
W
: 2
-0.2
U
n
9
-0.3
< 2 I -0.4 Y I
-0.5
0
'
'
I
1
5 0
I
'
I
I
I
1ob0
I
I
I
I 1600
'
I
I
'
2000
TIME, S E C
,
r4 x 1 0 - 6
Figure 4. Estimated parameter deviations compared with the true ones.
1
0
Fault Detection Results. An extended Kalman filter code was used to estimate the states CA and T and the parameters F and U. A trade-off exists between the sensitivity of the filter to the changes in the parameter values and the degradation of the filter performance at normal operating conditions. If small values for the elements of the Q matrix were used, the filter was insensitive to new measurements hence there was a long delay before the estimate responded to changes in the true parameters, but the amplitude of the fluctuations of the estimates around the true parameter value was small. On the other hand, when the elements of the Q matrix were large, the response of the filter was fast but sensitive to the measurement noise; hence the amplitude of the fluctuations of the estimates responded rapidly to changes in the trajectories of the true parameter values. After many trial runs with different seta of values of elements of Q, the elements of Q were fixed as follows 0
0
0 0
0
0 4x100
o 4
1
x 10-5
to give a reasonably fast response for fault detection. Because the process was assumed to be initially at its normal steady-state operating conditions, the variance of
each state estimate at the normal steady-state operating condition was used as the diagonal element of the covariance matrix of the initial conditions. When R1= R2 = 4 X lo4, the elements in R correspond to a standard deviation of 0.002 in (CA- cA,)/chand (T - T,)/Ts(in absolute terms roughly 0.0077 kg-mol/m3 in C and 0.86 A°C the corresponding standard in T). When R1= R2 = deviation is 0.01 in (CA- C A s ) / C A s and (T - Ta)/T,,Le., a factor of 5 greater. As expected, the trajectories of the two state variables (of concentration and temperature) were predicted quite closely in all cases; examine Figure 3 for concentration. We omit comparisons with the exact values here inasmuch as we are primarily interested in the trajectories of the (dimensionless) flow rate and heat transfer coefficients. Figures 4,5, and 6 compare the true values and the estimates (denoted by superscript caret) of both (U - U s ) / U sand ( F - F,)/F,when both F and U were simultaneously changed. The dispersion of the estimates increases as the level of measurement noise increases, and decreases as the level of measurement noise decreases, as expected. It is clear that simultaneous changes of two parameters can be discriminated by estimation if the proper criterion for discrimination is selected. Confidence Limits for the Parameters under Normal Operating Conditions. For the CSTR example it was assumed first that U and F were constants at the normal operating conditions in the steady state. Variances of z3*and f 4 * at the normal conditions were determined by processing the simulated measurement data during the steady state on the extended Kalman filter to get
486
Ind. Eng. Chem. Process Des. Dev., Vol. 22, No. 3, 1983
z
Var{f,*}= 1.3 x Var{P,*}= 5.0 x Var{$,*j= 2.4 x Var{P,*}= 9 . 4 x
when R =
}
when
R=
4x
:o-d
1
2 5 Y
L
a
; -0.1
\ll
Y
z
< a
If the distribution of a sample statistic is not known, as is the case for g3* and f4* here, a confidence interval for any random variable X can be specified by using the Chebyshev inequality (Himmelblau, 1970), namely the concept that the probability is at least 1 - (1/h2) of obtaining a standardized variable of value equal to or less than a number h, that is
?
0.0
I-
-0.2
Y 0
N < -0.3
0 -0.4 70100
TIME, SEC
Figure 5. Estimated parameter deviations compared with the true ones when F and U were simultaneously decreased 20% and 30'70, respectively.
For example, with h = 3, at least 1 - (1/3)2 = 89% of the occurrence of the random variable X should lie within p, f 3a, no matter what the distribution of X is. The 0.89 confidence limits for normal operation calculated by Chebyshev's rule are listed in Table I. Fault Detection and Diagnosis. Parameter estimates can be updated recursively as each set of data becomes available by the procedure shown above and compared with predetermined confidence limits established under normal operating conditions. Whenever one of the estimates of the parameters corresponding to the presumed faults exceeds the confidence limits for the parameter based on normal operating conditions, we may say that a fault is detected. In this example, because a one t o one correspondence exists between the parameters and the different fault modes, fault detection and diagnosis can be achieved simultaneously. If a one to one correspondence does not exist, the diagnosis of the specific cause of the fault is more complex. Figures 7 and 8 show the trajectories of the dimensionless estimated flow rate and heat transfer coefficients on which have been superimposed the 89% confidence limits. When R1 = R2 = 4 X lo4, for these confidence limits we can discriminate between normal and abnormal values of F (which began to change a t 300 s) at 360 s and for U (which began to change at 200 s) at 520 s. If we had used a lower probability of 0.75 corresponding to f 2 standard deviations we can diagnose the fault in F at 340 s and U at 500 s. However, for large (0.038 kg-mol/m3 in C and 4.3 A°C in T ) measurement error when R1 = R2 = with the 0.89 probability, a change in F cannot be detected until about 750 s and a change in U until 1150 s. In this simulation study we compared the results of the simulations with only a single set of measurements for each level of measurement noise. To get a better indication of the times at which faults would be detected with respect to different levels of measurement noise, one should average a large number of simulation results. Once we know the uncertainty associated with the model and the process noise, we can determine the necessary precision of measurements for satisfactory fault diagnosis for a particular application by computer simulation using different levels of measurement precision. There exists a trade-off between the benefits of early fault detection and diagnosis and the cost of instrumentation that must accompany different levels of precision.
-1 . E . I -
T
in z
P5
0.0
Y
a
: w
-0.1
<