Application of neural networks for gross error detection - American

1993, 32, 1651-1657. 1651. PROCESS DESIGN AND CONTROL. Application of Neural Networks for Gross Error Detection. Gagan Gupta and Shankar ...
0 downloads 0 Views 862KB Size
Ind. Eng. Chem. Res. 1993,32, 1651-1657

1651

PROCESS DESIGN AND CONTROL Application of Neural Networks for Gross Error Detection Gagan Gupta and Shankar Narasimhan' Department of Chemical Engineering, 1.1.2'. Kanpur,

U.P., India 208016

The problem of detecting gross errors in measurements arising from faulty sensors is an important one in operating chemical plants. The problem has applications in modeling, control, optimization and maintenance. Traditionally statistical methods have been used for this purpose. In this study, we explore the use of artificial neural networks (ANN) for solving this problem. Using Monte Carlo simulation, we address the following issues in applying ANNs for gross error detection: (a) type of input/output patterns used for training and their preprocessing; (b) parameters that affect performance; (c) strategy used for detecting multiple gross errors. We compare the performance of the ANN with that of statistical methods on a practical example and show that ANNs offer a competing alternative method for gross error detection.

Introduction The problem of obtaining reliable, accurate, and consistent estimates of process variables from measured data (data reconciliation) and the associated problem of detecting gross errors in measurements arising due to sensor biases (gross error detection) has grown in importance during the past 30 years (Mah, 1990). Currently several packages are available for these purposes and are being applied widely in chemical plants, especially refineries. Typically, constrained nonlinear optimization techniques combined with statistical methods are used to solve these problems. They have the disadvantage of being computationally intensive and cannot therefore be used in applications where results are required within minutes, such as for on-line applications. Furthermore, the theory on which gross error detection methods are based is valid only for linear processes (involving only mass flow constraints) and may not give good results for nonlinear processes (involving energy and multicomponent constraints). As an alternative method, we explore the use of artificial neural networks (ANNs) for gross error detection. During the past few years neural networks have been used to solve a wide variety of problems in science and engineering. In chemical engineering, ANNs have been used for fault diagnosis (Venkatasubramanian et al., 1990; Hoskins and Himmelblau, 1988)and process control (Bhat and McAvoy, 1990; Ungar, 1990). Although gross error detection is similar to fault diagnosis, there has not yet been an attempt to solve it using ANNs. A further feature of the studies on the use of ANNs for fault diagnosis is that the focus has been more on training the ANN and studying different multilayer perceptron structures. Relatively less effort has been devoted to comprehensively study the predictive (or generalization) capabilities of the ANN. In this work, our primary objective is to demonstrate how an ANN can be used for gross error detection and how its performance can be controlled. We also conduct a detailed study of the predictive capabilities of the trained ANN when presented with new data both when gross errors are absent and when one or more gross errors are present.

* To whom correspondence should be addressed.

For each test case 1000 different data sets containing random noise are used to assess the performance of the ANN. The issues which we have investigated are (a) type of data to be used as input, (b) type of training and magnitude of the gross error to be used in training, (c) factors that can be used to control the performance, and (d) strategies used for detecting multiple gross errors. On the basis of the simulation studies, the above issues are resolved. Finally, we compare the results of this study with those obtained by earlier statistical techniques to demonstrate that ANNs are a competing alternative technique for gross error detection. The scope of this study is limited to the use of single hidden layer perception model of the ANN for detecting gross errors in linearly constrained processes.

Gross Error Detection Problem In general, measurements of variables made in a plant are always corrupted by random errors. These random errors are assumed to be normally distributed with mean zero and covariance matrix Q. In addition to random errors, measurements may also contain gross errors (biases) due to faulty sensors. The measurement model can thus be described by y=x+t+6 (1) where y is the m vector of measurements, x is the vector of true values of variables, t is the vector of random errors, and 6 is the vector of biases. If the measurements do not contain any gross error, then 6 is identically equal to zero. In a steady-state process, the true values of variables are expected to satisfy mass and energy balance constraints. If we consider only overall material balance constraints of a process, then the constraints are linear and can be written as Ax=Q (2) where A is a m X n constraint matrix. However, because of the presence of random errors and/ or gross errors, the measurements do not satisfy the constraints and give rise to imbalances or constraint residuals which are defined by r=Ay (3) The objective of gross error detection is to identify the

0888-5885/93/2632-1651$04.00/00 1993 American Chemical Society

1652 Ind. Eng. Chem. Res., Vol. 32, No. 8, 1993 t

t

t

outputs Constraint

I

I

1

scaling

residuals

Hidden Layer

in the hidden layer depends on the number of linear hyperplanes required to separate the decision regions as well as the size and distribution of the training patterns. To train the ANN, a set of input patterns (with known or desired outputs) are fed in succession and the weights are adjusted iteratively such that the outputs predicted by the ANN is sufficiently close to the desired outputs. The best-known algorithm used for training is the backpropagation algorithm (Rumelhart et al., 1986).

Input Layer

I

Non Dimensionalise using standard deviation of residuals

Output Layer

Inputs

Figure 1. Multilayer perceptron ANN.

Application of ANN to Gross Error Detection

Id2

/

Figure 2. Representation of a node j in layer k.

measurements that contain gross errors so that they can be removed. Many gross error detection schemes based on statistical tests have been developed (Mah, 1990). All of them exploit constraint residuals in some form or the other for this purpose.

Basic Features of ANN We briefly describe the essential features of a multilayer perceptron ANN. A more detailed description is available in Venkatasubramanian et al. (1990). A multilayer perceptron consists of layers of interconnected processing elements (nodes or neurons), as shown in Figure 1. Typically, there is one input and one output layer with one or more hidden layers between them. The nodes in the input layer simply transmit the inputs fed to them. The inputs, I!, to the nodes in the hidden and output layers are given by Nk-1

(4) where w;:' is the weight (or strength) of interconnection between node i in the layer ( k - 1)to node j in the layer k,Op-' is the output of node i in the layer ( k - 11, and Nk-' is the number of nodes in layer ( k - 1). The corresponding outputs from these nodes are given by 0; = 1/(1+ exp(-@)) (5) where p is a constant. Figure 2 shows a schematic representation of a typical neuron. The number of nodes in the input layer is equal to the number of input variables, while the number of nodes in the output layer is fixed by the type of predictions required of the ANN. The number of hidden layers depend on whether the regions in the input space corresponding to different decisions are linearly separable, convex, or nonconvex (Lippman, 1987), while the number of nodes

Input Data. Our objective is to develop a neural network which can detect and identify the measurements that may contain gross errors, when the input data contains zero, one or more gross errors. Generally, in fault diagnosis and most other applications, measurements of the variables are used as inputs to the ANN (Venkatasubramanian et al., 1990; Hoskins & Himmelblau, 1988). However, in addition to the measurements, the constraints of the process provide additional information which can be effectively used in gross error detection. To utilize this information, we use constraint residuals as input data. The followingadvantages are obtained by using constraint residuals as inputs: (i) Constraint residuals inherently contain the information about relationships between the different variables. (ii) Typically, the number of constraints for any process is much less than the number of variables and thus there is a reduction in the number of input variables to be handled by the ANN. (iii) While the measurements depend on the true steadystate values of the variables, the constraint residuals are independent of the true steady-state values. From eqs 1-3 we easily see that r = Ax + Ac + AS =Ac+AS (6) Equation 6 shows that the constraint residuals do not depend on the true steady-state values. Preprocessing. The input data fed to the ANN have to be nondimensionalized because in a process different types of constraints might be present (for instance, mass and energy balances). Moreover, these inputs should be between *5.0, since the nature of the sigmoid function (eq 5) is such that the outputs of a node are insensitive if the inputs are greater than 5.0 or less than -5.0. We nondimensionalize the inputs by using the standard deviations of the constraint residuals. The standard deviation ui, of the ith constraint residual is given by (7)

where aj is the standard deviation of measurement error tj.

The magnitudes of the nondimensionalized constraint residuals may still be greater than 5.0. Therefore, they are all premultiplied by an appropriate scaling factor, such that their magnitudes are below 5.0. Figure 3 shows the preprocessing of input data schematically.

Ind. Eng. Chem. Res., Vol. 32, No. 8, 1993 1653

Training Patterns. In deciding the number and type of training patterns to be used, the following questions are addressed. (1)Should inputloutputpatternsfor every combination of gross errors present in the measurements be used i n training? If training patterns corresponding to every possible combination of gross error locations are used, the number of training patterns will be 2n, where n is the number of measurements. Thus it will not be possible to train the ANN in any reasonable length of time. Therefore, in our work, the training set consists of patterns corresponding to the no-gross-error case and those corresponding to a gross error in each of the variables. It is hoped that even though the ANN is trained to recognize only one gross error, it may still be able to correctly predict the existence of multiple gross errors. It should be noted that a similar philosophy is used in statistical methods for gross error detection, where the theory is developed for single gross errors but applied for multiple gross error prediction. (2) Should both negative and positive magnitudes of gross errors be used in training? The ANN should be capable of recognizing both positive and negative gross errors. However, since we are interested in predicting the existence and location of an error and not its sign, we can use absolute values of the constraint residuals as inputs to the ANN. Since the absolute value of the constraint residuals will be the same regardless of the sign of the gross error, training patterns corresponding to positive magnitudes of gross errors need only be used for training. The disadvantage of using absolute values of constraint residuals is that it may adversely affect the performance of the ANN, because the sign or the residuals also provide valuable information concerning the location of a gross error. In our work, we have used absolute values of constraint residuals as inputs, because patterns corresponding to positive magnitudes of gross errors need only be used for training. This reduces the number of training patterns and enables faster training. We present results later to show that the decrease in the performance of the ANN due to this choice is offset by the advantage gained in training the ANN quickly. (3) W h a t should be the magnitudes ofgross errors used in training? A choice has to be made regarding the magnitude of gross error used in the training patterns. One possibility is to have a fixed magnitude of gross error for all measurements (homogeneous training). Alternatively, the gross error magnitude can be chosen to be a percentage of the measurement value, which would imply that the gross error magnitude used in training would be different for different measurements (heterogeneous training). Since, it is difficult to predict a priori which type of training will yield better results, both methods are used in our work and the results compared. Ideally, the ANN should be capable of detecting gross errors in measurements which exceed a certain magnitude. In previous work on fault diagnosis (Venkatasubramanian et al., 19901, the ANN was trained for two different magnitudes for each fault, and it was shown that any intermediate level of fault could be easily detected by the ANN. To reduce the number of training patterns, we use only one magnitude for each gross error. However, the generalization capability of the ANN for other gross errors magnitudes is also studied. On the basis of all of the above considerations, it follows that the number of training patterns is equal to n + 1. Output Values. The number of neurons in the output layer is set equal to the number of measurements. Each

output neuron corresponds to a particular measurement and a gross error in a certain measurement is indicated by the corresponding output neuron having a value of 1. Conversely, an output value of 0 indicates that the corresponding measurement does not have any gross error. Since the output from a neuron can never be exactly 0 or 1, a convergence limit of 0.2 is used during training, that is, any value less than 0.2 is assumed to represent 0 and any value greater than 0.8 to be 1. In summary, the above choice of inputloutput data fixes the number of input nodes to be equal tom, and the number of output nodes to be equal to n. The number of nodes in the hidden layer is also taken to be equal to n, since this has been found to be satisfactory for rapid training of the ANN in our application. Training the ANN. The basic data required for training (and for testing) the ANN are the measured values of all the variables. We have obtained these through simulation. To simulate these data, true values of the variables, standard deviations of random error in the measurement, magnitude, and location of the gross error (if any) have to be specified. The simulation procedure for generating the data is described clearly in Iordache et al. (1985). For the training of the ANN, deterministic data were used, that is, without random noise. In practical applications, if actual operating data corresponding to different patterns are available, then they should be used for training. In the absence of such information, simulated training patterns as described here may be used.

Prediction Mode Once the ANN has been trained, its generalization or predictive capabilities are evaluated by using test data. The test data are also obtained through simulation as described earlier. The features of the test data are as follows: (1) Random noise was added to the measurements according to the specified standard deviations of measurement errors. (2) A specified number of gross errors are generated in each set of measurements, but the positions of the gross errors were randomly chosen. (3) The gross error magnitudes were randomly chosen between 4 and 40 times the standard deviation of measurement errors. For each chosen set of conditions, 1000 measurement sets were generated and used to evaluate the performance of the ANN. Before we use the ANN as a predictive tool, two more issues have to be resolved. First, an unambiguous decision criterion has to be chosen, which interprets the values of the output neurons appropriately and makes a prediction. For this purpose a threshold value, k, was chosen, and a gross error in a measurement is predicted if and only if the corresponding output neuron value exceeds this threshold. Second, since the measured data may contain more than one gross error, a strategy for the prediction of multiple gross errors has to be evolved. Two different strategies are proposed below for this purpose. Simultaneous Strategy. A straightforward strategy for detecting multiple gross errors is to predict gross errors in all those measurements whose corresponding output neurons have values greater than the threshold value. Thus, such a criterion requires only one pass through the ANN and predicts gross errors in the measurements, all at one time. Serial Strategy. A serial strategy predicts gross errors in measurements one by one. Two variants of a serial strategy are used in our work, as described below. The

1654 Ind. Eng. Chem. Res., Vol. 32, No. 8, 1993

significant feature of these strategies is that the structure of the ANN remains intact and the same trained ANN can be used over and over again. Serial Compensation Strategy. In the serial compensation strategy proposed by Narasimhan and Mah (1987),if a gross error is predicted in some measurement, then an estimate of the gross error is computed and subtracted from that measurement. The compensated measurements are used for further detection of gross errors. The estimate of a gross error in measurement i is given by (Narasimhan and Mah, 1987) b=

(AeJT(AQAT)-'r (Aei)T(AQAT)-'(Aei)

The algorithm for using serial compensation in combination with the ANN is as follows: Step 1: If all output values are less than threshold value, then no more gross errors are predicted. Otherwise, the highest output value is chosen and a gross error is predicted in the corresponding measurement. Step 2 The estimate of the gross error is computed using eq 8, and the compensated measurements are given by y* = y - bei

(9) Step 3 The updated constraint residuals (compensated residuals) are given by r* = r - bAe,

(10)

The compensated residuals are nondimensionalized and scaled as before and fed to the ANN again. Step 4: The whole procedure is repeated until no more gross errors are predicted. In the implementation it is also ensured that once a gross error in some measurement is predicted, then it is flagged so that more gross errors in the same measurement are not predicted repeatedly for the same data set. Serial Replacement Strategy. We note from eq 8 that a serial compensation strategy makes use of the process model in order to estimate the magnitude of a gross error. The error in the estimated magnitude or in the process model itself can adversely affect the further detection of gross errors. We propose a slightly different variation of the serial strategy which obviates the need for estimating the magnitudes of gross errors. In this strategy, we assume that estimates of the true values of measured variables have been provided a priori by the user. For example, these estimates could be the reconciled values obtained by applying data reconciliation (Mah, 1990) to measurements of a previous time period. Gross errors are detected serially as described in the serial compensation technique, except that, instead of compensating the measurement in which a gross error has been identified, the measurement itself is replaced by the estimate provided by the user.

Results and Discussion We now describe, through an example, how an ANN can be adapted for gross-error detection and also highlight the key issues that influence its training and performance. We choose the process network shown in Figure 4, which has been well studied using statistical methods (Serth and Heenan, 1986). Table I shows the true flow rates of the various streams and Table I1 gives the constraint matrix A for this process. The standard deviations of measurement errors were chosen to be 2.5% of the true flow rates.

Table I. Flow Rates of Streams in Steam-Metering Network flow rate flow rate stream (tons/h) stream (tondh) 0.86 15 60.00 16 1.00 23.64 17 32.73 111.82 18 109.95 16.23 19 7.95 53.27 112.27 20 10.50 21 2.32 87.27 22 5.45 164.05 23 2.59 0.86 24 10 52.41 46.64 11 25 14.86 85.45 12 26 81.32 67.27 13 111.27 27 70.77 14 28 91.86 72.23 ~~~~~~

Table I11 summarizes the structure of the ANN and the number of training patterns for the process used in our case study. Performance Measures. The performance of the ANN is evaluated using measures that have been proposed earlier (Rosenberg et al., 1987). When the input data does not contain any gross error, the performance of the ANN can be characterized by the average number of type I errors (AVTI),which represent the average number of mispredictions made by the ANN for every data set. When the input data contain one or more gross errors, then the two measures which adequately characterize the performancearepower andselectivity of the ANN. Power represents the proportion of gross errors in the data that are correctly detected and identified, and selectivity is the ratio of the number of gross errors correctly identified to the total number of gross errors predicted by the ANN. Controllingthe Performance of the ANN. Generally a chemical plant operates without gross errors for most periods. Therefore, it becomes imperative that the ANN makes as few gross error mispredictions as possible during these periods (a low value of AVTI). However, the ANN should also have the ability to correctly predict gross errors when they are present (a high value of power). These two issues are interrelated. For instance, if AVTI is lowered by appropriate manipulation of a parameter, then it will also lower the power of the ANN. Hence some performance

Ind. Eng. Chem. Res., Vol. 32, No. 8, 1993 1655 Table 11. Conrtraint Matrix for Steam Metering Network atream n o d e 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 1 1 -1 1 -1 -1 1 1 -1 2 1 -1 3 -1 1 1 -1 4 -1 1 -1 -1 -1 -1 1 5 -1 1 -1 6 1 1 -1 -1 -1 -1 7 1 -1 1 -1 -1 8 1 -1 -1 1 9 1 1 -1 1 10 1 1 1 -1 11 -1 1 1 1 1 1 -1 -1 12

Y

Table 111. Structure of A N N Used in Caw Study no. of input nodes 11 no. of output nodes 28 no. of hidden layers 1 28 no. nodes in hidden layer no. training patterns 29 Table IV. Effect of Training on ANN Performance no. of type of training scaling gross training magnitude factor errors power selectivity heterogeneous 10% 0.09 1 0.16 0.48 3 0.06 0.36 homogeneous 10T/h 0.16 1 0.61 0.83 3 0.49 0.72 homogeneous 4T/h 0.07 1 0.62 0.69 3 0.53 0.63 homogeneous 20T/h 0.35 1 0.60 0.83 3 0.44 0.76

level has to be chosen, which represents a suitable tradeoff. We have chosen a value of AVTI of 0.1 as a suitable performance level, that is, if the ANN is fed with 10 data sets which do not contain gross errors, then on average it mispredicts a gross error in one of the measurements. This also serves as a convenient basis for comparing the performance of the ANN for different conditions. All comparisons are done after ensuring that the AVTI value is 0.1. In statistical hypotheses testing, the level of significance is used to control the probability of type I error. For an ANN we require some similar parameters in order to control the AVTI value. Two such convenient parameters are the threshold value and the scaling factor, y, used in preprocessing the input data. Figure 5 shows for an ANN the variation of AVTI with the threshold value and scaling factor. For this ANN, homogeneous training using a gross error magnitude of 10 tons/h was employed, and serial compensation strategy was used for multiple gross error detection. Figure 5 shows that as the threshold value increases, AVTI decreases, which is to be expected. The value of AVTI also decreases as the scaling factor decreases. This can be easily explained, when we consider that lowering the value of y will attenuate all the inputs (constraint residuals). Since a zero value of all constraint residuals corresponds to the absence of gross errors, a very low value of y implies that fewer mispredictions will be made by the ANN. Either of the above two parameters can be used for controlling the AVTI value. For all our subsequent results, we have kept the threshold fixed at 0.6 and varied y until we obtain an AVTI of 0.1. It should be noted that other combinations of the threshold value and scaling factor which can also give an AVTI value of 0.1 may also be used. Our experience in applying an ANN for this problem shows that the performance of the ANN

0.1

0.0

0.2

0.3

0.2

ta 0.1

0.6 0.8 1 .o K Figure 5. Variation of AVTI with scalingfactor and threshold value. 0.0

0.2

0.4

is not significantly affected by which particular combination of values is used. Effect of Training on Performance. The type of training, homogeneous or heterogeneous, significantly affects the performance of an ANN. The first two rows of Table IV show the performance of two ANNs, one of which was heterogeneously trained using a gross error magnitude of 10% of the flow rate, and the other which was homogeneously trained using a gross error magnitude of 10tons/h for every measurement. All the results of this table were obtained by using serial compensation strategy for the ANNs. Thus, the variation in performance is solely due to the training procedure. It is evident that homogeneous training results in much better power and selectivity than heterogeneous training. On closer examination, it was found that a heterogeneously trained ANN makes a considerable number of gross error mispredictions in measurements with low flow rates, such as 1,2,7,9,and 23. This is due to the fact that a heterogeneously trained ANN is taught to recognizegross errors of relatively smaller magnitudes in smaller flow rate measurements. Random noise in any of the larger flow rate measurements confounds the predictions of this ANN. On the other hand, a homogeneously trained ANN is not sensitive to gross errors of small magnitudes in low flow rate measurements and makes fewer mispredictions. In fact, the results showed that the homogeneously trained ANN does not detect gross errors in measurements 1,2, 7,9, and 23. Basically, the homogeneouslytrained ANN is able to give a better overall performance by sacrificing its ability to detect gross errors in low flow rate measurements.

1656 Ind. Eng. Chem. Res., Vol. 32, No. 8, 1993 Table V. Effect of Multiple Gross Error Detection Stratem on ANN Performance no. of scaling gross type of strategy factor errors power selectivity serial compensation strategy 0.16 1 0.61 0.83 ~~

serial replacement strategy, case 1

0.16

serial replacement strategy, case 2

0.16

serial replacement strategy, case 3

0.15

serial replacement strategy, case 4

0.14

simultaneous strategy

0.15

3 5 1 3 5 1 3 5 1 3 5 1 3 5 1 3 5

0.49 0.37 0.61 0.54 0.40 0.61 0.54 0.40 0.61 0.50 0.37 0.61 0.46 0.33 0.49 0.35 0.21

0.72 0.66 0.80 0.78 0.74 0.81 0.78 0.74 0.80 0.75 0.71 0.69 0.67 0.66 0.72 0.53 0.56

It was also observed that while homogeneous training required about 1400 iterations, heterogeneous training required about 6000 iterations. This is due to the fact that the input data for all nodes in homogeneous training is of the same order of magnitude, while that for heterogeneous training spans about three orders of magnitude (since the flow rates also span 3 orders of magnitude as observed from Table I). On the basis of the above results, we rule out heterogeneous training from further consideration. The training magnitude used in homogeneous training does not have a major impact on the performance. Rows 2-4 of Table IV present the results of ANNs trained using gross error magnitudesof 4,10, and 20 tons/h, respectively. A lower training magnitude gives a higher power, but this gain is nullified by a decrease in the selectivity. The implication of this result is that we can choose any convenient gross error magnitude to be used in training. All of our subsequent comparisons are made for ANNs trained homogeneously using a gross error magnitude of 10 tons/h. Effect of Multiple Gross Error Detection Strategy. The type of multiple gross error detection strategy employed affects the performance of an ANN significantly. Table V presents the results for several identically trained ANNs (homogeneously trained, using a magnitude of 10 tons/h), but which use different multiple gross error detection strategies. In Table V, the first row presents the results when the serial compensation technique is used. The next four rows give the results obtained by using a serial replacement strategy. The estimates of the variables used for replacing measurements in which gross errors are identified are as follows: Case 1: The true values of variables are used as estimates in this case. Although, true values of variables will, in general, not be available, the results for this case gives the theoretical performance limit of the ANN when a serial replacement strategy is used. Cases 2-4: In all these cases, reconciled values of variables are used as estimates. The reconciled values for case 2 are obtained by reconciling measurements that contain 2.5 % noise. For cases 3 and 4, the reconciledvalues are obtained, respectively, from measurements that contain, in addition to 2.5% noise, 3 and 7 gross errors of magnitudes between 4 and 40 times the standard deviation in measurement errors.

Table VI. Effect of Using Signs of Constraint &miduals on ANN Performance no. of typeof training scaling gross training magnitude factor errors power selectivity homogeneous 10T/h 0.16 1 0.61 0.84 3 5

0.56 0.50

0.86 0.84

It can be observed from the results of Table V that serial strategies give better power and selectivity than the simultaneous one and this difference gets amplified as the number of gross errors in the measurements increases. Comparing the serial compensation technique with that of the best serial replacement (case 1)strategy, we observe that due to inaccurate estimates of gross error magnitudes, serial compensation results in 5% less power and 6-10% less selectivity. As expected, we can observe from rows 2-5 that the performance of serial replacement decreases when the inaccuracy of estimates used for replacement increases. The results of serial replacement for case 2 are comparable and those of case 3 are worse than those of serial compensation. This indicates that serial replacement is better than serial compensation as long as the estimates used in serial replacement are obtained from “goodnmeasurements which contain few gross errors (less than four gross errors in our case). Effect of Using Absolute Values of Constraint Residuals. In all of the above simulations, absolute values of constraint residuals have been used as inputs to the ANN. As discussed in the section on the generation of training patterns, this was adopted primarily to reduce the number of training patterns and, hence, enable training to be fast. Since the signs of constraint residuals can also provide additional information on the location of gross errors, use of absolute values may have resulted in a reduction in the performance of an ANN. To evaluate this loss in performance, we also attempted to train an ANN homogeneously using normalized, scaled residuals as such. In this case, training patterns corresponding to both positive and negative magnitudes of gross errors in each of the measurements have to be used. Despite using a two-hidden layer ANN and training for 50 000 iterations, we could not achieve convergencein training. On the other hand, when training patterns corresponding to only positive magnitudes of gross errors in the measurements were used, we could train a single-hidden layer ANN as before in 2200 iteractions. This ANN was tested with 1000 simulated patterns as before which contains gross errors of positive magnitudes only between 4 and 40 times the standard deviation of measurement errors. Serial compensation strategy was used for multiple gross error detection. Table VI shows the results of this simulation. Comparing these results with that of a homogeneously trained ANN using absolute values of constraint residuals (row 1 of Table V), we find that we get an improvement of about 10% in power and about 15% in selectivity. Although, this improvement is significant and indicates that the signs of constraint residuals are useful in training, it will be of value only if it is possible to train the ANN for patterns containing both positive and negative magnitudes of gross errors in an acceptable length of time. Comparison of ANN Performance with Statistical Methods. Statistical methods for gross error detection have been in use for quite some time now. Of the various techniques available probably the best ones are based on the measurement test (Mah and Tamhane, 1982) and which use serial strategies. Two such methods are the

Ind. Eng. Chem. Res., Vol. 32, No. 8, 1993 1667 Table VII. Performance of GLR Method no. of gross

errors

power

selectivity

1 3 5

0.70 0.61 0.53

0.85 0.76 0.68

iterative measurement test (Serth and Heenan, 1986),and the generalized likelihood ratio (GLR) method (Narasimhan and Mah,1987). The performance of the GLR method which uses a serial compensation strategy for the aame case study is given in Table VII. It was ensured that the AVTI of the GLR method was close to 0.1. Comparing the results with the ANN which also uses a serial compensation strategy (row 1of Table V), it is observed that the power of the GLR method is 10-15 % higher than that of an ANN, although the selectivities are comparable. If we compare the ANN which is trained using the signs of constraint residuals as well (Table VI), we observe that the power of the GLR method is still higher by about 5-10% but ita selectivity is lower by about 15%. Interestingly, it was observed that even the GLR method (like the homogeneously trained ANN) is not able to detect gross errors in streams 1, 2, 7, 9, and 23. The main results from the above study can be summarized as follows: (1)Normalized constraint residuals should be used as inputs to the ANN for gross error detection. For faster training, the absolute values of these residuals can be used as inputs. However, it is preferable to utilize the signs of constraint residuals in training, provided the ANN can be trained for both positive and negative magnitudes of gross errors. (2) The ANN should be trained homogeneously. Training need to be done for only one conveniently chosen magnitude of the gross error. (3) If good estimates of variables are available, then a serial replacement strategy can be used for multiple gross error detection. If no such estimates are available, then a serial compensation strategy is recommended for multiple gross error detection. Conclusion The basic issues involved in the use of ANNs for gross error detection have been suitably addressed, albeit for linearly constrained processes. The extension to nonlinear processescan be carried out on similar lines and is currently under investigation. Since ANNs are inherently nonlinear, it is hoped that they will perform better than statistical methods which rely on linearization of nonlinear processes.

ei: unit vector i $: input to node j in layer k n: number of measurements Nk: number of nodes in layer k output of node j in layer k Q: covariance matrix of measurement errors r: constraint residuals ui: standard deviation of constraint residual i w$ weights of interconnection between node i in layer k and node j in layer k + 1 IC: true values of variables y : measurements

q:

Greek Symbols

8: factor in sigmoid function (eq 2) scaling factor 6: gross error magnitudes t: random measurement errors uj: standard deviation of measurement error j y:

Literature Cited Bhat, M.; McAvoy, T. J. Use of Neural Neta for Dynamic Modeling and Control of Chemical Process Systems. Comput. Chem. Eng. 1990, 14, 573.

Hoskms, J. C.; Himmelblau, D. M. Fault Detection and Diagnosis Using Artificial Neural Networks. In Artificial Intelligence in Process Engineering; Mavrovouniotis, M. L., Ed.; Academic Press: San Diego, 1988; pp 123-159. Iordache, C.; Mah, R. S. H.; Tamhane, A. C. Performance Studies of the Measurement test for Detection of Gross Errors in Process Data. AZChE J . 1985,31,1187. Lippman, R. P. An Introduction to Computing with Neural Neta. IEEE ASSP Mag. 1987,4,14. Mah, R. S. H. Chemical Process Structures and Information Flows; Butterworth Stoneham, MA, 1990. Naraaimhan, S.;Mah, R. S. H. Generalized Likelihood Ratio Method for Gross Error Identification. AZChE J. 1987,33,1514. Roeenberg,J.; Mah, R. S. H.; Iordache, C. Evaluation of Schemes for Detecting and Identifying Gross Errors in Proceea Data. Ind. Eng. Chem. Rocess Des. Dev. 1987,26, 555. Rumelhart, D. E.; Hinton, G. E.; Williams, R. J. Learning Representations by Back-Propagating Errors. Nature 1986,323,533. Serth, R. W.; Heenan, W. A. Gross Error Detection and Data Reconciliation in Steam-Metering Systems. AZChE J. 1986,32, 733.

Acknowledgment

Ungar, L. H.; Powell, B. A.; Kamem, S. N. Adaptive Networks for Fault Diagnosis and Process Control. Comput. Chem. Eng. 1990,

The authors thank Dr. S. Pushpavanam of the Department of Chemical Engineering at IIT Kanpur for his valuable suggestions.

Venkatasubramanian, V.;Vaidyanathan, R.; Yamamoto, Y. Process Fault Detection and Diagnosie Using Artificial Neural Networks-I. Steady State Processes. Comput. Chem. Eng. 1990, 7,699.

Nomenclature A contraint matrix b: estimated magnitude of gross error

14, 561.

Received for review November 10, 1992 Revised manuscript received April 29, 1993 Accepted May 13,1993