Ind. Eng. Chem. Res. 1993, 32, 3020-3028
3020
PROCESS DESIGN AND CONTROL Data Rectification and Gross Error Detection in a Steady-State Process via Artificial Neural Networks Patricia A. Terry and David M. Himmelblau’ of Chemical Engineering, The University of Texas, Austin, Texas 78712
Department
One of the many problems engineers face is that of identifying and eliminating gross errors from measured data, and rectifying collected data to satisfy process constraints such as the mass and energy balances that describe a process. While it is possible t o use statistical methods coupled with error reduction techniques to rectify data, the strategy must be carried out iteratively in many steps. Artificial neural networks (ANN) being composed of basis functions yield excellent models, and can be trained to rectify data. We demonstrate the application of an ANN to rectify the simulated measurements obtained from a steady-state heat exchanger. Both random and gross errors added to the simulated measurements were successfully rectified. A comparison was made of the application of ANN with rectification by constrained least squares via nonlinear programming, and the ANN treatment proved to be superior. We conclude that the use of ANN appears to be a promising tool for data rectification.
Introduction
Methods of Data Rectification
When flawed data are used for cost accounting, production planning, design, process control, optimization, and fault detection and diagnosis, the result can be suboptimal performance and even unsafe operations. Data rectification can to some extent ameliorate the level of data corruption. Data rectification can be viewed as a broader concept than data reconciliation. The latter usually refers to the adjustment of the values of process measurements to compensate solely for random errors in the measurements. The assumption is that the “true values” involved in the material and energy balances are only corrupted by measurement noise. Data rectification, on the other hand, involves adjustment not only for measurement noise but also for gross errors, missing measurements, trends from the steady state, and possibly biases (long-run deviations from normality). Gross errors, as we have used them here, are deemed to be randomly occurring relatively large deviations (relatively large compared to the random errors about the usual expected value). Gross errors occur for one or a veryfew consecutive observations, but do occur continuously. Given measurements from a process (and a belief in the validity of the model), the data rectification problem can be posed as a nonlinear programming problem: minimize an error criterion subject to a set of constraints. Among the constraints is the process model, usually a model based on first principles. In this article we discuss one aspect of rectification that can be treated by successfullyvia artificial neural networks, namely gross error detection and removal from data. It turns out that not only can gross errors be detected, but they can be removed during rectification without sequentially removing them one at a time, and reprocessing the remaining data. We first briefly review the state of data rectification, then describe the process (a heat exchanger) and data involved in our example, explain the type of neural network used, and finally show the results of processing the data in an artificial neural network for various conditions.
A substantial literature of hundreds of articles exists dating back at least 30 years treating the subject of data reconciliation and rectification, mostly for steady-state processes. The usual procedure is to first remove gross errors and then reconcile the data that presumably now contain only random components. The existence of gross errors in the measurements usually has been detected via an assortment of statistical tests which are based on assumptions frequently violated by real plant data. Examples of such tests include those of Ripps (19651,Hogg and Tanis (1977), Knepper and Gorman (1980), Stanley and Mah (1981), Mah and Tamhane (1982), Voller and Planitz (1983), Iordache, Mah, and Tamhane (19851, Tamhane and Mah (19851, Crowe (1986,19921,Serth and Heenan (1986), Rosenberg et al. (19871, Narasimhan and Mah (1989),Kang (1990),Kao, Tamhane, and Mah (19901, Rollins and Davis (1990),Darouch and Zasadinski (1991), Kao et al. (19921, Liebman et al. (19921, and Rollins and Davis (1992). Tjoa and Biegler (1991) maximized the sum of two likelihood functions, one for random error and one for gross error. The key to valid data rectification is the employment of a good model of the process. If the model does not faithfully represent the process, then the rectified data will be distorted by model mismatch. While the material and energy balances usually used for process models do devolve from first principles, often so many simplifying assumptions have been made in their development that they are no better than strictly empirical models in representing a process. Thus, nonparametric model building to get a good model can be very competitive with the use of approximate models based on first principles. Table I lists a number of techniques of nonparametric model building. Artificial neural networks (ANN) are an additional method that has proved to be advantageous in many cases of process analysis,control, and fault detection. As mentioned in the Introduction, the mathematical statement of the rectification problem is
0888-5885/93/2632-3020$04.00/0
0 1993 American Chemical Society
Ind. Eng. Chem. Res., Vol. 32, No. 12, 1993 3021 Table I. Proposed Techniques for Nonparametric Model Building reference Breiman et al. (1984) Brieman (1991) Daubechies (1988); Mallat (1989) Duda and Hart (1973);Hiirdle (1990) Friedman (1991) Friedman and Stuetzle (1981); Huber (1985) Grenander (1981) Stone (1985) Wahba (1982)
method recursive partitioning (CART) p method wavelets Parzen windows and nearest-neighbor rules multiadaptive regression splines (MARS) projection pursuit
water 8200 ka/hr 7°C-
I
5500 kg/hr -. 60°C
-
Methanol
method of sieve additive regression regularization methods
W(y - 9 )
minimize F = (y subject to y = h(x)
f(x) = 0
+e
(la) (1b) (IC)
g(x) 2 0 where y = vector of measurements (both the inputs and the outputs of the process), 9 = vector of predicted measurements (both inputs and outputs), W = weighting matrix (possibly the identify matrix ) for the measurements, x = vector of “true”values of the input and output variables, f(x) = process model, h(x) = relation between the “true” values and the measured values of the variables, e = vector of random and other types of errors, and g(x) = possible inequality constraints and bounds on x. Other objective functions might well be used which would weight the measurements differently than exhibited in relation la. For example, the L1 norm would weight outliers less strongly than the La norm. Also, a penalty term could be added to the function in (la) to smooth the rectification such as adding the norm of the first or second derivatives of f(x). Again, this step would affect the relative weighting of the measurements. Equation l b (and IC)represents the model of the process. With an exact model and “true” values of the process variables, eq l b should be satisfied exactly. With only estimates for x, napely 9, and with f($ only known approximately, say f(x), one can satisfy f(9) C a,where a is some tolerance measuring accuracy. However, the result may not be valid if ?(x) is not close to f(x) and 9 is not a valid estimate of x. The problem with data rectification is that f(x) or x is never known for real process data. Thus, what an analyst can do is to start with a true model, add errors to the deterministic inputs and outputs, estimate 9,and compare 9 with the known x. The analyst also will compute the reduction in variance in the residuals resulting from the minimization compared with the initial variance in the data before rectification. If such a procedure proves to yield good results for a large number of cases in which f(x) is known, then one might expect to get valid results for cases in which f(x) is not known as long as one can formulate a reasonable approximation of f(x). The problem with this procedure is that in simulations f(x) is almost always used in the minimization of (l),not some ?(x) as would have to be the case for a real process. Thus, more favorable results are obtained than warranted. To improve on the general concept as outlined, we are led to the possible use of artificial neural networks in rectification to represent f(x) because ANN form excellent models. Data rectification using artificial neural networks con-
The Example Model and Simulation of Data To ascertain whether or not an ANN could satisfactorily rectify steady-state process data, we needed to know what the “true” values of the measured variables were. Hence, we used the Simulation Sciences code Hextran for a methanol/water heat exchanger operating under steadystate conditions. Methanol at a rate of approximately 5500 kg/h and 60 O C was cooled by water entering at approximately 8200 kg/h and 7 O C . Both the overall heattransfer coefficient and the exchanger area were maintained constant at 7500 kJ/(m2)(h)(OC) and 16.7 m2, respectively. The two energy balance equations describing the system shown in Figure 1 are
(rnC,Al?,,,
-
UAAT, = 0
(2b)
where m = mass flow rate, C, = heat capacity, AT =
3022 Ind. Eng. Chem. Res., Vol. 32, No. 12, 1993
temperature change of the stream, A T h = log mean temperature difference, A = exchanger area, and U = overall heat transfer coefficient. Although there were eight variables associated with the exchanger, only six were deemed to be “measured” variables, namely the four temperatures and the two input flow rates. The values of the six variables were selected from a case study in the Hextran code to be the deterministic base case (denoted by the symbol xi). Normal random noise with a coefficient of variation of 0.005 was then added to the four base case input variables so as to represent noise in the inputs to the exchanger. In some similar studies we used a coefficient of variation for the random noise of 0.04, but the rectification results were similar. In an ANN noise acts as a smoothing term added to the objective function so that the bias of the estimates of the true values of the variables increases somewhat but the overall mean square error is reduced. These values (without gross errors) became inputs to the Hextran code to yield values for the two output temperatures. Normal random noise was then added to the output temperatures to represent measurement noise. As a result, we generated 189different pairs of input-output vectors that comprised the training set (compared with 102 weights) and a set of 27 additional values to be used for the test set. To represent gross errors in the data (in addition to the random errors), a second set of training data consisting of 189 values and test set of 27 values was created with additive outliers included in about 10-1596 of the methanol simulated input flow rates: yi = x i + ei* + Ki where ei* represents the additive random error and Ki represents the gross error. All the gross errors were assumed to be due to instrument error. The amplitude of the gross errors ranged randomly to between 7 and 15% of the methanol flow rate.
Artificial Neural Networks Although artificial neural networks (ANN) have been inspired by research on the brain and living organisms, the ANN we currently use are far simpler structures. They are really networks of basis functions. We know that feedforward networks theoretically can exactly represent functions to any desired degree of accuracy if enough nodes are used, but with our current computer capacity, networks have to be truncated so that only approximate representations of processes can be achieved. The questions examined here are (1)are the approximations good enough and (2) are they better than approximate first principles models for data rectification? The procedure being used is quite general for steady-state processesbecause a process can be modeled by an ANN using input-output data. Adding the output variables to the input of an ANN and adding input variables to the output of an ANN will not disturb the fundamental modeling that takes place. Most artificial neural networks used by chemical engineers follow a similar framework of design that includes the following major features: (1)a set of processing units called nodes or transfer functions that execute an inputoutput mapping; (2) a pattern of connectivity between the nodes; (3) weights associated with each connection; (4) a rule for combining inputs to a unit with the unit’s state to produce a new output; (5) a rule for sending patterns of signals through the network; (6) a learning rule (to adjust the values of the weights); (7) data pairs (input-output pairs used to adjust the weights). With respect to the “learning” or “training” phase of the ANN, these words simply mean adjusting the weights
associated with each connection between the nodes in the net to meet some performance criterion. Presented in these words, one can see that all that is involved in obtaining the weights is an unconstrained optimization of a complex nonlinear function with the weights as the independent variables. In our work, the criterion (objective function) was relation la. Although we used back propagation, a form of steepest descent, to determine of the best values of the weights, any nonlinear programming algorithm or structured random search such as simulated annealing could be used to carry out the minimization. In the back propagation algorithm for updating the weights, the updating formula for the weights is obtained by differentiating the least squares objective function with respect to the weights, and after some algebra and applying the chain rule the following is obtained
where Awji(”+l)is the weight correction from iteration n to (n + l),9 is the “learning rate”, a proportionality coefficient (not necessarily constant), dEldwji is the derivative of the objective function with respect to weight wji, and a is a proportionality coefficient that multiplies the previous weight change in the second term (called the “momentum”),a term that helps damp out oscillations in the weight change Awj?”). A variety of feedforward artificial neural network architectures were tested empirically to get the network architecture that predicted the best. We used six input and six output nodes corresponding to the six measurements in the heat exchanger. To get the number of nodes in the hidden layer, we tested single hidden layer networks with 4-12 hidden nodes and networks with 2 hidden layers. We scaled all the input data to range from -0.9 to +0.9. Most of the scaling of the data was done linearly, but one group of networks was trained and tested with nonlinearly scaled data. Both the logistic function and the Gaussian function were used as transfer functions for the nodes during the training. Overall mean square errors achieved during the learning phase with logistic functions were higher by a factor of about 1.5. We finally settled on a net with six input nodes corresponding to the measurements, 6 output nodes corresponding to the rectified measurements, and one hidden layer of eight nodes because the overall mean square errors obtained by using networks with two hidden layers were no better than those obtained from a single hidden layer for the test data. This type of architecture is not the only one that might be used. For example, Kramer (1992)joined two feedforward nets with a “bottleneck” layer, and Karjala and Himmelblau (1992) used an Elman net. The HNC Corp. neurosoftware simulator Netset was used to train and test the neural networks. Training progress was measured by the overall mean square error of the scaled data, and was considered to be complete when the overall mean square error became asymptotic for at least 200 passes of the training set through the network. Both the values of the learning rate and the momentum used to train the network were very low, and were adjusted throughout the learning phase. Typically, learning rates for the hidden and output layers were initially set at 0.15 and 0.10, respectively. By the end of training they would have been lowered to 0.05 and 0.03, respectively. Momentum values were generally maintained at approximately twice the learning rates. Supervised learning for feedforward ANN requires targets, and in real life the targets for the data are, of course, not prespecified. If simulated data are used, the “true” values of the process variables are known and might
Ind. Eng. Chem. Res., Vol. 32, No. 12, 1993 3023 Table 11. Comparison of Two Targets in Training: (1) True Value Used as the Target Value and (2) Iterative Estimate of True Value Used as the Target base case iterative method Case used as target to get target value (deterestof std estof std variable ministic) true value dev true value dev MeOH Fh 12060.00 12077.34 87.462 12080.46 93.114 H20 Fi. 18OOO.00 18017.55 105.720 18015.67 100.917 MeOH Ti. 140.00 139.87 0.783 140.15 0.801 HzO Ti. 45.00 45.06 0.338 45.10 0.343 MeOH Tout 54.54 54.56 0.217 54.53 0.206 HzO Tout 87.60 87.57 0.385 87.60 0.401
be used as targets to estimate 4, but such a procedure is not feasible with real data. We decided that in practice one might develop targets iteratively by using the rectified values of the variables from the first stage of calculations as the targets for a second stage of rectification, and so on successivelyuntil the error level in F in relation l a and the tolerances for satisfying eqs l b and ICwere met. What actually governs termination of such a procedure is not the results of the training phase but those of the testing phase of network development (which we executed after each stage of training). The two different training modes were compared: one in which the target values were the true value of each variable and the other, an iterative procedure, in which the rectified values were used as target values stage by stage. In the iterative method, on the first stage of training, the targets were just the average values of the measurements for the entire training set. After the training was complete, the training set was processed through the network to obtain rectified values of the sixvariables. The average values of these "rectified" variables were then used to train the network a second time with the measurements still being the inputs. The saved weights for each training stage were used as the initial weights for the next training stage. This procedure was continued iteratively until the mean square error of the rectified scaled variables remained unchanged. Table I1 lists the outcome of rectification by the net for data with gross errors in the methanol flow rates plus random errors in all of the variables. No difference exists in the results as far as the estimated means and standard deviations of the variables are concerned. Because it took six to eight iterations to reach a stable value for the error in F in eq la, to save time the results shown here all involved the use of the true value of a variable for the target value. However, we checked occasionally to ascertain that our conclusion about target selection was not violated. We also checked the energy balances 2a and 2b to make sure that they were not violated (if they had been, we would have had to reduce the allowable error in F and train more extensively).
Results and Evaluation Two identical networks were trained, one on the data set with only random error in the data and one on the data set that also contained gross errors in the methanol stream flow rate. Once the training of the nets was completed, the test sets (one with only random error and one with random and gross error) were each processed by both of the networks to obtain rectified values for each of the variables. To check the performance of each network we calculated for the 27 test points (1)the average for each of the six measured variables, yi, and (2) the average for the predicted value of each of the six variables, Ji. Although the true values, x i , of the four inputs were known prior to adding random error, the values of the two
Table 111. Comparison of the Means and Standard Deviations of the Unrectified Data with the Rectified Data When Only Random Error Was Present network input (unrectified) variable
basecase
yi
MeOH Fi, HzOFi, MeOHTi, H2O Ti, MeOH Tout H2O Tout
12060.00 18000.00 140.00 45.00 54.54 87.60
12054.95 18008.62 139.84 45.06 54.51 87.56
network output (rectified)
si(l)
41
si(z)
61.608 12064.89 60.472 79.911 18001.48 68.653 0.815 139.90 0.764 0.338 45.08 0.392 0.286 54.49 0.369 0.379 87.50 0.471
output variables all differed in each of the simulations involving the training set and the test set. Consequently, we averaged for the two output temperatures the values of the heat exchanger outputs prior to adding random noise to get yi* to represent the %rue" values of the two output temperatures. In addition, standard deviations were calculated for each of the unrectified and rectified variables as follows: unrectified data input variables 27
output variables 21
rectified data
To provide a single scalar value by which the performance of the ANN could be evaluated for all six variables simultaneously, we used the mean square error (MSE)for the test set. The values produced by the output nodes of the net were scaled from 0 to 1 so that the weighting in the objective function for each of the six variables was about the same. The mean square errors were calculated as follows: unrectified data 6
rectified data 6
We sought answers to certain questions in this work. 1. Can a network trained with data containing normal random error successfully reconcile new (but similar) data? Table I11 compares the means and standard deviations of the six measured variables for three cases: (1)base case values, xi or yi*, (2) measured (unrectified)variables values j ~ iand , (3) rectified values j i when the data contained only random error and the network was tested with a similar training set. Figure 2 compares the mean square errors between the unscaled reconciled and the unreconciled variables for each of the 27 test points. While the overall error was not significantly reduced by the rectification, the mean values obtained by rectification from the network satisfied the two process constraints (2a) and (2b) whereas the simulated measured data typically did not. On the basis of the fact that the values of the reconciled variables were normally distributed, a two-sided t-test with a confidence interval of 0.95 indicated that the hypothesis
3024 Ind. Eng. Chem. Res., Vol. 32, No. 12, 1993 annn
"""V
7200 MOO
5600
1
I
Network reconciled unreconciled
z
I
h
4800 4000 3200 2400 1600
SO0 0
3
0
6
12
9
15
18
21
24
27
Toat Data Number
Figure 2. Mean square error of all of the variables for both unreconciled and network reconciled variables with only random error added to the true values.
-
700000
6ooooo 500000 v)
=
-.
umtifiidata
Netwmkrectifieddata
1-
R
300000 200000 400000
100000
minimize FC = z x ( y i j - $ i ) 2/si(2)
(3)
1 . l
-
0-
0
The reason why an ANN trained solely with random errors will not rectify data as well as an ANN trained with both random and gross errors in the data is that the ANN interpolates in carrying out data rectification. The noise in the data effectively acts as a smoothing function added to the least squares objective function in training the net. Gross errors (not biases which represent continuation of a gross error) are very similar to random noise except (a) their amplitude is greater (we used magnitudes of 1015% of the mean value), so they act as outliers, and (b) the frequency of occurrence is lower. Without gross errors being present in training the net, the smoothing that occurs is different, the value of the objective function is different, and the predictions of the net will be less satisfactory. From a practical view point this means that it is best to use historical plant data in training a net with the understanding that the net may not be satisfactory if some substantial change takes place in the character of the noise or the gross errors, as for example they change from having an expected value of their mean of zero to being all positive. 3. Can A Network Reconcile Data As Well As Traditional Methods I f The Data Contain Only Random Errors? To compare the effectivenessof data reconciliation using artificial neural networks with that achieved by more traditional methods, specifically nonlinear programming (NLP), the test set containing random errors only was reconciled by minimizing the s u m of the squares of the differences between pi and the measurements yij (scaled by dividing by @))
3
6
9
12
15
IS
21
24
27
Test Data Number
Figure 3. Mean square error of all of the variables for both unreconciledand network reconciled Variables with groas and random error added to the true values.
of no difference between the true value and the reconciled value was accepted for each of the six variables. 2. Can a network trained with data containingrandom errors plusgross errors successfully rectify new but similar data? Several experiments showed that a network trained with data containing only random errors could not satisfactorily rectify data that also contained gross errors. However, the network trained with data containing both random and gross errors generalized so that test sets containing either random errors or both gross and random errors could be successfully rectified. This outcome is quite desirable since it implies that ANN can be trained on historical data that contains some garbage in it and filter out the garbage in new data. Figure 3 compares the overall mean square error of the unscaled test set data and the corresponding rectified values for the test set involving gross errors when the network was trained with data that included gross errors. Table IV gives the means and standard deviations for the test set data and the corresponding rectified values processed through two different networks. Note that the overall mean square error, especially for the case with gross errors in the data, was significantlyreduced. Furthermore, networks trained with only random errors added to the data did not reduce the variance as much as did nets trained with both random and gross errors in the data.
subject to two process constraints (2a) and (2b) via the successive quatratic programming code NPSOL. Figure 4 compares mean square error for the application of NPSOL with that of the network output. Table V presents the means and standard deviations for the test set. Examination of the overall mean square error and standard deviations indicates that the two methods reconcile the data about equally well, but keep in mind that the exact model was used in NPSOL whereas the ANN developed an approximate model. The null hypothesis of no difference between the true values and predicted values of the variables was accepted for all the network outputs, whereas not all of the values of the variables from the reconciliation using NPSOL passed the t-test for a confidence coefficient of 0.95. The reconciled means for the methanol flow rate and the entering methanol temperature obtained via NPSOL exceeded the confidence limits slightly. 4. Can a network rectify data as well as traditional methods i f the data contain both random andgross errors? Figure 5 compares the overall mean square errors resulting from rectification by the ANN when the ANN was trained on data containing both gross and random errors with the results of using NLP. Unlike the weighted least squares method via NLP, the network successfully eliminated the gross errors and the rectified values of the variables and still satisfied the two process constraints. The standard deviations of the variables rectified by the network were significantly smaller than the standard deviations obtained by applying the NPSOL code. In addition, the rectified values from the network passed the t-test for no change in the ensemble mean whereas the rectified values from the NLP procedure for the methanol flow rate, the exit methanol temperature, and the exit water temperature were statistically different from the
Ind. Eng. Chem. Res., Vol. 32, No. 12, 1993 3025 Table IV. Comparison of Means and Standard Deviations for the Unrectified Data and the Rectified Data with both Gross and Random Error Included in the Data for MeOH measurements with random and gross error added xi or pi* 12059.54 18008.62 139.84 45.06 54.51 87.56
variable MeOH Fi. HzO Fi. MeOH Ti, HzO Ti. MeOH Tout Ha0 Tout 9000 8ooo
7000 6000
1
network results for net trained with random error only present
9i
Si(l)
1079.186 79.911 0.815 0.338 0.286 0.379
9i
si(z)
11992.15 18011.52 140.01 45.09 54.42 87.45
350.402 148.64 1.320 0.462 0.396 0.451
I
72000
NctworkrectifdOtl Wnctification
-,
network results for net trained with both random and gross error present
64000
I
Ip
: 5000 a
12077.34 18017.55 139.87 45.06 54.56 87.57
-
87.462 105.720 0.783 0.338 0.217 0.385
Network Rectification Wrectification
56000 w
48000
=
40000
v)
4000 32000 3000
24000
2000
16000
1000
8000
n
0 0
3
6
9
12
15
18
21
24
27
Test Data Number
0
3
6
9
12
15
18
21
24
27
Test Data Number
Figure 4. Mean square error of all of the variablesfor two rectification methods with random error added to the truevalues: (1)ANN trained with random error and (2)NLP minimization.
Figure 5. Comparison of mean square error of all of the variables for network and NLP rectification with random and groes error added to the true values.
Table V. Comparison of Means and Standard Deviations for the ANN and NLP Techniques of Reconciliation When Only Random Error Was Included in the Data NLP network
a reconciliation phase on the adjusted data set. We carried out experiments in which the data containing both gross and random errors were preprocessed to remove the gross errors before being rectified by the network. Preprocessing for the NLP treatment was carried out by applying the measurement test (based on the residuals) to the simulated data so that outliers could be detected and removed prior rectifying the data. The measurement test for outliers was applied in an iterative procedure. First, the exchanger data were run through the NPSOL minimization routine. Then the measurement test was applied to check for outliers. Once one or more outliers were located and removed, the data were again processed by NPSOL. This procedure was repeated until no gross errors were detected. Application of the measurement test resulted in the correct identification of all of the gross errors in the methanol flows. A similar procedure was applied to the predictions from the network, and outliers were successively removed. We could have made a single rectification of the data as described under question 4 above and identified all the gross errors directly, but did not do so in order that the test treatment of the data sets was equivalent. In practice, the measurement test could be eliminated as a way of isolating gross errors if an ANN is used. Figure 6 compares mean square errors for the network when the data were preprocessed and when they were not preprocessed. Table VI compares the means and standard deviations for the ANN. Although the improvement is not dramatic, preprocessingdoes improve the ability of the network to rectify the data, and it certainly is essential if NLP is to be used. Figure 7 compares the overall mean square error for two cases involving preprocessed data, one using the artificial neural network and the other using NLP. Table VI1 lists the mean values and standard deviations of the rectified measurements. While the NPSOWmeasurement
variable MeOH Fi. HzO Fi. MeOH Ti. Ha0 Ti. MeOH Tout HzO Tout
basecase 12060.00 18OOO.00 140.00 45.00 54.54 87.60
Bi
si(2)
5i
12032.78 17981.39 139.80 45.06 54.51 87.49
64.389 104.341 0.878 0.397 0.376 0.513
12076.07 18016.36 139.88 45.06 54.56 87.57
66.472 86.653 0.764 0.396 0.375 0.501
true values. The artificial neural network definitely did a better job of rectification. We suspect that the reason an ANN is able to handle gross errors without preprocessing (as is required for the nonlinear programming approach) is based on two characteristics of an ANN: 1. The hidden layer in effect carries out a nonlinear principal component analysis on the process data. Thus any outlier (gross error) will not be associated with a large variance and consequently will not contribute significantly to the model developed by the ANN and hence will not greatly influence the prediction by the ANN. A similar concept occurs in regression on principal components. When some principal components from the regression equation are deleted, biased estimates of the coefficients may be a result, but large variances caused by multicolinearities are reduced. 2. Smoothing takes place as discussed in connection with question 2 so that the effect of gross errors is wiped out. 5. What is the effect of preprocessing of the measurements before rectification? Can preprocessing of the measurements improve the rectification of the data? Traditional methods remove the gross errors (usually one at time) before carrying out
3026 Ind. Eng. Chem. Res., Vol. 32, No. 12,1993 7000
I
Table VII. Comparison of Means and Standard Deviations of the Rectified Data by the Network and the NLP Procedure When the Data Were Preprocessed NPSOL/measurement test network base variable case 4i g p 4i s p MeOH Fi, 12060.00 HzO Fi, 18000.00 MeOHTi, 140.00 HzOTi, 45.00 MeOH Tout54.54 87.60 HzO Tout
L .
1600000 0
3
6
9 12 15 18 Test Data Number
21
24
27
Figure 6. Mean square errors for all of the rectified variables compared for the preprocessed and unpreprocessed data with both gross and random error included.
12087.20 17983.73 140.26 45.13 54.75 87.87
1400000
63.937 95.605 1.522 1.558 1.458 0.794
12062.08 18015.97 139.87 45.06 54.56 87.57
57.619 100.382 0.643 0.266 0.296 0.434
Networkrectified Unreclilied
1200000 W
1000000
= 800000 In
Table VI. Comparison of Means and Standard Deviations for Network Rectification with Preprocessed and Nonpreprocessed Data When both Random and Gross Error Were Included in the Data for MeOH base nonpreprocessed data preprocessed data variable case Si gi(2) 4i si(2) MeOH Fi, 12060.00 HzO Fj,, 18000.00 MeOH Ti, 140.00 HzOTi, 45.00 MeOH Tout54.54 HzO Tout 87.60 _I
nnnn
lUUUU
9000
8000
7000
1-
12077.34 18017.55 139.55 45.06 54.56 87.56
87.462 105.720 0.783 0.338 0.217 0.385
12062.08 18015.97 139.87 45.06 54.56 87.57
57.619 100.382 0.643 0.266 0.296 0.434
NLP/MeasurementTe-st Reprocessed Data and Network
w 6000 In
=
5000
4000 3000
2000
1000 0 0
3
6
9 12 15 18 Test Data Number
21
24
27
Figure 7. Mean square error for all of the rectified variables for two cases: (1)network with preprocessed data and (2)the NLP plus the measurement test procedure with both random and grosa error added to the data.
test processing resulted in a slightly lower standard deviation for the water flow rate, the network yielded smaller standard deviations for the other five variables. In addition, the null hypothesis of no difference between the true value and the reconciled value was accepted for all of the rectified measurements from the network. Thesame was not true in the NPSOL/measurement test method for the methanol flow rate. Thus, the network could be said to do a slightly better job of rectification. Again, keep in mind that the NLP method involves an exact model. 6. Can a network successfully rectify autocorrelated data? When the process data are autocorrelated, some of the
600000 400000 200000 0
0
3
6
&$
NJiber 18 21 24 27
Figure 8. Mean square error of all the variables for both network rectified and unrectified variables with gross error included in both the MeOH flow and the water flow.
underlying assumptions involved in traditional methods of rectification are violated, and the performance of the methods degrades. In some of our simulations we assumed serial correlation occurred in the additive error. To both the training sets and the two test sets we added serially correlated errors to the methanol stream. The neural network was trained on the correlated data, and the test data containing similar data were easily rectified. Serial correlation in the methanol stream was reduced from 0.50 in the original test data to 0.18 in the rectified data. Test data containing both gross and serially correlated errors were also easily rectified as was the case with just random error in the data, and serial correlation in the methanol flow was reduced considerably. When NLP was applied to the same data set, the serial correlation was reduced from 0.50 to 0.22. The difference in the two correlation coefficients is not significant. 7. Can a network successfully rectify data with simultaneous gross errors i n more than one variable? To determine whether data with gross errors in two of the variables could be rectified, a further training set of 189groups of measurements was created plus a testing set of 27 groups of measurements. Each set contained gross errors in both in both the input methanol and the water flow rates plus added normal random error with a coefficient of variation of 0.025 in all six variables. For either flow rate, the magnitude of the gross errors ranged from 7 to 15% of the true values. Once the net was trained, it was evaluated via rectifying the test set. The test data were preprocessed as described previously using the measurement test to detect and remove the outliers. Figure 8 compares the mean square errors of the rectified test set with the correspondingvalues of the unrectified test set. Table VI11 lists the standard deviations and means for each variable. The standard deviations listed in Table VIIIconfirm that the network
Ind. Eng. Chem. Res., Vol. 32, No. 12, 1993 3027 Table VIII. Standard Deviations and Means for the Unrectified Data and the Data Rectified by the Network When both Random and Gross Error in Two Streams Were Included in the Data measurements with rectified data random and gross error included by network variable basecase 9i si(l) 9i si(2) MeOH Fi. HzO Fi. MeOH Ti, HzO Ti, MeOH Tout HzO ‘Tout
12060.00 18000.00 140.00 45.00 54.54 87.60
12059.54 18007.17 139.84 45.06 54.51 87.56
1079.185 1454.552 0.815 0.338 0.286 0.379
12056.89 18011.85 139.78 45.07 54.53 87.40
87.747 78.063 0.622 0.246 0.223 0.301
significantly reduced the error in the flow rate measurements while still reducing the error in the temperatures. When compared with the use of NLP as described previously, the results of rectification by the ANN were slightly better.
Conclusion On the basis of simulated measurements from a steadystate heat exchanger, we have demonstrated that artificial neural networks can rectify data containing only random errors. Furthermore, they can rectify data containing gross errors in one or more steams and autocorrelated data. When compared to the direct application of nonlinear programming for data rectification, the ANN was distinctly superior. When the measurements were preprocessed to remove outliers via the measurement test of residuals, the ANN results were slightly improved and still better than the application of least squares via nonlinear programming. Thus, while there is still much to be done to build up trust that an ANN can validly rectify measurements in practice, we conclude that ANN appear to have promise as a tool for solving the long running problem of data rectification because they provide good models and filter noise.
Acknowledgment The authors wish to thank the Shell Development Co. for support of this research and HNC Corp. for loan of the neural network hardware and software used in the work. Nomenclature A = exchanger area C, = heat capacity e = vector of errors (of various kinds) e* = vector of normal random errors ei* = normal random error for the ith variable f(x) = true process model g(x) = inequality constraint h(x) = relation between the “true” values and the measured values of the variables Ki = gross error in variable i m = mass flow rate MSE(’) = mean square error = estimate of standard deviation for sample yi AT = temperature change of a stream passing through the exchanger AT, = log mean temperature difference U = overall heat transfer coefficient W = weight matrix x = vector of “true” values xi = true value of variable i y = vector of measured variables yij = simulated measured value of variable i in test set j 9 = vector of rectified values of y
SI’
yi
= simulated measured value of variable i
pi = rectified estimate of value of variable i y* = vector of deterministicvalues of the output temperatures
5i = average value of rectified variable i
j~i*= average value of measured variable i in the two output temperatures of the exchanger Greek Characters a = tolerance coefficient Overlays
- = average A
= rectified value
Literature Cited Brieman, L. The Method for Estimating Multivariate Functions from Noisy Data. Technometrics 1991,33, 125-160. Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Classificationand Regression Trees; Wadsworth Belmont, CA, 1984. Crowe, C. M. Reconciliation of Process Flow Rates by Matrix Projection, Part 11. AZChE J . 1986,32, 616-623. Crowe, C. M. The Maximum Power Test for Gross Errors in the Original Constraints in Data Reconciliation. Can. J . Chem. Eng. 1992,70, 1030-1036. Duda, R. 0.; Hart, P. E. Pattern Classificationand Scene Analysis; Wiley: New York, 1973. Darouach, M.; Zasadzinski, M. Data Reconciliation in Generalized Linear Dynamic Systems; AZChE J . 1991,37,193-201. Friedman, J. H. Multiadaptive Regression Splines. Ann. Stat. 1991, 19,l-141. Friedman, J. H.; Stuetzle, W.: Projection Pursuit Regression.J . Am. Stat. Assn. 1981,76,817-823. Grenander, U. Abstract Inference; Wiley: New York, 1981. Hiirdle, W. Smoothing Techniques w i t h Implementation; Springer-Verlag: New York, 1990. Hogg, R. V.; Tanis, E. A. Probability and Statistical Inference; MacMillan Publishing Co.: New York, 1977. Huber, P. J. Projection Pursuit: Ann. Stat. 1985,13,435-475. Iordache, C.; Mah, R. S. H.; Tamhane, A. C. Performance Studies of the Measurement Test for Detection of Gross Errors in Process Data. AZChE J . 1985,31,1187-1201. Kang, C. W. Diagnostic Methods Using Recursive Analysis in Nonlinear Analysis. Ph.D. Univ. Minnesota, June 1990. Kao, C.4.; Tamhane, A. C.; Mah, R. S. H. Gross Error Detection in Serially Correlated Process Data. Znd. Eng. Chem. Res. 1990,29, 1004-1012. Kao, C.-S.; Tamhane, A. C.; Mah, R. S. H. Gross Error Detection in Serially Correlated Process Data. (11): Dynamic Systems. 4th Zntl. Symp. Process Syst. Eng. 1991,4th,11.13.1-11.13.19. Karjala, T. W.; Himmelblau, D. M.: Data Rectification Using Recurrent (Elman) Neural Network. Proc. ZJCNN92 1992. Knepper, J. C.; Gorman, J. W. Statistical Analysis of Constrained Data Sets. AZChE J . 26,260-264. Kramer, M. A. Autoassociative Neural Networks. Comput. Chem. Eng. 1992,16,313-328. Liebman, M. J.; Edgar, T. F.; Lasdon, L. S. Efficient Data Reconciliation and Estimation for Dynamic Processes Using Nonlinear Programming. Comput. Chem. Eng. 1992,16,963-986. Mah, R.; Tamhane, A. Data Reconciliationand Gross Error Detection in Chemical Process Networks. Technometrics 1985,27, 4. Narasimhan, S.;Mah, R. S. H. Treatment of General Steady State Process Models in Gross Error Identification. Comput. Chem.Eng. 1989,13,851. Ripps, D. L. Adjustment of Experimental Data. Chem. Eng. Prog. Symp. Ser. 1965,61,8-13. Rollins, D. K.; Davis, J. F. Gross Error Detection in On-Line Operations: a-Level Tests, Power Functions and Confidence Intervals. Presented at the AIChE Meeting, Chicago, IL, Nov 1116,1990. Rollins, D. K.; Davis, J. F. Unbiased Estimation of Gross Errors in Process Measurements. AZChE J . 1992,38,563-572. Rosenberg, J.; Mah, R. S. H.; Iordache, C. Evaluation of Schemes for Detecting and Identifying Gross Errors in Process Data. Znd. Eng. Chem. Res. 1987,26,555. Serth, R. W.; Hennan, W. A. Gross Error Detection and Data Reconciliation in Steam-Metering Systems. AZChE J . 1986,32, 733.
3028 Ind. Eng. Chem. Res., Vol. 32, No. 12, 1993 Stanley, G. M.; Mah, R. H. S. Observability and Redundancy Classification in Process Network Theorems and Algorithms. Chern. Eng. Sci. 1986,36,1941-1954. Stone, C. Additive Regression and Other Nonparametric Models. Ann. Stat. 1985,13,689-705. Tamhane, A. C.;Mah, R. S. H. Data Reconciliation and Gross Error Detection in Chemical Process Networks. Technometrics 1985, 27, 409. Tjoa, I.; Biegler,L. T. Simultaneous Strategies for DataReconciliation and Gross Error Detection of Nonlinear Systems. Comput. Chern. Eng. 1991,15,679-690. Whaba, G. Constrained Regularization for Ill-posed Linear Operator Equations with Applications in Meteorology and Medicine. In Statistical Decision Theory and Related Topics, ZZZ; Gupta, S.
S., Berger, J. O., Eds.; Academic Press: New York, 1982;Vol. 2, pp 383-418. Voller, V. R.; Planitz, M. A Comparison of the Algorithms for Automated Data Adjustment and Material Balance Around Mineral Processing Equipment. In Mathematical Modelling in Science and Industry; Avula, X. J. R., et al., Eds.,: Pergamon Press: New York, 1983;p 260. Receiued for reuiew March 29, 1993 Revised manuscript received August 23, 1993 Accepted August 30,1993'
Abstract published in Advance ACS Abstracts, October 15, 1993.