A Framework for Robust Data Reconciliation Based on a Generalized

Framework for Integrated Model-Centric Process Support ... Support Vector Regression Approach for Simultaneous Data Reconciliation and Gross Error or ...
0 downloads 0 Views 128KB Size
Ind. Eng. Chem. Res. 2003, 42, 3075-3084

3075

A Framework for Robust Data Reconciliation Based on a Generalized Objective Function D. Wang and J. A. Romagnoli* Process Systems Engineering Laboratory, Department of Chemical Engineering, The University of Sydney, Sydney, NSW 2006, Australia

In this paper using generalized objective functions, within a probabilistic framework, a unified view on robust data reconciliation is provided. Conditions for robustness as well as efficiency are investigated for a set of objective functions and their associated influence functions. A partially adaptive estimator based on a generalized T distribution and a fully adaptive estimator based on density estimation are also proposed and discussed. Both give the efficiency interpretation in the maximum likelihood estimation. The performance of the proposed methods and their comparison with other existing approaches are illustrated through a chemical engineering example. 1. Introduction A wide variety of process measurements are taken in chemical plants for various purposes such as process monitoring, control performance evaluation, process control, and statistical quality control. Measurements can be contaminated with errors (random and/or gross) due to various sources such as measurement irreproducibility, instrument degradation and malfunction, human error, process-related errors, and other unmeasured errors.1,2 Due to these errors we cannot expect that any set of measurements will obey the laws of conservation. Rational use of the large volume of data generated by chemical plants requires the application of suitable techniques to improve their accuracy. Data reconciliation (DR) is a procedure of optimally adjusting measured data so that the adjusted values obey the conservation laws and other constraints. The conventional method for data reconciliation is the weighted least-squares problem, where adjustments to the data are weighted by the inverse of the measurement noise covariance matrix so that the model constraints are satisfied. The optimality/validity of this approach is implicitly based on a main assumption that the errors follow a normal Gaussian distribution. When this assumption is satisfied, conventional approaches provide unbiased estimates of the plant states and a closed form solution can be obtained, which is suitable for plantwise use. However, normal distribution usually dose not exist in real chemical engineering practice, as it is hard to ensure the normality even for high-quality measurements. In addition, the frequent presence of gross errors and outliers violates the assumptions in the conventional approach and makes the results invalid.3,4 Even though the error characteristics are usually unknown in advance, most past researchers adhere to the use of normal distribution to characterize the errors for its convenience of discussion and computation. Following this convention, several schemes have been suggested to deal with the corruption of conventional normal assumption. The usually adopted approach is * To whom correspondence should be addressed. Tel.: +61 2 93514794. Fax: +61 2 9351 2854. E-mail: jose@ chem.eng.usyd.edu.au.

gross error and outliers detection, where the majority of the data are taken as normal distribution and a small fraction of the data, which could not pass the statistical test, is considered to be abnormal and has to be removed. Some typical gross error detection schemes were developed and practically used, such as the global test, nodal test, and measurement test.5,6 The global and nodal tests are based on model constraint residuals, whereas the measurement test is based on NeymanPearson hypothesis testing, using the residuals between measurements and estimations. The approach of gross errors/outliers detection intends to take advantage of the prevalence and convenience of normal assumption, but it has limitations. First, the selected tests are based on the normality assumption, which is not necessarily true. The residuals from regression may be heavily biased by the presence of the outliers or by other different error characteristics. Second, these gross error detection approaches are for linear models. This is a restriction for almost any process model other than mass balances. It is then possible that the validity of the tests may be affected by the additional errors introduced by the linearization of nonlinear constraints when the measurements are forced to conform to the linearized (approximate) model. Considering this, an alternative approach is to take into account the nonideality of the data distribution in the problem formulation. This is usually accomplished by combining the nonlinear programming and maximum likelihood principles, after the error distribution has been suitably characterized.7,8 In this approach, the data reconciliation problem is posed as an optimization problem and the probabilistic framework is employed for describing the error distribution. Typically, in their discussion, the characteristic of the error is composed of two parts, a narrow Gaussian representing normal measurement noise and a broad Gaussian representing the gross errors. The two Gaussian distributions are weighted by the probability or not of having a gross error in a sensor, respectively, and an objective function is constructed accordingly. In this way, the conventional data reconciliation and gross error detection can be condensed into a single optimization problem and can be performed simultaneously. Because of its nonlinear programming formulation, there are no restrictions on

10.1021/ie0206655 CCC: $25.00 © 2003 American Chemical Society Published on Web 05/31/2003

3076

Ind. Eng. Chem. Res., Vol. 42, No. 13, 2003

the constraints, so it applies not only to both linear and nonlinear systems but also to dynamic systems or systems with bound constraints. The limitation of this approach is that the contamination degree (i.e., the probability of having gross errors) has to be specified a priori for each measurement. In addition, the optimality and efficiency of assuming Gaussian characteristics for both normal and gross errors are under discussion. The DR problem can be posed in a probabilistic framework and use the maximum likelihood estimation to determine the most likely rectified states. As we know, data reconciliation is a procedure to adjust the data, subject to a priori constraints (usually system models DAE) considering spatial redundancy. However, making the data conform to these model constraints will not necessarily make the data right. Generally, incorporating more a priori knowledge of process variables (e.g., historical data) may improve the estimation. Considering this, a method based on Bayesian inference provides a unified approach to estimation of process variables and takes other approaches as special cases. This approach introduces temporal redundancy in a probabilistic framework and strikes a balance between historical data and the new measurements. There are several criteria that can be used for inference such as (1) maximum a posteriori (MAP)-posteriori mode, (2) minimum variance (MV)-posteriori mean, and (3) minimax (MM)-posteriori median. In the case of MAP,8 for example, an objective function composed of both a priori variables’ distribution and the measurement errors’ distribution is maximized subject to the constraints. The first distribution is estimated from historical data, and the second one, which is for characterizing the new measurement errors’ distribution, is assumed to be the contaminated Gaussian (following Tjoa and Biegler7). One of the advantages of this unified approach is that additional insight about the nature of variables, such as expected values, ranges of variation, and bounds, is provided into the reconciliation procedure, and the estimation may be improved with this information, even in the case where no model constraints are available (actually, the model constraints have been used in the estimation of the first distribution). Another advantage is that it provides a broader view of the data reconciliation problem and nests previous approaches as special cases. Even though the contaminated Gaussian assumption of the errors is used in the approach for eliminating the effects of gross errors, the real error distribution is not necessary in that fashion (contaminated Gaussian), and the estimation may not be optimal in a maximum likelihood estimation (MLE) sense if the invalidation occurs. In addition, the a priori distribution represented by the first term is not necessarily Gaussian, just like the second distribution does not always reflect the real error distribution. In this work, we will provide a generalized robust framework to rectify this disadvantage and improve the estimation. The analogy between data reconciliation and parameter regression made it possible that results in robust identification be applied to data reconciliation. In robust parameter estimation, one uses an objective function in the minimization that, due to its mathematical nature, is insensitive to deviations from the ideal assumptions on errors, especially to outliers. These estimators tend to look at the majority of the data and ignore atypical values. In this fashion, an accurate regression can be performed even if nothing is known about the outliers

or even the error structure of the data. The robust approach concentrates on finding estimators (i.e., the objective function in the optimization) whose influence functions are at least bound and/or descending. A number of objective functions can be chosen as candidates in terms of the requirement.9 Even though it is insensitive to outliers, the optimality/efficiency of the estimation is still dependent on the distribution structure of the measurement errors according to maximum likelihood principles. If the errors of the greater part of the data do not follow the predefined distribution, even if robust, the estimation will not be optimal. In this paper, a generalized robust data reconciliation approach and its several special cases are discussed. Specifically, the focus is given on a proposed partially adaptive estimator, which is based on the generalized T (GT) distribution, and a fully adaptive estimator, which is based on nonparametric density estimation. Both approaches make the estimation optimal/efficient in the MLE sense as well as being robust. Here, the problem is posed as minimizing an objective function that reflects the real (or quasi-real) distributional structure of the measurement errors subject to the process model. The objective function can be adjusted to the shape of the error distribution through appropriate choice of distributional parameters, in the GT distribution, or by nonparametric probability density function (PDF) estimation, in such a way that it will retain the advantages of both robustness and maximum likelihood estimation. The paper is organized as follows. In the next section, the probabilistic framework of data reconciliation is first introduced and a generalized robust approach is described after some different approaches for the elimination of gross errors/outliers are compared. A partially adaptive robust approach based on the GT distribution is discussed in section 3. The specification of this GT density family and the associated DR procedure is also provided. In section 4, considering the improvement of the robust approach, along with the Bayesian approach, a proposed fully adaptive robust DR approach based on a generalized objective function is introduced and the estimation steps are described. An application example (heat-exchange network) of steady-state DR with nonlinear constraints is provided to show the comparative advantages of the approaches in section 5, followed by the conclusions of this work. 2. Data Reconciliation and Its Robustness The essence of data reconciliation is that given the process measurements y from the plant, we want to estimate the process state x, which satisfies the models from first principles. In a probabilistic framework, this problem can be approached with the laws of probability and maximum likelihood principle by maximizing the probability function of the difference between measurement y and estimation x, subject to the constraints

xˆ ) arg max f(u) x

s.t.

(1)

DAE model and inequality constraints where u ) y - x, f(u), the probability density function of the error. For the sake of computational convenience, the negative logarithm is usually taken of the objective

Ind. Eng. Chem. Res., Vol. 42, No. 13, 2003 3077

error assumption7

function so that the problem can be expressed as

min F(u)

n

x

s.t.

(2)

x

DAE model and inequality constraints where F(u) ) -log f(u). In the following, our discussion will be focused on the choice of the objective function F(u). If the sensor errors are assumed to follow normal distributions, the objective function F(u) is quadratic and the conventional weighted least-squares formulation is obtained T

[

ln (1 - η) ∑ i)1

min F(x) ) -

1

x2πRi η

( ) ( )]

exp 1

x2πRib

u2i

+

2R2i

exp -

u2i

2R2i b2

s.t. g(x′,x) ) 0 h(x) e 0

(5)

-1

min (y - x) W (y - x) x

s.t.

(3)

DAE model where W ) diag(R21, R22, ..., R2n) and Ri is the measurement noise standard deviation. Furthermore, for linear equality constraints Ax ) 0 and in the absence of inequality constraints, this problem has a closed-form solution via Lagrange multipliers: T

T -1

xˆ ) y - WA (AWA ) Ay

where η is the probability of a gross error occurring and b is the ratio of the standard deviation of the gross error distribution to the standard deviation of the normal noise for sensor i. Different values of b and η can be assumed for each sensor. For the ith sensor the η-contaminated normal distribution takes the form

( )

u2i 1 exp - 2 + f(ui) ) (1 - η) 2Ri x2πRi

(4)

Obviously, the efficiency of the solutions of eq 3 and 4 is dependent on the normal distribution assumption of the errors according to the MLE. In practical applications, the real distribution of the error is usually unknown. Because gross errors or outliers often corrupt the plant data, the real distribution usually has thicker tails than the normal one. This problem can be solved by designing a robust estimator, which is insensitive to the deviation of the assumption for the majority of data. The design of this estimator is usually converted into choosing the objective function F(u) (not necessarily the quadratic one as in the conventional approach) in the optimization problem eq 2. To be robust, this objective function must give less weight to a large value of u than its quadratic form u2 so that the estimator will down-weight or ignore the contribution of the large errors in the data reconciliation. In problem 2, a number of candidate F(u) can be chosen as the objective function so that different robust estimators can be obtained. These kinds of estimators correspond to the M-estimators in robust statistics. In robust estimation, the ψ function or influence function defined by ψ(u) ) ∂F(u)/∂u is the usual tool for comparing alternative M-estimators for their robustness. The ψ function measures the “influence” that a residual will have on the estimation process. Some suggested criteria for the ψ function are that it be (a) bounded, (b) continuous, and (c) identically zero outside an appropriate region. The motivation for these properties is that (a) a single “anomalous” observation would have limited influence on the estimator, (b) grouping or rounding of data would have minimal impact on the estimator, and (c) ridiculously large observations would have no impact on the estimator. According to the robustness requirements, different estimators can be established by using a mixed objective function, which is based on the η-contaminated normal

η

1

x2πRib

(

exp -

u2i

)

2R2i b2

(6)

The influence function of this estimator would be

( ) ( )

( (

) )

u2 η u2 + exp - 2 2 2 b 2R 2b R u) ψcont(u) ) 2 u η u2 (1 - η) exp - 2 + 3 exp - 2 2 2R b 2b R w(u)u (7) (1 - η) exp -

For very large values of u, which means the measurement is far from its normal value, the weighting function w(u) can be approximated by w(u) ≈ 1/b2, whereas for small values of u, w(u) ≈ 1. Therefore, a good approximation to the influence function of the contaminated estimator would be

{

u uf0 ψcont ) 1 u uf∞ b2

(8)

This will down-weight the effect of the large errors. Even though its value is unbounded when u f ∞, theoretically, it can be made bounded in a region by adjusting the parameter b. The contaminated normal distribution has thicker tails than the pure normal distribution, making it less sensitive to outliers. Following this idea, the thicker tails distribution can also be obtained by representing each sensor and its operating modes explicitly.10 If one combines the normal and Laplacian distributions (due to its longer tails compared with the normal distribution) to describe the errors’ characteristics, with normal distribution representing the majority random error and Laplacian the minority gross errors, another

3078

Ind. Eng. Chem. Res., Vol. 42, No. 13, 2003

mixed objective function is obtained:

{

u2 |u| e k F(u) ) 2 2 k k < |u| k|u| 2

(9)

The influence function is in this case

{

k u < -k ψ(u) ) u -k e u e k -k k < u

(10)

where k is the parameter characterizing the degree of contamination, being used as a tuning parameter for the estimator’s performance.9 This influence function is bounded for any value of the errors, so the effect of anomalous data can be eliminated in data reconciliation. One can also choose the objective function as the absolute value of the residuals (least absolute deviation), that is n

Ri-1|ui| ∑ i)1

min x

(11)

This choice of the objective function corresponds to the situation when the overall error distribution is assumed as Laplacian distribution, which has longer tails than normal distribution. In this case, all data are treated equally with a constant weight, even for the gross errors. The influence function of the estimate is bounded with a constant value. For robustness, one can also just design an objective function without taking into consideration the corresponding error distribution structure, such that its influence function is bounded. The fair function, for example, can be used in the optimization11

F(u) ) c2

(|u|c - log(1 + |u|c )) u 1+

|u| c

efficient by estimating its distributional parameters from the data in the MLE sense. 3. Data Reconciliation Approach Based on the Generalized T Distribution 3.1. Generalized T Density and Its Properties. In this work the proposed density specification for describing the measurement distribution, which has flexibility to accommodate various distributional shapes, is the generalized T (GT) distribution12

fGT(u;σ,p,q) )

p

(

2σq1/2B(1/p,q) 1 +

(12)

The influence function is in this case

ψ(u) )

Figure 1. Influence functions of estimators: (a) least squares; (b) contaminated normal k ) 0.85; (c) least absolute deviation; (d) fair function, c ) 1.3.

(13)

which is bounded with the argument u approaching infinity. Here c acts as a tuning parameter. The influence curves of the above estimators are summarized in Figure 1. Even though the above approaches are less sensitive to gross errors and outliers, the optimality of the estimation, in the MLE sense, is still dependent on the suitability of the chosen function with respect to the actual distribution of the data. It is generally hard to characterize the distribution of the errors correctly without posteriori estimation. If the real errors do not follow the specified distributions, the performance of estimator may deteriorate and the estimation could be biased. Considering these disadvantages, a more flexible probability distribution function will be discussed next to describe the error distribution. It is designed to allow for a variety of thickness of tails, to capture the shape of distribution, and to accommodate other distributions as much as possible as special cases. The corresponding estimator will then be robust by its ψ function and

)

|u|p q+1/p qσp -∞ < u < ∞ (14)

where σ, p, and q are distributional parameters: σ corresponds to the standard deviation and p and q are parameters corresponding to the shape of distribution. This density is symmetric about zero, unimodal, and suitable to describe the error characteristics in most cases. In data reconciliation, σ is usually given as measurement standard deviation following the convention, acting as a weighting penalty according to its accuracy. By the choice of different values of p and q, the GT distribution will accommodate the real shape of the error distribution. The larger the value of p or q, the “thinner” will be the tail of the density. Similarly, smaller values of p and q will be associated with “thicker” tails. The tail behavior and other characteristics of the distribution, given the measurement standard deviation σ, depend on these two distributional parameters, which will be estimated from the data as discussed later. In addition, the GT distribution defines a very general family of density functions and combines two general forms, which include most of the stochastic specifications one meets in practice as special or limiting cases. For example, the power exponential or Box-Tiao (BT) distribution is defined as

fBT(u;σ,p) )

pe(|u|/σ)p 2σΓ(1/p)

(15)

Ind. Eng. Chem. Res., Vol. 42, No. 13, 2003 3079

Figure 2. Distribution tree of GT distribution family.

It is known that it includes Laplacian and normal distributions as special cases. The BT distribution is related to the GT distribution in eq 14 by the following limit:

fBT(u;σ,p) ) lim fGT(u;σ,p,q) qf∞

(16)

The other distributional form is the t distribution, with 2q degrees of freedom, and it is the generalized T distribution (eq 14) with p ) 2 and σ ) x2R. Furthermore, the Cauchy distribution is obtained if q ) 1/2. The relationship between these distributions, GT, BT, normal, Cauchy, Laplacian, and t, is compactly represented in Figure 2. The robustness of the estimator based on a GT distribution can be discussed by investigating its ψ function. This ψ function, corresponding to the objective function F(u,σ,p,q) ) -log fGT(u;,σ,p,q) from eq 14, is given by

ψGT(u;σ,p,q) )

(pq + 1) sign(u)|u|p-1 qσp + |u|p

(17)

For finite q, this influence function is bounded and reaches a maximum for positive u at u* ) [(p - 1)qσp]1/p and has a maximum value of

ψ(u*;σ,p,q) )

(p + 1/p)(p - 1)[(p-1)/p] σq1/p

(18)

Furthermore, limuf∞ ψ(u;σ,p,q) ) 0, so that this influence function exhibits a descending pattern. Consequently, “large” deviation will not have an impact on this estimator when q is finite. Also, for a given finite q, p controls the behavior of ψ(u;σ,p,q) near the origin. For example, if p > 2, then this influence function will be less steeply sloped near the origin than the influence function for the t distribution with 2q degrees of freedom. Figure 3 illustrates these patterns for several parameter combinations (σ,p,q), and we can see that the robustness requirements are always satisfied if the GTbased estimator is used. 3.2. Data Reconciliation Based on GT Distribution. After it is assumed that the measurement errors belong to the GT distribution family, the data reconciliation can be carried out by constructing a partially

Figure 3. Influence function of estimator based on GT distribution: (a) σ ) 0.5 (dotted line, p ) 2, q ) 10; solid line, p ) 4, q ) 10); (b) σ ) 1.0 (dotted line, p ) 2, q ) 10; solid line, p ) 4, q ) 10); (c) σ ) 0.5 (dotted line, σ ) 1.0; solid line, p ) 4, q ) 5); (d) σ ) 1.0 (dotted line, p ) 4, q ) 10; solid line, p ) 4, q ) 10).

adaptive estimator through the following optimization n

min x

- ln fGT (yi - xi;φi) ∑ i)1 i

s.t. constraints

(19)

where fGTi(ui,φi) is the GT distribution discussed in section 3.1 and φi ) (σ,p,q)′ is the vector of distributional parameters; σ is the measurement standard deviation (it will be assumed to be known), and only p and q are obtained via the estimation from the data to fit the real error distribution shape. The optimality of the process state estimation will depend on the estimate of the distributional parameter vector φi. Indeed, the motivation of this partially adaptive estimator is to allow the M-estimation problem to depend on the shape of the distribution of the measurement error, and here this dependence is allowed by the choice of φi. In considering estimates of the distributional parameters, the first attention will be restricted to the values of p and q, whereas σ is given as the measurement standard deviation. The reason for such choice of σ is that in conventional data reconciliation, the standard deviation σ is used as a weighting parameter in the optimization. In the GT distribution family, σ has the same role as the standard deviation R in normal distributions, with a proportional scaled relationship of σ ) x2R (Figure 2). Therefore, it is reasonable to take σ as the given standard deviation in the optimization. For p and q, we first restrict their values as 1 e p < Kp, 1/2 e q < Kq, where Kp and Kq are constants that are bounded. Although this restriction excludes some special cases, such as the normal distribution, we think that this exclusion is not important. We expect that in cases where the disturbance is actually generated by one of the excluded special cases (e.g., normal distribution), constraining the parameter to be slightly different from that of the special case values will have little effect on the asymptotic efficiency of the estimator. On the other hand, constraining q to be bound will make the

3080

Ind. Eng. Chem. Res., Vol. 42, No. 13, 2003

corresponding data reconciliation procedure less sensitive to very thick-tailed distributions (by restricting the corresponding influence function to be bounded). Otherwise, if q is unbounded (or very large), the estimator will be sensitive to large error because the influence function is unbounded (this can be illustrated by plotting the influence function giving very large value, e.g., q ) 1000; it will approach the IF of the case of normal distribution). Given sufficient historical data, the distributional parameter φi can be estimated by likelihood methods. (Note that using historical data to estimate the distribution is a Bayesian-based approach.) There are several different approaches one could take to estimate the distributional parameters. We will consider estimators based on the residuals (ui ) yi - xˆ i) obtained from preliminary estimation of the process state xˆ . The motivation for this approach is that the properties of the residuals should be descriptive of the properties of the error distribution. More precisely, it is known13 that the large-sample distribution of the process state depends only on the limit of the distributional parameters and not on the particular way in which they are estimated, so that there is no loss of asymptotic efficiency in using preliminary estimates of the distributional parameters in the final estimation of the process state. Furthermore, joint estimation of the process state and the distributional parameters is computationally more complex than estimation of the distributional parameters from the residuals. Because there is no gain in asymptotic efficiency from joint estimation and because there may be computational savings, there seems little to be lost by restricting attention to estimates based on residuals. Of course, it is also possible to construct joint estimators of the process states and the distributional parameters. These estimators will have large-sample properties similar to those of estimators based on residuals, although for brevity we will not explicitly consider this approach here. One way to use the residuals to estimate the distributional parameters is to maximize the GT likelihood function evaluated at the initial estimates of the process states. The motivation for such a procedure is that it should result in consistent estimates of the true values of the distributional parameters when the distribution is a member of the GT family. We will denote this estimate of the distributional parameters as φˆ i. Formally, φˆ i will be taken to be the solution to n

max

∑ log fGT(uˆ i;φ)

(20)

φ∈R i)1

where R is the set of feasible values for the distributional parameters, R ) {(σ,p,q)|1 e p < Kp, 1/2 e q < Kq, σ ) σ0, and uˆ i ) yi - xˆ i is the residual from the preliminary estimate of the process state xi. 4. Data Reconciliation Based on a Generalized Objective Function When the error distribution is not within the GT family, still the GT-based estimator can be used, but it is quite possible that its asymptotic efficiency will be affected. The reason is that in such cases there is no direct link between the GT, along with its corresponding distributional parameters, and the asymptotic efficiency of the resulting M-estimator. In such cases, a fully

adaptive estimator of data reconciliation can be constructed by estimating the actual distribution of the measurement error from data via nonparametric methods. This approach is preferred to other robust or partially adaptive procedures according to the asymptotic efficiency criteria (MLE). However, such an estimator will inevitably lead to increased complexity and computation load. 4.1. Fully Adaptive Data Reconciliation Based on a Generalized Objective Function. Recalling the data reconciliation problem in eq 2, a generalized objective function is used in the optimization for data reconciliation n

min x

F(u) ∑ i)1

(21)

where F(u) ) -log f(u), u ) y - xˆ is the error, and f is the unknown density. In comparison to previous approaches, where the error probability density function f is constructed in advance, considering the ideal distribution assumption as well as the deviation from the ideality, here f is estimated from the real residual data. Thus, the objective function will accommodate the real error distribution (symmetric or asymmetric, longtailed, or short-tailed) and the estimator will be optimal in the MLE sense. In general, this approach will result in an iterative estimation procedure with an increase in the computational load. To speed up convergence and improve the performance, one of the previous approaches can be employed in a preliminary step during the DR procedure. The basic steps of the proposed robust data reconciliation procedures are outlined as follows: Step 1. Find a reasonable robust preliminary estimator that is not greatly influenced by the outlying observations and take the estimation results xˆ i as the initial values. Step 2. Calculate the residuals ui ) y - xˆ i from this preliminary fit. On the basis of these residuals and using the PDF estimation technique, determine the objective function, which is optimal for these residuals. Step 3. Using the objective function (constructed in step 2) with the preliminary estimates from step 1 as starting values, perform one-step iteration toward the M-estimation of process state xˆ i+1. Step 4. If the residuals ui+1 ) xˆ i - xˆ i+1 converge or satisfy the prespecified tolerance, the procedure is terminated and xˆ i+1 is taken as the result. Otherwise, take ui+1 as ui and go to step 2. Figure 4 gives the flowchart of the proposed robust data reconciliation based on a generalized objective function. In step 1 several robust estimators can be chosen as candidates to ensure the robustness of the approach (such as the GT approach). Step 1 allows for any abnormal value to be eliminated first, to some extent, and a set of initial rectified values be obtained. The rectified values will then be used as pseudo measurements and will be improved further. The rest of the procedures are aimed to increase efficiency by recursively estimating the PDF until an appropriate fitting is obtained. Once the preliminary residuals are calculated, a probability density estimation technique is needed to construct an objective function, which is optimal according to the maximum likelihood principle. (Step 2 serves for this purpose.) Nonparametric methods such as Kernel, Wavelet, and elliptical basis function

Ind. Eng. Chem. Res., Vol. 42, No. 13, 2003 3081

Figure 5. Heat-exchange network.

Figure 4. Flowchart of iterative robust DR based on a generalized objective function.

(EBF) based estimations can be employed.10,14,15 In this work, the Kernel estimator is used14 and will be briefly described in the next subsection. Given the starting values (pseudo measurements from step 1), step 3 will follow, based on the objective function constructed in step 2 and a new estimation is obtained. In step 4, these new values are compared with the values from the last estimation to check convergence. A tolerance criterion (e.g., 10-3) is set up to determine convergence. 4.2. Kernel Density Estimation. The density function of any random quantity u gives a natural description of the distribution of a data set x to be found as follows:14

P{a < u < b} )

∫a f(u) du b

∀a