Support Vector Regression Approach for Simultaneous Data

Oct 27, 2009 - This paper applies the support vector (SV) regression approach as a framework for simultaneous data reconciliation and gross error or o...
0 downloads 0 Views 246KB Size
Ind. Eng. Chem. Res. 2009, 48, 10903–10911

10903

Support Vector Regression Approach for Simultaneous Data Reconciliation and Gross Error or Outlier Detection Yu Miao,† Hongye Su,*,† Ouguan Xu,‡ and Jian Chu† National Key Laboratory of Industrial Control Technology, Institute of Cyber-Systems and Control, Department of Control Science and Engineering, Yuquan Campus, Zhejiang UniVersity, Hangzhou, 310027, P. R. China, and Zhijiang College, Zhejiang UniVersity of Technology, Hangzhou, 310024, P. R. China

Process data measurements are important for model fitting, process monitoring, control, optimization, and management decision making, and they are usually applied with parameter estimation. This paper applies the support vector (SV) regression approach as a framework for simultaneous data reconciliation and gross error or outlier detection in processes. SV regression minimizes regularized risk instead of maximum likelihood, and it is a compromise between empirical risk and model complexity. For data reconciliation, it is robust to random and gross errors or outliers, because loss functions are addressed as objective functions instead of least-squares. Furthermore, because of the robust nature of SV regressions, the coefficients of the SV regression itself have less effect on the reconciled results. Finally, a number of process and literature system simulation results show the features of the SV regression approach for data reconciliation proposed in this paper. 1. Introduction Process data measurements are important for model fitting, process monitoring, control, optimization, and management decision making. Unfortunately process data measurements usually contain two types of error, random and gross, which make the data not obey process constraints defined by the mass, energy balances, or other model constraints. Therefore, data reconciliation could be applied to obtain the estimates of process variables based on the measurement data. Data reconciliation techniques explicitly use the process model as the constraint and build up an estimator to obtain true values of process data. However, data reconciliation results are corrupted heavily by the presence of gross errors or outliers in measurements. Gross errors can be divided into two categories, measurement related such as malfunctioning sensors and process related such as process leaks, and a special type of gross error is referred to as systematic bias. An outlier is, by definition, a measurement in which the error does not follow the statistical distribution of the bulk of the data. Normally, outliers are a small fraction of the whole data and have little or no relation to the main structure. So, a sufficient data reconciliation approach should be able to reduce the effects of both gross errors and outliers. Many methods have been developed to perform data reconciliation and gross error or outlier detection, and one major type of method is based on statistical testing. The most widely used methods based on statistical tests are the global test (GT), the measurement test (MT), the nodal test (NT), the generalized likelihood ratio (GLR), and the principal component test (PCT).1-4 Several strategies based on statistical testing have been developed in identifying multiple gross errors, such as serial compensation, serial elimination, simultaneous or collective compensation, and the simultaneous estimation of gross errors (SEGE) method.3,5-7 Statistical test methods for data reconciliation and gross error detection usually use a leastsquares objective function and give the same weight to all the measurements, including gross errors, so the reconciliation needs * To whom correspondence should be addressed. Tel.: +86 0571 8795 1200. E-mail: [email protected]. † Zhejiang University. ‡ Zhejiang University of Technology.

to be applied again after removing gross errors each time, which makes the statistical test methods an iterative process. Some new methods have been applied to industrial processes, which range from statistical test methods to robust statistics methods, from sequential or combinatorial methods to simultaneous data reconciliation and gross error or outlier detection methods. Robust estimators have been introduced to be the objective function of simultaneous data reconciliation problems based on robust statistics and their ability to reject outliers.8,9 Albuquerque and Biegler used a robust M-estimator, the Fair function, to reduce the effects of outliers, and they used boxplots to identify outliers from the residuals of regression.10 Wang and Romagnoli proposed a framework for robust data reconciliation based on a generalized objective function to eliminate outliers from process data, which takes the advantage of a generalized T (GT) distribution because it has flexibility to accommodate various distributional shapes, and they also designed an approach to tune the parameters of the GT distribution by using sufficient process data.11 As parameter estimation is usually the step after data reconciliation in which the reconciled values of the process variables are used to set values for the model parameters,12 simultaneous data reconciliation and parameter estimation (DRPE)13 has been developed. In this case, a set of measurements are obtained and gross errors could be seen as parameters of the process and could be estimated using the set of the measurements. Though robust approaches are specifically focused on dealing with the presence of outliers in the measurement sets, Arora and Biegler used redescending estimators as objective functions to reduce the effects of gross errors and designed a search procedure based on the Akaike information criterion (AIC) to determine coefficients of the redescending estimators in DRPE problems.13,14 Yamamura et al. first introduced the AIC into the problem of data reconciliation and gross error detection; they applied the AIC to identify biased measurements in a least-squares framework for gross error detection.15 There also exists some related work, such as a mixed integer linear program (MILP) approach described by Soderstrom et al., which was similar to the AIC in the form.16 Arora and Biegler argued that the mixed integer nonlinear program (MINLP) approach should be a direct minimizer of the AIC,

10.1021/ie801629f CCC: $40.75  2009 American Chemical Society Published on Web 10/27/2009

10904

Ind. Eng. Chem. Res., Vol. 48, No. 24, 2009

and they considered the AIC as a common statistical framework in which both statistical tests and robust methods could be interpreted. However, in the case of large problems, the penalty in the AIC is not adequate to compensate for the bias in the maximum likelihood estimates of the parameters of a model, and it is not robust either. Though the MINLP approach is the direct minimizer of the AIC, it is subject to overfitting for data reconciliation and the gross error detection problem; consequently, more gross errors are detected by mistake. Meanwhile, the quadratic object function of the MINLP approach has an unbounded influence function so that outliers will have more impact on it. Therefore both the MINLP approach and the robust estimators based on the AIC are relatively conservative, and a more effective criterion is needed. The support vector (SV) algorithm is a nonlinear generalization of the generalized portrait algorithm developed in Russia in the 1960s. As such, it is firmly grounded in the framework of statistical learning theory, or VC theory, which has been developed over the last three decades by Vapnik, Chervonenkis, and others. According to statistical learning theory, minimizing empirical risk, which will lead to overfitting and thus bad generalization properties, is replaced by minimizing regularized risk with adding a capacity control term to objective function.17 A basic SV regression algorithm named ε-SVM has been developed, which does not penalize errors below some ε (>0) chosen a priori. The parameter ε controls the sparseness of the solution. However, the priori information is usually unknown, then an improved algorithm named υ-SVM was introduced, which will figure out ε automatically, if the degree of sparseness is specified by υ.18 Since simultaneous data reconciliation and gross error or outlier detection can be addressed as model identification and parameter estimation problems, the SV regression can be introduced. Therefore, simultaneous data reconciliation and gross error or outlier detection can be included in the framework of statistical learning theory, or VC theory, which is considered to be a common framework of simultaneous data reconciliation and gross error or outlier detection problem. In this paper, a novel data reconciliation model is formed according to SV regression, which is similar to the υ-SVM in the form. This approach can be applied in both cases when there are several sets of measurements and when there is only one set of measurements. To comprehensively study the gross error or outlier detection performances of this approach in these two cases, three systems are used for simulations and comparison. The simulation results show that the SV regression approach is quite effective, accurate, and robust for detecting gross errors and eliminating outliers.

min(xM, x) x,u,p

(x, u, p) ) 0 (x, u, p) e 0 L U s.t. x e x e x uL e u e u U pL e p e p U

where F is the objective function, xM is the vector of measurement data of the corresponding variables x, u is the vector of unmeasured variables, p is the set of parameters, h is the set of equality constraints or model constraints, g is the set of inequality constraints, xL, uL, and pL are the lower bounds of x, u, and p, respectively, and xU, uU, and pU are the upper bounds of x, u, and p, respectively. If there exist several sets of measurements and all measurements change within each data set, it can be considered as a DRPE problem, which could be written as n

min

xi,ui,p

2.1. Mixed Integer Program Approaches. The purpose of data reconciliation is to find the estimates of process variables and parameters based on the measurement data under process constraints defined by the mass and energy balances or other model constraints. The data reconciliation can be addressed as an optimization problem. Assuming that the unmeasured variables are observable, the general form of data reconciliation can be described as follows:

∑ (x

M i , x i)

i)1

(xi, ui, p) ) 0 (xi, ui, p) e 0

(2)

L U s.t. x e xi e x pL e p e p U uL e ui e uU

where the subscript i refers to the ith measurement vector and the rest of the symbols mean the same as in the formulation 1. For the formulations above, data reconciliation and gross error or outlier detection can be addressed as a model identification and parameter estimation problem. In formulation 1, either gross errors or outliers could be seen as a part of estimated parameters, whereas in formulation 2, only gross errors can be considered as parameters so as to be eliminated. If more than one model could be fitted to the measurements, it is necessary to identify the best model and its parameters by a suitable model evaluation criteria. The AIC has been used for this purpose, which takes the form of a penalized likelihood, and the model with minimum AIC value is chosen as the best model to be used. On the basis of the assumption that the random errors possess a normal distribution after removing the gross errors, Yamamura et al. first introduced the AIC into data reconciliation and gross error detection problems for a linear system. Through dividing the set of measurement values into sets with gross errors and without gross errors, a branch-and-bound strategy was proposed to solve the problem. The procedure of the branch-and-bound strategy proposed was translated by Arora and Biegler into a MINLP with binary variables identifying the variables with or without gross errors, and they stated the MINLP described as the following: n

2. Data Reconciliation with the AIC

(1)

min

xi,µi,yi,zi

∑ i)1

[

µi (xM i - xi) σi σi

]

2

n

+2

∑y

i

i)1

Ax ) 0 µi e Uiyi -µi e Uiyi µi - ziUi - ziLi + Liyi e 0 s.t. -µ + z U + z L + L y e L + U i i i i i i i i i zi e y i 0 e x i e Xi yi, zi ∈ {0, 1}

(3)

xM i

where n is the number of measured variables, is the measurement of the ith variable, σi is the standard deviation of the ith variable, µi is the magnitude of bias in the ith variable, yi is a binary variable denoting existence of bias in the ith variable, zi is a binary variable for the sign of bias value µi for the ith variable, A is the matrix for constraints, Li and Ui are the lower and upper bounds on bias in the ith variable, and Xi is the upper bound for the ith variable. Before that, Soderstrom et al. had introduced a MILP approach to minimize an objective function similar to the AIC. Though the MILP objective function is similar to the AIC, it does not minimize the AIC directly. Latter in this paper, it can be seen that the MILP approach is more robust and it is a special form of the SV regression approach proposed. 2.2. Robust Estimators. Measurement data are mostly assumed to follow normal distribution, so a least-squares estimator is usually used as the objective function, without taking into account gross errors or outliers which may be present. However, the distribution of data corrupted with gross errors and outliers is often difficult to determine. The influence of these gross errors or outliers on the least-squares estimator can be minimized by defining robust estimators. Robust estimators are insensitive to deviations from the normal distribution, and it can produce unbiased results in presence of data derived from the normal distribution. With giving less weight to outlying measurements, robust estimators prevent good measurements from being corrupted by gross errors or outliers. The influence function defined by ψ(u) ) ∂F(u)/∂u is the tool that is used for designing robust estimators. The influence function measures the “influence” that a residual will have on the estimation process. Some suggested criteria for the influence function of a robust estimator are that it be (a) bounded, (b) continuous, and (c) identically zero outside an appropriate region. Several robust approaches were proposed, and the mostly used robust estimators are M-estimators, which include the Fair function, a Huber estimator, and the redescending estimator proposed by Hampel.13,19,20 Recently, Wang and Romagnoli proposed a framework for robust data reconciliation to eliminate outliers in the formulation (1) based on the GT distribution,11 which has flexibility to accommodate various distributional shapes. The GT distribution is F(u) )

p

(

2σq1/2B(1/p, q) 1 +

)

p q+1/p

|u| qσ p

,

{

Ind. Eng. Chem. Res., Vol. 48, No. 24, 2009

F(u) ) 1 2 u , 2 i a|ui | -

10905

0 e |ui | e b a2 , 2

a e |ui | e b

[ (

c - |ui | a a2 + (c - b) 1 2 2 c-b a a2 + (c - b) , ab 2 2

ab -

)] 2

, b < |ui | e c

(5)

|ui | > c

where ui is the ith residual of regression (xi - xM i ), and a, b, and c are the tuning constants for the redescending estimator. The redescending estimator should be tuned to system under investigation by setting the best possible values for a, b, and c. So a two-step procedure was proposed to identify the best estimator, which is based on a golden section search to find the parameters that generate the minimal AIC. 3. SV Regression Approach for Data Reconciliation From the previous forms, it can be seen that the AIC is considered as a common statistical framework in which both the MINLP approaches and the robust methods can be interpreted. However, in the case of large scale problem, especially the DRPE problem in formulation 2, the penalty in the AIC is not adequate to compensate for the bias in the maximum likelihood estimates of the model, and it is subject to overfitting for data reconciliation; meanwhile, it is not robust to outliers. So a more effective criterion is needed, and we consider statistical learning theory, or VC theory, as the framework of data reconciliation by means of SV regression. 3.1. ε-SVM Regression and υ-SVM Regression. Support vector machines bring out a breakthrough of learning algorithms, which are supported by results of the statistical learning theory. SV regression estimation seeks to estimate functions as f(x) ) (w · x) + b,

w, x ∈ RN,

b∈R

(6)

based on data (x1, y1), · · · , (xl, yl) ∈ RN × R

(7)

By minimizing the regularized risk functional

-∞