An Iterative Two-Level Optimization Method for the Modeling of Wiener

Jan 2, 2014 - ABSTRACT: Most data-driven soft sensors assume that the processes operate in a steady state, which may be improper because...
0 downloads 0 Views 337KB Size
Article pubs.acs.org/IECR

An Iterative Two-Level Optimization Method for the Modeling of Wiener Structure Nonlinear Dynamic Soft Sensors Xinqing Gao,†,‡ Fan Yang,†,‡ Dexian Huang,†,‡ and Yanjun Ding*,§ †

Department of Automation, Tsinghua University, Beijing 100084, China Tsinghua National Laboratory for Information Science and Technology, Beijing 100084, China § Department of Thermal Engineering, Tsinghua University, Beijing 100084, China ‡

ABSTRACT: Most data-driven soft sensors assume that the processes operate in a steady state, which may be improper because of the essential dynamics in the process industries. Because of commonly existing irregular quality samples, establishing dynamic soft sensor models is a difficult task. To cope with this problem, a nonlinear dynamic soft sensor model with a Wiener structure is proposed in this paper. Such a structure consists of two parts: (i) finite impulse responses of first-order transfer functions with dead time are introduced to approximate the dynamic properties and (ii) a nonlinear network is utilized to describe the nonlinearity. An iterative two-level optimization method is applied to establish this dynamic soft sensor model. The computational cost is reduced, and convergence can be guaranteed. The proposed dynamic soft sensor approach is validated through simulation and industrial case studies.

1. INTRODUCTION Some critical quality variables, such as concentration and density of products, must be monitored in real time to improve product quality and control efficiency in the process industries. However, these quality variables are usually difficult to measure online because of economic or technical reasons; instead, they are usually obtained via laboratory analysis.1 As a solution to this problem, soft sensors can estimate these quality variables based on other easily measurable auxiliary variables. In general, soft sensors can be classified into model-driven soft sensors and datadriven soft sensors.1 Mechanisms of the processes should be learned as a priori knowledge to establish model-driven soft sensors, yet it may be arduous because it is not easy to fully understand the mechanisms because of the complexity of the target models in some cases. Compared with model-driven soft sensors, data-driven soft sensors are less concerned about the mechanisms and thus have drawn much attention from researchers in recent decades. Multivariate statistical analysis and neural networks are applied widely in this field. The most common data-driven techniques, including principal components analysis (PCA),2−4 partial least-squares (PLS),5−7 artificial neural networks (ANN),8,9 and support vector machines (SVM),10−12 have gained more popularity in recent years. A practical issue in the process industries is that the processes usually embody evident dynamics for several reasons, such as material disturbances, switching of the operating points, and time-delay properties. In most studies, the static soft sensors ignore the dynamics and assume that all data are sampled in steady states, which means the current output (quality variable) is related only with the current inputs (process variables). Only a minor part of process variables are utilized, while relevant past data, which may contain lots of dynamic information, are neglected, as shown in Figure 1. However, this approach may be improper in some cases. Take model predictive control (MPC), for example. Because of the unpredictable laboratory analysis time, the sampling intervals of the quality variables to be © 2014 American Chemical Society

Figure 1. Irregular quality samples in the process industries.

controlled are usually large and irregular compared with the online measured process variables;13 thus, it is hard to control these variables directly. Sometimes estimates of these quality variables, which are obtained via soft sensors, are alternatively selected as controlled variables (CV). For static soft sensors, the changes of relevant input variables, which may be manipulated variables (MV) in MPC as well, would affect the outputs immediately. This is in contradiction with actual processes because it usually takes a long time to recover to steady states after changes of relevant MV, and these sparse quality samples are actually related with relevant past process data because of the dynamics. Therefore, static soft sensors may not be suitable in MPC. Unfortunately, exact dynamic models, such as transfer function models or state space models, are difficult to obtain via system identification methods because of extremely sparse and irregular quality samples.14 Other model-based methods, such as output calibration techniques based on state estimator,15 Received: Revised: Accepted: Published: 1172

July 2, 2013 December 2, 2013 January 2, 2014 January 2, 2014 dx.doi.org/10.1021/ie4020793 | Ind. Eng. Chem. Res. 2014, 53, 1172−1178

Industrial & Engineering Chemistry Research

Article

2. WIENER NONLINEAR DYNAMIC SOFT SENSOR MODEL STRUCTURE 2.1. Structure of Wiener Nonlinear Dynamic Soft Sensor Model. To improve the performance of static soft sensors, a Wiener structure nonlinear dynamic soft sensor model is taken into account, as shown in Figure 2.

can reflect dynamic properties of the processes and improve the performance of soft sensors, yet mechanisms of the target models should be learned as a priori knowledge, which is impractical in some cases. The main feature of dynamic processes is that the quality variables are influenced by relevant past data, and therefore a better performance may be achieved if these useful past data are utilized. Up to now, some methods of incorporating past data have been developed to establish dynamic soft sensors. A straightforward approach is directly adopting relevant past process and quality variables as the inputs of dynamic soft sensors.16 It should be pointed out that how to select relevant past data is difficult, and improperly selected data will deteriorate the performance of soft sensors; furthermore, overfitting phenomenon may arise in some cases. Dynamic artificial neural networks adopt weighted process variables as inputs, and an enhanced particle swarm optimization algorithm is applied to train this dynamic network.15 Although this method is effective in some cases, it does not take input delay into account; in addition, the computational burden is usually high, which will be discussed later in the case studies. To avoid overly complex models, finite impulse responses (FIR) are applied to establish dynamic soft sensors; Lu et al. established dynamic soft sensors by linear combinations of impulse response templates, and the modified differential evolution algorithm (MDE) was applied to estimate related parameters.17 Nonetheless, this method is a linear approach and not suitable for highly nonlinear processes. To cope with the modeling issues of nonlinear dynamic systems, a Wiener nonlinear dynamic model is proposed in this paper. The whole model consists of two parts. In the first part, finite impulse responses (FIRs) of first-order plus dead time transfer functions (FOPDT) are applied to approximate the dynamic property. Because of the extremely sparse and irregular quality samples, it is not easy to identify exact dynamic models such as transfer function models or state space models. FIR is utilized as an alternative because it is suitable to capture dynamic properties in such cases. The dynamic mechanism usually remains unknown and the FOPDT model is a reasonable approximation; this approach is rather common in industrial applications. For these reasons, the FIRs of FOPDT models are utilized as the first part. In the second part, nonlinear networks are used to reflect nonlinear properties. Nonlinear properties are rather common in the process industries, and linear modeling methods may not work well in processes with highly nonlinear properties, especially in the cases where operating points have big changes. Therefore, nonlinear networks are utilized to deal with the nonlinearity of the modeling processes. In this paper, SVM is adopted because it is very suitable in small sample cases. The modeling of this dynamic soft sensor can be regarded as an optimization problem. Because of the existence of the nonlinear part, solving this problem is a difficult task; therefore an iterative two-level optimization method is applied to reduce the computational burden. The remainder of this article is organized as follows. In Section 2, the structure of the Wiener nonlinear dynamic model is introduced. In Section 3, an iterative two-level optimization method is proposed to estimate relevant parameters. In Sections 4 and 5, both simulated and industrial cases are studied to validate the proposed soft sensors, followed by concluding remarks in Section 6.

Figure 2. Wiener structure nonlinear dynamic soft sensor model.

The Wiener structure is a universal approach to cope with the modeling issues of dynamic nonlinear systems. FIRs of FOPDT models, which are suitable in sparse and irregular quality samples, are utilized to capture the dynamic property of the modeling processes. Nonlinear networks are applied to reflect the nonlinear properties. Our method enjoys the following highlights. First, it has relatively simple structure compared with other dynamic soft sensors. The dynamic auxiliary variables x(i) (i = 1, ..., l) are employed to establish the nonlinear network instead of process variables u(qi); therefore the complexity of the nonlinear network is not increased compared with static models. Furthermore, compared with ANN-based dynamic soft sensors, SVM is more suitable in small cases. Second, our method can reasonably reflect the actual dynamic properties. Even though the actual dynamic property may be complex, it can be wellapproximated by FOPDT models in many cases, and this treatment is quite common in industrial applications. Third, FOPDT models have low-pass characteristics, which means our method can restrain high-frequency noise in input channels. 2.2. Linear Dynamic Part of the Dynamic Soft Sensor Model. In this study, only multi-input-single-output (MISO) soft sensors are considered. For static soft sensors, only a minor part of process variables are used and a large amount of past data is ignored; alternatively, the Wiener structure dynamic soft sensors utilize impulse responses of FOPDT models to incorporate relevant past data, and therefore better performance can be achieved. FOPDT models have the form G(s) = [K/(Ts + 1)]e−τs. When the sampling interval is Ts, we can obtain the impulse response (IR) g(i): ⎧0 0D ⎪ α ⎩T

(1)

where D = τ/Ts and α = exp(−Ts/T). For simplicity, we assume that D is an integer. When input signal u(i) (i = 1, 2, ...) is passing through G(s), the outputs o(i) can be obtained via convolution between the IR and the corresponding input signals: i

o(i) =

∑ u(j)g(i + 1 − j)Ts j=1

⎧0 ⎪ ⎪ = ⎨ KTs ⎪ ⎪ T ⎩ 1173

0 D (2)

dx.doi.org/10.1021/ie4020793 | Ind. Eng. Chem. Res. 2014, 53, 1172−1178

Industrial & Engineering Chemistry Research

Article

The ith quality variable is sampled at the time of qiTs, and the dynamic auxiliary variables defined as x(i) = o(qi) are exactly what we are concerned with because variables x(i) are utilized to establish the nonlinear network, as shown in Figure 2. These dynamic auxiliary variables can be formulated as

l

min

ω , b , ξi , ξ*i

s.t. ωT Φ(x i) + b − yi ≤ ε + ξi yi − ωT Φ(x i) − b ≤ ε + ξ*i

x(i) = o(qi) ⎧0 ⎪ ⎪ = ⎨ KTs ⎪ ⎪ T ⎩

ξi ≥ 0, ξ*i ≥ 0, i = 1, 2, ···, l

0 < qi ≤ D qi − D − 1



3. ITERATIVE TWO-LEVEL OPTIMIZATION METHOD FOR NONLINEAR DYNAMIC SOFT SENSOR MODELING The Wiener structure soft sensor model can be obtained via the following optimization.

(3)

It is noted that if qi is too large, the computational time of eq 3 would be huge. However, the settling time Tl of the transfer function G(s) is limited, and only the first L items of IR are considered, where L = round(Tl/Ts) ≈ round(4T/Ts). To reduce the computational cost, finite impulse responses (FIRs) are applied as an alternative. With this simplification, we have x(i) =

(5)

where ε is the precision threshold and ξi and ξ*i are slack variables; C is the penalty factor.

u(qi − D − j)α j qi > D

j=0

l

1 T ω ω + C ∑ ξi+C ∑ ξ*i 2 i=1 i=1

l

min

ω , b , ξi , ξi*, αj , Dj

l

1 T ω ω + C ∑ ξi+C ∑ ξi* 2 i=1 i=1

s.t. xj(i) = gh(αj)T Uj(qi , Dj)

KTs ·gh(α)T U (qi , D) T

x(i) = [x1(i), x 2(i), ···, xm(i)]T

where

ωT Φ(x(i)) + b − y(i) ≤ ε + ξi y(i) − ωT Φ(x(i)) − b ≤ ε + ξi* ξi ≥ 0, ξi* ≥ 0, i = 1, 2, ···, l 0 < αj ≤ 1, j = 1, 2, ···, m Dj ≥ 0

For the purpose of further reducing the computational burden, the gain of the FIR is ignored and thus x(i) = gh(α)TU(qi,D). This treatment does not affect the accuracy of the soft sensor because the gain of the FIR can be reflected by SVM; yet the computational burden is lower because the number of the optimization parameters is reduced. Particularly, when both the time constant and delay of the FOPDT model take the value of 0, i.e., α = 0, D = 0, the dynamic model is degenerated into a static model for x(i) = gh(0)TU(qi,0) = u(qi). 2.3. Nonlinear Part of the Dynamic Soft Sensor Model. SVM is applied to reflect the nonlinear properties, which serves as the second part of the dynamic soft sensor model. In this study, a ε-support vector regression (ε-SVR) is utilized.18 A brief review of ε-SVR is as follows. The data set {(x1,y1),(x2,y2),...,(xl,yl)} is utilized to establish the SVM model where x and y denote the inputs and output, respectively. The original input space is mapped to feature space via nonlinear transformation φi = Φ(xi). The inner product of two vectors φi,φj in the feature space is defined as: φTi φj = k(xi,xj), where k(xi,xj) is the Gaussian kernel function k(xi,xj) = exp((−||xi − xj ||22)/σ2) with a predetermined parameter σ. Via this nonlinear transformation, the original nonlinear regression problem is converted into a linear regression problem in the feature space. The regression function is expressed as ωTΦ(x(i)) + b, where ω and b denote regression coefficients and bias, respectively. The structure of ε-SVR can be determined by solving the constrained minimization problem below:

(6)

Here, we define:

In this optimization problem, m is the number of input variables, l the sampling number of the quality variables, L the predetermined length of the FIRs, qi the sampling time of the ith quality variables, αj the dynamic parameter of the jth FIR, and Dj the delay of the jth FIR. Compared with the original support vector regression problem, nonlinear equality constraints are added, destroying the convexity. This means that the local minimum point may not be the global minimum point. In addition, the number of the optimization variables and constraints are huge. For these reasons, commonly used numerical solutions are not applicable; hence, intelligent optimization algorithms may be relatively better choices. However, we should note that the support vector regression would be trained many times and the computational burden is high if intelligent optimization algorithms are applied directly. The optimization variables are classified into two groups: the parameters of support vector regression (ω,b,ξi,ξ*i ) and the FIR models (αj, Dj), which implies that the original optimization problem can be divided into two suboptimization problems. Therefore, an iterative two-level optimization method is applied. 1174

dx.doi.org/10.1021/ie4020793 | Ind. Eng. Chem. Res. 2014, 53, 1172−1178

Industrial & Engineering Chemistry Research

Article

1 e−90su1(s), 400s + 1 1 x 2(s) = u 2(s), 200s + 1 1 x3(s) = u3(s), (300s + 1)(100s + 1)

When the parameters of the FIR models have been determined, the original optimization problem degenerates into support vector regression because the dynamic auxiliary variables x can be calculated directly. The first suboptimization problem is formulated below. l

min

ω , b , ξi , ξi*

x1(s) =

l

1 T ω ω + C ∑ ξi+C ∑ ξi* 2 i=1 i=1

y(t ) = log(x1(t ) + x 2(t ) + x3(t ) + 3)

s.t. ωT Φ(x(i)) + b − y(i) ≤ ε + ξi

The sampling interval of the input signals is 30 s and a total of 12,000 continuous input samples are extracted. To simulate sudden changes of actual processes, the inputs are generated from three independent normal distributions with common mean and variance, but different mean values are adopted in 3 periods: the initial mean value of the three distributions is 0; at the 3000th and 6000th input sampling time, it changes to be 1 and 0.5, respectively. In contrast with uniformly sampled inputs, the outputs are sampled with irregular intervals ranging from 1200 to 2400 s, and the sampling number is 200. White noise is added to the output signals. The first half of the outputs is used as the training set and the rest as the test set, as shown in Figure 3.

y(i) − ωT Φ(x(i)) − b ≤ ε + ξi* x(i) = [x1(i), x 2(i), ···, xm(i)]T ξi ≥ 0, ξi* ≥ 0, i = 1, 2, ···, l

(8)

The optimization problem is actually an SVM training problem. Existing methods can be applied to solve this problem, and the structure of the support vector regression can be determined easily. After the structure of the support vector regression has been determined, only the parameters of the FIR models need to be adjusted to improve the regression accuracy. For this purpose, the second suboptimization problem is formulated below. l

min ∑ [ωT Φ(x(i)) + b − y(i)]2 αj , Dj

i=1

s.t.. xj(i) = gh(αj)T Uj(qi , Dj) x(i) = [x1(i), x 2(i), ···, xm(i)]T 0 < αj ≤ 1, j = 1, 2, ···, m Dj ≥ 0

(9)

The optimization variables are just related with the parameters of the FIR models. Intelligent optimization algorithms, such as genetic algorithm (GA) or particle swarm optimization (PSO), can be applied to solve this optimization problem. The parameters of the support vector regression have been determined; therefore, the computational time would be reduced significantly because this suboptimization problem does not involve training of the SVM. The procedure of the iterative two-level optimization methods is as follows: Step 1: Initialization. Select the initial variables of the FIR models via prior knowledge and train the SVM. Step 2: Keep the parameters of the support vector regression unchanged and optimize the parameters of the FIR models. Step 3: Update the parameters of the SVM. Step 4: Test the accuracy of the dynamic soft sensor. Go back to Step 2 if the accuracy is insufficient. Step 5: Complete the modeling of this nonlinear dynamic soft sensor. The computational cost is much lower because the training time of the support vector regression is reduced significantly. After the optimization of each subproblem, the solution would be improved, thus guaranteeing the convergence. Usually, after dozens of cycles, the solution will converge to a satisfactory value.

Figure 3. Sampled output variables.

The following methods are studied for comparison: (1) Our method; (2) Static SVM; (3) Impulse response template (IRT) proposed by Lu et al.17 (4) Dynamic ANN via directly including past inputs (DANN) proposed by Du et al.16 (5) Dynamic ANN via weighted past inputs (WANN) proposed by Wu et al.15 By using root square mean error (RMSE) as error criterion, the results are listed in Table 1. It is noted that the dynamic property and time delay are quite obvious in this simulation case; thus, the static model does not work well. The dynamic soft sensors generally outperform the static SVM. Compared with static SVM, the dynamic soft sensor Table 1. RMSE of Different Methods

4. SIMULATION CASE STUDY The following nonlinear system is simulated with three inputs passing through dynamic elements. 1175

RMSE

our method

static SVM

IRT

DANN

WANN

training error test error

0.0321 0.0352

0.271 0.284

0.139 0.180

0.0396 0.1229

0.0359 0.0457

dx.doi.org/10.1021/ie4020793 | Ind. Eng. Chem. Res. 2014, 53, 1172−1178

Industrial & Engineering Chemistry Research

Article

Figure 4. Estimated and actual impulse responses of the first two transfer functions.

incorporates dynamic information from relevant past data; thus, the dynamic property is reflected. Among all the dynamic soft sensors, the performance of our method is the best. IRT is a linear approach and not very suitable in this case. For DANN, the first 10 recent past inputs of each channel are adopted; therefore, the input number of the ANN is 30. However, the generalization ability is unsatisfactory. As discussed in Section 1, the selection of relevant past data has great influence on the performance. Improperly selected past data would deteriorate the performance. Furthermore, the structure of DANN is usually rather complicated because more inputs are adopted; the DANN can minimize the training error, yet the model may not work well on the test set, leading to an unsatisfactory generalization ability. WANN does not take time delay into account, and this is the main reason why the performance is slightly worse than our method. Furthermore, the training time is rather long, which will be discussed later. Applying the finite impulse responses of FOPDT models is the key to our method, and the estimates of relevant dynamic parameters have great influence on the performance. In the experiment, the dynamic property of the system is simulated by three transfer functions. Because of the existence of noise, the estimates of the first two transfer functions deviate slightly from actual systems yet can still be viewed as reasonable approximations, as shown in Figure 4 and Table 2. The third

Figure 5. First-order approximation of the third transfer function.

Table 2. Estimated and Actual Value of Relevant Dynamic Parameters time constant/delay

estimated value

actual value

input 1 input 2 input 3

429/90 219/0 460/30

400/90 200/0

Figure 6. Training error of each iteration.

Table 3. Training Time of Different Methods

transfer function has a second-order dynamic characteristic, and it could be simplified as a FOPDT model. In Figure 5, we can see that this first-order approximation is satisfactory. In the simulation case, the Matlab Optimization Toolbox is applied to solve the second subproblem and GA is selected as the solver. The computational cost is acceptable. The number of iterations is usually no more than 10. A typical training curve is shown in Figure 6. The simulation experiment was conducted on a PC with Intel Core Duo CPU 2.66 GHz, 2GB RAM using Matlab 2012a. The training time of these methods is listed in Table 3.

time

our method

static SVM

IRT

DANN

WANN

100−150s