Soft Sensor Development for the Estimation of Benzene Content in

Jan 25, 2012 - A typical soft sensor design procedure is presented as follows:(3) ..... such as ARMA (autoregressive moving average) models, linear an...
1 downloads 0 Views 808KB Size
Article pubs.acs.org/IECR

Soft Sensor Development for the Estimation of Benzene Content in Catalytic Reformate Ž eljka Ujević Andrijić,*,† Tomislav Rolich,‡ and Nenad Bolf† †

Department of Measurements and Process Control, Faculty of Chemical Engineering and Technology, University of Zagreb, Savska c. 16/5A, 10000 Zagreb, Croatia ‡ Department of Fundamental Natural and Engineering Sciences, Faculty of Textile Technology, University of Zagreb, Prilaz baruna Filipovića 28a, 10000 Zagreb, Croatia ABSTRACT: As vehicle emission standards become more stringent, there is an increasing need for continual monitoring of benzene content in gasoline. Since on-line analyzers are often unavailable, and laboratory analyses are infrequently obtained, soft sensors for the estimation of benzene content of light reformate are developed. Soft sensors are developed using linear and nonlinear identification methods. Experimental data are acquired from the refinery distributed control system (DCS) and include continuously measured variables and analyzer assays available on-line. In the present work, the development of a finite impulse response (FIR) model, an output error (OE) model, and a Hammerstein−Wiener (HW) model is presented. To overcome the problem of selecting the best model parameters by trial and error, genetic algorithms and pattern (direct) search were used. On the basis of developed soft sensors, it is possible to entirely replace on-line analyzers with soft sensors by embedding the model in a DCS on-site.

1. INTRODUCTION The process industry is faced with ever-increasing product quality and environmental requirements. Hence, there is a need for continuous monitoring of large numbers of process variables and properties. In various process plants, there are variables which are either difficult to measure on-line (viscosity, odor, boiling point, color, etc.) or are only obtainable through infrequent laboratory measurements. Those properties are often of great importance for the process industry, because they may greatly influence the final product quality. On-line analyzers are often out of service or under maintenance. Laboratory analyses are infrequent and can be time-consuming. This problem can be solved by the application of soft sensors. The soft sensing technique, developed in recent years, has become a widely used solution. It utilizes variables measurable on-line (secondary variables) to predict the product quality variable (primary variable) through certain modeling approaches, such as mechanism modeling, statistical modeling, and artificial intelligence modeling. Due to the uncertainty and complexity of industrial processes, mechanical models are often unavailable. Therefore, data-driven empirical models are viable alternatives.1 As distributed control systems (DCSs) are installed in most chemical plants, many process variables can be measured and stored in real time.2 A database containing historical data enables engineers to build soft sensors with the goal of producing reliable, real-time estimates of unmeasured data. Some typical applications of soft sensors are backup procedures of measuring devices (during regular maintenance) or a permanent substitution of a hardware device, when one is out of service for a long time. A typical soft sensor design procedure is presented as follows:3 1. selection of historical data from the plant database © 2012 American Chemical Society

Figure 1. Block diagram of the FIR model.

Figure 2. Block diagram of the OE model.

Figure 3. Block diagram of the Hammerstein−Wiener model.

2. 3. 4. 5.

outlier detection, data filtering model structure and regressor selection model estimation model validation

Received: Revised: Accepted: Published: 3007

October 14, 2011 January 13, 2012 January 25, 2012 January 25, 2012 dx.doi.org/10.1021/ie202362d | Ind. Eng. Chem. Res. 2012, 51, 3007−3014

Industrial & Engineering Chemistry Research

Article

Figure 4. Fractionation reformate plant.

The application of soft sensors developed in this case study is a temporary substitution of measuring equipment, either during maintenance or during other periods of unavailability. In this work, the finite impulse response (FIR), output error (OE) and Hammerstein−Wiener (HW) models were considered. Those types of models do not require past samples of the measured output (variable inferred by the soft sensor) when using validation data.

Particular emphasis in this work is given to steps 3 and 4, as choosing the optimal model structure and regressors is crucial for soft sensor performance.4 Differently structured models can be used to model real systems. There is no unified theoretical approach for proper model selection. One possible approach is to start with a simple model structure and then gradually increase its complexity. In the field of industrial applications, the focus of attention is on parametric (polynomial) structures in both linear and nonlinear versions. To estimate polynomial models, the model order must be predetermined. Model order can be defined as a set of integers that represent the number of coefficients for each polynomial included in the selected model structure. Dead time―given by the number of samples before the output corresponds to the input―must also be specified. Depending on the model structure complexity, model order, dead time and an additional set of parameters may need to be adjusted in order to obtain the best performance of each model. Some of the classical model order selection criteria for linear dynamic polynomial models are Akaike’s information criterion (AIC)5,6 and the minimum description length (MDL) criterion.7 For nonlinear dynamic models the Lipschitz number6,8 criterion is mostly used for model order selection. In a case of multiple input models, predetermining the set of parameters can become a very complex task. A cumbersome trial-and-error procedure is therefore commonly applied. To overcome the problem of selecting the best model order, as well as the delays of each input and other configurable parameters, genetic algorithms (GAs) and pattern (direct (DS)) search were used. Also, GA and DS were used to determine the appropriate number of nonlinear units of the HW model.

2. SOFT SENSOR MODEL IDENTIFICATION Linear dynamic models are in many cases sufficient for real-life applications.9 One of the most-used linear parametric models is the FIR model: nu

y ̂( t ) =

∑ Bi(q) ui(t − nk i) (1)

i=1

y ̂(t ) = B1(q) u1(t − nk1) + B2(q) u2(t − nk2) + ... + Bnu(q) u nu(t − nk nu)

(2)

where q is the time-shift operator and B1(q) = b1,1 + b1,2q−1 + ... + b1,nb1q−nb1+ 1

(3)

B2(q) = b2,1 + b2,2q−1 + ... + b2,nb2q−nb2 + 1 ...

(4)

Bnu (q) = bnu,1 + bnu,2q−1 + ... + bnu,nbnuq−nbnu + 1 (5)

nu is the number of system inputs, ŷ(t) is the predictor (predicted output) at time t, ui(t) is the ith input at time t, nki is 3008

dx.doi.org/10.1021/ie202362d | Ind. Eng. Chem. Res. 2012, 51, 3007−3014

Industrial & Engineering Chemistry Research

Article

• x(t) = (B/F)w(t) is a linear transfer function, where B and F are polynomials of the linear output error model. • ŷ(t) = h(x(t)) is a nonlinear function transforming output data x(t) from a linear block. The nonlinear function in the HW model can be presented as a summed series of nonlinear units, such as tree-partition networks, wavelet networks, a multilayer neural network, and piecewise linear or sigmoid functions.9 After an appropriate model structure has been chosen, model orders of the FIR, OE, and HW models are determined by GA and DS. GA and DS are optimizing techniques with a high potential for finding the global optimum of the fitness (objective) function. GAs are implemented in a computer simulation in which a population of chromosomes of candidate solutions to an optimization problem evolves toward better solutions. The evolution starts from a population of randomly generated individuals. In each generation, the fitness of every individual is evaluated; multiple individuals are stochastically selected from the current population based on their fitness and modified (recombined and mutated) to form a new population. The new population is then used in the next iteration of the algorithm. The algorithm commonly terminates either when a maximum number of generations has been produced or when a satisfactory fitness level has been reached.13 In direct search, the algorithm creates at each step a set of points, called a mesh, around the current point―the point computed at the algorithm’s previous step. The mesh is formed by adding a scalar multiple of a set of vectors called a “pattern” to the current point. If the pattern search algorithm finds a point in the mesh that improves the objective function at the current point, the new point becomes the current point at the next step of the algorithm.

Table 1. Configurable Parameters of FIR, OE, and HW Models parameter

parameter description

nb

no. of past input samples (there are five nb’s in each model according to five inputs) input delay (there are five nk’s in this model according to five inputs) no. of past prediction outputs (for OE and HW model) no. of piecewise units (for HW model); there are six np’s according to five inputs and one output

nk nf np

min value

max value

1

8

0

15

1

5

1

13

Table 2. Genetic Algorithm Parameters parameter

value/property

population size no. of generations function evaluation selection crossover mutation mutation probability rate fitness scaling no. of elite individuals crossover fraction

50 for FIR and OE; 30 for HW 60 for FIR, OE and HW 3000 for FIR and OE; 1800 for HW stochastic uniform scattered uniform 0.1 rank 1 0.7

Table 3. Direct Search Parameters parameter

value/property

complete poll max function evaluation mesh contraction mesh expansion poll method search method

on 3000 for FIR and OE; 1800 for HW 0.5 2.0 GSSPositiveBasis2N GSSPositiveBasis2N

3. PROCESS DESCRIPTION Benzene is present in crude oil and is also formed in a number of catalytic and thermal processes within the refinery. From 70 to 85% of the benzene is contributed by reformate from the catalytic reforming process. The reformate plant catalytic reformate is fractioned into light and heavy reformates and the benzene-rich fraction in reformate splitter columns. Although benzene has a high octane number and high calorific value, the benzene content in light reformate needs to be reduced to 1%. This is due to the fact that benzene is a precursor for the formation of cyclohexane in the process of isomerization, and is thus an undesirable component of gasoline (low octane number). Also, European emission standards (such as EURO IV and EURO V) for vehicle exhaust emission and MSAT (Mobile Source Air Toxics) regulations limit the amount of benzene in gasoline, due to the hazardous effect of benzene on health and a negative environmental impact. Hence, the continuous measurement of benzene content in reformate is necessary, but the problem lies in the fact that the benzene on-line chromatographic analyzer is frequently under maintenance and is also very expensive. An overview of a fractionation reformate plant with the variables used for soft sensor development is given in Figure 4. Fractionation of light reformate (top product) takes place in column C1. Fractionation of the benzene-rich fraction (top product) from heavy reformate (bottom product) takes place in column C2.

the time delay for the ith input, and nbi is the maximum number of past input values. Polynomial coefficients of Bi(q) can be determined using the least-squares optimization method. The block diagram shown in Figure 1 represents the FIR model structure. A somewhat extended variant of the FIR model is the output error (OE) model. The OE model predictor is given by y (̂ t ) = B(q) u(t − nk) + [I − F(q)]y (̂ t )

(6)

F(q) = I + F1q−1 + F2q−2 + ... + Fnf q−nf

(7)

where nf is the maximum number of past predicted (simulated) outputs. The block diagram shown in Figure 2 represents the structure of the OE model. If the output of a system depends nonlinearly on its inputs, the Hammerstein−Wiener (HW) model can be applied. The HW model is presented with a serial connection of static nonlinear blocks with a dynamic linear block. Related studies10−12 have shown that nonlinear industrial processes, such as distillation columns or heat exchangers, can be satisfactorily modeled with an HW model structure. Figure 3 shows a block diagram of the HW model structure, where • w(t) = f(u(t)) is a nonlinear function transforming input data u(t). 3009

dx.doi.org/10.1021/ie202362d | Ind. Eng. Chem. Res. 2012, 51, 3007−3014

Industrial & Engineering Chemistry Research

Article

Table 4. FIR Model Properties, Using GA FIT FPE RMS eMAE fitness function, y

run 1

run 2

run 3

run 4

run 5

run 6

78.853 0.0063 0.0518 0.0394 32.675

78.895 0.0064 0.0517 0.0391 32.654

78.892 0.0064 0.0517 0.0388 32.664

78.873 0.0064 0.0518 0.0392 32.669

78.896 0.0064 0.0517 0.0388 32.661

78.895 0.0064 0.0517 0.0391 32.654

Table 5. FIR Model Properties, Using DS init values of nb and nk FIT FPE RMS eMAE fitness function, y

run 1

run 2

run 3

run 4

run 5

run 6

all 1 78.617 0.0064 0.0524 0.0395 33.062

all 2 78.922 0.0064 0.0517 0.0387 32.627

all 3 78.896 0.0064 0.0517 0.0388 32.661

all 4 78.734 0.0064 0.0521 0.0394 32.883

all 5 78.896 0.0064 0.0517 0.0388 32.661

all 6 78.789 0.0064 0.0520 0.0392 32.778

The following continuously measured variables have been chosen as key input variables of the soft sensor for the estimation of benzene content in light reformate: C1 inlet stream temperature, TC-001; C1 column bottom temperature (outlet from H-1), TC-018; C1 column temperature, TC-003; C1 column pressure, PI-009; and pump-around flow rate, FC-002.

4. SOFT SENSOR MODEL DEVELOPMENT Off-line process data were obtained from the plant database over a continuous period of 3 weeks, with a sampling time of 5 min according to the process dynamics. Data preprocessing included detecting missing data and outlier detection and removal (3 sigma rule). Output quantities (benzene content, v/v %) measured by the on-line analyzer were sampled every 20 min. The number of each input data (sampled every 5 min) must correspond to the number of output data, thus requiring additional output data. This was generated by the multivariate adaptive regression splines algorithm. Data were also detrended and filtered. The modeling data set for the estimation and the independent validation data sets included 4500 samples and 1500 samples, respectively. After data preprocessing, the MATLAB System Identification Toolbox (SIDT) and Global Optimization toolbox routines were used for the development of soft sensor models. 4.1. Optimizing Model Parameters by Genetic Algorithm and the Direct Search Method. After the model structure is chosen, optimal model parameters need to be determined. Configurable parameters (of FIR, OE, and HW structure) and their ranges are shown in Table 1. Parameter ranges have been chosen based on process operators’ experience of observed process dynamics, as well as rational model complexity. There are 10, 15, and 21 configurable parameters of interest in the FIR, OE, and HW models respectively, all positive integers. Important GA and DS parameters, presented in Tables 2 and 3, have been chosen on the basis of preliminary experiments and experience, as well as rational calculating time. In the GA evolution of configurable parameters, each individual chromosome in the population represents a set of possible model orders. The initial values of parameters are selected randomly. Crossover type was chosen between scattered, single-point, and two-point crossovers. The type of mutation

Figure 5. Plot of the best values of fitness function at each iteration for the best FIR model using the DS algorithm.

Figure 6. Comparison between measured and FIR model output data.

was chosen between uniform and Gaussian mutations. The mutation probability rate was chosen between 0.1 and 0.6. The rank fitness scaling had performed better than proportional scaling and shift linear scaling. The number of elite individuals has been chosen between 1 and 2, and the crossover fraction has been chosen between 0.5 and 0.7. In the proposed GA for FIR and OE models, 34 individuals have been created in each generation by using a crossover procedure, 15 by using a mutation procedure, and 1 is an elite individual (i.e., with the lowest fitness function value from the previous generation). The algorithm terminated when 60 generations had been produced. The HW model contains 30 individuals due to the model’s complexity and high processing time, while the number of 3010

dx.doi.org/10.1021/ie202362d | Ind. Eng. Chem. Res. 2012, 51, 3007−3014

Industrial & Engineering Chemistry Research

Article

FPE, and RMS values. The fitness function (FIT) of model is calculated as follows:

generations is 60. The search space of configurable parameters for the FIR model, as the simplest of the proposed models, is equal to 85·165 = 3.4360 × 1011. For the OE model it is equal to 85·165·55 = 1.0737 × 1014, and for the HW model it is equal to 85·165·55·136 = 5.1827 × 1020. GA and DS procedures found combinations of the parameters, among the billions of possibilities, which best (or nearly so) described the process dynamics. Direct search parameters are presented in Table 3. In the preliminary investigation of DS parameters the GSSPositiveBasis2N search method (with a set of 2N vectors) gave better results than GSSPositiveBasisNp1 (set of N + 1 vectors), where N is the number of estimated parameters. 4.2. Model Evaluation Criteria and Fitness Function Assignment. Models are primarily evaluated based on FIT,

⎛ ⎜ FIT = ⎜1 − ⎜ ⎝

⎞ ∑in= 1 (yî − yi )2 ⎟ ⎟ ·100 ∑in= 1 (yi − ym )2 ⎟⎠

(8)

y is the measured output, ŷ is the simulated or predicted model output, and ym is the mean of y; 100% corresponds to a perfect FIT. Akaike’s final prediction error (FPE) criterion is also used for evaluating models. According to Akaike’s theory, the most accurate model has the smallest FPE.9,14 This criterion represents the compromise between model accuracy, expressed by the accuracy of estimated parameters, and model complexity, expressed by the number of parameters used to parametrize the model. FPE is defined by the following equation: ⎛ 2d ⎞ ⎟ FPE = V ⎜1 + ⎝ n ⎠

(9)

where V is the loss function, d is the number of estimated parameters, and n is the number of values in the estimation data set. The loss function V is defined by the following equation: ⎛ n ⎞ 1 T ⎜ V = det⎜ ∑ ε(t , θn) (ε(t , θn)) ⎟⎟ ⎝n ⎠

(10)

1

where θn represents the estimated parameters and ε is output error. Root mean square error (RMS) as a frequently used criterion for model evaluation is also presented in the results:

Figure 7. Distribution of FIR model residuals.

Table 6. Frequency of FIR Model Residuals category −0.20 < −0.15 < −0.10 < −0.05 < 0.00 < e 0.05 < e 0.10 < e 0.15 < e

e ≤ −0.15 e ≤ −0.10 e ≤ −0.05 e ≤ 0.00 ≤ 0.05 ≤ 0.10 ≤.15 ≤ 0.20

count

percent

9 56 219 379 700 100 32 6

0.60 3.73 14.59 25.25 46.64 6.66 2.13 0.40

RMS =

n

1 n

∑ (y î − yi )2 (11)

i=1

FIT should be as high as possible, and FPE and RMS should be as low as possible. Fitness Function. The definition of the chromosome fitness function is an important element of a GA.15 Since preliminary investigation showed that FIT and FPE are not always correlated, but both are frequently used criteria for model

Table 7. OE Model Properties, Using GA FIT FPE RMS eMAE fitness function, y

run 1

run 2

run 3

run 4

run 5

run 6

89.214 0.0035 0.0264 0.0180 16.892

90.423 0.0028 0.0235 0.0175 14.733

90.267 0.0023 0.0239 0.0171 14.396

89.812 0.0025 0.0250 0.0190 15.190

88.910 0.0023 0.0272 0.0221 16.081

89.981 0.0022 0.0246 0.0188 14.652

Table 8. OE Model Properties, Using DS init values of nb, nk, nf FIT FPE RMS eMAE

run 1

run 2

run 3

run 4

run 5

run 6

all 1 84.638 0.0037 0.0454 0.0261

all 2 81.481 0.0022 0.0454 0.0354

all 3 −3.361 × 1010 0.0027 8.239 × 1010 2.285 × 1010

all 4, except nf all 3 80.130 0.0024 0.0487 0.0359

all 4, except nf all 1 85.528 0.0020 0.0374 0.0263

all 6, except nf all 1 88.565 0.0023 0.0280 0.0218

3011

dx.doi.org/10.1021/ie202362d | Ind. Eng. Chem. Res. 2012, 51, 3007−3014

Industrial & Engineering Chemistry Research

Article

Figure 8. Plot of the best values of the fitness function at each generation, for the best OE model using GA. Figure 10. Distribution of OE model residuals.

Table 9. Frequency of OE Model Residuals category −0.20 < −0.15 < −0.10 < −0.05 < 0.00 < e 0.05 < e 0.10 < e 0.15 < e

e ≤ −0.15 e ≤ −0.10 e ≤ −0.05 e ≤ 0.00 ≤ 0.05 ≤ 0.10 ≤ 0.15 ≤ 0.20

count

percent

0 1 44 810 606 40 0 0

0 0.07 2.93 53.97 40.37 2.66 0 0

Figure 9. Comparison between measured and OE model output data.

number defined here, it gives good results in all run simulations. As can be seen from Table 5, the DS method needs defining initial parameter values and also gives satisfactory results. Simulation from the second run of the direct search procedure is proposed as the best among the developed FIR models. Since the FIR model has the simplest parametric structure among the proposed models, it needs a higher model order. FIR model parameters of the best model achieved are presented in the following vector form:

evaluation, they are integrated in the fitness function, as follows: y = (100 − FIT) + 1000·FPE + 100 ·RMS

(12)

As can be seen, in the present study, the weighted sum method for multiobjective criteria is used for setting the fitness function. Each criterion is assigned a weighting value (estimated from the previous investigation), and the fitness function is a linear combination of all weighted criteria.16,17 Results also include the mean absolute error, eMAE, which is given by

nb = [7 8 8 8 8] nk = [6 15 15 4 15]

Figure 5 shows a plot of the best value of the fitness function at each generation for the chosen FIR model. Figure 6 shows the comparison between simulated and measured outputs for the validation data set. It can be seen that the model output is a very good match for the validation data. Figure 7 represents the histogram of FIR model residuals, e (differences between predicted and measured outputs), which is bell-shaped and zero-centered. From Table 6, it can be perceived that 71.9% of all residuals lie in the range of ±0.05 values. 5.2. Results of the OE Model. OE model properties for the validation data set using GA are presented in Table 7, and model properties for the validation data set using DS are presented in Table 8. The function evaluation number for the OE model is 3000 for both optimization methods. From Table 8, run 3, it can be seen that, when the initial value of nf is 3, the results are very poor and the model is unstable. As a consequence, high initial values of nf are avoided in other runs. The OE model from simulation run 3 with parameters estimated by GA was chosen as the best among the presented OE models. As opposed to FIR results, GA in all simulations gives

n

e MAE =

1 ∑ |yî − yi | n i=1

(13)

5. RESULTS AND DISCUSSION All results were calculated for a new set of validation data. The parameters nb, nf, nk, and np were estimated by minimizing the objective function (eq 12) using the GA and DS procedures, as was discussed earlier. Coefficients of polynomials Bi(q) and F(q) were estimated using the optimization methods integrated with the MATLAB System Identification Toolbox. 5.1. Results of FIR Model. As can be seen from Tables 2 and 3, the function evaluation number for the FIR model is 3000 for both optimization approaches. FIR model properties using GA as an optimization technique for parameter estimation are shown in Table 4. FIR model properties using DS as an optimization technique for parameter estimation are shown in Table 5. GA is a stochastic procedure, and its convergence to a global minimum can be slow. However, for the function evaluation 3012

dx.doi.org/10.1021/ie202362d | Ind. Eng. Chem. Res. 2012, 51, 3007−3014

Industrial & Engineering Chemistry Research

Article

Table 10. HW Model Properties, Using GA FIT FPE RMS eMAE fitness function, y

run 1

run 2

run 3

run 4

run 5

run 6

86.335 0.0013 0.0335 0.0237 18.346

88.630 0.0015 0.0279 0.0209 15.680

86.652 0.0013 0.0327 0.0254 17.900

87.483 0.0015 0.0307 0.0227 17.117

86.429 0.0018 0.0333 0.0238 18.718

86.228 0.0011 0.0338 0.0234 18.245

Table 11. HW Model Properties Using DS init values of nb, nk, nf, np FIT FPE RMS eMAE fitness function, y

run 1

run 2

run 3

run 4

run 5

run 6

all 1 85.874 0.0014 0.0346 0.0264 19.028

all 2 −127.061 0.0031 0.5566 0.3780 285.787

all 3 51.124 0.0015 0.1198 0.0970 62.397

all 4, except nf all 1 82.579 0.0014 0.0427 0.0353 21.353

all 5, except nf all 1 84.789 0.0024 0.0373 0.0271 21.320

all 6, except nf all 1 81.732 0.0009 0.0448 0.0345 23.730

Figure 11. Plot of the best values of the fitness function at each generation, for the best HW model, using GA.

Figure 12. Comparison between measured and HW model output data.

in this case better results than the DS procedure. The cause of inferior DS results may be found in the larger search space of the OE model, as well as in insufficient diversity and early convergence obtained by the DS procedure. OE model structure parameters of the best model achieved are presented in the following vector form:

5.3. Results of the HW Model. Due to the complexity and nonlinearity of the fractionation process, the HW model was developed. The function evaluation number for the HW model is 1800 for both optimization methods. HW model properties using GA as an optimization method for parameter estimation are shown in Table 10, and those using DS in are shown in Table 11. HW model parameters of the achieved model best (run 2 of GA procedure) are presented in the following vector form:

nb = [6 1 2 3 3] nf = [1 1 2 2 1] nk = [1 8 7 9 13]

nb = [1 6 5 7 5]

Figure 8 shows a plot of the best values of the fitness function at each generation for the best OE model. Compared to the FIR model, the OE model gives better results, as was expected, since the past predicted output values are taken. It can be seen that a lower model order (lower nb) is sufficient for satisfactory results. Figure 9 shows the comparison between the OE model output and the measured output for validation data. A graphical comparison and the corresponding FIT, FPE, and RMS values show that experimental and model data are in accurate agreement. The average absolute deviation of the OE model is quite satisfactory for the on-line implementation. A residual plot of validation data for the OE model is given in Figure 10. As can be seen, residuals have a normal distribution, with a narrower bell shape than the FIR model. Table 9 shows that the most frequent residuals (94.3%) lie in the range of ±0.05 values.

nf = [1 2 1 2 2] nk = [2 6 10 8 11] np = [13 1 3 1 2 1]

Figure 11 shows a plot of the best values of the fitness function at each generation for the best HW model. Figure 12 shows the comparison between the HW model output data and the measured output data. As can be seen from Tables 10 and 11 and Figure 12, the HW model performed somewhat poorer compared to the OE model. From a graphical comparison and the corresponding FIT, FPE, and RMS values and residual analyses (Figure 13), it can be observed that the HW model data are an accurate match for experimental data. From Table 12, it can be seen that the most frequent residuals (93.1%) lie in the range of ±0.05 values. 3013

dx.doi.org/10.1021/ie202362d | Ind. Eng. Chem. Res. 2012, 51, 3007−3014

Industrial & Engineering Chemistry Research

Article

OE model, which can thus be successfully employed as the soft sensor for the on-line prediction of benzene content.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This paper is a result of the scientific project ZP-125-19631964, soft sensors and analyzers for process monitoring and control, supported by the Croatian Ministry of Science, Education and Sports. The authors thank Mr. Saša Horvat and Mr. Boris Ž eželj from INA-Refinery Rijeka, Croatia, who contributed to the work in an advisory capacity.



Figure 13. Distribution of HW model residuals.

Table 12. Frequency of HW Model Residuals category −0.20 < −0.15 < −0.10 < −0.05 < 0.00 < e 0.05 < e 0.10 < e 0.15 < e

e ≤ −0.15 e ≤ −0.10 e ≤ −0.05 e ≤ 0.00 ≤ 0.05 ≤ 0.10 ≤ 0.15 ≤ 0.20

count

percent

2 3 63 594 803 33 1 2

0.13 0.20 4.20 39.57 53.50 2.20 0.07 0.13

REFERENCES

(1) Luo, J. X.; Shao, H. H. Developing soft sensors using hybrid soft computing methodology: A neurofuzzy system based on rough set theory and genetic algorithms. Soft Comput. 2006, 10, 54. (2) Ma, M.; Ko, J.; Wang, S. S.; Wu, M.; Jang, S.; Shieh, S.; Wong, D. S. Development of adaptive soft sensor based on statistical identification of key variables. Control Eng. Pract. 2009, 17, 1026. (3) Fortuna, L.; Graziani, S.; Rizzo, A.; Xibilia, M.G. Soft Sensors for Monitoring and Control of Industrial Processes; Advances in Industrial Control; Springer: London, 2007. (4) Kadlec, B.; Gabrys, B.; Strandt, S. Data-driven Soft Sensors in the process industry. Comput. Chem. Eng. 2009, 33, 795. (5) Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716. (6) Rahim, H. A.; Ibrahim, F.; Taib, M. N.; Rahim, R. A.; Sam, Y. M. Model order selection criterion for monitoring haemoglobin status in dengue patients using ARMAX model. Int. J. Smart Sens. Intell. Syst. 2008, 1 (2), 403. (7) Liang, G.; Wilkes, D. M.; Cadzow, J. A. Arma model order estimation based on the eigenvalues of the covariance matrix. IEEE Trans. Signal Process. 1993, 41 (10), 3003. (8) He, X.; Asada, H. A new method for identifying orders of inputoutput models for nonlinear dynamic systems. Proc. Am. Control 1993, 2520. (9) Ljung, L. System Identification: Theory for the User; Prentice Hall: Upper Saddle River, NJ, 1999. (10) Yusoff, Z. M.; Muhammad, Z.; Rahiman, M. H. F.; Tajuddin, M.; Adnan, R.; Taib, M. N. Modeling of Steam Distillation System Using Hammerstein-Wiener model. IEEE Signal Process. 2011, 7, 435. (11) Norquay, S. J.; Palazoglu, A.; Romagnoli, J. A. Application of Wiener model predictive control (WMPC) to an industrial C2-splitter. J. Process Control 1999, 9, 461. (12) Eskinat, E.; Johnson, S. H.; Luyben, W. Use of Hammerstein models in identification of nonlinear systems. AIChE J. 1991, 37 (2), 255−268. (13) Affenzeller, M.; Winkler, S.; Wagner, S.; Beham, A. Genetic Algorithms and Genetic Programming�Modern Concepts and Practical Applications; Taylor & Francis Group: Boca Raton, FL, 2009. (14) Verhaegen, M.; Verduit, V. Filtering and System Identification; Cambridge University Press: New York, 2007. (15) Dam, M.; Saraf, D. N. Design of neural networks using genetic algorithm for on-line property estimation of crude fractionator products. Comput. Chem. Eng. 2006, 30, 722. (16) Venkataraman, P. Applied Optimization with MATLAB Programming; John Wiley and Sons: New York, 2009. (17) Deb, K. Multi-objective Optimization Using Evolutionary Algorithms; John Wiley and Sons: New York, 2009.

If we keep in mind that the optimal model is the simplest model that describes the process dynamics most accurately, then the OE model is the right choice to embed on-site. Developed code is quite general and can be easily adapted, with some modifications, for the development of other parametric identification models, such as ARMA (autoregressive moving average) models, linear and nonlinear ARX (autoregressive with exogenous inputs) models, or BJ (Box Jenkins) models. In the presented work, FIT, FPE, and RMS values are included in the fitness function as they are commonly used measurements for model quality evaluation and comparison. However, the fitness function can be some other measurement which can then be used for model evaluation. The results showed that GA can be successfully used for adjusting FIR, OE, and HW model parameters. DS can also be a confident method for the estimation of FIR model order, but for the models with larger search spaces (such as OE and HW), the proper initial model order needs to be defined.

6. CONCLUSIONS Linear and nonlinear dynamic models for the estimation of benzene content of light reformate were developed. To avoid a trial-and-error procedure, GA and DS methods were proposed for proper model order selection, which makes the development of soft sensors more systematic. Chosen models show a satisfactory match with experimental data, thus proving their usefulness as soft sensors for the on-line estimation of benzene content in light reformate. The GA-based method appears to be more suitable than the DS method for larger search spaces. Using the described procedure, it was shown that GA can be satisfactorily applied for optimizing configurable parameters of input−output polynomial models. The best performer was the 3014

dx.doi.org/10.1021/ie202362d | Ind. Eng. Chem. Res. 2012, 51, 3007−3014