292
Ind. Eng. Chem. Res. 2006, 45, 292-298
Dealing with Collinearity in Finite Impulse Response Models Using Bayesian Shrinkage Mohamed N. Nounou* Department of Chemical and Petroleum Engineering, The United Arab Emirates UniVersity, P.O. Box 17555, Al-Ain, UAE
Finite impulse response (FIR) models are widely used in the chemical industry due to their simplicity and easily defined model structures. However, they usually require a large number of coefficients to capture the process dynamics. This large number of coefficients introduces collinearity or redundancy in the models, which usually increases the uncertainty of their estimation. This paper presents a Bayesian approach, called empirical Bayesian FIR (EBFIR) modeling, to deal with this collinearity problem. Unlike ridge regression (RR), which shrinks the FIR model parameters toward zero in order to reduce their variations, EBFIR makes better use of the data by estimating a prior density function for the FIR coefficients, which is then used within a Bayesian framework to shrink the FIR coefficients in a more rigorous fashion. The developed Bayesian approach is computationally inexpensive, as a closed form solution of the Bayesian estimate of the FIR coefficients is derived. The EBFIR approach is also more general than some of the existing methods, such as ordinary least squares regression (OLS) and RR, which are shown to be special cases of EBFIR. Finally, the developed EBFIR approach has been shown to outperform some of the commonly used existing FIR estimation methods over a wide range of noise levels. 1. Introduction Many process operations, such as process control, optimization, and data filtering, rely on the availability of accurate process models that can predict and characterize the behavior of the process. Process models are categorized into fundamental, empirical, and semiempirical models.1 Fundamental models are usually differential and/or algebraic equations, which are derived from the basic conservation laws, such as material and energy balances. When dealing with complex processes, fundamental models can be difficult to obtain, and in such cases, empirical models, which are derived from measurements of the process variables, are usually relied upon. Constructing empirical models, however, can have very challenging aspects, such as defining the model structure, accounting for the presence of measurement errors, and extracting the maximum amount of useful information from the data to enhance the model accuracy. Linear process models are very common in the chemical industry because of their simplicity and relative accuracy in the operating regions of interest. The most commonly used linear model structures include the autoregressive with exogenous (ARX) variable model2,3 and the finite impulse response (FIR) model.4,5 FIR models are usually less parsimonious than ARX models but are widely used because of their simple structures which use the lagged process inputs as the model independent variables, resulting in the impulse response of the system being the model parameters. Also, when the model inputs are assumed to be noise-free as in most control applications, FIR models can be more accurate than ARX models because they satisfy the assumption made in ordinary least squares regression (OLS) about the input variables being noise-free. On the other hand, since ARX models use the process lagged outputs (which are not necessarily noise-free) as the model inputs, they do not satisfy this noise-free inputs assumption and, thus, can be more erroneous in their prediction. Moreover, defining the FIR model structure is usually much easier than in ARX models as long * To whom correspondence should be addressed. Tel.: +971-37133-549. Fax: +971-3-762-4262. E-mail:
[email protected].
as a large enough number of FIR coefficients are used. However, a major drawback of FIR models is the uncertainty associated with their parameter estimation, especially for a large number of FIR coefficients. This is because of the collinearity or redundancy in the input or information matrix, which consists of the lagged inputs. Collinearity makes the input matrix (if not singular) nearly singular, increases the variance of the estimated parameters, and thus degrades their accuracy of estimation. Many modeling techniques have been developed to deal with this collinearity problem, and they include principal component regression (PCR), partial least squares (PLS) regression,4,6-8 and ridge regression (RR).8-10 In PCR and PLS regression, the singularity of the input matrix is reduced by decreasing the mathematical rank of the input matrix using singular value decomposition (SVD), while in RR the rank is increased to match the number of the model inputs. The idea behind RR is that it not only minimizes the output prediction error, as in OLS, but also penalizes the magnitude of the estimated parameters. This penalty effectively shrinks the estimated model parameters to reduce their large variations. Shrinkage has been shown to improve the accuracy of estimation by trading bias for risk, which means that shrinkage-based estimators tend to bias the estimate to decrease its estimation error.11,12 In fact, RR has been shown to be a special case of a shrinkage estimator, called the James-Stein (JS) estimator.12 However, RR shrinks all FIR coefficients toward zero, which is not very rigorous because it is known that the impulse response of a process is not zero. Wavelet-based multiscale representation has also been used to shrink the estimated impulse response of collinear FIR models.13,14 Nikolaou et al.13 represented the OLS estimated FIR coefficients at multiple scales using wavelets and then used a recursive approach to select the set of wavelet coefficients (and shrink the unnecessary ones) to minimize a cross validation mean squares error. Robertson et al.,14 on the other hand, used multiscale representation to reduce the effect of input noise on the estimated models. The main idea behind this paper is that the accuracy of FIR model estimation can be improved if the estimated FIR coefficients are shrunk toward a more realistic impulse response than
10.1021/ie048897m CCC: $33.50 © 2006 American Chemical Society Published on Web 11/19/2005
Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006 293
zero. This, of course, requires knowledge about the impulse response of the process, which is the quantity we are trying to estimate. However, such knowledge and much more information about the process behavior can be obtained from the continuously measured process variables. One method that can make good use of data, by extracting their embedded information, is Bayesian estimation.15,16 Bayesian estimation has many attractive properties. It satisfies the likelihood principle, which means that it uses all the data information in the estimation problem, and can account for different levels of noise in the data. Also, it provides a general framework through which many existing modeling techniques can be better understood. These advantages of Bayesian estimation have been previously exploited to develop Bayesian PCR and Bayesian latent variable regression methods.17,18 In this paper, an empirical Bayesian approach, called empirical Bayesian FIR (EBFIR) modeling, is developed to deal with the collinearity problem of FIR models. The developed approach improves the accuracy of estimated FIR models by shrinking their parameters toward an empirically estimated process impulse response. The EBFIR approach first estimates a prior density, using the available data, to quantify the distribution of the process impulse response (FIR model parameters) and, then, uses this prior density function within a Bayesian context to give a Bayesian estimate of the FIR model coefficients. The rest of this paper is organized as follows: In section 2, the representation of FIR models and some of their existing estimation techniques are described, followed by an introduction to Bayesian estimation in section 3. Then, in section 4, the formulation and algorithm of Bayesian FIR modeling are described, followed by an illustrative example in section 5 to compare the performance of the developed EBFIR algorithm with existing methods. Finally, the paper is concluded with a few remarks in section 6. 2. Identification of FIR Models Given data of the process input, uk ∈ {u1, u2, ..., un}, which are assumed to be noise-free, and measurements of the process output data, yk ∈ {y1, y2, ..., yn}, which are assumed to be contaminated with additive noise, i.e.,
yk ) y˜ k + ek
(1)
where y˜ k and ek are the noise-free output and the measurement noise at time step k, and if it is assumed that the linear FIR model has the following form, m
yk )
h˜ iuk-i + ek ∑ i)1
(2)
it is desired to estimate the noise-free process impulse response or FIR model coefficients, {h˜ 1, h˜ 2, ..., h˜ m}. The FIR model shown in eq 2 can be written in matrix notation as follows:
Y ) Uh˜ + e where
[] [][]
yn yn-1 ,U) Y) l y2 y1
uTn T un-1 l ,e) uT2 uT1
en en-1 l e2 e1
and uTk ) [uk-1 uk-2 ... uk-m ]
(3)
The remainder of this section presents some of the common techniques used to estimate FIR model coefficients, such as OLS3 and RR.8-10 2.1. OLS Modeling. OLS is one of the most widely used modeling techniques.3 It assumes noise-free inputs and estimates the model parameters by maximizing the likelihood or probability that the predicted model fits the measured data. Statistically, this is equivalent to maximizing the likelihood function, which is the conditional density function of the measured output variable given the noise-free model, i.e.,
{hˆ }OLS ) argmax P(Y|U, h˜ ) h˜
(4)
subject to the model constraint shown in eq 3. If the measurement noise is assumed to be independent, zero-mean Gaussian noise errors, with a variance of σ2, the estimation problem reduces to
{hˆ }OLS ) argmin (Y - Uhˆ )T(Y - Uhˆ ) hˆ
(5)
which has the following well-known closed form solution:
{hˆ }OLS ) (UTU)-1UTY
(6)
The above closed form solution shows that when the input covariance matrix is close to singularity (which is typical in large FIR models), the covariance matrix of the estimated parameters gets very large, resulting in a large uncertainty about the estimated model parameters. This is a major drawback in using OLS to estimate FIR model coefficients. 2.2. Ridge Regression (RR) Modeling. Ridge regression was developed to stabilize the model parameters estimated using OLS in the case of collinear model inputs.8-10 RR reduces the large variations in the model parameters by imposing a penalty on the magnitude of the estimated parameters as follows:
{hˆ }RR ) argmin (Y - Uhˆ )T(Y - Uhˆ ) + λhˆ Thˆ hˆ
(7)
where λ is a non-negative scalar. The above RR estimation problem has the following closed form solution:
{hˆ }RR ) (UTU + λI)-1UTY
(8)
which shows that RR introduces a bias to the input covariance matrix to artificially make it nonsingular. Statistically, the above RR formulation is equivalent to the following maximization problem:
( )
σ2 {hˆ }RR ) argmax N(Uhˆ ,σ2In) N 0, Im λ hˆ
(9)
where In and Im are the identity matrices of sizes n and m, respectively. Equation 9 shows that the RR estimator is a Bayesian estimator which uses a zero-mean multivariate normal distribution for the prior density function of the model parameters and a zero-one loss function. This means that RR shrinks the estimated parameters toward zero in order to damp their large variations. This might be acceptable if we have absolutely no information about the process or its true impulse response. In
294
Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006
practice, however, a reasonable idea about the process impulse response can be extracted from the available process data. Utilizing such information in the estimation of the FIR model can improve the accuracy of its estimation, which can be done using Bayesian estimation. 3. Bayesian Estimation 3.1. The Bayesian Perspective. The main difference between Bayesian estimators and other estimation techniques is their ability to incorporate external or prior knowledge about the quantities to be estimated through a density function called a prior. This ability of Bayesian estimators is due to the fact that they consider all quantities, measured and unmeasured, as random quantities that can be represented by a joint density function.16,19 Unlike Bayesian estimators, non-Bayesian estimators rely only on the likelihood function, which, according to the likelihood principle (LP), brings the data information into the estimation problem. Therefore, non-Bayesian methods do not provide the means for incorporating any external knowledge. Bayesian estimators, on the other hand, rely on the posterior density function, which is the conditional density function of the quantities of interest given the measurements. According to the Bayes rule, the posterior is related to the prior and the likelihood functions as follows:
posterior ∝ likelihood × prior
(10)
which shows that Bayesian estimators blend the data information brought in by the likelihood function and the external information brought in by the prior to form the posterior, which represents the distribution of the quantities of interest after collecting the measurements. As an example, assume that it is desired to estimate a quantity θ from measurements of the quantity y. Then, the posterior is the conditional density function of θ given the measurements y, i.e., P(θ˜ |y). Using the Bayes rule, the posterior can be written as
P(θ˜ |y) ∝ P(y|θ˜ ) P(θ˜ )
(11)
where P(y|θ˜ ) is the likelihood function and P(θ˜ ) is the prior. Note that the posterior is a density function and not an estimate. Therefore, a sample from the posterior, based on the choice of a loss function, is selected as the Bayesian estimate. 3.2. Loss Function. The loss function, L(θ˜ ;θˆ ), decides which sample from the posterior should be selected as the Bayesian estimate. Here, θˆ and θ˜ denote the Bayesian estimate and the true value of the quantity θ, respectively. Many loss functions have been suggested, such as quadratic and zero-one loss functions.15 A zero-one loss function imposes a penalty of zero when the selected sample is the true one and a penalty of unity otherwise, i.e.,
{
0 L(θˆ ;θ˜ ) ) 1
when {θˆ }Bayesian ) θ˜ otherwise
(12)
The use of a zero-one loss function corresponds to choosing the posterior mode or maximum as the Bayesian estimate, which is usually referred to as the maximum a posteriori (MAP) estimate,15 i.e.,
{θˆ }MAP ) argmax P(y|θ˜ ) P(θ˜ ) θ˜
(13)
Equation 13 shows that a zero-one loss function transforms the Bayesian estimation problem into an optimization problem,
which is very helpful as will be shown later in the formulation of EBFIR modeling. 3.3. Empirical Bayesian Estimation. Traditional Bayesian estimators assume the availability of a fully predefined prior density function. In practice, however, the prior might be fully or partly unspecified. In such cases, the prior is estimated empirically from the available data, which is called empirical Bayesian (EB) estimation.16,20 Generally, there are two main approaches for estimating the prior: a parametric approach and a nonparametric approach. The parametric approach assumes a structure for the prior and then estimates its hyperparameters. The nonparametric approach, on the other hand, estimates the entire distribution from the data. 4. Bayesian FIR (BFIR) Modeling In FIR modeling, the objective is to estimate the noise-free model parameters, h˜ , given the input and output data. From a Bayesian point of view, the posterior for this estimation problem can be defined as the conditional density function of the noisefree model parameters given the input and output data, i.e.,
posterior ) P(h˜ |u1,u2,...,un,y1,y2,...,yn)
(14)
Using the Bayes rule, the posterior can be written as
posterior ∝ P(y1,y2,...,yn|h˜ ,u1,u2,...,un) P(h˜ |u1,u2,...,un) (15) where
likelihood ) P(y1,y2,...,yn|h˜ ,u1,u2,...,un)
(16)
prior ) P(h˜ |u1,u2,...,un)
(17)
and
Therefore, to compute the posterior, the likelihood and prior densities need to be defined. 4.1. Likelihood Density Function. The likelihood, shown in eq 16, is the conditional density function of the output measurements given the noise-free model. Assuming that the output measurements are contaminated with zero-mean, independent and identically distributed Gaussian noise, i.e.,
ek ∼ N(0,σ2)
(18)
E[eiej] ) 0 ∀i * j
(19)
and
the likelihood function simplifies to n
likelihood )
N(y˜ k,σ2) ∏ k)1
(20)
Also, since y˜ k ) uTk h˜ , then the likelihood function becomes n
likelihood )
N(uTk h˜ ,σ2) ∏ k)1
(21)
4.2. Prior Density Function. The prior, shown in eq 17, is the conditional density function of the noise-free model parameters given the input data. Since the inputs are assumed to be noise-free, the prior simplifies to
prior ) P(h˜ )
(22)
Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006 295
Since this prior quantifies the distribution of the model parameters, assumptions about the nature of the FIR coefficients are needed in order to completely define its structure. In this work, it is assumed that the noise-free FIR coefficients follow a multivariate normal distribution with a mean vector, µ, and a covariance matrix, ηI, which is a scalar multiple of the identity matrix, i.e.,
prior ) P(h˜ ) ∼ N(µ,ηI)
(23)
where η is a positive number. The rationale behind this choice of prior distribution is the fact that it transforms the FIR Bayesian estimation problem into a quadratic optimization problem, which has a closed form solution, and, therefore, is computationally very efficient. In principle, other forms of prior density functions can be used, such as a uniform distribution; however, the objective function resulting from such a choice would not be quadratic and, thus, would require a numerical approach in order to be solved. According to these assumptions, the posterior becomes n
posterior ∝
N(uTk h˜ k,σ2) N(µ,ηI) ∏ k)1
(24)
4.3. Zero-One Loss Function. The posterior, shown in eq 24, describes the behavior of the model parameters after collecting the measurements, based on the assumptions made about the nature of noise and the FIR model parameters. However, it does not provide an estimate of the model parameters. To obtain a Bayesian estimate of the FIR model coefficients, a sample from the posterior is selected, based on a choice of a loss function. In this work, a zero-one loss function is assumed for multiple reasons. First, as described in section 3.2, a zero-one loss function selects the posterior maximum or mode as the model parameter estimate, which is often referred to as the maximum a posteriori “MAP” estimate. This is because this type of loss function defines a penalty of one when the selected estimate is the true one and a zero penalty otherwise, which corresponds to choosing the posterior maximum as the Bayesian estimate.15 Thus, a zero-one loss function transforms the Bayesian estimation problem into an optimization problem. As will be shown later, this will make the problem computationally inexpensive because the MAP estimate of the FIR model parameters has a closed form solution. Second, it provides a direct comparison between the Bayesian solution of the FIR model parameters and those obtained using OLS and RR. On the basis of this selection of a zero-one loss function, the Bayesian estimation problem of the FIR model parameters becomes n
{hˆ }MAP ) argmax h˜
N(uTk h˜ k,σ2) N(µ,ηI) ∏ k)1
(25)
4.4. MAP Solution of FIR Coefficients. The optimization problem shown in eq 25 is equivalent to the following minimization problem: n
∑
{hˆ }MAP ) argmin hˆ k)1
(yi -
uTk hˆ k)
σ2
1 + (hˆ - µ)T(hˆ - µ) (26) η
This is because the posterior is the product of two Gausssian distributions (the likelihood and the prior), which can be maximized by minimizing the sum of their negative exponents. Equation 26 can be rearranged in matrix notation as follows:
1 {hˆ }MAP ) argmin 2(Y - Uhˆ )T(Y - Uhˆ ) + hˆ σ 1 (hˆ - µ)T(hˆ - µ) (27) η Multiplying the entire equation by σ2, it becomes
{hˆ }MAP ) argmin (Y - Uhˆ )T(Y - Uhˆ ) + hˆ σ2 (hˆ - µ)T(hˆ - µ) (28) η Defining γ ) σ2/η, eq 28 becomes
{hˆ }MAP ) argmin (Y - Uhˆ )T(Y - Uhˆ ) + hˆ γ(hˆ - µ)T(hˆ - µ) (29) which, assuming that the quantities µ and γ are known, has the following closed form solution (see the Appendix for this proof):
{hˆ }MAP ) (UTU + γ I)-1(UTY + γµ)
(30)
Note the significant similarity between the objective function used in this Bayesian (MAP) approach shown in eq 29 and the RR objective function shown in eq 7. However, there is a major difference; the RR objective function shrinks the FIR coefficients toward zero, whereas eq 29 shrinks the FIR coefficients toward the multivariate mean of the prior density. That is why this MAP approach is expected to improve the accuracy of estimated FIR models. Also, comparing eqs 6, 8, and 30, it can be easily seen that the Bayesian (MAP) approach is more general than OLS and RR. For example, if the mean vector, µ, is assumed to be the zero vector, the MAP solution reduces to RR, while if the parameter γ is assumed to be zero, it reduces to OLS. 5. Empirical Bayesian FIR (EBFIR) Modeling As discussed in section 3.3, when all or part of the prior is not known, it can be estimated from the data, which is called empirical Bayesian estimation. In section 4, based on a full knowledge of the prior density function, a closed form solution of the MAP estimate of the FIR model coefficients was derived. However, this closed form solution, which is shown in eq 30, assumes the availability of the prior hyperparameters, µ and γ. In practice, however, prior knowledge about these quantities is usually not available. Therefore, the quantities µ and γ will need to be estimated from the data. This will be referred to as empirical Bayesian FIR (EBFIR) modeling, in which the estimation problem becomes
{hˆ ,µ,γ}MAP ) argmin (Y - Uh˜ )T(Y - Uh˜ ) + h˜ ,µ,γ γ(h˜ - µ)T(h˜ - µ) (31) which simultaneously estimates the FIR coefficients and the prior hyperparameters from the available data. To solve this empirical estimation problem, first, the prior parameters need to be estimated from the data, and then, the FIR model parameters can be computed using the MAP closed form solution shown in eq 30. This procedure is summarized in the algorithm described below. 5.1. EBFIR Modeling Algorithm. (A) Step 1. Divide the available input and output data (U and Y) into two sets, one
296
Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006
(i) For multiple γ values in the range of [0,γmax], estimate the model parameter vector using the MAP closed form solution, i.e., eq 30. (ii) In each case, use the estimated model parameters to predict the output using the testing data, and compute the mean squared error (MSE). (iii) Select the values of γ and the vector hˆ that minimize the output prediction MSE of the testing data. 6. Illustrative Example 6.1. Model Description. In this section, the performance of the EBFIR method is illustrated and compared to those of existing methods, such as OLS, RR, and PCR, using the following second-order plus dead time process (SODTP):21 Figure 1. Impulse response of a SOPDT process using a sampling interval of 0.3 min.
Y(s) 10e-1.5s ) 2 U(s) 2s + 5s + 1
for training and the other for testing or validation of the estimated model. (B) Step 2. Estimate the mean of the prior density function as follows: (i) Solve for the RR FIR model parameters, {hˆ }RR, using cross validation (CV), by minimizing the prediction mean squared error of the testing data. (ii) Approximate the variance of the model parameters by computing the variance of the RR parameter estimates, i.e., ηˆ ) var[{hˆ }RR]. (iii) Approximate σ2 by computing the variance of the output prediction error using the RR model parameters, i.e., σˆ 2 ) var[(Y - Uhˆ RR)]. (iv) Compute the parameter, γ, as follows: γˆ ) σˆ 2/ηˆ . (v) Use the above estimated γˆ to compute the mean of the model parameters as follows:
which is discretized using a sampling interval of 0.3 min, to give an impulse response with a settling time of around 90 samples, as shown in Figure 1. The discretized model is used to generate data for modeling purposes using a 1000-sample PRBS input signal changing between -1 and 1. Then, the output is contaminated with zero-mean Gaussian noise. The noise standard deviation is varied from 0.1 to 1 to illustrate the performance of the EBFIR algorithm under different levels of noise. Figure 2 shows a portion of the input and output data, when the noise standard deviation is 0.5. 6.2. Comparison Criteria. The performances of the different modeling techniques are compared using two criteria: the mean squared errors of the estimated parameters with respect to the noise-free ones, i.e.,
-1
µˆ ) (U U + γˆ I) (U Y) T
T
(C) Step 3. Compute the MAP solution and, using cross validation (CV), calculate {hˆ }MAP and γˆ as follows:
MSE )
1
(32)
m
∑(hˆ i - h˜ i)2
m i)1
(33)
and the estimated model gain, i.e.,
Figure 2. Portion of the input and output data used in the SOPDT example. The output is contaminated by zero-mean, 0.5-standard-deviation Gaussian noise.
Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006 297 Table 1. Comparison of the MSE of Estimated FIR Coefficients for the Various Modeling Methods (×104) modeling method
σ ) 0.1
σ ) 0.3
OLS RR PCR EBFIR
5.6 3.5 2.1 1.3
49 10 5.7 4.2
noise content σ ) 0.5 σ ) 0.7 143 18 9.2 6.9
250 22 13 9.1
σ ) 1.0 547 32 18 13
Table 2. Comparison of the Gains of the Estimated FIR Coefficients for the Various Modeling Methodsa modeling method
σ ) 0.1
σ ) 0.3
OLS RR PCR EBFIR
9.969 9.959 9.964 9.964
9.966 9.936 9.955 9.959
a
noise content σ ) 0.5 σ ) 0.7 9.972 9.927 9.957 9.967
9.971 9.909 9.951 9.968
σ ) 1.0 9.969 9.887 9.945 9.957
The true gain ) 9.959.
Figure 3. Optimizing the number of principal components in PCR using cross validation.
Figure 4. Optimizing the RR parameter λ using cross validation.
Figure 6. Comparing the EBFIR estimation of FIR coefficients with existing methods. Figure 5. Optimizing the EBFIR parameter γ using cross validation. m
G)
hˆ i ∑ i)1
(34)
Note that these comparisons are possible because the noisefree model parameters are known for this simulated example. 6.3. FIR Model Estimation. Except in OLS, CV is used to optimize the estimated FIR models by dividing the data into
two sets (training and testing). Then, the number of principal components (in PCR), the parameter λ (in RR), and the parameter γ (in EBFIR) are optimized by minimizing the output prediction mean squared errors of the testing data. As an example, in the case where the noise standard deviation is 0.5, the optimum values for the number of principal components, λ, and γ are 34, 55, and 150, respectively, as shown in Figures 3-5. Note the difference in values between the optimum λ and
298
Ind. Eng. Chem. Res., Vol. 45, No. 1, 2006
γ. The optimum λ is less than γ, which makes sense. This is because RR assumes a zero-mean for the FIR parameters, and the estimate for the variance of the FIR coefficients under this zero-mean assumption, σ2/λ (from eq 9), should be larger than the estimated variance using EBFIR, η, where a more accurate mean is used. Mathematically, the above statement can be written as
σ2 >η λ
(35)
(36)
6.4. Comparison Results. To make a fair and statistically valid comparison between the different modeling techniques, a Monte Carlo simulation of 100 realizations is performed, and the results are presented in Tables 1 and 2. Table 1 compares the mean squared errors of the estimated FIR coefficients with respect to the noise-free values, and shows that EBFIR always outperforms all existing methods followed by PCR, RR, and finally OLS. This performance of EBFIR is consistent for all noise levels, which shows the robustness of the developed algorithm. Also, Table 2 shows that the estimated model gain using EBFIR is very comparable to and sometimes better than existing methods as in the case of a high noise level (σ ) 1.0). The advantage of EBFIR can be seen visually by comparing the estimated FIR coefficients in Figure 6, which shows that the EBFIR estimate has smaller variations around the true FIR coefficients than all other methods. This is because EBFIR does a better job shrinking the estimated FIR coefficients toward an empirically estimated mean vector. 7. Conclusions This paper presents an empirical Bayesian approach, called empirical Bayesian FIR (EBFIR) modeling, to deal with the collinearity problem often encountered in FIR modeling. Unlike ridge regression, which shrinks the FIR model parameters toward zero in order to reduce the uncertainty about their estimation, the developed EBFIR approach estimates a prior for the FIR coefficients and then shrinks the model parameters toward the mean of the prior in a Bayesian framework. The EBFIR approach is shown to outperform many existing methods, such as OLS, PCR, and RR, through a simulated example. The Bayesian solution of the FIR model is also computationally efficient, since it is a closed form solution. Finally, the Bayesian formulation of the FIR modeling problem is shown to be theoretically more general than those of OLS and RR, and it reduces to these methods under certain conditions. Acknowledgment Financial support from the United Arab Emirates University is gratefully acknowledged. Appendix: Derivation of the BFIR (MAP) Solution of the FIR Model Coefficients The MAP formulation of the FIR modeling problem is
{hˆ }MAP ) argmin (Y - Uhˆ )T(Y - Uhˆ ) + hˆ γ(hˆ - µ)T(hˆ - µ) (A1) Therefore, the objective function is
Taking the derivative of the objective function with respect to the model parameter vector, hˆ ,
dJ ) -2UTY + 2UTUhˆ + 2γ(hˆ - µ) dhˆ
(A3)
Setting dJ/dhˆ ) 0, we get
UTY + γµ ) UTUhˆ + γhˆ ) (UTU + γI)hˆ
Substituting γ ) σ2/η into the above inequality, we get
γ>λ
J ) (Y - Uhˆ )T(Y - Uhˆ ) + γ(hˆ - µ)T(hˆ - µ) (A2)
(A4)
Solving for hˆ ,
{hˆ }MAP ) (UTU + γI)-1(UTY + γµ)
(A5)
Literature Cited (1) Luyben. Process Modeling, Simulation, and Control for Chemical Engineers, 2nd ed.; McGraw Hill: New York, 1990. (2) Box, G. E. P.; Jenkins, G. M. Time Series Analysis, Forecasting and Control, revised ed.; Holden-Day, Ed.; McGraw-Hill Book Co.: New York and Maidenhead, England, 1976. (3) Ljung, L. System Identification: Theory for Users; AddisonWesley: Reading, MA, 1987. (4) Wise, B. M.; Ricker, N. L. Identification of Finite Impulse Response Models by Principal Component Regression: Frequency-Response Properties. Process Control Qual. 1992, 4, 77-86. (5) Dayal, B. S.; McGregor, J. F. Identification of Finite Impulse Response Models: Methods and Robustness Issues. Ind. Eng. Chem. Res. 1996, 35, 4078-4090. (6) Ricker, N. L. The Use of Biased Least-Squares Estimators for Parameters in Discrete-Time Pulse-Response Models. Ind. Eng. Chem. Res. 1988, 27 (2), 243. (7) Wold, S. Nonlinear PLS Modeling II: Spline Inner Relations. Chemom. Intell. Lab. Sys. 1992, 14, 71-84. (8) Frank, I. E.; Friedman, J. H. A statistical View of Some Chemometric Regression Tools. Technometrics 1993, 35 (2), 109-148. (9) Hoerl, A. E.; Kennard, R. W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 8, 27-52. (10) McGregor, J.; Kourti, T.; Kresta, J. Multivariate Identification: A Study of Several Methods. In AdV. Control Chem. Processes, Sel. Pap. IFAC Symp. 1991, Toulouse, France, 1991; IFAC Symp. Ser. 1992, n8. (11) Manton, H.; Krishnamurthy, V.; Poor, V. James-Stein Filtering Algorithms. IEEE Trans. Signal Process. 1998, 46, 9. (12) Gruber, M. ImproVing Efficiency by shrinkage: The James-Stein and Ridge Regression Estimators; Marcel Dekker: New York, 1998. (13) Nikolaou, M.; Vuthandam, P. FIR Model Identification: Achieving Parsimony through Kernel Compression with Wavelets. AIChE J. 1998, 44 (1), 141-150. (14) Robertson, A. N.; Park, K. C.; Alvin, K. F. Extraction of Impulse Response Data via Wavelet Transform for Structural System Identification. J. Vib. Acoust. 1998, 120, 252-260. (15) Robert, C. P. The Bayesian Choice: A Theoretic Decision MotiVation; Springer-Verlag: New York, 1994. (16) Gelman, A.; Carlin, J.; Stern, H.; Rubin, D. Bayesian Data Analysis; Chapman and Hall: London, 1995. (17) Nounou, M. N.; Bakshi, B. R.; Goel, P. K.; Shen, X. Bayesian Principal Component Analysis. J. Chemom. 2002, 16 (11), 576-595. (18) Nounou, M. N.; Bakshi, B. R.; Goel, P. K.; Shen, X. Process Modeling by Bayesian Latent Variable Regression. AIChE J. 2002, 48 (8), 1775-1793. (19) Kadane, J. Prime Time for Bayes. Controlled Clin. Trials 1995, 16, 313-318. (20) Maritz, J. S. Empirical Bayes Methods; Methuen & Co.: London, 1970. (21) Stephanopoulos, G. Chemical Process ControlsAn Introduction to Theory and Practice; Prentice-Hall: Englewood Cliffs, NJ, 1984.
ReceiVed for reView November 15, 2004 ReVised manuscript receiVed June 6, 2005 Accepted October 17, 2005 IE048897M