Errors-in-Variables-Based Modeling Using ... - ACS Publications

However, the major limitation of this technique has been the difficulty in identifying the actual and stable model parameters when the collinearity in...
1 downloads 0 Views 171KB Size
368

Ind. Eng. Chem. Res. 2005, 44, 368-380

Errors-in-Variables-Based Modeling Using Augmented Principal Components† P. Vijaysai,‡ R. D. Gudi,*,‡ and S. Lakshminarayanan§ Department of Chemical Engineering, Indian Institute of Technology, Bombay, Powai, Mumbai 400 076, India, and Department of Chemical and Environmental Engineering, National University of Singapore, Singapore, Singapore

The total least-squares technique has been extensively used for the identification of dynamic systems when both the inputs and outputs are corrupted with noise. However, the major limitation of this technique has been the difficulty in identifying the actual and stable model parameters when the collinearity in the causal data block leads to several “small” singular values. This paper proposes a novel multivariate tool, namely, the augmented principal components analysis (APCA), to deal with collinearity problems under the errors-in-variables formulation. On the basis of the level of noise in each measured variable, the proposed technique is divided further into simple and generalized augmented principal components. Some of the errors-invariables- and least-squares-based methods available in the literature have been shown as special cases of this APCA-based technique. The efficacy of the new technique over other conventional methods has been illustrated through representative case studies taken from the literature. 1. Introduction Success of automation in process industries is directly linked to the accuracy of the model that is deployed in closed-loop control schemes. The control-related applications of input-output models and in particular the linear models date back over the past 3 decades. Though the literature in the recent past records several variants of traditional input-output models having relatively complex structures, linear models are still indispensable because they not only are easy to deploy but also are fairly precise in the operating regions of interest. Nevertheless, linear modeling strategies have continuously evolved over time in order to comply with the potential difficulties encountered with modeling of chemical plant data. Conventional modeling methodology was centered upon the principles of ordinary least squares (OLS). The main assumptions and requirements of the OLS are as follows: (i) All input variables are independent of each other. (ii) All input variables are clean, and only the output variables are associated with noise. However, it is a well-known fact that process variables are often highly correlated (nearly collinear or cocurved). Second, all of the variables (inputs and outputs) of the process data are noisy because of measurement errors. Consequently, least-squares-based methods are not very good in treating such data. Some of the multivariate statistical tools, namely, principal component regression (PCR), latent root regression (LRR), and projection to latent structures (PLS), have been proven to be expedient in resolving problems arising from collinearity. A plethora of work has been reported in the literature regarding the † A shorter version of this paper appeared in the Proceedings of the American Control Conference, Denver, CO, 2003. * To whom correspondence should be addressed. Tel.: +91.22.25767204. Fax: +91.22.25726895. E-mail: ravigudi@ che.iitb.ac.in. ‡ Indian Institute of Technology. § National University of Singapore.

application of these methods. Ricker1 showed the first application of PCR and PLS techniques to dynamic model identification. These methods have also been extensively used for developing inferential process models.2-4 A dynamic PLS framework that essentially captures the dominant dynamic features of the process under study was proposed by Lakshminarayanan et al.5 The inherent ability of the PLS to extract hidden information from correlated measurements has been exploited for recursive parameter estimation under a closed loop.6,7 Recently, Flores-Cerrillo and MacGregor8 have used PLS for inferential adaptive control of batch reactors. As an alternative method to PLS, Betrand et al.9 have proposed LRR. The LRR seeks to maximize covariance between the principal components (PCs) of the regressor block with the response variables. In recent years, there has also been a gradual shift in the trend toward formulation of linear models, wherein the errors-in-variables (EIV) approaches are being considered to be more realistic than the least-squaresbased methods.10-13 Unlike the least-squares-based methods, the EIV methods assume noise to be associated with both the causal and response variables. The EIV-based methods have been used extensively for parameter estimation of time series models. Some of the popular tools used for system identification are generalized least squares and prediction error methods (PEM). These techniques use nonlinear numerical optimization, and under the assumption of the inputs being noisy, the minimization could be time-consuming and may even converge to local optima. An alternative and efficient approach that inherently assumes noise in all of the variables is the total least squares (TLS).14 Recently, the last PC analysis (LPCA) was shown to yield a class of estimators when additional constraints were imposed on TLS.15 Soderstrom and Mahata16 showed the comparison between the asymptotic covariance matrix of TLS estimates and the instrumental variable method (IVM). Though the TLS and its family are very powerful EIV techniques, they do not account for the collinearity in causal data.15,17 The model parameters using TLS are obtained by the singular value decomposition (SVD) of

10.1021/ie030671g CCC: $30.25 © 2005 American Chemical Society Published on Web 12/21/2004

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005 369

the augmented data block that includes both the causal variables and the output variable. The right singular vector corresponding to the smallest, and hence the last, eigenvalue describes the possible linear relationship between the columns under the EIV formulation. However, any collinearity in the causal block may yield two or more eigenvalues having similar (smaller) magnitudes, thus creating an ambiguous situation. This could pose difficulties in determining the last singular vector (last PC). As was already mentioned above, the PCR and PLS have been proven to be statistically efficient methods to model ill-conditioned data. However, a deeper analysis of these tools is necessary to realize the effect of linear transformations [original variables are compressed into a smaller set of orthogonal latent variables (LVs)] and subsequently their impact on noise components associated with the original variables. Many authors have interpreted these transformations in a number of ways, and these are discussed below. Qin18 describes PLS as a more robust technique than OLS by depicting its ability to filter noise in the correlated row space of the input. Nounou et al.17 describe PCR and PLS as EIV-based modeling strategies, which account for collinearity in the causal data block. Huang15 revisits a number of modeling techniques including PCR and PLS and ascribes the ability to treat “all of the variables equally” to only those estimators belonging to the TLS family. Therefore, it is necessary to review the existing methods and understand how they counter collinearity and noise with the original variables. In this paper, we analyze the issues related to collinearity handling by PCR, PLS, and LRR in the presence of noise affecting the independent (noncollinear) variables and qualify these techniques in terms of their ability to solve the EIV problem. Furthermore, as mentioned earlier, it is difficult to identify the last PC using conventional TLS methods when the causal block is collinear. VanHuffel and VandeWalle19 have proposed a subset selection algorithm for TLS when the causal block is collinear. Though this algorithm guarantees better stability and improved predictions, it is difficult to implement problems with a large number of variables. Fierro et al.20 proposed a truncated TLS (TTLS) method that can efficiently handle problems due to confounded variables. This method is efficient when the variance of noise is relatively the same in all of the variables and for a single response variable. A more reliable multivariate technique, which accounts for the unequal amounts of noise in the measured variables, is therefore necessary. This paper seeks to effectively address the above-mentioned problem through the proposition of the simple and generalized augmented PCs (SAPC and GAPC) strategies. We show that, like any other EIV approach, the proposed methods can be regarded as a two-step procedure; the variables of the causal block are filtered in the first step before they are regressed on the response variable. However, the collinearity becomes more prominent when the causal block is cleaned, thus making it rank-deficient. The resulting rank-deficient problem is proposed to be solved in our approach, using the 2-norm minimization of the regression coefficient.21 The proposed SAPC and GAPC algorithms are evaluated through simple illustrative examples as well as through simulations involving a representative inferential estimation problem and a nonlinear, multivariable reactor.

This paper is organized as follows: We first review the traditional PLS, PCR, and LRR methods in the context of the EIV formulation. Section 3 presents the formulation of the proposed SAPC algorithm. This algorithm is generalized to the case of nonequal noise variances (GAPC) in section 4. An assessment of the relative merits of the proposed algorithms when compared to the traditional algorithms is presented in section 5. Section 6 presents a theoretical comparison of these methods for estimation and prediction. Validation of the proposed methodology via simulations is presented in section 7, followed by the conclusions and summary of the proposed methodology. 2. Analysis of Traditional Methods In this section, we present and analyze the properties of PCR, PLS, and RR from the viewpoints of EIV-based modeling. Specifically, we analyze cases when the noise variances in causal and response blocks are equal. Throughout this paper, the underlined character in boldface roman type denotes a matrix and a column vector is denoted by a character in boldface roman type. The regressor block (causal block) is denoted as X ) [x1, x2, ..., xn] ∈ Rm×n where each xi, with 1 e i e n, is a measured variable. Similarly, the response block is denoted as Y ) [y1, y2, ..., yp] ∈ Rm×p. It is assumed that the number of samples m . n. The first r columns of X are given by X1:r. Similarly, the sample corresponding to the last (mth) row of xi, with 1 e i e n, is given by xi,m or Xm,n. The first r columns and rows of X is shown as X1:r,1:r. Im×n implies an identity matrix of size (m, n), and the null matrix of the same size is denoted as 0m×n. To begin with the general EIV form, let

X ) Xa + EX

and

Y ) Ya + EY

(1)

where Xa ) [x1a, x2a, ..., xna] ∈ Rm×n and Ya ) [y1a, y2a, ..., ypa] ∈ Rm×p are the noise-free causal and response variables, respectively. EX ) [eX1, eX2, ..., eXn] and EY ) [eY1, eY2, ..., eYp] are the errors associated with the noisefree variables of the causal and response blocks, respectively. The relationship between the actual causal and response variables is given by

Ya ) Xaθ

(2)

It is assumed that, for the analysis presented below, the errors are independently and identically distributed. Therefore

EXTEX ) σ2In×n

(3)

EYTEY ) σ2Ip×p

(4)

EXTEY ) 0n×p

(5)

and

All of the model parameters derived by treating causal and response blocks separately are denoted as either an overbar or an overhead dot. Note: The conditions stated in eqs 3-5 are applicable for the real data set only when the number of observations tends to infinity. PCR. Let Xa be a rank-deficient matrix (r < n), and it is assumed to be due to linear dependency in the

370

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005

columns. To simplify the analysis, only a single output is considered. The procedure followed for PCR modeling can be divided into two steps: a. The causal block Xis decomposed into n PCs (each of rank 1), of which only the first r are selected. Hence, PCA

X 98 {T h, P h}

(6)

X)T hP hT

(7)

such that

where T h ) [th1, ht2, ..., htn] ∈ Rm×n are the scores and P h ) [p j 1, p j 2, ..., p j n] ∈ Rn×n are the loadings of the X block. The blocks of selected scores and loadings are given by T h 1:r and P h 1:r, respectively. b. Instead of the original variables, the selected (r) PCs are regressed on the output. Therefore, using only the first r components, eq 8 can be written as

the NIPALS algorithm calculates each LV of the PLS in a single step and therefore does not suffer from problems due to convergence.1,7 However, to understand the noise-filtering characteristics of the PLS, a slight deviation from the traditional algorithm is proposed here and is shown in appendix A2. The algorithm presented in appendix A2 can be summarized as (a) decomposing the given causal block into PC scores and loadings and (b) maximizing the covariance between the PC scores and the response block. The second step could be construed as rotating the PCs of the regressor block so that they are more predictive of the response variable. Therefore, the results of PLS regression on T and y are given by PLS

{T h , y} 98 {Γ h, Ξ h, W h,B h}

(15)

XTXθˆ PCR ) XTy

(8)

such that Γ h ) [τj1, τj2, ..., τjn] ∈ and Ξ h ) [ζh1, ζh2, ..., ζhn] ∈ Rn×n are the PLS scores and loadings for the X block, respectively. W h ) [w j 1, w j 2, ..., w j n] ∈ Rn×n is called the weight matrix. B h ) [b h 1, b h 2, ..., b h n]T ∈ Rn is a vector of inner relation coefficients and is obtained as shown in eq 16. The regression coefficient of the PLS is

P h 1:rT h 1:rTT h 1:rP h 1:rTθˆ PCR ) P h 1:rT h 1:rTy

(9)

B h ) (Γ h TΓ h )-1Γ h Ty

(16)

θˆ PLS ) P h 1:rW h 1:r(Γ h 1:rTW h 1:r)-1B h 1:r

(17)

Rm×n

therefore given as

or

θˆ PCR ) P h 1:r(T h 1:rTT1:r)-1T h 1:rTy

(10)

Therefore, the PCR estimates for the noise-free inputsoutputs can be similarly obtained as

As seen above, Xa is assumed to be rank-deficient (r < n), and the noises associated with the variables are uncorrelated. Therefore, the last n - r PC scores shown eq 6 retain no information regarding the variables in Xa but only the noise associated with correlated variables. Hence, the last n - r scores are not predictive of y such that

θˆ a,PCR ) Pa,1:r(T h a,1:rTT h a,1:r)-1T h a,1:rTya

T h r+1:nTy ) 0n-r

If Xa is decomposed as PCA

Xa 98 {T h a, P h a}

(11)

(12)

It can further be shown (see appendix A1) that the following relationship, between the PCA decompositions for the noise-free and noisy cases, holds true:

P h )P ha

(13a)

T h TT h )T h aTT h a + σ2In×n

(13b)

and

Substituting eq 13 into eq 10 for the first r components and using conditions shown in eqs 3-5,

θˆ PCR ) P h a,1:r(T h a,1:rTT h a,1:r + σ2Ir×r)-1T h a,1:rTya (14) From eqs 12 and 14, it is evident that the estimates of PCR are different when the causal variables are noisy. The PCs of the causal block retain noise associated with the original variables, albeit the discounted components filter out noise with only the highly correlated measurements. Therefore, the PCR estimates derived as in eq 14 will converge to eq 12 only when the measurements of the causal block are noise-free. Partial Least Squares. The nonlinear iterative partial least squares (NIPALS) algorithm22 and kernel algorithms6 have been extensively used to build PLS models. For a model with a single response variable,

(18)

Therefore, when the above scores are used in the PLS algorithm, only the first r components contribute to the model, leading to Br+1:n ) 0n-r. Consequently, the PLS effectively uses only the first r PCs. When all of the r PCs are retained, the PLS algorithm will effectively use the same information content for the prediction, which the PCR did. Therefore, the variance of noise retained by the PLS coefficients is essentially the same as that shown in eq 14. Remark 1: The NIPALS algorithm uses directly the X block instead of scores. When X and Y are directly used, the resultant PLS model is given in eq 19. When PLS

{X, Y} 98 {Γ h, Ξ h, W 4 , B4 }

(19)

eqs 15 and 19 are compared, the following results can be deduced.

Γ h)Γ h

(20)

B4 ) B h

(21)

W 4 )P hW h

(22)

LRR. LRR23 is a variant of PCR and shares certain common characteristics with PLS.9 It differs from PLS in terms of its formulation in that it uses loadings of an augmented matrix Z that includes both the response

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005 371

and the causal block. Like PLS, it has been shown that LRR is more efficient than PCR when the smaller singular values corresponding to collinearity in the causal block are more predictive of the response. The augmented matrix Z for LRR is given by

Z ) [X, y]

(23)

PCA

Z 98 {T, P}

-1

θˆ LRR ) κ P1:n,1:rΛ1:r,1:r Pn+1,1:r

T

(25)

and

κ ) Pn+1,1:rΛ1:r,1:r-1Pn+1,1:rT

Because of its similarities with PLS, the LRR estimates can also be shown to be influenced by noise considerations, as seen in eqs 14 and 17. It is possible to conclude from the above illustration that, even though PCR, PLS, and LRR are proven to be very efficient methods to deal with collinearity related issues, they inherently assume the independent (noncollinear) variables as noise-free. Therefore, these methods are not adequate when all of the variables are observed with noise, and hence they are not EIV methods in the absolute sense. Remark 2: As seen in eq 23, LRR essentially analyzes the dependencies in the augmented block Z. Further, when the causal block X is collinear, the analysis with the augmented matrix leads to multi-collinearities. VanHuffel and VandeWalle19 classify the collinearity in the augmented block as predictive and nonpredictive collinearities. Both of these collinearities are associated with the smaller singular values of the augmented matrix Z. However, they can be distinguished based on the corresponding loadings of the augmented block. When causal data have multi-collinearity, both of the PLS and LRR19 techniques have the capability of extracting predictive collinearity from the data for the prediction. Very often it is seen that PLS uses relatively smaller numbers of LVs than PCR.27 3. SAPC In this section, we formulate and develop the SAPC methodology. The starting point for the formulation is the augmented matrix Z, as shown in eq 23. Because the objective of the proposed method is to deal with illposed problems under EIV formulation, it is necessary to characterize collinearity in the measured variables. Throughout the paper, we assume collinearity to be essentially nonpredictive. 3.1. Formulation of SAPC for a Single Output Problem. The formulation of SAPC shown in this section considers only a single dependent variable (y). The mathematical formulation of the TLS has already been well documented in the literature. Conventional methods of solving the problem for the full matrix case was approached by earlier workers.16,17 Here we present an alternate formulation that helps to interpret TLS in light of the SAPC approach. Case 1: Xa Is a Full Matrix (TLS). Let the augmented matrix of the noise-free data be

Z ) Za + EZ

(27)

EZ ) [EX, eY]

(28)

such that

Let θˆ be the estimate of θ generated by TLS such that

where

Λ ) TTT

(26)

The rank of Xa is n (r ) n), and hence Za and Z defined in eq 23 are of rank n and n + 1, respectively. It is assumed that the variance of errors is equal in all of the variables and they follow all of the conditions shown in eqs 3-5. Because the errors are uncorrelated with the variables in Z, eq 23 can be rewritten as

(24)

where (XaTXa + EXTEX)p j i ) λhip j i are scores and P ) [p1, p2, ..., pn+1] ∈ Rn+1×n+1 are loadings of the Z block. The regression coefficient is estimated as -1

Za ) [Xa, ya]

yˆ ) X ˆ θˆ

(29)

[X ˆ θˆ - yˆ ] ) 0m

(30)

Z ˆ ) [X ˆ , yˆ ]

(31)

Rearranging eq 29

Defining

and extended parameter vector l, eq 30 can be written as

θN Z ˆ l ) 0m

(32a)

Z ˆ l ) 0m

(32b)

or

where

l)

[θˆ T - 1]T ; θN ) |[θˆ T - 1]T| θN

(33a)

such that

lTl ) 1

(33b)

Let EZ ) Z - Z ˆ . The TLS seeks to estimate l subject to the condition that variance of EZ is a minimum. Therefore, the objective function with the constraint (33b) can be posed as

min J ) {lTEZTEZl - c(lTl - 1)} l

(34)

Differentiating with respect to l

∂J ) 2EZTEZl - 2cl ) 0n ∂l

(35a)

EZTEZl ) cl

(35b)

or

Substituting for EZ and simplifying eq 35b further,

(Z - Z)T(Z - Z ˆ )l ) cl

(36)

From eq 32b, it follows that

ZTZl ) cl

(37)

372

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005

Therefore, c and l are the eigenvalues and eigenvectors of covariance of Z. Solving eq 34 using eq 35b,

min J ) lTcl - c(lTl - 1) ) c l

(38)

Therefore, if J has to be a minimum, c in eq 37 has to be the last (smallest) eigenvalue and accordingly the eigenvector l should be chosen against the smallest eigenvalue. Remark 3: In eq 34, the Z is decomposed into T and P. Therefore, l is the last column of P such that Pn+1 ) l. From eqs 32 and 36, it could be deduced that

ˆ ZP1:nP1:nT ) Z

(39)

Remark 4: When X is an ill-conditioned matrix, there could be two or more (n - r + 1) small eigenvalues. Therefore, selection of the eigenvector that corresponds to the relationship between X and y becomes difficult. Case 2: Xa Is a Rank-Deficient Matrix. As discussed by MacGregor et al.,24 the rank-deficient problem can have an infinite number of solutions. In this context, we adhere to the principal of optimal solution, which is obtained by 2-norm minimization of the regression coefficient. When Xa shown in eq 2 is rank-deficient, θa is divided into

θa ) θR + θΝ

(40)

where θˆ R is the row-space component and θˆ Ν is the nullspace component. The optimal solution seeks to obtain the shortest length by minimization.28 2

2

2

|θˆ a| ) |θˆ R| + |θˆ Ν|

(41)

The pseudo-inverse, which uses SVD, is a natural choice to estimate the robust solution, and it can be shown to be identical with PCR. Because Pa is a unitary matrix, the reduced parameter (Ra,θ ∈ Rr) and score (Ta,1:r) in eq 12 can be given by Ra,θ ) Pa,1:rTθˆ a,PCR and Ta,1:r ) XaPa,1:r. In the EIV context, the objective function shown in eq 34 can be reformulated as

min J ) l T(ζ - ζˆ )T(ζ - ζˆ )l - c(l Tl - 1) l

where l ∈

Rr+1

(42)

is a reduced parameter vector, such that

l ) PTl

(43a)

l Tl ) lTPPTl ) 1

(43b)

subject to

and ζ ∈ Rm×r+1 is a reduced augmented matrix, given by

ζ ) ZP

(44)

The P ∈ Rn+1×r+1 is an operator chosen to compress the augmented matrix with a column size of n + 1 to r + 1. The objective of the operator is to reduce the dimensionality of the X block to its basis r such that the augmented matrix is of rank r + 1. The objective function shown in eq 42 is identical with eq 38, and when all of the steps as shown in case 1 are

repeated, the resulting solution will be

ζTζl ) cl

(45)

l ) Pl

(46)

such that

Specifically, let the operator P be chosen as

P)

[

P1:r 01:r 01:rT 1

]

(47)

We term the solution that is obtained as a result of the operator chosen in eq 47 as a simple augmented principal component. The operator proposed above performs a linear transformation of the causal block by which the information content in the block is compressed into a set of reduced and orthogonal vectors, namely, scores. Consequently,

ζ ) [T1:r, y]

(48)

Let the eigenvalues of XTX be λi, with 1 e i e n. They can also be determined by

Λ h 1:r,1:r ) T h 1:rTT h 1:r ) diag(λh1, λh2, ..., λhr)

(49)

Therefore, >l or p′r+1 can also be calculated as the last eigenvector of

[

]

Λ h 1:r,1:r T h 1:rTy p′r+1 ) λ′r+1p′r+1 h 1:r yTy yTT

(50)

(All of the model parameters of SAPC are denoted with a single prime). When the covariance of the modified augmented block is denoted as Σζ, eq 50 can be rewritten as

Σζp′r+1 ) λ′r+1p′r+1

(51)

Remark 5: When Xa is a full-rank matrix, eq 46 is still valid with r ) n for the operator as shown in eq 47. Remark 6: The parameters for the model can directly be estimated using eq 49 when the variance in scores and response is corrected for the error λ′r+1. It is given by

θˆ ) P h 1:r(Λ h 1:r,1:r - λhr+1Ir×r)-1T h 1:rTy

(52)

The proof of the above equation is given in appendix A.3. 3.2. SAPC for Multiple Outputs. The formulation shown above considers only a single response variable. Nevertheless, this can easily be extended for the cases where the numbers of response variables are more than one. For the multiple-output case, ζ is a matrix of size r + p. Therefore, when the causal block is decomposed into scores and loadings, we get PCA

ζ 98 {T′, P′}

(53)

where T′ ∈ Rr×p and P′ ∈ Rr×p. The reduced extended parameter matrix, l ∈ Rr×p ) Pλ′r+1r+1:r+p. As in the previous case, to exemplify the actual relationship between the original variables and the response, l needs to be modified further.

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005 373 Table 1. Algorithm for SAPC 1

2 3 4

Table 2. Algorithm for the Frisch Scheme

Construct the causal and response blocks from the given data. For the dynamic model identification, the causal block is constructed using the lagged values of process (input/output) variables. Decompose the causal block into scores and loadings as shown in eq 6. Construct the reduced augmented block using selected scores and response block. Deduce the extended parameter and later the true coefficient by correcting the reduced parameter for the initial rotation given to the causal block as in eq 54.

Hence, let l be divided further as l1 ) l1:r,1:p and l2 ) lr+1:r+p,1:p. The coefficient θˆ ∈ Rn×r could be obtained as -1

θˆ ) -P1:rl1l2

(54)

The above derivation is given in appendix A.4. The algorithm for SAPC is discussed in Table 1. 4. GAPC The estimates derived from SAPC, presented in the earlier section, are reliable when the noise level (variances) in each variable is fairly the same. However, sparing a few special cases, the variance of the noise can be significantly different among the measured variables. The problem of varying noise contribution has been extensively studied and documented in the literature for the well-conditioned matrices. Recently, Soderstrom and Mahata16 compared the estimation accuracy of the various techniques used for the dynamic model identification. They showed that the estimation accuracy of the Frisch scheme, which they viewed as a more advanced bias-compensating scheme, is comparable with that of the PEM and much superior to that of the IVM technique. In this paper, the Frisch scheme proposed by Beghelli et al.25 is revisited below and is further modified to address collinearity problems; we term the resulting algorithm as the GAPC analysis. For given causal and response data, the Frisch scheme estimates the maximum possible variance of the noise associated with each variable. Let σmax,Xi2, with 1 e i e n, and σmax,y2 be the variances of maximum noise associated with the variables of causal and response blocks, respectively. They are determined as

1 2 3 5 6 7 8 9 10

Determine XZ. Set i ) 1. Determine σmax,Xi2 as shown in eq 55. Evaluate σXi2 ) kiσmax,Xi2, such that 0 e ki e 1. Provide the correction shown as below to the variance of Xi in ΣZ as ΣZi,i ) ΣZi,i - σXi2 Increment i by 1, i ) i + 1. Repeat step 3 of the process until i ) n. For i ) n + 1, set ki ) 1. Determine σy2 from eq 56.

Σ ˆ Z is a singular matrix, and the eigenvector corresponding to the zero eigenvalue Σ ˆ Z is the extended parameter vector. Therefore, the Frisch scheme can be mathematically posed as

(ΣZ - ΣE)pn+1 ) λn+1pn+1

(58)

where ΣE is chosen such that λn+1 ) 0 and l ) pn+1. The most important step in this scheme is the choice of ki, with 1 e i e n, which is shown in step 5 of Table 2. The correction provided to each variable is the fraction of the maximum possible error associated with the variable, which is decided by the value ki chosen. Once all of the values of ki for n different variables in the causal blocks are determined, the necessary correction for the response is obtained by setting kn+1 ) 1. In the above formulation, Xa is treated as a wellconditioned block. The only linear dependency in Z is the subtle relationship between Xa and ya. Therefore, the proposed GAPC seeks to provide a solution to the ill-conditioned causal block by suitably exploiting the knowledge of the process in order to decide the value ki. Many authors recommend incorporating the knowledge of the process during model identification.15,17 In this context, the knowledge of the process is attributed to the variation in the measured values of the inputs and outputs around the mean when the inputs are held constant. Let Σ ˜ E ) diag[σ˜ X12, σ˜ X22, ..., σ˜ Xn2, σ˜ y2] be the matrix that includes variance of the measured variables around the mean values. Now the objective function can be formulated as n

{ki, ΣE} ) arg min[ ki

(σX 2 - σ˜ X 2)2 + ∑ i)1 i

i

(σy2 - σ˜ y2)2], with 1 e i e n (59) subject to the constraint

σmax,Xi2 ) 2

σmax,y )

det(ΣZ) det(Σ|Xi) det(ΣZ) det(ΣX)

(55)

(56)

where ΣZ ) ZTZ and ΣX ) XTX. Σ|Xi ∈ Rn×n implies that ΣZ contains every element in ZTZ except the rows and columns corresponding to the variable Xi. The Frisch scheme is shown in Table 2. Denoting ΣE ) diag[σX12, σX22, ..., σXn2, σy2] ∈ Rn+1×n+1. In the Frisch scheme, as shown in step 6 of Table 2, the covariance of the augmented block is

Σ ˆ Z ) ΣZ - ΣE

(57)

Σ ˆ Zpn+1 ) 0 It must the noted that the Frisch scheme only generates the upper bounds on the noise variances in each variable under the assumption that all other variables are noise free. This may be used along with a priori knowledge about the noise variances, which may by themselves not be correct and may need to be rectified further. The optimization problem defined in eq 59 seeks to generate estimates of the noise variances such that information about the variances from the Frisch scheme as well as through a priori knowledge is accommodated to arrive at the estimates of the noise variances. In addition, the optimization procedure includes a constraint that requires the noise-free covariance matrix to be rankdeficient. Equation 59 seeks the value of ki that minimizes the difference between the estimated and available

374

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005

Table 3. Algorithm for GAPC 1 2 3 4 5 6 7

Formulate ΣZ. Formulate Σ ˜ E using the information available on errors associated with each variable. Determine Σ ˜ Z using the Frisch scheme. Decompose Σ ˜ Z into eigenvalues and eigenvectors (loadings). If Σ ˜ Z has two or more eigenvalues close to zero, estimate the pseudoscores. Reconstruct the pseudo-augmented matrix using loadings and pseudoscores. Subject the pseudo-augmented matrix to SAPC to obtain the final model parameters.

information on the error with each variable in the model. It can be shown that, when the errors in the measured variables are unequal, the PCA fails to recognize collinearity in the variables. The proposed technique partially uses the knowledge of the errors associated with the variables, and when the variables are corrected for those errors, the hidden collinearity in the causal data surfaces. The collinearity in the causal block can be realized when Σ ˆ Z yields two or more eigenvalues close to zero. Therefore, this situation can again lead to ambiguity in choosing the last eigenvector as seen in the case of TLS. Therefore, we formulate the solution for the above problem in two steps. a. Provide the necessary corrections to all of the variables in the augmented block and reconstruct the cleaned causal and response variables so as to obtain a cleaned augmented block. b. Derive the relationship between the cleaned causal and the response block. However, the first step is not as straightforward as that in the case of TLS or SAPC. In the case of SAPC, the operator shown in eq 47 decomposes the augmented block into r number of scores and responses. In the present discussion, the scores cannot be obtained directly because the loadings are obtained by decomposing the modified/corrected covariance of the augmented block. To obtain the scores, it is necessary to obtain the variables of Z, which have been corrected for errors. Mathematically from eq 58, it follows that Σ ˆ pn+1 ) λn+1pn+1. However, Zpn+1 * tn+1. Therefore, we propose an alternate method to obtain what we term as the pseudoscores. The GAPC decomposes the given augmented block into the pseudoscores (T ˇ ) and loadings as shown in eq 60, from which the first r pseudoscores and loadings are used to reconstruct the pseudo-augmented block (Z ˇ ∈ Rm×n+1). (See appendix A.5 for the derivation of this equation.) GAPC

Z 98 {T ˇ , P}

(60)

Z ˇ )T ˇ 1:rP1:rT

(61)

The relationship between the causal and response variables using the reconstructed pseudo-augmented block can be obtained by subjecting Z ˇ to the SAPC analysis discussed in the earlier section. The algorithm for GAPC is given in Table 3. The formulation of GAPC, though shown for a singleoutput case, can be easily extended to multiple outputs using an identical methodology as discussed above. 5. Analogy between Various Modeling Tools Using the GAPC structure, the analogy between different modeling tools can now be established by posing all of the techniques as an eigenvalue problem.

The analyses presented below are based on whether the technique assumes collinearity in the causal block or not. Accordingly, they are divided into two cases. A. Xa Is a Full-Rank Matrix. It can be shown that GAPC reduces to the form shown in eq 58 when Xa is a full-rank matrix. With this viewpoint, the analysis can be presented as below. a. OLS: If

ΣE ) σmin2

[

0n×n 0n 0nT 1

]

for which λn+1 ) 0, the GAPC reduces to OLS. It can be shown that σy2 ) σmin2 is the least-squares prediction error. b. TLS: If ΣE ) 0n+1×n+1, the formulation equation (58) reduces to the traditional TLS. λn+1 is the minimum value of correction offered to ΣZ such that ΣZ becomes singular. c. LPCA: Let ) ωE ) ΣE-1/2 ) diag(σX1, σX2, ..., σXn, σy). Therefore, eq 58 can be rearranged as

(ωΣω - In+1×n+1)ω-1pn+1 ) λn+1ω-1pn+1

(62)

Let pw ) ω-1pn+1 and λw ) λn+1 + 1. Equation 62 can be further expanded as

ωZTZωpω ) λwpw

(63)

Equation 63 is a special case of LPCA formulation,15 where ω is called as the weight matrix. B. Xa Is a Rank-Deficient Matrix. a. SAPC: The GAPC formulation directly reduces to the SAPC form when ΣE ) σ2. In other words, when the noise variances in all variables are the same, GAPC reduces to SAPC. b. PCR: PCR also assumes errors in all variables to be of the same variance. Therefore, similar to OLS, PCR can be shown as a variant of GAPC when Xa is rankdeficient. This derivation is discussed elaborately in the next section. 6. Estimation and Prediction The proposed SAPC and GAPC are the techniques to develop models for the EIV case, particularly when the data are ill-conditioned. In the earlier sections, the emphasis was mainly on parameter estimation under an EIV framework. In this section, the applications of proposed EIV methods are discussed from the viewpoints of estimation and prediction. The PCR and PLS algorithms have been extensively used for static and dynamic model identification particularly when the data are ill-conditioned. These techniques reduce the prediction error by selectively extracting the information content in the data, so that the variance in estimated parameters is reduced. It is therefore necessary to compare the prediction accuracy of the proposed methods with respect to PCR/PLS. The analysis presented below assumes an ill-conditioned matrix X and single response y. Let the number of PCs chosen to predict y be equal to r. The prediction error of the PCR model can be estimated as the difference between the y and yˆ PCR, where yˆ PCR is the response obtained by regressing PCs on y. However, the proposed method seeks to compare the predictive ability of PCR with SAPC by unveiling the underlying relationship between them. Therefore, the modified augmented block as shown in eq 48 is recalled here for further analysis.

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005 375

Using the theory of matrices,25 the sum of square prediction error (SSPE) of the PCR model is obtained as the ratio of the determinant of the modified augmented block’s covariance to the scores’ covariance.

SSPEPCR )

det(Σζ)

(64)

det(Λ h)

where r

det(Λ) )

λhi ∏ i)1

(65)

One way of obtaining the determinant of Σζ is by diagonalizing eq 51. For this, we suggest the use of an operator shown below in eq 66, which converts Σζ into diagonal form.

Θ)

[

Ir×r T

-y T hΛ h

-1

0r 1

]

(66)

such that

ΘΣζΘT )

[

Λ h r×r 0r 0rT δPCR

]

(67)

The determinant of Σζ is the product of all diagonal elements shown in eq 67. r

λhi)δPCR ∏ i)1

det(Σζ) ) (

(68)

Therefore, when eqs 68 and 65 are substituted into eq 64,

SSPEPCR ) δPCR

(69)

Therefore, the term δ in eqs 67 and 68 implies the prediction error estimated from the PCR model, which will be zero if yˆ PCRTyˆ PCR is substituted for yTy in eq 50 such that T

T

yˆ PCR yˆ PCR ) y y - δPCR

(70)

When yˆ PCRTyˆ PCR is substituted for yTy in eq 50, λ′r+1 will be zero and accordingly the PCR parameters can be estimated using l. Therefore, to compare the predictive abilities of PCR and SAPC on an equal footing, we simplify eq 50 (see appendix A.6 for the derivation) as r

(thiTy)2

∑ h i)1 λ

i - a

) yTy - b

(71)

where a and b are the two constants such that (1) a ) 0 and b ) δPCR for PCR and (2) a ) λ′r+1 and b ) λ′r+1 for SAPC. In the case of SAPC, the variable λ′r+1 (0 e λ′r+1 < λhi) appears on both sides of eq 71, whereas δPCR appears only in the latter part. Therefore, if the equality condition has to be valid, δPCR has to always be greater than λ′r+1. However, λ′r+1 by itself does not exemplify the prediction error in the case of the response variable. Because SAPC is an EIV approach, λ′r+1 is equal to the sum of variance of errors between the original and reconstructed principal components as well as the

response variables. Therefore, the prediction error with respect to the response in particular can be identified as

δSAPC ) λ′r+1lr+12

(72)

l is the last eigenvector and is in normalized form. Therefore, lr+1 is always less than 1. Thus, eq 72 implies that the sum of square prediction error of SAPC is less than that of PCR or, in other words, the predictive ability of SAPC is better than PCR.

δSAPC < δPCR

(73)

Equation 73 should not be regarded as the better estimation capability of SAPC over PCR. This is not generally true, and some limitations of SAPC are discussed as below; these limitations are true for any EIV method. Let X and y given in eq 1 be regarded as the calibration data set. The SAPC model is developed for these data as shown in Table 2. The estimation problem can now be posed as two different cases as below. Case 1: Let XNew and yNew be the new causal and response variables from which yˆ New needs to be estimated using model parameters derived from calibration data. Case 1 can be regarded more as a reconciliation problem using the SAPC approach and can be presented as follows. (i) Remove the collinearity in XNew such that T h 1:r,New ) XNewP h 1:r. This step ensures the removal of errors in XNew that mask the collinearity. However, the errors in other variables are still retained in T h 1:r,New, which is removed in the next step. (ii) Augment T h 1:r,New with yNew such that ζNew ) [T h 1: r,NewyNew]. The errors in both the causal and response variables can be removed as ζˆ New ) ζNew(I - ll T). (iii) Reconstruct the original variables such that Z ˆ New ) [X ˆ NewyNew] ) ζˆ NewPTP′1:rT. The Z ˆ New described in three steps above can directly be obtained from eq 74 below:

ˆ NewyNew] ) ZNewPP′P′TPT Z ˆ New ) [X

(74)

yˆ New thus estimated satisfies the conditions shown in eq 73. However, it is to be noted that, in order to get yˆ New, it is necessary to have both XNew and yNew available. Case 2: Only the new casual variables (XNew) are available, from which yˆ New needs to be estimated using model parameters derived from the calibration data. This inferential estimation problem is the most commonly encountered problem in process industries. Because yNew is not available, eq 74 is no longer valid. One possible approach could be to use the parameter θˆ LPCA derived from calibration data such that

yˆ New ) XNewθˆ LPCA

(75)

However, in doing so, yˆ New is directly brought into the column space of XNew, thus treating XNew ) X ˆ New. In other words, eq 75 treats all of the variables in XNew as cleaned variables, which is not true. We propose a solution to the above problem that uses the error information derived from the calibration model. If EX and EY are the errors with the causal and response variables of the calibration set, then µXT ) E(EX)T ∈ Rn and µy ) E(EY). If the errors in new variables

376

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005

possess the same statistical properties as those of the calibration set, then the new inferred response is

yI,New ) XNewθˆ SAPC

(76)

yˆ New ) yI,New - µy

(77)

X ˆ New ) XNew - µX

(78)

such that

and

However, eq 73 may not necessarily hold good for the estimation obtained through eq 77. Though the techniques such as PCR and PLS yield biased coefficients, they may yield better estimations than EIV models specifically when the response variables are not known and need to be inferred. 7. Case Studies In this section, the efficacy of the proposed technique has been illustrated using a simulated example and two representative applications: (i) an inferential estimation problem and (ii) a nonlinear reactor, both taken from the literature. The technique is compared with various approaches recommended in the literature for illconditioned problems. 7.1. Simulated Example. Let us consider a linear model of the form

Ya ) Xaθ Xa ) [x1a, x2a, x3a, x4a] such that

x2a ) x3a + x4a

[ ]

and

2 2 θ) 1 1 Y ) Ya + EY

-2 -1 -0.5 -0.5

and

X ) Xa + EX

Case 1: Variance of Error Is Equal in All of the Variables. For the simulation, we use an additive and uncorrelated noise sequence. The variables of Xa are generated in MATLAB as random numbers of zero mean and unit variance with a sample size of 1000. As stated in the problem, the second column is constructed as a linear combination of the third and fourth numbers. To generate an uncorrelated noise sequence, we propose the following method. 1. Construct an augmented matrix Za ) [Xa, Ya]. 2. Determine the left singular vector of Za, subjecting Za to SVD. The purpose of using SVD on the augmented data is to generate uncorrelated noise for the simulation. It is to be noted that the row and column size of left singular values (LSVs) is equal to 1000 and all columns are of unit variance. The first six columns of the LSV are correlated with Za, whereas all of the remaining columns (7-1000) are completely uncorrelated with the variables of Za and to each other. Therefore, any column

Table 4. Coefficients Obtained Using Different Methods PCR 1.95 1.95 0.98 0.98

PLS

-1.94 -0.97 -0.49 -0.49

1.95 1.95 0.98 0.98

-1.94 -0.97 -0.49 -0.49

TLS 2.00 a1 a2 a3

SAPC

-2.00 b1 b2 b3

2.00 2.00 1.00 1.00

-2.00 -1.00 -0.50 -0.50

of the LSV after the first six can be chosen as the noise vector. For the simulation, the 7-10th columns of the LSV were chosen as EX and the 12th and 13th columns as EY. In addition to the requirements stated in eqs 3-5, the errors generated above will also satisfy the following property:

ZaTEz ) 0n+p×n+p

(79)

The errors thus generated using SVD need not be Gaussian in nature. Additional modifications are necessary to generate Gaussian noise. The coefficients obtained using PCR, PLS, TLS, and SAPC are reported in Table 4. For determination of the number of PCs or LVs, the above data were divided into five parts. The number of PCs and LVs to obtain minimum cumulative prediction error sum of squares was found to be 3 for the PCR and PLS models, respectively. As discussed in section 2, the estimates obtained from the PCR and PLS models were found to be identical because of the above-discussed noise properties. It could also be realized through the simulation that both of the estimates derived from the PCR and PLS models tend to drift away from θ when the signal-to-noise ratio (SNR) decreases. These results are in accordance with the conditions derived through eqs 14 and 17. The TLS algorithm lacked repeatability in terms of yielding identical parameters. The estimates were found to be different from the previous one each time the simulation was run. However, the TLS always yielded parameters as shown in Table 4, which satisfy the properties a1 - a2 ) a1 - a3 and b1 - b2 ) b1 - b3 such that a2 ) a3 and b2 ) b3, which satisfy the collinearity constraints of the problem. Of all of the methods discussed above, SAPC was found to yield consistent estimates irrespective of the SNR. Case 2: Variance of Error Is Different for Each Variable. Continuing with the same example, the variance of error (σ) with each variable was assumed to be of the following form:

(

Σ ˜ E ) diag σ2,

)

σ2 σ2 σ2 σ2 σ2 , , , , 2 3 4 5 6

The first four terms in the above equation describe the variance of errors with the variables of the causal block and the last two that for the response variables. The parameters obtained using various techniques are reported in Table 5. When the errors associated with each variable differ, it could be seen that PCR and PLS gave nonidentical estimates. However, they again depend on the value of σ. The estimates derived from SAPC were much closer to θ than the other three methods, and GAPC consistently yielded the correct value of θ for all 100 runs. This example intended to compare some of the popular methods in the literature to deal specifically with collinearity problems (PCR and PLS) and EIV formula-

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005 377 Table 5. Average of All Estimates Derived from Different Methods (Based on 100 Simulations) PCR 1.57 1.97 0.96 0.97

-1.57 -0.98 -0.48 -0.48

PLS 1.58 1.83 1.22 1.23

TLS

SAPC

-1.57 1.65 -1.64 1.84 -0.90 -16.7 16.41 2.04 -0.67 19.70 -17.94 1.01 -0.68 20.04 -18.17 1.01

-1.82 -1.04 -0.51 -0.51

GAPC 2 -2 2 -1 1 -0.5 1 -0.5

Table 6. Operating Conditions of the Distillation Column under Steady State feed rate feed composition reflux flow boilup distillate flow rate distillate composition bottom product bottom product composition

{

EPV ) 1 -

1 kmol/min 0.5 (mole fraction) 2.706 kmol/min 3.206 kmol/min 0.5 kmol/min 0.99 (mole fraction) 0.5 kmol/min 0.01 (mole fraction)

tions (TLS). For the ill-conditioned EIV problem as shown above, the parameters derived from the PCR and PLS models tend to deviate more from θ when more noise is added to the variables. On the contrary, even with the identical error levels in all of the variables, the collinearity mars the estimation capability of TLS. The proposed SAPC and GAPC are the techniques that can efficiently manage collinearity problems under EIV formulation with identical and nonidentical noise in all of the variables, respectively. Note: The noise generated in this case follows all of the conditions mentioned in eqs 3-5 and 79. However, it is also possible to illustrate the efficacy of the proposed techniques over other methods by using any other ways of random and additional noise generation. 7.2. Inferential Estimation. In this section, we extend the application of the proposed techniques to an inferential estimation problem for a high-purity distillation column of Mejdell and Skogestad.2 The description of the column is as follows. The column has 41 theoretical stages including a reboiler and a total condenser. The feed enters the 20th tray of the column as a saturated liquid. The feed is a binary mixture with a relative volatility of 1.5. The maximum temperature difference across the column is 13 °C. The nominal operating condition of the column is reported in Table 6. We illustrate the efficacy of the proposed methods on the static estimation problem. The predictive ability of the proposed methods is compared with that of the PCR, PLS, and EIV methods. In this study, the composition of the distillate is inferred using only the tray temperature. Though the temperatures of the top trays are very strongly correlated with the distillation composition, they do not show much variation in their values. Therefore, temperatures recorded from the 6th to 20th stages were used as the inputs to the model. A total of 200 samples were generated from the simulation by varying the reflux and the boilup rate by a maximum of 3% around the nominal operating value. The steady-state observations were assumed to be corrupted with Gaussian measurement noise to the extent of 5% of the true value. The first 150 samples were used for constructing the model (calibration set) and the remaining 50 for crossvalidation. The models were constructed using PCR, PLS, TLS, and SAPC. The results of each model are presented in Table 7 in terms of explained prediction variance (EPV) given by

(y - yˆ )T(y - yˆ ) (y - ymean)T(y - ymean)

}

× 100

where y and yˆ are the measured and predicted values of the response data for cross-validation and ymean is the mean of the calibration data. The causal (temperature) and response (composition) variables were corrected for the mean. The number of components for PCR and PLS were chosen by the standard “leave out one method”. For PCR, the required number of PCs was found to be 5, whereas for PLS, only 3 LVs were found to be adequate. When the same procedure was adapted to SAPC, the number of PCs to be retained was found to be 9. For TLS and SAPC, the measurement noise for all of the temperatures was assumed to be constant, though this may not be strictly valid. Through this assumption, eq 52 can directly be implemented without any a priori knowledge about the errors in the quality measurements. The results presented above show that although the performances of PCR and SAPC are comparable, PLS exhibits a much better predictive ability than SAPC. As was already discussed, although the model coefficients of PCR and PLS are biased, they always tend to minimize the prediction errors. On the contrary, the results obtained through SAPC could have been better if the temperature data for cross-validation had been noise-free. Figure 1 shows the difference in performance of SAPC with respect to TLS. The poor estimation ability of TLS is essentially due to the collinearity in the causal data. 7.3. Dynamic Model Identification. In this section, we consider the continuous stirred tank reactor (CSTR) of Dayal and MacGregor6 for an irreversible exothermic reaction:

AfB The model equations of the reactor are as follows:

Reactant mass balance: V

dCA ) FICA - FOCA - VkOe-EA/RTCA dt

Product mass balance: V

dCB ) -FOCB + VkOe-EA/RTCA dt

Reactor energy balance: VFSCPS

dT ) FSCPSFITf - FSCPSFOT + V(dt ∆HR)kOe-EA/RTCA + UAH(TCJ - T)

Cooling jacket energy balance: VCJFWCPW

dTCJ )m ˘ CPW(TI - TCJ) + UAH(T - TCJ) dt

The specific parameters used for the CSTR are given in Table 8. The conversion y1 ) (CAO - CA)/CAO and the temperature of the reactor (y2) are the controlled variables. The manipulated variables are the feed rate (u1) and cooling water flow rate (u2). The sampling rate is chosen to be 20 s. In the simulation presented below, we seek to

378

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005

Figure 2. SRCs for y1 and u1 calculated using different tools. Figure 1. Steady-state inferential estimation using TLS and SAPC. Table 7. Analysis of Standard Tools for Steady-State Inferential Estimation 1 2

technique

EPV

PCR PLS

92.5 93.8

3 4

technique

EPV

TLS SAPC

37.0 92.1

Table 8. Parameters Used for CSTR symbol AH CAI CPS CPW EA FI FO ∆HR kO R TI TF U V VCJ FS FW

parameter, units

value

heat-transfer area, m2 concentration of species A, kg/m3 specific heat capacity of species A and B, J/kg‚K specific heat capacity of water, J/kg‚K activation energy, J/mol inlet feed flow rate, m3/s outlet flow rate, m3/s heat of reaction, J/kg Arrhenius rate constant, s-1 gas law constant, J/mol‚K entering cooling water temperature, K temperature of the inlet stream, K overall heat-transfer coefficients, W/m2‚K volume of CSTR, m3 volume of cooling jacket, m3 density of species, kg/m3 density of water, kg/m3

5 866 1.791 4.181 60000 0.02 0.02 -140.0 4 × 108 8.314 290 293 30 1 0.2 866 998

develop a model between y1 and u1 and u2. A dead time of five sampling intervals between the output and the two inputs is deliberately introduced to highlight the relative modeling aspects of the above strategies; this delay is assumed to be unknown as far as the identification is concerned. This assumption also is in keeping with a real industrial scenario where the dead time is seldom known precisely. To identify the dynamic model, the inputs were given in terms of simple step changes at a regular interval of 5 min. The maximum step change was set to be 5% of the nominal input values. The total duration of the experiment was set to be 2.5 h. Unlike the procedure adopted by Dayal and MacGregor,6 which assumed equal noise variances, iid measurement noises of differing variances up to 4% and 8% were added to u1 and u2, respectively. Similarly, the output y1 was assumed to be corrupted with 5% iid noise. To determine the “true” SRCs for comparison, the following ARX model was identified from the data generated using the procedure discussed above:

A(z-1) yt ) B1(z-1) u1,k + B2(z-1) u2,k + et

Figure 3. SRCs for y1 and u2 calculated using different tools.

where

A(z-1) ) (1 + a1z-1 + ... + anaz-na) -1

Bi(z ) ) bi,1z

-1

+ bi,2z

and -2

+ ... bi,nbz-nb

The model order was chosen to be sufficiently large (na ) 12 and nb ) 12) so that the effects due to nonstationary errors and dead time are absorbed in the model. It could be noticed here that the total number of variables in the causal block is 36. To determine the above ARX model, the plant was perturbed in a multipleinput single-output fashion by imposing two uncorrelated inverse repeat sequence signals.26 The decision on how many components to retain for each case was made based on, as in the earlier case studies, use of the “leave out one method”. The models obtained using different identification techniques were compared in terms of the step response coefficients (SRCs), with the SRCs obtained from the above ARX model, assuming that the latter represented the true plant. Figures 2 and 3 show the SRCs calculated using PCR, PLS, and GAPC between y1, u1 and y1, u2, respectively. It could be seen in the figures that none of the models identify the true gain of the system. This was mainly because of the deliberately insufficient excitation in the process (data from a simple step test was used for identification). However, as seen in Figure 2, the PLS and PCR models fail to capture the underlying dynamics between y1 and u1. This is essentially due to the presence of dead time and different extents of noise in each measured variable. As such, PLS and PCR, being biased estimators, do not yield accurate estimates of the plant parameters. The bias in the parameters is further aggravated by the presence of the delay. An inaccurate

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005 379

identification of the delay results in a further bias in the parameters, using PLS and PCR. On the other hand, the GAPC formulation was able to estimate the noise variances and remove them from the causal and output blocks. The ARX estimates obtained from the GAPC algorithm showed very small coefficients with the terms associated with the delay and also were able to identify the parameters associated with the dynamics fairly accurately, as seen in the figures. It should also be noted that when the input data were shifted with respect to time to compensate for the delay (i.e., when data were realigned to remove the effect of dead time), the PCR and PLS models performed relatively better but still gave some bias because of the unequal noise variances. The biased regression methods, as seen in the previous example, can improve the yield estimation compared to the EIV models. However, in the context of dynamic model identification, estimating “true” parameters will always help to understand the dynamics of the process better and thus achieve improvement in the control performance. As is seen in the illustrative example, the accuracy of the model has to be viewed in the context of whether the model is to be used for output estimation or n-step-ahead prediction. The results from the examples convey that the biased regression methods yield models that can perform better for output estimation. In contrast, bias in model parameters is oftentimes not acceptable. Typical examples are in n-step-ahead prediction based control or in a fault diagnosis and detection framework, where it may be crucial to decide if the model parameters have deviated. In such a case, the EIV-based SAPC and GAPC methodologies are more appropriate and would exhibit a better performance. 8. Conclusions A new EIV-based methodology for identification and parameter estimation of process systems was proposed in this paper. The proposed methodology specifically focused on the model building between cause-and-effect data blocks that were ill-conditioned (collinear) and noisy, with equal and unequal noise variances. A two-step-based methodology to remove noise and estimate the parameters was proposed. The relative merits of the proposed methodology over the traditional PLS and PCR algorithms were theoretically established. Validation exercises carried out on representative systems taken from the literature demonstrated the utility of the proposed algorithm for modeling in the presence of noise and collinearity. Poor experimental conditions and propagation of noise through feedback are some typical characteristics observed in closed-loop control. Identification under such conditions using the above approach is a plan for further research. Appendix A.1 The eigenvalues and eigenvectors of covariance of X are

XTXp j i ) λipi, 1 e i e n

where

λha,i ) (λhi - σi2)

p j i in eq 1.3 is also the eigenvector of XaTXa, and therefore

j a,i p ji ) p

T h aTT ha ) T h TT h - σ2In×n

P h )P ha

T h TT h )T h aTT h a + σ2In×n

and

Appendix A.2: Alternate NIPALS algorithm for PLS A2.1. PLS Algorithm. 1. Decompose the given causal block into scores and loadings. PCA

X 98 {T h, P h} 2. Initialize E0 ) T h and set i ) 1. 3. Determine the weight vector as a normalized covariance of Ei-1 and y; w j i ) Ei-1Ty/|Ei-1Ty|. 4. Determine the PLS score by rotating Ei-1 by wi; τji ) Ei-1w j i. 5. Calculate the PLS loading by regressing the score on Ei-1; ξhi ) [τjiTEi-1/τjiTτji]T. 6. Calculate the residual; Ei ) Ei-1 - τjiξhiT. Increment i by 1, and repeat step 3 until i ) n. Appendix A.3 When l ) p′r+1 is substituted, eq 49 can be written as

[

]

Λ h 1:r,1:r T h 1:rTy l ) λ′r+1 h 1:r yTy yTT

(Λ h 1:r,1:r - λhr+1Ir×r)l1:r + lr+1T h 1:rTy ) 01:r -

l1:r ) (Λ h 1:r,1:r - λhr+1Ir×r)-1T h 1:rTy lr+1 l1:r θˆ ) -P h lr+1

(3.2) (3.3) (3.4)

Appendix A.4 To simplify the analysis, let us consider Xa and Ya instead of X and Y. Let Xa be a rank-deficient matrix, and therefore the reduced augmented block is given by ζa ) [T′a, Ya]. Therefore, eq 3.1 takes the form

[

Λ h 1:r,1:r YaTT h 1:r

]

T h 1:rTYa l ) λ′ili YaTYa i

(4.1)

Because Ya is in the range of Xa, the last p eigenvalues are zero. Hence, for r e i e r + p

(Λ h 1:r,1:r - λ′i)l1:r,i ) -T1:rTYalr+1:r+p,i (1.3)

(3.1)

The above equation can be rearranged as

Substituting eq 3 into eq 1.2 and rearranging

XaTXap j i ) λha,ip ji

(1.6)

Therefore, from eqs 1.5 and 1.6

(1.1)

(1.2)

(1.5)

Because λha,i ) hta,iThta,i and λhi ) htiThti, it follows that

Equation 1.1 is reduced to eq 1.2 when the noise is uncorrelated with the variables.

(XaTXa + EXTEX)p j i ) λhip ji

(1.4)

or

(4.2)

380

Ind. Eng. Chem. Res., Vol. 44, No. 2, 2005

θˆ a,j ) -P h a,1:rl1:r,ilr+1:r+p,i-1, 1 e j e p

(4.3)

Appendix A.5 From eq 58, it follows that

(ZTZ - ΣE)pi ) λipi

(5.1)

To obtain the scores of Z that are corrected for the error, multiply eq 5.1 throughout by Z.

(ZZTZ - ZΣE)pi ) λiiZpi

(5.2)

Equation 4.2 can be rewritten as

(ZZT - ZΣEZ+)Zpi ) λiZpi

(5.3)

where the plus sign implies the pseudo-inverse of Z.

(ZZT - ZΣEZ+)ui ) λiui

(5.4)

For each value of λi, pseudoscore is obtained as

hti ) uixλi

(5.5)

Appendix A.6 To obtain the analogy between PCR and SAPC, we reformulate eq 50 as follows:

{[

Λ h 1:r,1:r T h 1:rTy h 1:r yTy yTT

]}

- λ′r+1Ir+1,r+1p′r+1 ) 0r+1

(6.1)

The above presented form will have one redundant equation. Therefore, the upper and lower parts can separately be expanded as

[Λ h 1:r,1:r - λ′r+1Ir×r]θ′ ) T h 1:rTy

(6.2)

yTT1:rθ′ ) yTy - λ′r+1

(6.3)

and

where

θ′ ) -

p′r+1,1:r p′r+1,r+1

Let λ′r+1 in eqs 6.2 and 6.3 be denoted as a and b, respectively. Therefore, eq 6.2 for θ′ can be written as

θ′ ) [Λ h 1:r,1:r - aIr×r]-1T h 1:rTy

(6.4)

Substituting eq 6.4 for θ′ in eq 6.3

yTT h 1:r[Λ h 1:r,1:r - aIr×r]-1T h 1:rTy ) yTy - b

(6.5)

Equation 6.5 can be rewritten in the same form as eq 71. Literature Cited (1) Ricker, N. L. The Use of Biased Least-Squares Estimators for Parameters in Discrete-Time Pulse-Response Models. Ind. Eng. Chem. Res. 1988, 27, 343-350. (2) Mejdell, T.; Skogestad, S. Estimation of Distillation Composition from Multiple Temperature Measurements Using Partial-Least Squares Regression. Ind. Eng. Chem. Res. 1991, 30, 2543-2555. (3) Kresta, J. V.; Marlin, T. E.; MacGregor, J. F. Development of Inferential Process Models Using PLS. Comput. Chem. Eng. 1994, 7, 597-611.

(4) Kano, M.; Miyazaki, K.; Hasebe, S.; Hashimoto, I. Inferential Control System of Distillation Compositions using Dynamic Partial Least Squares Regression. J. Process Control 2000, 10, 157-166. (5) Lakshminarayanan, S.; Sirish, S. L.; Nandakumar, K. Modeling and Control of Multivariable Processes: The Dynamic Projections to Latent Structures Approach. AIChE J. 1997, 43, 2307-2322. (6) Dayal, B. S.; MacGregor, J. F. Recursive Exponentially Weighted PLS and its Applications to Adaptive Control and Prediction. J. Process Control 1997, 7, 169-179. (7) Vijaysai, P.; Gudi, R. D.; Lakshminarayanan, S. Identification on Demand Using Block-wise Recursive Partial Least Squares Technique. Ind. Eng. Chem. Res. 2003, 42, 540-554. (8) Flores-Cerrillo, J.; MacGregor, J. F. Within-Batch and Batch-to-Batch Inferential-Adaptive Control of Semibatch Reactors: A Partial Least Squares Approach. Ind. Eng. Chem. Res. 2003, 42, 3334-3345. (9) Betrand, D.; Qannari, E. M.; Vigneau, E. Latent Root Regression Analysis: An-Alternative Method to PLS. Chemom. Intell. Lab. Syst. 2001, 58, 227-234. (10) Chou, C. T.; Verhaegen, M. Subspace Algorithm for the Identification of Multivariable Dynamic Errors-in-Variables Models. Automatica 1997, 33 (10), 1857-1869. (11) Li, W.; Qin, S. J. Consistent Dynamic PCA Based on Errors-in-Variables Subspace Identification. J. Process Control 2001, 11, 661-678. (12) Roorda, S.; Heij, C. Global Total Least Square Modeling for Multivariable Time Series. IEEE Trans. Autom. Control 1995, 40 (1), 50-63. (13) Li, W.; Bhargava, A.; Shah, S. L. Adaptive Process Monitoring via Multichannel EIV Lattice Filters. AIChE J. 2002, 48 (4), 786-904. (14) Roorda, S.; Heij, C. Global Total Least Square Modeling for Multivariable Time Series. IEEE Trans. Autom. Control 1995, 40 (1), 50-63. (15) Huang, B. Process identification based on last principal component analysis. J. Process Control 2001, 11, 19-33. (16) Soderstrom, T.; Mahata, K. On instrumental variables and total least squares approaches for system identification of noisy systems. Int. J. Control 2002, 75 (6), 381-389. (17) Nounou, M. N.; Bakshi, B. R.; Goel, P. K.; Shek, K. Process Modeling by Bayesian Latent Variable Regression. AIChE J. 2002, 48, 1775-1793. (18) Qin, S. J. Recursive PLS algorithms for adaptive data modeling. Comput. Chem. Eng. 1998, 48, 503-514. (19) VanHuffel, S.; VandeWalle, J. Algebraic relationships between classical regression and total least squares estimation. Linear Algebra and Applications; Elsevier Science Publishing Co.: New York, 1987; pp 93-149. (20) Fierro, R. D.; Golub, G. H.; Hasen, P. C.; O’Leary, D. P. Regularization of Truncated Total Least Squares. SIAM J. Sci. Comput. 1997, 18 (4), 1223-1241. (21) Strang, G. Linear Algebra and its Applications; Harcourt Brace Jovanovich Publishers: San Diego, 1986. (22) Geladi, P.; Kowalski, B. R. Partial Least Squares: A tutorial. Anal. Chem. Acta 1986, 185, 1-17. (23) Webster, J. T.; Gunst, R. F.; Mason, R. L. Latent Root Regression Analysis. Technometrics 1974, 16 (4), 513-522. (24) MacGregor, J. F.; Martin, T. E.; Kresta, J. V. Some Comments on Neural Networks and Other Empirical Modeling Methods. AIChE Symposium CPC-IV; Elsevier: South Padre, TX, 1991; pp 665-672. (25) Beghelli, S.; Guidorzi, R. P.; Soverini, U. The Frisch Scheme in Dynamic System Identification. Automatica 1990, 26 (1), 171-176. (26) Srinivasan, R.; Raghunathan, R. Use of Inverse Repeat Sequences for Identification of Chemical Process Systems. Ind. Eng. Chem. Res. 1999, 38, 3420-3432. (27) Wise, B. M.; Gallagher, M. B. PLS Tollbox for Use With Matlab version 1.5; Eigenvector Research Inc.: Manson, WA, 1995. (28) Strang, G. Linear Algebra and Its Applications; Harcourt Brace Jovanovich Publishers: San Diego, 1986.

Received for review August 15, 2003 Revised manuscript received April 23, 2004 Accepted September 10, 2004 IE030671G