Article Cite This: Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
pubs.acs.org/IECR
Identification of Errors-in-Variables Models Using Dynamic Iterative Principal Component Analysis Deepak Maurya,*,† Arun K. Tangirala,*,‡ and Shankar Narasimhan*,‡ †
Department of Computer Science and ‡Department of Chemical Engineering, Indian Institute of Technology Madras, Chennai-600036, India
Ind. Eng. Chem. Res. Downloaded from pubs.acs.org by DURHAM UNIV on 09/01/18. For personal use only.
S Supporting Information *
ABSTRACT: Identification of linear dynamic systems from input−output data has been a subject of study for several decades. A broad class of problems in this field pertain to what is widely known as the errors-in-variables (EIV) class, where both input and output are known with errors, in contrast to the traditional scenario where only outputs are assumed to be corrupted with errors. In this work, we present a novel and systematic approach to the identification of dynamic models for the EIV case in the principal component analysis (PCA) framework. The key contribution of this work is a dynamic iterative PCA (DIPCA) algorithm that has the ability to determine the correct order of the process, handle unequal measurement noise variances (the heteroskedastic case), and provide accurate estimates of the model together with the noise covariance matrix under large sample conditions. The work is presented within the scope of single-input, single-output (SISO) deterministic systems corrupted by white-noise errors. Simulation results are presented to demonstrate the effectiveness of the proposed method.
1. INTRODUCTION Identification of dynamic systems from measurements is a crucial exercise in almost all spheres of process systems engineering. A vast repertoire of efficient methods for model development from input−output data is available at the user’s disposal today. Among the numerous methods, the predictionerror minimization class constitutes the state-of-art techniques for identifying single-output (and possibly multi-input) processes.1 On the other hand, multi-input, multi-output (MIMO) processes with unknown process order and unconstrained state-space matrix structures are efficiently handled by subspace identification techniques.2 Both these classes of methods are suitable for identifying systems, given deterministic (or noise-free) inputs and corresponding noisy output measurements of the system. However, in several identification exercises, quite often it is the case that the inputs also contain errors, leading to what is termed as the errors-in-variables (EIV) situation. Figure 1 shows the architecture of the EIV dynamic model for linear systems. Given the noisy (stochastic) measurements of input
and output, u and y, respectively, the goal is to identify the LTI model G(.) (in the difference equation form) that connects the noise-free input and output. The EIV class of problems in the context of identification has been studied for more than four decades now.3−9 Two broad classes of representations, as in traditional identification, namely, the input−output and state-space descriptions, have been considered. A rich literature that caters to the former class of representations is available. Notable among those are the instrumental variables (IV), prediction-error (maximum likelihood), total least-squares (TLS), and frequency-domain methods.3,10−15 On the other hand, identification of statespace models in the EIV framework has received relatively lesser attention. The state-space representation offers the advantage of being able to handle MIMO systems with ease. The subspace error-in-variables formulation by Chou and Verhaegen9 is an early work in this direction, where an IVbased method is developed. The works of Li and Qin16 and Wang and Qin17 present algorithms for subspace identification using principal component analysis (PCA), a well-established multivariate data analysis technique. Notwithstanding the efficacy of these methods, a common shortcoming, regardless of the representation (input−output or state-space), is that the model order is determined in a heuristic manner (a detailed discussion follows shortly). Evidently, a method that Received: Revised: Accepted: Published:
Figure 1. Linear dynamic EIV architecture. © XXXX American Chemical Society
A
March 29, 2018 August 5, 2018 August 6, 2018 August 6, 2018 DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
functions,37 multiple correlation coefficient (MCC) and coefficient of determination,38 or analyzed the eigenvalues using ratio of determinants, termed as instrumental determinant ratio (IDR).39 Bauer40 proposes multiple information criteria of the singular values and the estimated innovation variance for order estimation in the subspace methods framework. For the EIV class of identification problems, determination of the correct order using PCA essentially amounts to determining the relevant number of principal components for a given data set. However, it is well-known that answers to this problem can be provided only under noise-free conditions. For noisy data, either only a working order can be determined using heuristic methods (akin to subspace identification methods for noisy data) or it is that the order is assumed to be known a priori. In matters of heuristics, several ideas have been proposed such as the AIC, MDL, Scree plot, 1-eigenvalue analysis, etc. (see ref 41 for a lively discussion). Different criteria have been used to arrive at the “best” working order, but not necessarily the correct order; see, for example, the works of refs 41 and 42 that are based on minimizing the variance of reconstruction error (due to the discarded principal components). Li and Qin16 and Wang and Qin,17 in their works on identifying state-space models by combining the concepts of subspace identification, instrumental variable methods, and PCA, consider the problem of order determination. However, the solution that emerges is an AIC-based criterion, which is shown to work for the simulation case studies. With respect to methods that assume known order, two works merit mention. The first one pertains to the last principal component analysis (LPCA) technique by Huang,43 which is based on the DPCA idea. It recovers the dynamic model from the eigenvector of the sample covariance matrix corresponding to the smallest eigenvalue. However, it is severely limited by the assumption of only one constraint among the lagged variables. This holds only for single-output systems with an accurate knowledge of the process order (of dynamics)an assumption that is largely unrealistic. Further it assumes that the errors in the input and output have equal variances, which is again quite restrictive (see Section 2.3 for an illustration). The second work is that of Vijaysai et al.44 who develop what they term as the generalized augmented PC-based regression. The idea therein is to scale the data with the square root of the inverse of noise covariance matrix that is assumed to be known. However, in almost all realistic situations, the error variances are also not known. It is thus clear from the foregoing discussion that there exists a strong case for a method that can “see through” the noise and determine the correct order of process dynamics, while simultaneously estimating the model parameters and the noise covariance matrix. The main objective of this work is to present an algorithm for (i) order determination, (ii) model identification in the input−output (difference equation/transfer function) framework of representations, and (iii) estimation of noise variances (in the input and output measurements). To this end, we present a systematic identification of linear dynamic processes from input−output data using what we term as dynamic iterative PCA (DIPCA). The scope of this work is restricted to single-input, single-output (SISO) systems for the primary reason of fulfilling the above three objectives. Furthermore, a solution in the SISO scenario is expected to form the building block for a prospective method for MIMO systems. In a few respects, the proposed work can be viewed as an extension of
simultaneously estimates the order, model, and noise covariances consistently is certainly attractive. The present work is aimed at realizing this objective. In particular, we consider the identification of input−output models for a dynamic EIV process. The problem is formulated in the TLS framework and solved using the (dynamic) PCA methodology. The TLS class of methods is known to offer efficient solutions to the EIV problem.14,18 PCA, known for its simplicity, is one of the most widely used and effective technique for solving the TLS problem, albeit under certain assumptions.19 Applications of PCA span a diverse set including identification, dimensionality reduction, process monitoring,20−23 fault detection and diagnosis,24−28 and feature extraction, as well as a preprocessing tool for other identification methodologies.29,30 A distinguishing factor in the modeling philosophy of the PCA-based approach vis-a-vis standard approaches (e.g., regression, subspace) is that it does not a priori distinguish between dependent and independent variables, whereas the latter approaches observe this distinction right up front. Further, linear models are expressed as implicit constraints among variables in the PCA framework. These constraints are identified by either a singular value decomposition (SVD) of the data matrix or the eigenvalue analysis of the sample covariance matrix. The distinction between dependent and independent variables is invoked in the postanalysis stage, but only when required by the application. For instance, fault detection applications do not require this distinction since they largely work with constraint residuals. In contrast, model identification or prediction applications invoke this distinction and accordingly partition the constraint matrix to extract the (causal) input−output regression model. The generic formulation of PCA is suited for identifying steady-state or instantaneous linear relationships among variables. Dynamic PCA (DPCA) by Ku et al.31 was introduced as a natural extension of this formulation to suit identification of dynamic relationships by applying PCA on a matrix of lagged variables. While this idea has been extensively applied in process monitoring and to a certain extent in regression, a rigorous method of recovering the dynamic model of the error-free variables from the results of DPCA is yet to be addressed in the literature. Specifically, a theoretically rigorous approach for determining the order, dynamic relationships, and noise covariance matrix begs attention. Order determination is a critical step in both the classical and the EIV identification problems, regardless of whether the system is represented as an input−output or state-space model. In the classical identification of input−output models, orders are determined by a combination of trial-and-error, statistical analysis of residuals, and information-theoretic criteria such as the Akaike information criterion (AIC), Bayesian information criterion (BIC), and minimum description length (MDL).1,32,33 On the other hand, order determination in state-space modeling has relied on subspace identification methods that are guaranteed to provide an exact answer for LTI systems only under noise-f ree conditions. In the noisy case, however, a heuristic approach is adopted by choosing the “smallest” singular values of Hankel matrix2,6,34,35 or more recently using information-theoretic criteria, rank tests of cross-covariance matrix relating delayed inputs, and the regressor vector.36 Recent developments in order determination for classical (non-EIV) identification use the instrument variable approach to nullify the noise effects. To this end, works have either used modified information criteria loss B
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research the iterative PCA (IPCA) proposed by Narasimhan and Shah45 to overcome the shortcomings of PCA in identifying the number of steady-state relationships when the errors in variables have unequal variances, which are unknown and possibly spatially correlated. The main strategy in IPCA is to scale the data with the square root of the inverse of the noise covariance matrix followed by an application of the ordinary PCA to the scaled data. The noise covariance matrix and the linear relations are estimated iteratively in IPCA, hence the name. Section 2.2 reviews the technical details of IPCA. The basic strategy in the proposed algorithm is essentially a two-step procedure, described as follows. In the first step, IPCA is applied to the data matrix constructed from suitably lagged measurements (of inputs and outputs). This is the key step since it facilitates determination of order by exploiting the relationship between the unknown process order and two known quantities, namely, the number of constraints identified by the IPCA and the maximum lag used in the construction of the data matrix. Applying the IPCA algorithm to this matrix recovers the model order and the noise covariance matrix, of course, in an iterative manner. It must be remarked that stacking lagged measurements to develop dynamic models is not an uncommon approach in system identification, for example, in subspace identification. However, the novelty of this work rests in the fact that the exact number of linear constraints among the lagged measurements can be obtained and consequently the order. It may be noted that this method of order determination belongs to the class of a priori methods, i.e., those that are based on data analysis, as against a posteriori methods that rely on the identified model (e.g., AIC). In the second step, the data matrix is reconfigured using the estimated order in the foregoing step and scaled using the estimated error standard deviations. Subsequently, DPCA is applied to recover the model parameters. It is shown through MC simulations that the proposed algorithm provides accurate estimates of model order and model parameters and the input−output delay as well. A minor contribution of this work is also a modified IPCA algorithm that takes into account the specific structure of the error covariance matrix that arises as a result of the fact that we work with lagged measurements of two variables, namely, the input and output. In general, the proposed method also offers provision for incorporating prior knowledge of any or all of delay, order, input dynamics, and noise covariance matrix, if available. It must be noted that the development in this work is confined to the class of open-loop systems. The rest of the paper is organized as follows. Section 2 reviews the basic ideas underpinning PCA and IPCA in solving the EIV linear regression problem. Further, a review of the DPCA with an example that highlights the limitations of the technique is also included. In Section 3 we present the main contribution of this work, which is a systematic method for identifying the dynamic model and its accompaniments from data. Simulation results are presented and discussed in Section 4, wherein Monte Carlo (MC) simulations are carried out to study the goodness of estimates and to illustrate the impact of maximum lag on model quality. The paper ends with a few concluding remarks in Section 5.
applicability to steady-state processes with heteroskedastic errors, i.e., when the errors in variables have nonidentical variances. Subsequently, we critically review dynamic PCA proposed by Ku et al.31 to identify models for dynamic LTI processes. PCA has primarily been considered, in the area of multivariate process/signal analysis, as a statistical technique for dimensionality reduction or signal compression.46,47 Its application as a method for identifying a linear steady-state model in the EIV case is relatively less well-known. There exist only a few technical references that elucidate the regression facet of PCA.17,45,48,49 Although both applications implicitly or explicitly exploit the relationships between variables, a subtle, but an important, difference between the requirements of these two applications is noteworthy. Dimensionality reduction allows for approximate or lossy compression and therefore the accuracy of identifying the linear relationships is not as much a concern as meeting the compression requirements for that application. On the other hand, regression is solely concerned with model identification, and therefore, relatively speaking, identifying an accurate model is of prime interest. Since the present work is concerned with identification of a linear dynamic model, we first review the mathematical underpinnings of PCA from an identification perspective. Consider a linear time-invariant process described by M (noise-free) variables xi[k], i = 1, ···, M that are related or constrained by d < M linear equations at any sampling instant k, A 0 x[ k ] = 0 ,
A 0 ∈ d × M
(1a)
where A0 is the constraint matrix and x[k] = [x1[k] ··· xM[k]]T is the vector of noise-free variables at the kth instant. The objective of PCA-based regression in the EIV scenario is twofold: (i) determine the number of constraints (row dimension) d and (ii) estimate the constraint matrix A0, from N noisy measurements of the M variables xi[k], k = 0, 1, ···, N − 1. The measurements are assumed to be generated according to z[k ] = x[k ] + e[k ]
(2)
Introduce X = [ x[0] x[1] μ x[N − 1]]T
(3a)
Z = [ z[0] z[1] μ z[N − 1]]T
(3b)
so that eq 2 can be written in the matrix form as (4)
Z=X+E
The following assumptions are made on the random errors: 1. e[k ] ∼ 5(0, σ 2 I) 2. E(e[j]eT[k]) = σ2δjk IM×M 3. E(x[j]eT[k]) = 0, ∀j, k where E(.) is the expectation operator, δjk is the Dirac delta function, and e[k] is a vector of white-noise errors, with all elements having identical variance σ2 as stated above. The total least-squares method for obtaining an estimate of the model can be formulated as an optimization problem as described below.
2. FOUNDATIONS We begin with a review of PCA in the context of identification of linear steady-state (static) models for the EIV case, followed by a brief exposition of iterative PCA which extends its
N
min
A, x[k ]
C
∑ (z[k ] − x[k ])T (z[k ] − x[k ]) k=1
(5a) DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research subject to
Ax[k ] = 0d × 1, k = 1, ··· , N
(5b)
AAT = Id × d
(5c)
X contained in S2. The right singular vectors corresponding to the d zero singular values provide a basis for the constraint matrix A0. This is easily verified by postmultiplying eq 9) with V2 and comparing with eq 6
It may be noted that constraints in (eq 5b) can be alternatively expressed as T
XA = 0N × d
XV2 = U1S1V1T V2 = 0
(6)
⇒
implying that the rows of A lie in the d-dimensional null space of X. Furthermore, the optimal solution resulting from eq 5 only results in a basis for the row space of the true constraint matrix A0. Thus, there exists a rotational ambiguity in the resulting solution. Finally, since any basis of this null space (of X) is a solution, an additional condition is required to formulate a well-posed optimization problem. The set of constraints in (eq 5c) is verily for this purpose. 2.1. PCA for Steady-State Model Identification. We begin our review with the noise-free scenario, where one has direct access to X. If there are no errors in the measurements, then we have direct access to X, and an orthonormal basis for the null space of X can be obtained using the singular value decomposition (svd) of the data matrix as follows. X = USVT ,
X ∈ N × M , N > M
A = VT2
(10)
Note that A and A0 differ by a similarity transformation T s.t. TT T = I. In general, this is not an issue and is completely alleviated in regression since the variable set can be partitioned into d dependent, denoted by x D[k], and (M − d) independent, denoted by xI[k], variables. The regression coefficients can be uniquely recovered as follows: ÅÄÅ x [k ] ÑÉÑT ÅÅ I ÑÑ ÑÑ , A = [ AD AI ] x[k ] = ÅÅÅ ÅÅ x [k ]ÑÑÑ (11) ÅÇ D ÑÖ ⇒
1 xD[k ] = ´− A−≠DÖÖÖÖÖÖÖÖ AIÆxI [k ] , ÖÖÖÖÖÖÖÖ
AD ∈ d × d (12)
B
It is clear that the coefficient matrix B is invariant to a similarity transformation of A; i.e., both A and TA yield the same B. Instead of an SVD of the data matrix, an eigenvalue analysis 1 of the sample covariance matrix Sx ≜ N XT X can be used as follows: ÅÄÅ λ ÑÉÑ ÅÅ 1 ÑÑ ÅÅ ÑÑ ÅÅ ÑÑ ∏ ÅÅ ÑÑ ÑÑ, λ1 > ··· > λM − d Sx V = VΛ , Λ = ÅÅ ÅÅ ÑÑ λ ÅÅ ÑÑ M−d ÅÅ ÑÑ ÅÅÅ ÑÑÑ 0 Ç Ö
(7)
where • U is a N × N, unitary matrix, i.e., UTU = IN×N • S is
(13a)
where σ1 ≥ σ2 ≥ ··· ≥ σM are the M singular values of X. • V is a M × M, unitary matrix, i.e., VTV = IM×M When rank(X) = r < M, the last (M − r) singular values are identically zero, i.e., σr+1 = σr+2 = ··· = σM = 0. It follows that when X satisfies (eq 1a), d singular values of X are identically zero. Consequently, the upper part of S, denoted by Secon, can be partitioned as ÅÄÅ σ 0 μ 0 ÑÉÑ ÅÅ 1 ÑÑ ÅÅ Ñ ÅÄÅ S 0 ÑÉÑ ÅÅ 0 σ μ 0 ÑÑÑ ÅÅ 1 ÑÑ ÅÅ ÑÑ 2 ÑÑ, where S1 = ÅÅ ÑÑ Secon = ÅÅÅ ÅÅ ÑÑ ÅÅ 0 S2 ÑÑÑ Å ÑÑ ∂ ∂ ∏ ∂ ÅÇ ÑÖ ÅÅÅ Ñ ÅÅ 0 0 μ σ ÑÑÑ ÅÅÇ M−d Ñ ÑÖ S2 = 0d × d (8) Subsequently, eq 7 can be written as ÅÄÅ S 0 ÑÉÑ Å 1 ÑÑ ÑÑ[V V ]T = U1S1V1T X = [U1U2]ÅÅÅÅ ÅÅ 0 S2 ÑÑÑ 1 2 ÅÇ ÑÖ
V = [ v1 v2 μ vM ] ,
A = [ vM − d + 1 μ vM ]T
(13b)
It should be noted that the diagonal entries of S in eq 7 are √N times the square root of the eigenvalues of Sx, denoted by λi in eq 13. If measurements of X are noisy, then an eigenvalue analysis 1 of the sample covariance matrix Sz ≜ N ZT Z does not result in any zero eigenvalue since there exists no pair of columns of Z that are linearly related. From the assumptions made regarding the measurements errors, it can be proved that Σz = Σ x + Σe
(14)
where 1 T XX N →∞ N 1 lim E[ZT Z] N →∞ N
Σ x = lim E[Sx ] = lim N →∞
Σz = lim E[Sz ] = N →∞
(15)
Σx exists under the quasi-stationarity assumption (Ljung ) on z[k]. Roughly speaking, this assumption rules out “growing” or unbounded deterministic signals. Under the assumption of “homoskedasticity” as stated earlier, eq 14 can be restated as 1
(9)
where S1 and S2 are diagonal matrices of dimension (M − d) × (M − d) and d × d respectively. The dimensions of U1, U2, V1, and V2 are in accordance with the dimensions of S1 and S2. Further, V1 is the matrix of orthonormal right singular vectors corresponding to the (M − d) nonzero singular values along the diagonal of S1 while V2 is the matrix of orthonormal right singular vectors corresponding to the d zero singular values of
Σz = Σ x + σe 2 I
(16)
Since σe2I is a diagonal matrix, the eigenvector and eigenvalue properties for sum of matrices can be used, namely, D
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research 1. The last d eigenvalues are equal to σe2, i.e., λM−d+1 = ··· = λM = σe2. 2. Eigenvectors of Σz and Σx are identical. By virtue of the first property, the last d singular values of Z are identical to N σe. To summarize, under the homoskedastic errors, the following holds: ÄÅ ÉÑ 0 ÅÅÅ S1 ÑÑÑ ÑÑ[ V1 V2 ]T Z = [ U1 U2 ]ÅÅÅ Ñ ÅÅ 0 N σe Id × d ÑÑÑÑÖ ÅÅÇ (17)
while the second step estimates the covariance matrix from the estimate of A0. The former step is essentially PCA on scaled data from which we recover the constraint matrix, whereas the latter step involves solving an optimization problem as described below. Denote the estimate of A0 at the ith iteration by  (i) (the subscript is dropped since only a basis can be determined). Subsequently, one generates constraint residuals as (i) (i) (i) (19) r(i)[k ] ≜  z[k ] =  x[k ] +  e[k ] (i) (i) ̂ ̂ If the estimate A is indeed a basis for A0, then A = TA0. As a result, the first term on the RHS vanishes and the constrained residuals are uncorrelated and normally distributed with zero mean and covariance matrix Σ(i) r given by
Consequently, for the homoskedastic case, the ability of PCA to recover the regression model remains intact. Furthermore, under normally distributed errors, the eigenvectors of Sz are consistent estimates of the eigenvectors of Σz.48 We can therefore conclude that if the error in all the variables are normally distributed with identical variances and are spatially uncorrelated, PCA still serves as a powerful tool for deriving the asymptotically unbiased estimate of the constraint matrix. For the general scenario, i.e., nondiagonal Σe with unequal diagonal entries, it is not possible to estimate the actual number of constraints and, consequently, the constraint matrix A0 in eq 1a from an analysis of the eigenvalues. This problem is compounded further by the fact that even when the dimension d is known, it is still not possible to consistently estimate the constraint matrix. In the absence of a rigorous method to address this issue, several heuristic methods for identifying the dimension d and the constraint matrix A0 were proposed until as recently as a decade ago (recall the short review of these methods in Section 1). A paradigm shift in this direction was witnessed with the introduction of Iterative PCA (IPCA), as proposed by Narasimhan and Shah.45 A first of its kind, it overcomes several of the above shortcomings based on theoretical considerations, as briefly reviewed in the following section. 2.2. Iterative PCA. The key idea of iterative PCA introduced by Narasimhan and Shah45 is to “transform” the heteroskedastic errors case to the homoskedastic case with appropriate scaling of the data, i.e., one works with ZW, where W ∈ M × M is a suitable scaling matrix. This is similar to the strategy adopted in solving the weighted least-squares problem in classical linear regression problems with temporally heteroskedastic or correlated errors. Similar to the strategy therein, IPCA proposes to scale the data with the inverse square root of noise covariance matrix (eq 1a), i.e., W = Σe−1/2. As a direct consequence, the errors in the scaled data matrix are homoskedastic as explained below. Introduce Zs ≜ ZΣe−1/2 . Then, eq 14 for the scaled data takes the form Σ zs = Σ x s + I
(i) (i) Σ(ri) = Â Σe(Â )T
(20)
Consequently, an estimate Σe is obtained by solving the likelihood problem,45 N (i) (i) min N log |Â Σe(Â )T | + Σe
∑ (r(i)[k ])T (Â (i)Σe(Â (i))T )−1 k=1
(i)
× r [k ]
(21)
With an estimate of Σe in hand, one goes back to the former step of scaling the data followed by a PCA of the scaled variables. An important assumption underlying the above developments is that the size of A0, i.e., the number of constraints d is known. Interestingly, the IPCA algorithm is equipped to determine this crucial piece of information as well. The method of determining d is straightforward considering eq 18. Eigenvalue analysis of the LHS matrix, by virtue of the eigenvalue shift theorem, results in λ (Σ z s ) = λ (Σ x s ) + 1
(22)
Consequently, all zero-valued eigenvalues of Σxs map to unityvalued eigenvalues of Σzs, the size of which determines the number of constraints d. This applies to sample covariance matrices X and Z as well. Observe that scaling preserves the rank of Σxs. Thus, IPCA estimates the number of constraints, d, based on theoretical considerations. This makes the algorithm iterative in two layers, the first one for estimating the basis of A, Σe and a second layer for identifying d. It must be remarked that, in the determination of d, one begins from a “minimum” value and proceeds until the guessed d matches with the number of unity eigenvalues (of the sample covariance matrix). The minimum value of d is governed by an identifiability constraint, which arises since Σe is also estimated from the data. Technically, identifiability constraints arise since we are essentially recovering an M × M error covariance (symmetric) matrix from a lower-dimensional d × d residual covariance matrix using the relation in eq 20. Since Σr(i) is symmetric, we have d(d + 1)/2 unique equations. For a diagonal Σe, the constraint is therefore
(18)
clearly suggesting that the errors in the scaled variables have identical variance, in fact, equal to unity. Subsequently one applies PCA to the scaled data and recovers the constraint matrix for the original (unscaled) variables from that for the scaled ones. The natural requirement for the above strategy to work is that the noise covariance matrix must be known (as in maximum likelihood PCA by Wentzell et al.50), which is seldom the case. A key feature of IPCA is that both the noise covariance and constraint matrices are estimated simultaneously by iterating between two steps. One step consists of estimating the (basis for) constraint matrix given the covariance matrix,
d(d + 1) ≥P (23) 2 where P is the number of elements of Σe that need to be estimated. For a diagonal Σe, P = M. Therefore, the minimum guess value of d should be chosen to satisfy the above eq 23. In E
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
obtained. This is the procedure proposed in DPCA by Ku et al.31 However, when the errors variances in the input and output measurements are not identical, then DPCA (like PCA) does not estimate the model accurately even when the input and the output orders are known. The last principal component analysis (LPCA), proposed by ref 43, is identical to DPCA and possesses the same limitations. The following simulated example illustrates the adverse effect of heteroscadstic errors on the model estimated using DPCA. 2.3.1. Known Order. Consider a unit-delay, second-order, deterministic SISO system.
effect, the identifiability constraints amounts to requiring “sufficient” redundancy in the data for estimating the noise covariance matrix Σe in the heteroskedastic case. As an example, for a M = 5 variable process with diagonal Σe, the variables are required to satisfy at least d = 3 constraints in order to correctly recover the regression model and the noise covariance matrix. The reader may refer to ref 45 for a complete working algorithm for IPCA and further technical details like convergence of the algorithm. We close this section with a review of dynamic PCA (DPCA), which is an extension of the classical PCA to the dynamic modeling problem. It may be noted that DPCA constitutes one of the core building blocks of the proposed methodology in this work, the other core block being the IPCA with suitable modifications. 2.3. Dynamic PCA and Its Shortcomings. Dynamic PCA as proposed by Ku et al.31 extends the application of PCA for identifying dynamic process models. Of specific interest is the class of parametric deterministic SISO linear time-invariant dynamic input (u*)−output (y*) systems described by na
y*[k ] +
y*[k ] + 0.4y*[k − 1] + 0.6y*[k − 2] = 1.2u*[k − 1] (26)
Assuming that orders and delay are known, the lagged matrix is configured as follows:
j=D
(24)
where na and nb are output and input order, respectively, and D ≤ nb is the input−output delay. For the purpose of discussions to follow, we introduce η ≜ max(na , nb)
(27)
ZL = [ z[2] z[3] μ z[N − 1]]T
(28)
N = 1023 observations are generated under two different conditions: (i) homoskedastic errors, wherein σey2 = σeu2 = 0.135, and (ii) the more realistic heteroskedastic errors with σey2 = 0.2385 and σeu2 = 0.1006, such that the signal-to-noise ratio (SNR) at both input and output is fixed to 10. A full-band PRBS input is used under both conditions. Homoskedastic errors: PCA of the lagged matrix ZL is performed and the constraint (model) is estimated from the eigenvector corresponding to the minimum (last) eigenvalue. The parameter estimates from 100 runs of MC simulations are shown in Table 1, which are in close agreement with the true value.
nb
∑ aiy*[k − i] = ∑ bju*[k − j] i=1
z[k ] = [ y[k ] y[k − 1] y[k − 2] u[k − 1]]T
(25)
which may be, with some abuse of terminology, called equation order. Note that this equation order is identical to the process order, whenever na ≥ nb. The EIV dynamic identification problem is that of estimating the following quantities: 1. Output and input orders na and nb, respectively 2. Input−output delay, D a and 3. The coefficients of the difference equation, {ai}ni=1 nb {bj}j=D 4. Variances of the errors in output and input measurements, σey2 and σeu2, respectively from measurements of y*[k] and u*[k], denoted by y[k] and u[k], respectively. DPCA solves this problem by working with a matrix consisting of lagged variables since the dynamic model in eq 24 can be viewed as a static model on lagged variables. The columns of this lagged matrix are constructed by stacking {y[k]}, {y[k − 1]} up to {y[k − Ly]}, and {u[k]}, {u[k − 1]} up to {u[k − Lu]}, where Ly and Lu are user-specified maximum lags for the output and input, respectively. However, as we shall illustrate shortly, DPCA succeeds in solving the identification problem only under highly restrictive conditions. Two cases arise for the stacking orders, depending on the prior knowledge that is available: (i) output and input orders that are known a priori and (ii) unknown orders. In the first case, the output and input stacking lags can be chosen equal to the known orders, that is, Ly = ny and Lu = nu, to construct the lagged data matrix. For this choice of the stacking lags, there exists only one constraint that relates the lagged variables. If we further assume that σey2 = σeu2 (homoscedastic errors), then by applying PCA to the lagged data matrix and using the right singular vector corresponding to its smallest singular value, the constraint (or dynamic model) relating the variables can be
Table 1. MC Simulation Results from Dynamic PCA for Homoskedastic and Heteroskedastic Cases σey2 = σeu2
σey2 ≠ σeu2
parameter
true value
mean
2 × std. dev.
mean
2 × std. dev.
a1 a2 b1
0.4 0.6 1.2
0.3995 0.6002 1.1965
±0.0482 ±0.0232 ±0.0442
0.3454 0.5698 1.2892
±0.0588 ±0.0283 ±0.0486
Heteroskedastic errors: Once again PCA is performed on ZL and the constraint is estimated from the eigenvector associated with the last eigenvalue. The resulting parameter estimates, however, as reported in Table 1, are biased in this case. The bias in parameter estimates for the heteroskedastic case should be expected from the arguments presented in Sections 2.1 and 2.2 on the static and iterative PCA, respectively. In other words, DPCA inherits the shortcomings of PCA. These are further compounded for the unknown order case, as explained next. 2.3.2. Unknown Order. When the orders and delay are unknown, the general practice is to let Ly = Lu = L (a user specified value). The lagged matrix, denoted by ZL, is thereby constructed as follows: zL[k ] = [ y[k ] y[k − 1] μ y[k − L] u[k ] u[k − 1] μ u[k − L]]T
(29a)
ZL = [ zL[L] zL[L + 1] μ zL[N ]]T F
(29b)
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research Dynamic PCA essentially performs PCA of ZL, i.e., eigenvalue analysis of the sample covariance matrix 1 Sz ≜ N ZTL ZL . However, as illustrated below, DPCA fails to recover the model, even for homoskedastic errors. Consider the same second order system described in eq 26. For illustrative purposes we choose the stacking lag L to be 3 (in general since the model order is unknown, a “sufficiently large” value of L has to be chosen). The eigenvalues of the sample covariance matrix of ZL ∈ 1020 × 8 are reported below: Λ = diag([ 5.03 4.11 2.24 1.66 1.14 0.4068 0.1368 0.1358 ])
Observe that the last two eigenvalues are almost identical, indicating the presence of two linear relations in the noise-free variables, whereas there exists only a single dynamic relation as per the data generating process in eq 26. Although on a first glance this is surprising, it should be expected due to the excess stacking (greater than the true process order). Given that the process is second-order, a stacking lag of L = 3 results in two linear relations in the stacked variables, z [k ] = [ y[k ] y[k − 1] y[k − 2] y[k − 3] u[k ] u[k − 1] u[k − 2] u[k − 3]]
one that relates the noise-free variables {y*[k], y*[k − 1], y*[k − 2], u*[k − 1]} and the other that constraints its shifted version. Eigenvectors corresponding to the two nearly identical eigenvalues constitute the associated constraint matrix  = É ÄÅ ÅÅ− 0.4570 0.2823 − 0.0902 0.2717 0.0044 0.5540 − 0.5677 0.0147 ÑÑÑ ÑÑ ÅÅ Ñ ÅÅ ÅÅÇ 0.3747 0.5173 0.3704 0.2175 0.0005 − 0.4573 − 0.4454 − 0.0038 ÑÑÑÖ
Unfortunately, there exists no systematic way of recovering the model (single constraint) from these excess constraints. As a matter of fact, the foregoing issue has not received any form of attention or mention in the literature. The problem of excess relations is further compounded under heteroskedastic errors, wherein it is not even possible to correctly identify the number of linear relations, let alone recover the model from them. This is clear from the eigenvalues reported below for the heteroskedastic case (with L = 3)
d=L−η+1
(31)
Equation 31 holds the key to estimation of equation order (and hence the process order) since d is determined from IPCA and L is known by virtue of user selection. The estimate of equation order is thus straightforward,
Λ = diag([ 5.17 4.17 2.23 1.67 1.091 0.407 0.1866 0.1382 ])
(30)
To summarize, DPCA is best suited for known order and homoskedastic errors. Any deviation from these assumptions requires heuristics to be used with DPCA, as there exists no theoretically rigorous way of determining the system order and the model in the prevailing literature.
η̂ = L − d ̂ + 1
(32)
where d̂ is the estimate of number of constraints from IPCA. To illustrate the proposed order determination procedure and the resulting equation in eq 32), consider the second-order process in eq 26 and assume that the lag is set to L = 3. Observe that the process and equation orders are identical, i.e., η = na = 2, for this process. Performing IPCA to the stacked matrix identifies two eigenvalues close to unity as reported below:
3. DYNAMIC IPCA FOR EIV MODEL IDENTIFICATION To recapitulate, the complete EIV dynamic identification problem consists of estimating the following: 1. Output and input orders na and nb, respectively 2. Input−output delay, D a and 3. The coefficients of the difference equation, {ai}ni=1 nb {bj}j=D 4. Variances of the errors in output and input measure-
Λ = diag([ 24 20 12 10 9 3 1 0.99 ])
(33)
Thus, IPCA has identified two linear relations while it is a fact that there exists only one unique dynamic relation among the noise-free variables. The additional relation is clearly an artifact of stacking excess lagged variables. Given that d̂ = 2 relations have been identified in a matrix of input−output variables
ments, σey and σeu , respectively. from measurements of y*[k] and u*[k], denoted by y[k] and u[k], respectively. The proposed method, termed as dynamic 2
iterative PCA (DIPCA), tackles the above problem in a stepwise manner as described below. 3.1. Proposed Methodology. Primarily, the proposed DIPCA method consists of two steps. In the first step, we perform an iterative PCA on the matrix of lagged measurements (for an appropriately chosen L) and determine the process order η from the identified number of constraints. This step of determining the correct process order constitutes a key novel contribution of this work. An important byproduct of the first step is the estimate of the noise covariance matrix. Subsequently, in the second step we reconfigure the data matrix as per the estimated order and scale it with Σ̂ e−1/2. A PCA of this scaled matrix is used to obtain the desired regression model. Estimation of delay is implicit in the model identification step. For the special case when error variances are known, we provide a theoretical proof (attached in Supporting Information) that the process order and consequently the model parameters are consistently estimated by applying DPCA on the lagged scaled data matrix. However, we are unable to provide a theoretical proof for joint consistency of the estimates of error variances, model order, and model parameters using DIPCA. Instead, Monte Carlo simulations are used to estimate a confidence interval (CI) for the parameter estimates. The CI obtained through simulations seems to indicate that DIPCA can provide consistent estimates of all model parameters. 3.1.1. Order Determination. A matrix of lagged measurements ZL is constructed for a user-specified lag L (as in eq 29b), “large” enough to contain the model order and one that meets the identifiability constraint (as discussed in the following subsection). Subsequently, we apply the iterative PCA on ZL to determine the number of constraints among the underlying noise-free variables. Denote this by d. As demonstrated in Section 2.3.2, this approach of excessive stacking leads to multiple constraints (unless L = η). The following relationship between the number of constraints d relating the stacked variables, the stacking lag L, and the equation order η can be derived.
2
G
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research lagged by L = 3, it is easy to see that the true order is η̂ = L − (d̂ − 1) = 3 − (2 − 1) = 2. Returning to the main discussion, it must be noted that the correct order determination is guaranteed by the ability of IPCA to identify the number of linear relations from noisy data, as reviewed in Section 2.2. The ability to correctly identify the linear relations, in turn, is linked to the simultaneous estimation of the noise covariance matrix, Σ̂ e, as explained below. 3.1.2. Estimating Σe Using Constrained Optimization. The estimation of Σe is along the lines of iterative PCA for the static case (as explained in Section 2.2) but calls for a modification in order to account for the fact that there exist only two error variances that have to be estimated for a SISO system. In fact, the modification is necessary for IPCA to overcome the identifiability constraint in eq 23 for the dynamic case. This is explained as follows. The generic formulation of IPCA, as presented in Section 2.2, assumes that all elements of diagonal Σe are distinct. This leads to the identifiability constraint in eq 23. For the diagonal Σe case, the RHS of eq 23 is M, where M is the column dimension of Z, i.e., the number of process variables. The larger the value of M, the more the redundancy (number of linear relations) that is required. For the dynamic case, where Z is configured as in eq 29b, M = 2(L + 1), wherein one is required to choose large values of L in order to fulfill the identifiability constraint, d(d + 1)/2 > 2(L + 1)
(i) (i) min(N − L) log |Â Σe(Â )T | Σe
+
subject to
diag(Σe) = [σey 21L + 1σeu 21L + 1]
(38) (i)
where 1L+1 denotes a row vector with unity entries and r [k] are the residuals defined in eq 19. The term N − L in the above equation indicates the loss of L observations due to stacking. It may be emphasized that estimation of noise variances from data without any prior knowledge (of the dynamic model) is one of the key contributions of this work. Most of the approaches in literature assume that the individual input and output noise variance or their ratio is known a priori.51 We complete our presentation with a description of the final step, i.e., recovering the input−output difference equation in eq 24. 3.1.3. Estimation of the Dynamic Model. The object of interest is the parameter vector of the linear difference equation in eq 24 θ = [ a1 μ ana bD μ bnb ]
(39)
which can be recovered in two different ways from the results of the foregoing step. The first approach makes use of the constraint matrix estimate  L obtained through the IPCA of ZL. This matrix satisfies
(34)
 L xL̂ = 0,
 L ∈ d × 2(L + 1)
(40)
where xL̂ = [ y ̂*[k ] y ̂*[k − 1] μ y ̂*[k − L] u*̂ [k ] u*̂ [k − 1] μ u*̂ [k − L]]T
(41)
for the user-specified value of L. As explained in Section 2.3, d > 1 due to the excess stacking, is also given by eq 31. The corresponding true constraint matrix, call it AL,0, contains shifted versions of the single constraint, as explained in Section 2.3. Specifically, θ appears in shifted forms per row of this constraint matrix. For example, when the process is first-order with unit delay and ZL is assembled with L = 2, the true constraint matrix is ÄÅ ÉÑ ÅÅÅ 1 a1 0 b0 b1 0 ÑÑÑ ÑÑ A 2,0 = ÅÅÅ ÅÅ 0 1 a 0 b b ÑÑÑ 1 0 1Ñ ÅÇ (42) Ö whereas the estimated constraint matrix  2 is a 2 × 6 full matrix. In general, the structure of AL,0 in terms of the parameter vector θ is known. The estimated constraint matrix is a rotated version of the true AL,0. Therefore, it is possible to recover the parameter vector by first suitably rotating the estimated constraint matrix  L so that the rotated version RL L is structurally identical to AL,0. This approach was adopted in the work of ref 52, wherein the rotation matrix RL is estimated using a least-squares method. The approach results in good estimates of the parameter vector. A demerit of this approach, however, is that it involves a cumbersome and possibly suboptimal estimation of the rotation matrix and hence the model. Furthermore, there exists more than one way of estimating the rotation matrix, each of which yields an estimate
i.e., only two distinct elements of the noise covariance matrix have to be estimated. Thus, the lower limit of the identifiability constraint in eq 23 is always 2, giving rise to a relaxed identifiability criteria for the SISO dynamic (open-loop) linear model estimation, (36)
Observe that the minimum value of d that satisfies eq 36 is 2. Combining this fact with eq 31, it is clear that the minimum lag that is required for stacking is Lmin = η + 1
∑ (r(i)[k ])T (Â (i)Σe(Â (i))T )−1r(i)[k ] k=1
It may be noted that d = L − η + 1 for the dynamic case, as discussed in the foregoing section. However, the above inequality must be satisfied only when all the 2(L + 1) stacked variables have different error variances. Given the configuration of ZL, however, it is clear that, for the SISO case, there exist only two distinct signals, namely, y[k] and u[k], while the rest are lagged versions of these signals. Consequently, ÅÄÅ 2 ÑÉ ÅÅ σe IL + 1 0 ÑÑÑÑ ÅÅ y ÑÑÑ Σe = ÅÅÅ ÑÑ 2 ÅÅ 0 Ñ σ I ÅÅÇ eu L + 1 Ñ ÑÖ (35)
d(d + 1) ≥2 2
N−L
(37)
In practice, since η is unknown, one chooses a “large enough” value of L. The reader may refer to Section 4.2.1 for guidelines on selecting an appropriate stacking lag. The IPCA method is revised so as to take into account the relaxed identifiability constraint in eq 36. For this purpose, the optimization problem for estimating Σe in eq 21 is modified to reflect this constraint in eq 38 as stated below. H
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
The next section presents illustrations of the proposed approach on the second order system of eq 26, along with an analysis of the quality of the resulting estimates. We also present results on the effect of the stacking order on the parameter estimates and provide guidelines for choosing it appropriately.
of different quality. We propose, in this work, an alternative method that is simple and direct. The proposed method makes use of only the order and error covariance matrix estimates obtained from the previous step. The key idea is to directly estimate model parameters by stacking the variables exactly only up to the estimated order and scaling it with the inverse of Σ̂ e−1/2. An important consequence of this reconfiguration and scaling is that only a single relation is identified (given that we are dealing with a SISO system). The eigenvector corresponding to the last (unity) eigenvalue of this reconfigured matrix directly gives the dynamic model for the scaled data, from which the difference equation for the original or raw data is recovered in a straightforward manner, as outlined below. Let the transpose of the eigenvector (estimated constraint vector) corresponding to smallest singular value Zs,η̂ be denoted by a. The constraint vector is partitioned into âD and âI, corresponding to scaled dependent (lagged outputs) and independent (lagged inputs) variables, respectively, as shown below: [ aD̂ aÎ ]x s , η[̂ k ] = 0
x s , η[̂ k ] = [ y *s , η ̂ u*s , η ̂]T
aD̂ y *s , η ̂ = −aÎ u*s , η ̂
4. SIMULATION RESULTS We take up two case studies for demonstration. The first case study is the follow-through example introduced earlier in eq 26, while the second case study corresponds to a third-order system with a delay of three samples and second-order numerator dynamics. 4.1. Example 1: SISO Second Order System. Recall the system in eq 26), y*[k ] + 0.4y*[k − 1] + 0.6y*[k − 2] = 1.2u*[k − 1]
for which the equation order η is 2. For data generation purposes, the input, u*, is chosen to be a full-length, full-band (white-noise-like) PRBS signal of length N = 1023 observations. The noise-free output, y*, is generated as per eq 26. Measurements y[k] and u[k] are generated by adding zero-mean Gaussian white noise to the noise-free inputs with σeu2 = 0.1103 and σey2 = 0.2477. These values of noise variances correspond to SNR of 10. A snapshot of the input and output data is shown in Figure 2. As a first step of the algorithm in Table 2, the matrix of lagged variables is constructed by setting L = 5 (implying that the maximum possible order η of the difference equation is 4). Following the guidelines in the second step of the algorithm, we examine the eigenvalues resulting from IPCA for values of dguess = 2 through dguess = 5. Figure 3a−d displays the resulting logarithm of eigenvalues corresponding to the chosen values of dguess, respectively. From these plots, it can be inferred that, for dguess = 2 and dguess = 3, the number of eigenvalues close to unity exceeds the guessed number of linear relations, while for dguess = 4, there is an exact match between the guessed number of linear relations and the number of unity eigenvalues. It may be observed that, for dguess = 5, a complete breakdown occurs in the sense that the last dguess eigenvalues differ from unity. From the foregoing discussion, it can be concluded that there exist d̂ = 4 linear relations among the 2(L + 1) = 12 lagged variables. Consequently, the order of the difference equation, η, is η̂ = L − d̂ + 1 = 2, which agrees with the true order. The estimated noise variances for the output and input variables are σ̂ ey2 = 0.2460 and σ̂ eu2 = 0.1144, which are also close to the true values used. With the identified order, the matrix of lagged variables is reconfigured by setting L = η̂ = 2 and scaled using the estimated standard deviations of the input and output noise. Applying the last step of the algorithm in Table 2, we obtain the parameters of the input−output difference equation from the eigenvector corresponding to the unity (last) eigenvalue of the covariance matrix, as given below:
(43) (44)
where y *s , η̂ =
1 [ y*[k ] y*[k − 1] μ y*[k − η ]̂ ]T σê y
1 [ u*[k ] u*[k − 1] μ u*[k − η ]̂ ]T u*s , η̂ = σê u
where the s in the subscripts indicates scaled versions of the respective variables. Finally, the regression equation in eq 44 is rewritten such that the leading coefficient (on y*[k]) is adjusted to unity so as to bring it on par with the standard difference equation form of eq 24. This completes the presentation of the proposed method for identifying the dynamic model from data using DIPCA that is self-reliant, in the sense that the order and noise error covariance matrix are also estimated from data. The complete DIPCA algorithm is summarized in Table 2. Table 2. Proposed DIPCA Algorithm for EIV Identification 1. Construct the data matrix ZL by choosing a sufficiently large stacking order L. If a guess of upper bound on the order is available, set L ≥ Lmin = ηguess + 1. 2. Identify the number of linear relations d̂ and estimate the noise covariance matrix, Σ̂ e, using the modified IPCA (see remark below). 3. Determine the order of the difference equation using the relation η̂ = L − d̂ + 1. 4. Reconfigure the data matrix Z using a lag equal to the estimated equation order η̂ in step 3 and perform eigenvalue decomposition of the covariance matrix after scaling with the inverse square root of estimated Σ̂ e in step 2. 5. Finally, obtain the difference equation model from the eigenvector corresponding to the unity (last) eigenvalue of the covariance matrix of Zη̂.
y*[k ] + 0.396y*[k − 1] + 0.61y*[k − 2]
Remark: The minimum guess value of d (regardless of the choice of L) is two, to ensure identifiability. On the other hand, the maximum guess value of d is the stacking lag L, which corresponds to a first-order process (refer to eq 32). This serves as a useful guideline for estimating the number of linear relations in Step 2 of the algorithm above.
= 0.036u*[k ] + 1.213u*[k − 1] + 0.009u*[k − 2] (45)
It can be observed from eq 45 that the estimated parameters are in close agreement with true values in eq 26. Note that the true values of the coefficients on u*[k] and u*[k − 2] are zero, I
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 2. Measurements for SNR = 10.
Figure 3. Eigenvalues by DIPCA.
whereas the estimated values are nonzero due to finite-sample errors. In order to establish this fact and study the statistical properties of the parameter estimates, we carry out MC simulations of the proposed DIPCA-based identification. Results from 200 runs are analyzed for studying the distributional properties and variability of the parameter estimates. Histogram plots for the parameter estimates are shown in Figure 4. The plots indicate the distribution of the parameters may be reasonably assumed to be Gaussian. Based on this, the 95% confidence region is estimated and presented in Table 3.
From the results of the MC studies, it can be concluded that coefficients of the regressors u*[k] and u*[k − 2] are insignificant (at 95% significance levels). As a final step, the model is re-estimated by excluding these regressors from the matrix of lagged variables. The estimated difference equation is presented below: y*[k ] + 0.401 y*[k − 1] + 0.6004y*[k − 2] = 1.202 u*[k − 1] ±(0.0113)
±(0.012)
±(0.03)
where the estimates of coefficients are based on averages computed from 200 MC simulations and the values in parentheses are standard deviations for the respective parameter estimates. It is clear that the algorithm has correctly J
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 4. Estimates from MC simulation for lag order = 5, SNR = 10.
Table 3. MC Simulation Results from DIPCA for N = 1023, SNR = 10, and L = 5 parameter
true value
95% CI
a1 a2 b0 b1 b2
0.4 0.6 0 1.2 0
[0.355 0.446] [0.575 0.625] [−0.045 0.047] [1.137 1.268] [−0.089 0.085]
4.2. Analysis of DIPCA. From the foregoing description, it can be seen that the only user-defined parameter to be provided to the DIPCA algorithm is stacking order L. It may be recalled that the minimum value of L is 2 for identifiability reasons. This choice allows the user to estimate a first-order model. Any other higher value of L, in general, implies that the maximum possible equation order for the governing difference equation is η = L − 1. In the absence of any prior knowledge, it is recommended to choose L sufficiently high. In the brief discussions to follow, we throw light on the impact of choosing a high stacking lag on the estimates of noise variances and the model parameters. Finally, we study the quality of estimates obtained using DIPCA as a function of sample size using MC simulations.
estimated the order and delay and provided accurate estimates of the parameters. In practice, instead of MC simulations, bootstrapping techniques53,54 can be used to estimate uncertainty in the parameter estimates.
Figure 5. Parameter estimates for different choices of stacking order L. K
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research 4.2.1. Impact of Stacking Order L. Consider the same system specified in eq 26 and a similar procedure for data generation. As the purpose of these simulations is to study the behavior of estimates with respect to stacking order, we will assume the system order to be known. We vary the stacking order from 3 to 80 in steps of 5 for each run of the MC simulation but use the same set of 1023 observations for all stacking orders. An MC simulation of 100 runs for each case is done for estimating the confidence interval. The estimated variances and parameters of the dynamic model from these MC simulations are shown in Figure 5. From the results reported in Figure 5, it may be concluded that the accuracy of parameter estimates is almost constant for a large range of stacking orders. At small values of L, naturally, one expects to see some improvement with an increase in L, but the improvement is marginal thereafter. Furthermore, the accuracy is not significantly impaired even for a large stacking order. This characteristic of the proposed method may be partly attributed to the third step of the algorithm, wherein the matrix of lagged variables is reconfigured with the identified order, regardless of the choice of L. It may further be noted that these simulations are carried out using a sufficiently large sample size (1023 samples) relative to the maximum stacking order (L = 80) under study. Thus, these results may not necessarily hold for small sample sizes or lags comparable to N. Based on these observations, a sufficiently large value of L, but small relative to N, is recommended in practice. Naturally, any prior knowledge on the range of process or equation order can be used to make a more informed choice. We turn our attention next to studying the effect of sample size N on the accuracy of parameter estimates for a fixed choice of L. 4.2.2. Consistency of Model Parameters. In this subsection, consistency of the estimated order and model parameters is investigated via MC simulations. A theoretical analysis of the consistency property of the estimates derived using DIPCA is difficult due to the complicated nature of the method. Therefore, we investigate this property through a numerical study, wherein we use the second order system of eq 26 with noise variances, σey2 = 0.6039 and σeu2 = 0.2510, corresponding to an SNR of 4. It may be noted that a low SNR value is deliberately chosen to study the ability of the proposed algorithm under low SNR conditions. We fix the stacking order to be equal to 5 and estimate the parameters using three different sample sizes of N = 511, 1023, and 8191. A plot of the (logarithm of) eigenvalues resulting from the (second step of the) DIPCA algorithm with dguess = 4 for the chosen sample sizes is shown in Figure 6. For all sample sizes, the order is estimated correctly to be η = 5 − 4 + 1 = 2, as the last four eigenvalues are unity. Running through the full DIPCA algorithm, the model parameters are estimated for each of the three scenarios. MC simulations of 200 runs are performed for each scenario. The 95% confidence intervals are reported in Table 4. The results indicate that the estimates converge to the corresponding true values as seen from the shrinking width of the CI with increasing sample size. It should be noted that CI for each N contains the true value. To summarize, the numerical study indicates that the proposed DIPCA algorithm may be able to provide consistent estimates of the parameters. 4.3. Example 2: Third Order System. The purpose of this case study is to demonstrate the effectiveness of the
Figure 6. Eigenvalues for different sample sizes and SNR = 4.
proposed algorithm on a system with challenging characteristics. For this purpose, we consider a system of third-order dynamics with input order being greater than the output order and three sample delays, as described below: y*[k ] − 0.5y*[k − 3] = 2u*[k − 3] − 1.8u*[k − 5] (46)
Note that, unlike in the previous case study, the equation order η = 5 is different from the order of process dynamics. In order to generate the data, the process in eq 46 is excited with a fullband PRBS input, u* of length N = 1023 sampling instants. Measurements of input and output, as earlier, are generated by adding zero-mean Gaussian white noise with variances of 0.0965 and 0.8697, to the noise free inputs and outputs, respectively. A snapshot of the input and output is shown in Figure 7. The algorithm is implemented by choosing L = 7 (implying a maximum possible sixth order difference equation), and the DIPCA algorithm is applied as explained in the previous case study. Eigenvalues obtained from the DIPCA algorithm corresponding to dguess = 2, 3, and 4 are shown in Figure 8. It is observed that the number of unity eigenvalues is identical to the number of guessed relations for dguess = 3, implying that the number of relations among the 16 lagged variables of Z7 is d = 3. Thus, the estimate of equation order η as per eq 32 is η̂ = 7 − 3 + 1 = 5, which is equal to the true order of the process. The estimated noise variances are [0.0917 0.8850], which are in close agreement with the true values [0.0965 0.8697] used in the simulation. The matrix ZL is reconfigured with L = η̂ = 5, and the final step of the DIPCA algorithm in Table 2 is implemented to estimate the parameters of the difference equation, which are reported below. y*[k ] − 0.034y*[k − 1] − 0.043y*[k − 2] − 0.543y*[k − 3] − 0.023y*[k − 4] − 0.017y*[k − 5] = 0.019u*[k ] + 0.047u*[k − 1] − 0.023u*[k − 2] + 1.978u*[k − 3] − 0.092u*[k − 4] − 1.85u*[k − 5] (47)
It can be observed from eq 47 that the estimated parameters are in close agreement with the true values mentioned in eq 46. We proceed in a manner similar to the previous case study, for L
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research Table 4. MC Simulation Results from DIPCA for Various Sample Sizes and SNR = 4 parameters
N = 511
N = 1023
true value
95% CI
95% CI
95% CI
0.4 0.6 0 1.2 0
[0.279 0.506] [0.539 0.659] [−0.101 0.110] [1.022 1.364] [−0.223 0.189]
[0.327 0.484] [0.562 0.640] [−0.073 0.078] [1.091 1.324] [−0.134 0.152]
[0.364 0.433] [0.587 0.615] [−0.029 0.029] [1.154 1.240] [−0.055 0.059]
a1 a2 b0 b1 b2
N = 8191
Figure 7. Measurements for SNR = 10.
Figure 8. Eigenvalues by DIPCA for Z7.
the determination of distributional properties, variability in parameter estimates, and significant/insignificant coefficients. Averages and 95% confidence interval width (= 1.96 × std. dev.) of parameter estimates from DIPCA algorithm are reported in Table 5 from 200 runs of MC simulations.
It is clear from Table 5 that the estimated coefficients associated with y*[k − 1], y*[k − 2], u*[k], u*[k − 1], u*[k − 2], and u*[k − 4] are insignificant. Subsequently, the difference equation is re-estimated with only the relevant regressors included, and the same is reported below: y*[k ] − 0.494 y*[k − 3] = 2.001 u*[k − 3] − 1.793 u*[k − 5] ±(0.081)
Table 5. MC Simulation Results for Example 2 parameters
true value
a1 a2 a3 a4 a5 b0 b1 b2 b3 b4 b5
0 0 −0.5 0 0 0 0 0 2 0 −1.8
±(0.205)
Comparing with the true system given by eq 46, it is evident that the DIPCA algorithm has estimated the input and output orders, delay, and parameters of the system accurately. 4.4. Nonlinear Case Study: Two Noninteracting Tank System. In this case study, we showcase the ability of the proposed algorithm to estimate a reasonably good linear approximation of a nonlinear system perturbed around its steady state. For this purpose, we consider a system comprising two noninteracting tanks in series6 as shown in Figure 9. The objective is to build a model that explains changes in level h2(t) with respect to changes in inlet flow rate Fi(t). The deterministic first-principles model of the liquid level system (based on conservation of mass) is
95% CI width
[−0.161 [−0.133 [−0.635 [−0.054 [−0.053 [−0.090 [−0.088 [−0.084 [−1.890 [−0.323 [−2.103
±(0.067)
0.218] 0.171] −0.316] 0.062] 0.069] 0.088] 0.081] 0.087] 2.110] 0.435] −1.432] M
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
To validate the above result, the true process described in eq 48b is discretized under zero order hold (ZOH) assumption. The continuous transfer function in Laplace domain derived from first-principles model described in eq 48b is stated below G (s ) =
h(s) 0.2813 = 2 Fi(s) s + 0.3896s + 0.0176
(51)
Discretization of the above transfer function with sampling time of 1 s under ZOH assumption leads to y*[k ] − 1.41y*[k − 1] + 0.46y*[k − 2] Figure 9. Two tank noninteracting system.
= 0.44u*[k − 1] + 0.33u*[k − 2]
dh1(t ) Cv 1 + 1 h1(t ) = Fi(t ) dt A1 A1
(48a)
d h 2 (t ) Cv Cv + 2 h2(t ) = 1 h1(t ) dt A1 A2
(48b)
The estimates derived from DIPCA algorithm in eq 50 are in close agreement to the above discretized transfer function. This demonstrates the versatile nature of the DIPCA algorithm to estimate the linearized model of even a nonlinear system around its steady state.
5. CONCLUSIONS This work presents a novel approach for estimation of dynamic models for linear open-loop processes for the EIV case, specifically from measurements of input and output corrupted by white and spatially uncorrelated, heteroskedastic noises. The most important feature of this algorithm is that it is nearly automated with respect to order determination, noise covariance estimation, and model identification. Conceptually, the method is built on the ideas of iterative PCA, which was originally formulated for static systems and dynamic PCA, which works only for known order and the homoskedastic case, i.e., equal noise variances. A key contribution of this work is the order determination purely from noisy data. The heteroskedasticity in data is handled by scaling the data with the noise covariance matrix, which is simultaneously estimated with the model parameters. The sole user-defined parameter in the proposed algorithm is the stacking order L, for which guidelines have been provided in this paper. In the absence of the user’s input, a default high value of L can be chosen, rendering the algorithm nearly automated. The developed DIPCA algorithm has been shown, through numerical studies, to produce accurate estimates even at low SNR for the case study under consideration. The performance of this proposed method for small sample sizes can be studied as part of future work. Comparison of DIPCA with other methods will also be very useful. The extension of DIPCA for identification of multi-input, multioutput systems is an important problem for future research.
The system is brought to a steady state before exciting it with the designed input. The operating conditions are stated in Table 6. Table 6. Operating Conditions Cv1
Cv2
A1
A2
h1ss
h2ss
1.8
0.5
2.4
1.2
1.23
16
The input, Fi(t), used for simulation is the PRBS signal around the nominal operating point. Gaussian white noise of different variances is added to noise-free data (sampling interval of 1 s) for generating noisy measurements. The variances are chosen such that SNR is maintained as 10. A snapshot of the noisy input−output data is shown in Figure 10.
■ S
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.iecr.8b01374. Proof for consistent estimation of process order and model parameters by the proposed algorithm. It should be noted that this proof is conditioned on the availability of true noise covariance matrix (PDF)
The DIPCA algorithm is used for model identification with stacking order of L = 4 and dguess = 3. The last 6 converged eigenvalues are shown below: (49)
■
As the last three eigenvalues are close to unity, equation order is estimated as η̂ = L − d̂ + 1 = 4 − 3 + 1 = 2. The estimated difference equation is
AUTHOR INFORMATION
Corresponding Authors
*(D.M.) E-mail:
[email protected]. *(A.K.T.) E-mail:
[email protected]. *(S.N.) E-mail:
[email protected].
y*[k ] − 1.4215y*[k − 1] + 0.4693y*[k − 2] = 0.4357u*[k − 1] + 0.3162u*[k − 2]
ASSOCIATED CONTENT
* Supporting Information
Figure 10. Snapshot of input−output data.
Λ = [ 40 31 1.74 1.038 0.986 0.975]
(52)
(50) N
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research ORCID
(23) Alcala, C. F.; Qin, S. J. Reconstruction-based contribution for process monitoring with kernel principal component analysis. Ind. Eng. Chem. Res. 2010, 49, 7849−7857. (24) Luo, R.; Misra, M.; Himmelblau, D. M. Sensor fault detection via multiscale analysis and dynamic PCA. Ind. Eng. Chem. Res. 1999, 38, 1489−1495. (25) Liu, J.; Chen, D.-S. Fault detection and identification using modified Bayesian classification on PCA subspace. Ind. Eng. Chem. Res. 2009, 48, 3059−3077. (26) Wang, J.; Wei, H.; Cao, L.; Jin, Q. Soft-Transition Sub-PCA fault monitoring of batch processes. Ind. Eng. Chem. Res. 2013, 52, 9879−9888. (27) Yao, Y.; Gao, F. Batch process monitoring in score space of two-dimensional dynamic principal component analysis (PCA). Ind. Eng. Chem. Res. 2007, 46, 8033−8043. (28) Chen, J.; Liu, K.-C. On-line batch process monitoring using dynamic PCA and dynamic PLS models. Chem. Eng. Sci. 2002, 57, 63−75. (29) Zhang, Y.; Edgar, T. F. PCA combined model-based design of experiments (DOE) criteria for differential and algebraic system parameter estimation. Ind. Eng. Chem. Res. 2008, 47, 7772−7783. (30) Stumpe, B.; Engel, T.; Steinweg, B.; Marschner, B. Application of PCA and SIMCA statistical analysis of FT-IR spectra for the classification and identification of different slag types with environmental origin. Environ. Sci. Technol. 2012, 46, 3964−3972. (31) Ku, W.; Storer, R.; Georgakis, C. Disturbance detection and isolation by dynamic principal component analysis. Chemom. Intell. Lab. Syst. 1995, 30, 179−196. (32) Lind, I.; Ljung, L. Regressor selection with the analysis of variance method. Automatica 2005, 41, 693−700. (33) Schoukens, J.; Rolain, Y.; Pintelon, R. Modified AIC rule for model selection in combination with prior estimated noise models. Automatica 2002, 38, 903−906. (34) Ho, B.; Kalman, R. E. Effective construction of linear statevariable models from input/output functions. at-Automatisierungstechnik 1966, 14, 545−548. (35) Kung, S.-Y. A new identification and model reduction algorithm via singular value decomposition. Presented at the 12th Asilomar Conference on Circuits, Systems, and Computers, 1978; pp 705−714. (36) Söderström, T.; Wang, L. On model order determination for errors-in-variables estimation. IFAC Proceedings Volumes 2012, 45, 1347−1352. (37) Duong, H. N.; Landau, I. D. An IV based criterion for model order selection. Automatica 1996, 32, 909−914. (38) Young, P.; Jakeman, A.; McMurtrie, R. An instrumental variable method for model order identification. Automatica 1980, 16, 281− 294. (39) Wellstead, P. E. An instrumental product moment test for model order estimation. Automatica 1978, 14, 89−91. (40) Bauer, D. Order estimation for subspace methods. Automatica 2001, 37, 1561−1573. (41) Valle, S.; Li, W.; Qin, S. J. Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods. Ind. Eng. Chem. Res. 1999, 38, 4389− 4401. (42) Qin, S. J.; Dunia, R. Determining the number of principal components for best reconstruction. J. Process Control 2000, 10, 245− 250. (43) Huang, B. Process identification based on last principal component analysis. J. Process Control 2001, 11, 19−33. (44) Vijaysai, P.; Gudi, R.; Lakshminarayanan, S. Errors-in-VariablesBased Modeling Using Augmented Principal Components. Ind. Eng. Chem. Res. 2005, 44, 368−380. (45) Narasimhan, S.; Shah, S. Model identification and error covariance matrix estimation from noisy data using PCA. Control Engineering Practice 2008, 16, 146−155. (46) Dai, Y.; Guan, J.; Quan, W.; Xu, C.; Zhang, H. PCA-based dimensionality reduction method for user information in Universal Network. 2012, 01, 70−74.
Arun K. Tangirala: 0000-0002-7921-5340 Notes
The authors declare no competing financial interest.
■
REFERENCES
(1) Ljung, L. System Identification - A Theory for the User; Prentice Hall International: Upper Saddle River, NJ, U.S.A., 1999. (2) Overschee, P. V.; de Moor, B. Subspace Identification for Linear Systems: Theory, Implementation and Applications; Kluwer Academic Publishers: The Netherlands, 1996. (3) Söderström, T. Errors-in-variables methods in system identification. Automatica 2007, 43, 939−958. (4) Söderström, T. Errors-in-Variables Methods in System Identification; Springer International Publishing: 2018. (5) Söderström, T.; Stoica, P. Instrumental Variable Methods for System Identification; Springer: Berlin, Heidelberg, 1983; Vol. 57. (6) Tangirala, A. K. Principles of System Identification: Theory and Practice; CRC Press, Taylor & Francis Group: Boca Raton, FL, U.S.A., 2014. (7) Mahata, K. An improved bias-compensation approach for errorsin-variables model identification. Automatica 2007, 43, 1339−1354. (8) Fernando, K. V.; Nicholson, H. Identification of linear systems with input and output noise: the Koopmans-Levin method. IEE Proc.D: Control Theory Appl. 1985, 132, 30−36. (9) Chou, C.; Verhaegen, M. Subspace algorithms for the identification of multivariable dynamic errors-in-variables models. Automatica 1997, 33, 1857−1869. (10) Söderström, T.; Stoica, P. Instrumental variable methods for system identification. Circuits, Systems and Signal Processing 2002, 21, 1−9. (11) Wong, K.; Polak, E. Identification of linear discrete time systems using the instrumental variable method. IEEE Trans. Autom. Control 1967, 12, 707−718. (12) Young, P. An instrumental variable method for real-time identification of a noisy process. Automatica 1970, 6, 271−287. (13) Söderström, T.; Soverini, U.; Mahata, K. Perspectives on errorsin-variables estimation for dynamic systems. Signal Processing 2002, 82, 1139−1154. (14) Huffel, S. V.; Vandewalle, J. The Total Least Squares Problem: Computational Aspects and Analysis; SIAM: Philadelphia, PA, U.S.A., 1991. (15) Pintelon, R.; Schoukens, J. Frequency domain maximum likelihood estimation of linear dynamic errors-in-variables models. Automatica 2007, 43, 621−630. (16) Li, W.; Qin, S. J. Consistent dynamic PCA based on errors-invariables subspace identification. J. Process Control 2001, 11, 661− 676. (17) Wang, J.; Qin, S. A new subspace identification approach based on principal component analysis. J. Process Control 2002, 12, 841− 855. (18) Kukush, A.; Markovsky, I.; Van Huffel, S. Consistency of the structured total least squares estimator in a multivariate errors-invariables model. Journal of Statistical Planning and Inference 2005, 133, 315−358. (19) Rao, C. R. The use and interpretation of principal component analysis in applied research. Sankhya:̅ The Indian Journal of Statistics, Series A 1964, 329−358. (20) Wang, X.; Kruger, U.; Irwin, G. W. Process monitoring approach using fast moving window PCA. Ind. Eng. Chem. Res. 2005, 44, 5691−5702. (21) Ge, Z.; Song, Z. Distributed PCA model for plant-wide process monitoring. Ind. Eng. Chem. Res. 2013, 52, 1947−1957. (22) Chen, J.; Wang, W.-Y. PCA- ARMA-Based Control Charts for Performance Monitoring of Multivariable Feedback Control. Ind. Eng. Chem. Res. 2010, 49, 2228−2241. O
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research (47) Abdi, H.; Williams, L. J. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2010, 2, 433−459. (48) Joliffe, I. T. Principal component analysis; Springer Series in Statistics; Springer-Verlag: New York, U.S.A., 2002. (49) Narasimhan, S.; Bhatt, N. Deconstructing principal component analysis using a data reconciliation perspective. Comput. Chem. Eng. 2015, 77, 74−84. (50) Wentzell, P. D.; Andrews, D. T.; Hamilton, D. C.; Faber, K.; Kowalski, B. R. Maximum Likelihood Principal Component Analysis. J. Chemom. 1997, 11, 339−366. (51) Vajk, I.; Hetthessy, J. On the Generalization of the KoopmansLevin Estimation Method. 2005, 4134−4139. (52) Maurya, D.; Tangirala, A. K.; Narasimhan, S. Identification of Linear Dynamic Systems using Dynamic Iterative Principal Component Analysis. IFAC-PapersOnLine 2016, 49, 1014−1019. (53) Efron, B.; Tibshirani, R. An Introduction to the Bootstrap; Chapman/Hall: New York, 1993; Google Scholar. (54) Shumway, R. H.; Stoffer, D. S. Time series analysis and its applications: with R examples; Springer Science & Business Media: 2010.
P
DOI: 10.1021/acs.iecr.8b01374 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX