Estimation of Instrument Variance and Bias Using Bayesian Methods

Often, instrument measurement bias and variance change over the time and online calibration/re-estimation is necessary. Originated from a real industr...
1 downloads 0 Views 1MB Size
ARTICLE pubs.acs.org/IECR

Estimation of Instrument Variance and Bias Using Bayesian Methods Ruben Gonzalez,† Biao Huang,*,† Fangwei Xu,‡ and Aris Espejo‡ † ‡

Department of Chemical and Materials Engineering, University of Alberta, Edmonton, Alberta T6G 2G6, Canada Syncrude Canada Ltd., Fort McMurray, Alberta T9H 3L1, Canada ABSTRACT: Imprecision of sensors is one of the main causes of poor control and process performance. Often, instrument measurement bias and variance change over the time and online calibration/re-estimation is necessary. Originated from a real industrial application problem, this paper proposed a Bayesian approach to determine the inconsistency of sensors, based on mass-balance principles. A mass-balance factor model is then introduced, where the factor analysis method is used to determine initial values for estimating instrument noise and process disturbance variance. Because of the structural constraint of mass-balance equations, a gray-box estimation procedure must be adopted for which Bayesian network estimation via the expectation-maximization (EM) algorithm is a very suitable method. Therefore, this paper uses factor analysis to determine the initial values, and, afterward, estimates process and sensor variance by means of Bayesian estimation. After estimating the process and instrument variance, the process steady state and instrument bias can be similarly estimated.

1. INTRODUCTION Process instrumentation is a crucial element for controlling and monitoring process control systems. However, instrumentation is subject to random measurement error due to imprecision, and more seriously, it is often subject to systematic gross errors caused by instrument bias and, in some cases, process material loss. Identifying true process operating points and detecting gross errors has traditionally been performed by data reconciliation and gross error detection techniques. However, conventional data reconciliation calculates a weighted average, where the weights are the inverse of the estimated instrument noise variance. In practice, it is difficult to estimate instrument noise variance from the data, because of the often-inevitable presence of other process disturbances, which contaminates these estimates. Sets of process data contaminated by both process and instrument noise can be described by a factor model based on mass balances. These types of factor models have predefined relationships between the instruments and a set of process variables. The task at hand is to estimate the combination of process and instrument variances that best explains the data, a task for which Bayesian estimation is an appropriate tool. While the major focus is on estimating process and instrument variance, which is a task that is not explicitly attempted by data reconciliation methods, Bayesian estimation can also be used to estimate process means and instrument bias. A better understanding of this work can be obtained by reviewing the progress of data reconciliation. The first generation of data reconciliation techniques1 made use of statistical tests such as those implemented by Reilly and Carpani,2 which obtains a χ2 value from the optimization of the mass-balance-based objective function. A later family of techniques proposed by Tamhane3 and Crowe et al.4 analyzed residuals of process constraints and applied univariate tests to each instrument. More-advanced techniques for data reconciliation were proposed by Tong and Crowe, who used principal component analysis (PCA) to draw on principal relations between various instruments. Prior to this, data reconciliation assumed that all hidden process states were invariant, thus independent, so that r 2011 American Chemical Society

all variance can be attributed to instruments. Using PCA implicitly allowed for the tests to take process noise into account, and therefore considered common trends of the set of instruments over time, resulting in a more-intelligent testing procedure. Ozyurt and Pike5 built on the objective function approach to detect gross errors by introducing a framework based on the contaminated Gaussian distribution; after this, Schladt and Hu6 continued with the practical use of data reconciliation to enhance soft sensors for key process measurements. The Bayesian network approach for gross error detection (or bias detection in the absence of process material loss) is closely related to the PCA method used by Tong and Crowe,1 because it allows for the separation of process noise from measurement noise to gain more-informative estimates of gross errors. However, the main difference between these techniques is that the PCA method focuses on estimating the covariance matrix from the data, and afterward, imposing mass-balance relationships while Bayesian network techniques focus on estimating a model that is simultaneously consistent with mass balances and measurement noise covariance. Mass-balance factor models contain similar information to that contained in principal component models; however, mass-balance factor models are more explicit in their handling of process variance. These models can be used to apply more-advanced methods similar to Hu,6 except in ways that are completely consistent with Bayesian statistics. This work has made two main contributions. The first contribution is the use of Bayesian networks to separate and estimate measurement noise and process disturbance variances using historical data, as well as estimating process means and gross errors. The second contribution is the method of estimating capacity parameters by integrating process data over different time intervals and performing a regression analysis on the Received: August 24, 2010 Accepted: March 31, 2011 Revised: January 31, 2011 Published: April 14, 2011 6229

dx.doi.org/10.1021/ie101770p | Ind. Eng. Chem. Res. 2011, 50, 6229–6239

Industrial & Engineering Chemistry Research

ARTICLE

Figure 1. Solids handling and slurry preparation.

covariance matrices; this was a secondary procedure but necessary for practical application. This paper is divided into the following sections. A problem from industry is shown in section 2 with an introduction of the corresponding Bayesian network. Section 3 contains an overview of the EM algorithm, and the use of PCA and FA to obtain an initial variance guess. Section 4 contains development of gross error estimation from a defined Bayesian network. A simulation of sequential variance and bias estimation is shown in section 5, while section 6 presents a framework for capacity regression. Finally, section 7 contains an industrial application of capacity regression, variance estimation, and bias estimation.

2. MOTIVATION 2.1. Industrial Example. Weightometers are instruments that are designed to measure solid mass flow; however, they make use of many mechanical parts and, therefore, normally require considerable maintenance. For example, in the DeBeers mine, if maintenance is not frequently performed, the gross errors in weightometers can reach 20%.7 Common causes for inaccuracy

include mass spillage onto weighing components, belt vibrations, and belt shifting. The weightometers considered in this analysis were used for oil sands mining operations near Fort McMurray, Alberta, Canada. These weightometers were subject to even more complications than the DeBeers mine mentioned in ref 7, namely, the adhesive consistency of the oil sand. Such conditions often lead to poor weightometer performance, despite following the recommended maintenance practices. Because of the nature of oil sands operations, weightometers are often unable to achieve their design performance without maintenance that is above and beyond the usual requirements. Ultimately, work in this area can enable the on-site monitoring of instrument performance and correction, which could reduce costs of maintenance and aid in dealing with the unavoidable presence of weightometer error. Consider the oil sands slurry preparation system shown in Figure 1, which contains seven instruments: (1) truckload values from the WENCO Mining Database (WENCO) (2) crusher weightometer (WI 1) (3) surge level indicator (LI 1) (4) mix box feed conveyor (WI 2) 6230

dx.doi.org/10.1021/ie101770p |Ind. Eng. Chem. Res. 2011, 50, 6229–6239

Industrial & Engineering Chemistry Research

ARTICLE

(5) slurry flow meter (FM S) (6) slurry density meter (DM S) (7) water flow meter (FM W) From the flowchart in Figure 1, WENCO data were available in a database that recorded the mass of each truckload and the time at which it was dumped into the dump hopper. WI 1 is a weightometer on the crusher conveyor belt that feeds into the surge pile. LI 1 gives a reading on the relative level of the surge pile (from 0% to 100%). Since this is a percentage capacity reading, one of the tasks is to estimate its total capacity. WI 2 is a weightometer for the mix box feed conveyor, which transports oil sand from the surge pile to the slurry mix box. In this mix box, a slurry is prepared by adding controlled amounts of water to the oil sand. FM W measures the volumetric flow rate of water, and since the density of water is known, the conversion to mass flow is trivial. The volumetric effluent slurry flow is measured by FM S, with the density measured by DM S, so that a mass flow rate can be calculated. 2.2. Problem Formulation. Mass balances serve as a method to keep track of process material over a given time window Δt. Since mass balances are an accumulative comparison of mass flow over time, flow rates should be integrated over time; compositions and properties, however, do not have rate-related units and, therefore, should be averaged over time instead of integrated. Capacity values are inherently cumulative mass values and, therefore, should not be integrated; instead, they should have a difference calculated between the beginning and end of the time interval. The procedure for calculating the mass-balance observation vector Y is shown in eq 1 for the industrial example. Z ti þ Δt WENCO dt Y1ðiÞ ¼ ti Z ti þ Δt Y2ðiÞ ¼ WI 1 dt ti

Y3ðiÞ ¼ cH ðLI 1ðti þ ΔtÞ  LI 1ðti Þ Þ Z ti þ Δt WI 2 dt Y4ðiÞ ¼ ti Z ti þ Δt FM S dt Y5ðiÞ ¼ ti Z ti þ Δt DM S dt ti Y6ðiÞ ¼ Z ti þ ΔtΔt Y7ðiÞ ¼ FM W dt

ð1Þ

ti

Note that, since LI 1 gives readings in terms of percent capacity, it must be appropriately scaled by the hopper scalar (cH) to make the conversion to mass units. Since all instrument measurements in eq 1 contain noise, raw measurements can be treated as Gaussian random variables. Measurement variance is dependent on the sample time interval Δt, because of the effect of integrating noise over this interval; because of this, mass-balance variance estimates are only appropriate given a certain sampling time. The only exception to this rule is intermediate holding, which has no integration; such observations would have constant variance, regardless of Δt. 2.3. A Note on Notation. In this paper, numerical subscripts indicate the index of an array. For example, C2 would denote the second entry of a vector, while S12 would denote the first row, second column entry of a two-dimensional array. Subscripts with curved brackets indicate that the normal script variable is defined

or specified by the subscript variable within the brackets. For example, ε(Σ) would indicate that ε is process noise that corresponds to the observation covariance Σ; another example would be σ2(Δt)2, which indicates that measurement noise variance for σ22 is dependent on a particular value of Δt, in the same way σ2(1)2 would be the value of σ22 that corresponds to Δt = 1 (or 1 min). 2.4. Mass Balances: Factor Model Representation. Factor models can be used to represent the mass balance of a process at steady state. Factor models, as defined in this work, are composed of two main types of variables: hidden state variables (X), and observed variables (Y), which are affected by the hidden states. Since the factor models of interest represent a mass balance, the hidden states X are a set of minimal process parameters that define the mass balance. However, the hidden states are subject to process noise, even at steady state. Because of control systems or the process itself, noise within the hidden states can be cross-correlated; thus, X can be represented by eqs 2 and 3. n ∼ Nð0, IÞ

ð2Þ

X ¼ μ þ Lx n

ð3Þ

where μ is the process mean vector at a given steady state, n a standardized source of variability, and Lx a loading matrix that produces cross-correlated noise. In this way, cross-correlated process noise (δ) can be summarized as follows: ð4Þ

δ ¼ Lx n

Since n has covariance I, the covariance of δ can be determined as 0

ΣX ¼ Lx Lx

ð5Þ

For the industrial example, three hidden states are considered over a given time period: X1 is the accumulated oil sands throughput (centered around the slurry mix box), X2 is the accumulated water throughput, and X3 is the net oil sands accumulation in the surge hopper. The second part of the factor model deals with the observation variables Y, as shown in eq 7 below. This is the general nonlinear representation of a factor model with gross errors β, which often represents instrument bias or process material loss. ε ∼ Nð0, ΣY Þ

ð6Þ

Y ¼ f ðXÞ þ β þ ε

ð7Þ

where f(X) is a nonlinear observation function of the hidden state X, β represents gross errors, and ΣY is the measurement noise covariance matrix, or, equivalently, the covariance of Y|X. ΣY is diagonal, since it is assumed that measurement noises are mutually uncorrelated. Linear approximations of the observation model take the form shown in the following equation: Y ¼ CX þ r þ β þ ε

ð8Þ

where C is an observation matrix of linearized coefficients and r is a vector of linearized reference points. C and r can be calculated by means of a Taylor series approximation, given by the following equations: Cij ¼

Df ðμÞi Dμj

r ¼ f ðμÞ  Cμ 6231

ð9Þ ð10Þ

dx.doi.org/10.1021/ie101770p |Ind. Eng. Chem. Res. 2011, 50, 6229–6239

Industrial & Engineering Chemistry Research

ARTICLE

where f(μ)i is the ith element of f(μ) (where i ranges from 1 to the number of instruments) and μj is the jth element of μ (where j ranges from 1 to the number of hidden states). For most measurements, Taylor series approximations are not necessary. When measurements are a direct combination of the hidden states having the same units, coefficients of C are simply 1. For example, Y2 is a direct combination of the mass flow through the mix box and the mass accumulation within the hopper. Linear combinations can also have simple unit conversions. For example, Y5 is a volumetric reading that is affected by mass flow rates of oil sands and water, which are hidden states; the corresponding elements of C are the specific volume of oil sands and water, which serve as unit conversion constants. For this industrial example, a Taylor Series approximation is only required for the density meter, which is dependent on oil sand mass flow X1 and water mass flow X2, as shown in eq 11. Fslurry ¼

Fsand X1 þ Fwater X2 X1 þ X2

ð11Þ

Because there is no way to separate this equation into a linear combination of X1 and X2, a Taylor Series approximation has been applied. 2.5. Conversion to a General Factor Model. The massbalance factor model defined by eqs 6 and 8 is based on a set of observation variables Y that are defined by a set of cross-correlated hidden variables X and an observation matrix C. However, general factor analysis methods estimate the loadings of hidden variables having a mean of zero and a covariance of I, which is a property of the standardized variation source n in eq 3. A general factor model can be obtained by substituting μ þ Lxn for X into eq 8. This results in observations of Y being explained in terms of standardized latent variables n, instead of the state variables X: Y ¼ Cðμ þ Lx nÞ þ r þ β þ ε

ð12Þ

This results in Y having the following distribution: 0

Y ∼ Np ðCμ þ r þ β, ΣY þ C0 Lx Lx CÞ

ð13Þ

where C and r are specified by process knowledge, so they do not need to be estimated. The principal interest of this work is to estimate measurement noise and process disturbance covariances, ΣY and L0xLx in a way that is consistent with a massbalance model structure. Estimating these parameters also allows us to obtain a rudimentary estimate of te instrument bias (β). 2.6. Bayesian Networks. Estimating ΣY and LxL0x will be done using a Bayesian network approach, which can explicitly estimate the statistics of X as well as Y, given X, by allowing us to fix the values of C. This results in variance estimates that are consistent with the mass-balance model structure. A Bayesian network is a visual representation of causal relationships denoted by nodes and arrows; arrows always point from parent to child, which indicates the type of reasoning that must be done. If a parent value is known, it can be used to predict the values of child nodes. (See Figure 2.) If a child value is known, as shown in Figure 3, it can be used to diagnose the values of the parent nodes. A comprehensive description of the Bayesian inference for Gaussian distributions is explained by Bolstad8 in his textbook.

Figure 2. Predictive reasoning.

Figure 3. Diagnostic reasoning.

3. BAYESIAN NETWORKS AND VARIANCE ESTIMATION Parameter estimation using Bayesian networks is discussed in this section. Parameters of interest are means and variances of X and β (shown in eqs 3 and 8). However, the chosen Bayesian network technique requires an initial value. Nevertheless, by removing the sample mean, the initial value problem is reduced to one that can be solved using factor analysis (FA); FA does not require an initial value. FA yields a solution for the covariances of X and Y|X, but it does not necessarily conserve the relationships between instruments and hidden states represented by the C matrix in eq 8. Furthermore, FA cannot estimate means as parameters; these are the reasons why Bayesian learning is the focus as the principal tool for this work. Nevertheless, because FA does not require initialization, this independent solution can be used as an initial value for the Bayesian variance estimation problem. The FA-initialized Bayesian variance estimation solution can then be used as part of the mean estimation procedure, which also uses Bayesian network techniques. 3.1. Bayesian Example. Consider the industrial example. To remove bias from the measurements, means are subtracted so that we can focus on variance estimation. This converts eqs 6 and 8 to the form shown in eqs 14 and 15: 0 1 0 10 1 x1  x1 300 0 0 n1 B Bx x C C¼B Bn C C B 150 40 0 C CB ð14Þ 2A @ 2 @ A@ 2 A x3  x3 n3 0 0 120 1 0 1 y1  y1 B B C B y2  y C B 1 2C B B By y C B 0 B 3 B 3C C B B B y4  y4 C ¼ B 1 C B B B y5  y5 C B 1=2:1 C B B By y C B C @ 6 @ o 6A y7  y7 0 0

0 0 0 0 1:0 Cw 1

0 1 1 B C B 1C C0 1 B B x 1C  x C 1 1 B CB C B B C 0 C@ x 2  x 2 A þ B C B B 0C C x3  x3 B B 0C A @ 0

ε1 ε2 ε3 ε4 ε5 ε6 ε7

1 C C C C C C C C C C C A

ð15Þ where x1 represents the overall oil sands flow, x2 represents the water flow, and x3 represents the hopper level change, while yi represents a measurement output. Co and Cw are parameters that 6232

dx.doi.org/10.1021/ie101770p |Ind. Eng. Chem. Res. 2011, 50, 6229–6239

Industrial & Engineering Chemistry Research

ARTICLE

PDF given P(Y|X,θ^n) is another multivariate normal R by eq 8. One should also note that P(Y|X,θ^n)P(X|θ^n) dX is simply a normalization constant, since θ^n is constant and X is marginalized out; this results in eq 17 being a relatively simple function of X. At this point, it is also useful to note that ln½PðY , XjθÞ ¼ ln½PðY jX, θÞPðXjθÞ

which makes explicit use of the model-defined PDFs P(Y|X,θ) and P(X|θ); these PDFs are similar to those used in eq 17, but are written in terms of the unknown real parameter set θ, instead of fixed estimates θ^. Combining eqs 17 and 18 and marginalizing out X by integration results in the expectation shown below: Z ^ Þ ln½PðY , XjθÞ dX EXjY , θ^ fln PðY , XjθÞg ¼ PðXjY , θ n

Figure 4. Bayesian representaion of the mass-balance factor model.

are obtained using Taylor Series approximations, and therefore are dependent on the operating condition. The model from eqs 14 and 15 has the corresponding Bayesian network shown in Figure 4. Note that the solid arrows symbolize relationships that are predetermined and broken arrows symbolize relationships that must be estimated. Similarly, solid nodes have a known variance, but broken nodes require variance estimation. According to this convention, one can observe in Figure 4 that relationships between observed variables and hidden variables are known; however, the dotted lines around the nodes indicate that the statistics for these nodes are not complete. Namely, the variance of the nodes is of the interest. Furthermore, X1 and X2 are shown to be correlated, but the dotted line indicates that the correlation between the two variables is also unknown. 3.2. Bayesian Parameter Estimation with Hidden Variables. Data from this model contain underlying variables that are hidden (we treat real states as hidden and measurements as observed); thus, the parameter estimation methods must be able to deal with hidden variables. The most widely used parameter estimation methods that can deal with hidden variables are Gibbs Sampling (GS) and Expectation Maximization (EM). However, GS is often only used as a last resort, because of its lower accuracy; thus, the EM algorithm is of interest here. Detailed information about the EM algorithm can be found in the work of Korb and Nicholson9 and in the work of Dempster et al.10 Being an iterative approach, the EM algorithm obtains a new estimate of parameters θ^nþ1, given a previous set of parameters θ^n, by successive substitution and maximization. Z  ^ ^ PðXjY , θn Þ ln½PðY , XjθÞ dX ð16Þ θn þ 1 ¼ arg max θ

where X is a missing or hidden variable, Y a set of observed data, and θ^ an estimate of the real parameter set θ. This process can be broken down into two steps: (1) Expectation: When we have a set of initial values for θn, we can compute the probability density function (PDF) P(X|Y, θ^n) as a function of X, conditioned by a set of parameters θn, which are to be held constant: ^ Þ ¼Z PðXjY , θ n

^ ÞPðXjθ ^Þ PðY jX, θ n n ^ ÞPðXjθ ^ ÞdX PðY jX, θ n n

ð17Þ

where θ^n is a set of node-related parameters such as mean, variance, and path coefficients at iteration n, P(X|θ^n) is the multivariate normal PDF given by eqs 2 and 3, and

ð18Þ

n

ð19Þ This expectation is calculated by the previous fixed parameters θ^n and the variable parameters θ, which are to be optimized in the maximization step. 2 Maximization: The objective function obtained in eq 19— which is a function of θ, since X was marginalized out—can be maximized according to eq 16. Calculations involving Bayesian network estimation and the EM algorithm were done using the BNT routine, which is an open-source toolbox for MATLAB written by Murphy.11 The files have been made open to the public via the University of British Columbia Computer Science website, as well as Google Project Hosting. Details on software use are given by Murphy11 at the URL given in the reference section. For convenience of notation, we use the notation EM(θ|Y) to denote the parameter estimated by the EM algorithm and the data that are used. Thus, ^ X ^  ¼ EMð½θ, Xj½Y Þ ½θ,

ð20Þ

if θ and X are both estimated using data given in Y. 3.3. Principal Component Analysis. Principal component analysis (PCA) aims to break down the sample covariance matrix into linearly independent components, and it is often used as a first step toward factor analysis to determine the number of latent variables. In general, the goal of PCA is to rotate the axes of p-dimensional covariance matrix to coincide with directions of maximum variability, resulting in a diagonal matrix.12 This allows greater variance to be described by a few major underlying uncorrelated variables. Performing PCA starts with an estimate of the covariance matrix S of steadystate data given below. S¼

0 1 n ðYi  Y ÞðYi  Y Þ n  1 i¼1



ð21Þ

where Y i is the ith sample of the p-dimensional vertical measurement vector Y at steady state. It is important that Y measurements that are used to estimate S be quality data. When instrument measurements have very different units or scales, the standardized PCA method is often best to use, as in the case of this analysis. Consequently, the sample correlation matrix (R) is analyzed instead of the sample covariance matrix (S). R can be obtained by transforming S, using S y, which is the main diagonal of S. This transformation is shown 6233

dx.doi.org/10.1021/ie101770p |Ind. Eng. Chem. Res. 2011, 50, 6229–6239

Industrial & Engineering Chemistry Research in eq 23.

0

S1, 1 B B 0 Sy ¼ B B l @ 0

0 S2, 2 l 0

333 333 3 33 333

R ¼ Sy 1=2 SSy 1=2

1 0 C 0 C C l C A Sp, p

ARTICLE

ð22Þ

ð23Þ

Let q e p represent the number of principal components considered when performing PCA. E(q) = e1, e2, ..., eq would then be a matrix of q eigenvectors, and Λ(q), a diagonal matrix of eigenvalues λ1, ..., λq of the correlation matrix R. The correlation matrix R can be redefined according to spectral decomposition in eq 24, leading to the approximation in eq 26. 0

R ¼ EðpÞ ΛðpÞ EðpÞ

ð24Þ

~s ¼ EðqÞ ΛðqÞ 1=2 L

ð25Þ

~0s L ~s RL

ð26Þ

~s is a matrix of factor loadings according to PCA, defined where L according to eq 25. Equation 26 should hold true as long as q is large enough to capture sufficient variance. The objective is to find the minimal value of q that would still capture most of the variance expressed in eq 24. PCA is actually an approximate FA solution; the objective of PCA is to determine an adequate number of factor loadings to use in FA. 3.4. Factor Analysis by Modified Principal Components. While there are many methods to perform FA, the modified PCA method was used for this paper. This method of FA is similar to PCA but accounts for noise within the individual observations (Y1 ... Yp) themselves, which is denoted as Ψs. FA techniques attempt to separate R into its diagonal specific variance Ψs and its loading-based communality LsL0s, as shown in eq 27. 0

R  Ls Ls þ Ψs

ð27Þ

Ls and Ψs are directly estimated from R, using the iterative Principal Factor Solution, which can be found in the work of Johnson and Wichern.12 Once Ψ s and Ls have been estimated, they can be converted to their unstandardized forms, according to eqs 28 and 29. L ¼ Sy 1=2 Ls

ð28Þ

Ψ ¼ Sy 1=2 Ψs Sy 1=2

ð29Þ

FA makes estimates on the assumption that the model can take the form of a zero-mean factor model, as shown in eq 30. Y~  Ln þ εðΨÞ

ð30Þ

where Y~ represents conditioned observation data (with zero mean); n is a source of variation, identical to the n used in eq 3; and ε(Ψ) is observation noise with an estimated covariance Ψ. Ideally, ε(Ψ) = ε but estimates of Ψ are not constrained to be consistent with mass balances, thus one must distinguish ε(Ψ) from ε. Because eq 13 defines the distribution of a zero-mean linear factor model, it is possible to perform a substitution to generate an FA model. When subtracting the sample mean

Figure 5. Bayesian representation with bias.

value of Y from eq 12, the factor model reduces to eq 31. Y  Y ¼ CLx n þ ε

ð31Þ

If the data can be arranged in a manner that is consistent with eq 30, FA estimates the loading matrix L, as well as the covariance of ε(Ψ). It becomes clear from eqs 30 and 31 that FA makes an attempt to estimate a value for L = CLx, as well as ΣY = Ψ. The principal cause of concern for using Ψ as an estimate for ΣY is the fact that black-box estimation methods such as FA cannot force consistency with the mass-balance equation structure onto estimates of L and Ψ. Particular trouble arises in the case where one set of measurements for an instrument seems to be relatively uncorrelated with the others. When this happens, it is difficult to determine whether the noise is coming from the instrument or from a separate hidden state n. Nevertheless, measurements that cause such confusion are often the exception and not the norm. Thus, as an initial guess for ΣY, Ψ is still valuable; the subsequent Bayesian estimation will essentially make corrections to Ψ by forcing mass-balance consistency to get Σ.

4. ESTIMATING THE GROSS ERROR DISTRIBUTION While the principal focus of this work is using Bayesian networks to estimate process and measurement noise covariance, this method can also be used to estimate gross errors such as instrument bias. Since covariance matrices are now specified, the Baysian network can be simplified to a form such as that shown in Figure 5 Data quality requirements for bias estimation are not as strict as those for variance estimation; thus, the data where gross error is suspect can be independent of the data used to estimate variance as long as the assumption of unchanging variance is reasonable. Since bias is a property of the mean, means should not be removed from the data in this procedure (thus resuming direct use of eqs 3 and 8 as the model). The procedure to estimate gross errors such as bias follows the philosophy of traditional data reconciliation and gross error detection procedures. In traditional methods,5 the first step is to determine weighting factors for each instrument, which is related to the instrument variance. The second step uses all instruments to estimate a steady state; this state is known as the reconciled estimate. The reconciled estimate is then used as a reference point against each of the instruments. If any set of these instruments is far enough away from the reconciled estimate to be considered suspicious, the reconciled estimate is recalculated with the suspect instruments removed. The first step is to estimate the weighting factors. For this method, the weighting factors are essentially ΣY and ΣX, whose estimation is represented by ^  ¼ EMð½ΣY , ΣX jðY  Y ÞÞ ^ ,Σ ½Σ Y X 6234

ð32Þ

dx.doi.org/10.1021/ie101770p |Ind. Eng. Chem. Res. 2011, 50, 6229–6239

Industrial & Engineering Chemistry Research

ARTICLE

Table 1. Variable Definitions

Table 2. Correlation Matrix for a Direct Nonlinear System

Value symbol

mean

standard deviation

meaning real oil sand flow

y1

y2

y3

y4

y5

y6

y7

1.00

0.83

0.15

0.78

0.79

0.03

0.83

1.00

0.17

0.89

0.88

0.03

0.82

0.15

0.17

1.00

0.07

0.07

0.02

0.07 0.86

0.72

x1

1000

300

x2

500

155.2

real water flow

0.78

0.89

0.07

1.00

0.94

0.03

x3

0

120

real hopper level change

0.79

0.88

0.07

0.94

1.00

0.05

0.91

0.03

0.03

0.02

0.03

0.05

1.00

0.07

0.72

0.82

0.07

0.86

0.91

0.07

1.00

y1

0

200

WENCO database value

y2

0

90

first Weightometer readings

y3

0

130

hopper level readings

y4 y5

0 0

80 60

second Weightometer readings slurry flow meter readings

y6

0

0.2

density meter readings

y7

0

50

water flow meter readings

β1

0

0

bias

β2

30

0

bias

β3

0

0

bias

β4 β5

300 80

0 0

bias bias

β6

0.2

0

bias

β7

0

0

bias

with an initial guess for ΣY and ΣX given by FA. The second step is to obtain an initial reconciled estimate for state X. ^ ,Σ ^ Þ ^X ¼ EMðμX j½Y , Σ μ Y X

ð33Þ

Because of the degrees of freedom, one cannot simultaneously estimate the means of X and β to obtain a unique solution. This is why traditional methods assume β is equal to zero when obtaining a preliminary estimate of the steady state. However, one can improve the estimate μ^X by simultaneously estimating the variance of β denoted as Σβ0. This allows for the penalization of instruments whose distance from the mean is substantial, having correspondingly large values of Σβ0. ^  ¼ EMð½μ , Σβ0 j½Y , Σ ^ ,Σ ^ Þ ½^ μX , Σ X β0 Y X

the next set of estimates for β to be more accurate, because the reference point is less contaminated.

5. SIMULATION OF THE OVERALL PROCESS Before applying these methods to real process data, a simulation of the industrial system was performed by generating data for a fictional operating point. Verification of this method is best done through simulation, since the true parameters are available in a simulation beforehand. The real parameters are given in Table 1. Note that gross errors were introduced into this simulation in the form of instrument bias. Gaussian data were generated for the values of n and ε, so that simulated values for X could be obtained using eq 3 and Y could be calculated according to the nonlinear model in eq 7. Nine hundred (900) data points were generated for the system. When performing appropriate substitutions to eqs 3 and 7, simulated data can be directly calculated according to eqs 36 and 37: 0 1 0 1 0 10 1 1000 300 0 0 x1 n1 B Bx C C¼B Bn C C B 500 C CþB B 150 40 0 C CB @ 2A @ A @ A@ 2 A ð36Þ x3 n3 0 0 0 120 1 1 0 0 200n4 x1 þ x3 y1 B C B B C C B y2 C B B x2 þ x3 C B 90n5 B C B C B 130n By C B x 6 C B B 3C B 3 C B B C B C þ B 80n7 B y4 C ¼ B x1 C B B C B 1 1 C B 60n8 B y5 C B 2:1 x þ 1:0 x 1 2 C B C B By C B ðx þ x Þ1 ð2:1x þ 1:0x Þ C B B 0:2n9 2 1 2 A @ 6A @ 1 @ 1 50n10 y7 1:0 x2 0

ð34Þ

A suitable initial value for μX can be obtained from process design specifications while a suitable initial value for ΣB0 is ΣY, because the two values have the same units and usually have similar scale. Once μ^X is estimated, gross errors estimates and their variances Σβ can be obtained using μ^X as a reference point. ^ Σ ^  ¼ EMð½β, Σβ j½Y , μ ^ ,Σ ^ Þ ^X , Σ ½β, β Y X

ð35Þ

A suitable initial value for β is 0, while a suitable initial value for Σβ is Σβ0. Using and Σ^β, one can perform hypothesis testing to determine whether or not = 0. The risk level (R) can be chosen based on false positive severity; since data is normally abundant (well over 300 data points), the standard normal distrubution can be used instead of Student’s T-distribution. A risk level of R = 0.1 is used for the purpose of this work, because of the relatively benign consequences of false positive results (typical test procedures use values of R = 0.05). If any of these tests fail, one should either remove this data from Y, or correct that particular set of measurements using . After corrupted data has been removed or corrected, μ^X should be re-estimated using eq 33 or eq 34, to remove or reduce corruption in the state estimate. This allows

1 C C C C C C C C C C C A ð37Þ

where ni is randomly generated Gaussian noise with a mean of 0 and a variance of 1. This simulation resulted in the correlation matrix given in Table 2. As mentioned previously, all relationships are linear, except for the case of y6. Performing a Taylor Series approximation around the operating points x1 = 1000 and x2 = 500 yields the following parameters: Csand = 1.836  104 and Cwater = 4.492  104. 5.1. Variance Estimation Results: EM by Uninformed Initial Guesses. When performing parameter estimation, it was found that certain parameters were sensitive to initial values. Most parameters were well-converged, except y4 and y7. Again, if the initial values are not carefully considered, the model may assume that one instrument is unduely precise in measuring the corresponding underlying variable. A summary of results can be found in Table 5, which is presented later in this paper. 6235

dx.doi.org/10.1021/ie101770p |Ind. Eng. Chem. Res. 2011, 50, 6229–6239

Industrial & Engineering Chemistry Research

ARTICLE

Figure 7. Approximate factor analysis model. Figure 6. FA validation using principal component analysis (PCA).

Table 4. Loadings and Noise from FA Estimation Versus Simulated Values Table 3. PCA Vectors Determining by Decreasing Principal Component (PC) variable

62.5%

15.4%

14.3%

3.8%

2.1%

1.2%

Estimated

0.8%

Simulated

L1

L2

L3

L1

L2

L3

y1

0.42

0.16

0.05

0.82

0.33

0.10

0.05

y1

335.8

54.9

20.55

y1

300

120

0

y2

0.45

0.14

0.04

0.00

0.42

0.77

0.05

y2

336.3

53.4

20.04

y2

300

120

0

y3

0.01

0.95

0.09

0.24

0.04

0.17

0.01

y3

4.5

158.4

28.2

y3

0

120

0

y4 y5

0.46 0.46

0.10 0.11

0.01 0.01

0.11 0.19

0.49 0.06

0.47 0.34

0.55 0.78

y4

316.2

33.6

2.03

y4

300

0

0

y5

282.7

30.8

4.16

y5

292.9

0

40

y6

0.03

0.11

0.99

0.04

0.03

0.00

0.00

y7

0.44

0.12

0.03

0.47

0.69

0.15

0.26

y6 y7

0.01 124

0.04 14.1

0.185 5.58

y6 y7

0.0012 150

0 0

0.0195 40

5.2. Variance Estimation Results: PCA and FA Results. PCA was performed with the intent of determining how many hidden variables were needed to explain the covariance (or correlation) matrix. Figure 6a displays a plot of the percent variance explained by the nth principal component. Because improvements were marginal after three principal components, three underlying latent variables are sufficient for this model (which happens to be the number of hidden process states). From these results, it was shown that roughly 92.1% of the variance can be explained using three principal components. Table 3 displays principal components (PCs) labeled by their scores in units of percent variance. When using three factors (as decided from PCA), roughly 98.4% of the variance can be explained by three factors when independent noise was taken into account. One can now use this information to construct a network model. From the overall assumption that three hidden variables cause measurement variance, the corresponding Bayesian network is shown in Figure 7. The approximate FA model is simplified by omitting connections between nodes with negligible correlation based on loadings in Table 3. Factor loadings along with the estimated standard deviation are displayed in Table 4. Note that these are scaled so that underlying factors have a mean of 0 and standard deviation of 1 (explicit in eq 2). Real values for CLx are presented in Table 4, along with the sample standard deviation calculated from simulated data. Because loadings were known beforehand through mass balances, the principal interest is in estimating the variance in measurement noise. Overall, noise values produced by FA are fairly close to the simulated values. The only issues lie within the noise variance for y3 and y6, both of which are severely underestimated; this is because the instruments used to measure y3 (surge level) and y6 (density) are relatively uncorrelated with the rest of the instruments. If measurements for an instrument are

Table 5. Summary Table of Variance Estimates variable

simulated σ

EM

FA

EM with FA

x1 x2

319 161

314 180

333 155

314 174

x3

117

126

161

121

y1

222

207

212

203.7

y2

99.9

105

102

94.3

y3

144

133

62.8

114.1

y4

88.8

2.25

94.2

89.8

y5 y6

66.6 0.222

83.7 0.221

62.9 0.063

56.9 0.220

y7

55.5

14.7

59.9

47.69

relatively uncorrelated with the others, FA will attempt to heavily associate this independence toward a separate underlying cause ni and reduce the effect of ni on the other measured variables. This is easily seen in Table 5, where the variance for x3 is highly overestimated, because x3 is the only hidden variable that has a significant effect on y3. Under these circumstances, it is difficult to determine how much of the measurement variance is caused by the instrument and how much is caused by the hidden process. With the measurement variance calculated, the variance of hidden variables can be estimated either by analyzing the loadings or, since measurement noises were assumed to be uncorrelated with process noise, eq 38 can be used to estimate process noise variance. σmeasured 2 ¼ σ process 2 þ σinstrument 2

ð38Þ

Results from FA are compared with other results in Table 5. 5.3. Variance Estimation Results: Bayesian Estimation with FA Initial Values. While FA results were suboptimal, they 6236

dx.doi.org/10.1021/ie101770p |Ind. Eng. Chem. Res. 2011, 50, 6229–6239

Industrial & Engineering Chemistry Research

ARTICLE

Table 6. Estimated Bias from Bayesian Estimation (Zero Bias Initial) variable

simulated

EM

σβ

Table 7. Estimated Bias from Bayesian Estimation (Modified Bias Initial)

percentile error

variable

simulated

EM

σβ

percentile error

β1

0

15.2

122

10.5

β1

0

15.7

118

10.6

β2 β3

30 0

58.5 29.9

58.8 76.8

37.2 30.3

β2 β3

30 0

58.7 7.01

57.0 74.3

38.5 7.52

β4

300

235

50.6

79.8

β4

300

275

49.0

38.8

β5

80

104

43.4

42.7

β5

80

96.5

41.9

30.5

β6

0.2

0.223

0.199

9.02

β6

0.2

0.211

0.199

4.18

β7

0

4.58

33.3

10.9

β7

0

4.38

33.2

10.8

were found to be an effective starting point for Bayesian estimation by means of the EM algorithm. Bayesian estimation did not bring a dramatic improvement of variance estimation for most instrument measurements, when compared to FA initial values. However, it significantly improved variance estimates for difficult-to-estimate measurements such as y3 and y6, leading to a much more reliable estimation overall. A summary of all results is shown in Table 5. 5.4. Bias Estimation: Bayesian Estimation Results. After the variance parameters were estimated, the Bayesian estimation was used to estimate gross errors from bias using the procedure outlined in eqs 3235. A summary of results is given in Table 6. Mean and standard deviation values were given, and estimated error percentiles were calculated according to the normal distribution. Since the initial bias guess was zero, the bias estimates were somewhat conservative. Because β4 was significant, dataset Y4 was corrected with the bias estimated and the estimation procedure outlined in eqs 3235 was performed again. Estimates shown in Table 7 gave improved bias estimation results, since the state estimate was less contaminated.

6. SPECIAL CASE, ESTIMATING CAPACITY WITH NO SCALING 6.1. Time Dependency as Information. There is one additional challenge that must be overcome before FA and Bayesian estimation can be used for industrial data. The hopper level indicator gave a percent capacity reading, but the maximum capacity was not known. Therefore, the instrument readings could not be easily used to indicate the change in hopper mass, which is a necessary hidden state to complete the mass balance. The aim is to obtain a hopper scalar (cH) that would enable the reliable estimation of the real surge hopper mass accumulation (x3) using surge hopper level readings (y3).

x3 ¼ c H y3

ð39Þ

Returning to the industrial example in Figure 1, consider a subsystem that focuses on WI 1, LI 1, and WI 2, where an intermediate holding vessel is placed between two streams; for this subsystem, the observation vector Y is redefined as follows. y1ðiÞ ¼ LI 1ðti þ ΔtÞ  LI 1ðti Þ Z y2ðiÞ ¼

ti þ Δt

WEI 1  WEI 2 dt

ð40Þ ð41Þ

ti

Note that, unlike eq 1, eq 40 does not include cH in the observation vector terms, because it is unknown. Traiditonally, cross-calibration can be done by a comparison of means for flow

measurements; however, proper cross-calibration can be difficult for capacity change units, because they have zero mean. Furthermore, if the process variance is much less than the measurement variance (as in a steady-state process), instrument measurement correlation is very low, making direct calibration by direct regression unfeasible. The corresponding factor models (similar to eqs 3 and 8) are shown in eq 42:

y1 y2

!

x ¼ LxðΔtÞ n ! C1 ¼ xþ C2

ε1

!

ð42Þ

ε2ðΔtÞ

where subscript (Δt) indicates that the corresponding variable is dependent on Δt, x is a single hidden state (the real surge hopper accumulation), and n is the standard Gaussian noise with a mean of 0 and a variance of 1. y1 represents the difference in surge hopper readings, while y2 is the summed difference between weightometer readings. C is the observation matrix and Lx(Δt) is the loading on the process noise (which is equal to the standard deviation of x) and is dependent on Δt; ε2(Δt) is the measurement noise from the weightometer difference and is also a function of Δt, while ε1, the hopper level noise, is constant for all values of Δt. The expected covariance structure from this model is represented below. ! C1 C2 LxðΔtÞ 2 σ 1 2 þ C1 2 LxðΔtÞ 2 ð43Þ covðyÞ ¼ C1 C2 LxðΔtÞ 2 σ 2ðΔtÞ 2 þ C2 2 LxðΔtÞ 2 The covariance at one time interval specifies three combinations of parameters, but there are five unknown parameters (Lx(Δt), C1, C2, σ12, and σ2(Δt)2). Thus, a solution from a straightforward mass-balance factor model is unavailable. Nevertheless, because some parameters vary with Δt while others do not, it is possible to use this information to perform a regression analysis. Before applying this problem, some parameters can be specified. Continuous flow measurements give direct units; thus, C2 has a value of 1 by definition, while C1 is equal to cH1, since C1 converts x values to y and cH converts y values to x. When making appropriate substitutions for C1 and C2, eq 43 reduces to eq 44 (see below). Let S(Δt) be the sample covariance matrix of instrument data, sectioned off according to a given sampling time Δt; its expected value, according to this model, is defined as ! σ1 2 þ cH 2 LxðΔtÞ 2 cH 1 LxðΔtÞ 2 ð44Þ EðSðΔtÞ Þ ¼ cH 1 LxðΔtÞ 2 σ 2ðΔtÞ 2 þ LxðΔtÞ 2 Sample covariance matrices are directly available for chosen values of Δt; because of this, it is possible to exploit this model 6237

dx.doi.org/10.1021/ie101770p |Ind. Eng. Chem. Res. 2011, 50, 6229–6239

Industrial & Engineering Chemistry Research

ARTICLE

Figure 8. Regression analysis for noise: (a) hopper noise correlation and (b) weightometer noise correlation.

Figure 9. Regression focusing on σ12 estimation: (a) hopper noise correlation and (b) σ12 (given cH) correlation.

structure and perform regression for covariance values over different values of Δt. Since a 2  2 covariance matrix has three unique entries, there are three possible regression equations: S11ðΔtÞ = σ1 2 þ S12ðΔtÞ =

1 LxðΔtÞ 2 cH 2

1 LxðΔtÞ 2 cH

S22ðΔtÞ = σ 2ðΔtÞ 2 þ LxðΔtÞ 2

ð45Þ ð46Þ ð47Þ

where Sij(Δt) is the entry corresponding to the ith row and jth column of the sample covariance matrix S(Δt); recall that the subscript (Δt) denotes that this variable is a function of Δt. The direct values for Lx(Δt)2 are not available; however, eq 46 indicates that values for (1/cH)Lx(Δt)2 can be obtained directly from the sample covariance matrix. By substitution, relationships shown in eqs 4547 can be reduced to the models shown in eqs 48 and 49.   1 ð48Þ S12ðΔtÞ S11ðΔtÞ = σ1 2 þ cH S22ðΔtÞ = σ 2ðΔtÞ 2 þ cH S12ðΔtÞ

ð49Þ

The model shown in eq 48 can be used directly for regression since the surge level indicator variance (σ12) and the surge hopper scalar (cH) are independent of Δt; however, in eq 49, the weightometer variance (σ2(Δt)2) is an unknown function of Δt. If σ2(Δt)2 and Lx(Δt)2 are not independent, regression is not possible. Thus, because of direct knowledge of time independence, eq 48 is much more suitable for regression analysis. 6.2. Regression Results Using Industrial Process Data. Using industrial data, the first task was to perform regression analysis on eq 48 to get an estimate of cH. Figure 8a displays the data for linear regression and parameter estimation results for cH as well as σ12. Both figures pertain to time intervals between 4 min and 30 min. Figure 8a shows a very clear linear trend, which results in a precise value of the cH parameter, which is the inverse of the slope. Since the σ12 parameter is somewhat insignificant at this scale, confidence intervals were wider. For this time interval region, it was also found in Figure 8b that the weightometer variance σ2(Δt)2 was linear with respect to Δt2. Because of the fact that the level indicator balances have instrument noise variance that are independent of time, the level indicator instrument variance σ12 can be estimated as the limit of the trend when Δt f 0 or the y-intercept. To obtain a more precise estimate of this parameter, data points were generated by

Figure 10. QQ plot for variance estimation data (300 data points).

sectioning the data into even smaller time windows, as σ12 becomes more significant in this region. Data were generated for time intervals of 210 min. (See Figure 9.) Confidence intervals for σ12 were much tighter, and marginal plots of σ12 showed little trend activity. Now that this method has been verified to determine the surge hopper scalar accurately, FA and Bayesian estimation can be performed on industrial data.

7. INDUSTRIAL APPLICATION For the industrial case, the variance and bias of instrument measurements were estimated in the same manner as the simulation. When performing process and instrument noise variance estimation, special attention was given to obtaining a dataset wherein measurements followed a multivariate normal distribution. The degree of multivariate normality can be represented by a QQ plot (shown in Figure 10). If the QQ plot can be considered linear, it is likely that the data are multivariate normal; one can use the coefficient rQ as a quantitative indicator of linear correlation. In this case, the rQ coefficient obtained from 300 data points is 0.9942, which is greater than the 0.9935 threshold for a significance of R = 0.1 but less than the 0.9953 threshold for a significance of R = 0.05.12 Note that multivariate normality is only required in the instrument and process variance estimation step shown in eq 32. The larger independent dataset (used in the steps represented by eqs 3335) does not need to be multivariate normal, and it generally will not be if the bias is not constant. Nevertheless, steady state is still a requirement for these steps. Data indicating 6238

dx.doi.org/10.1021/ie101770p |Ind. Eng. Chem. Res. 2011, 50, 6229–6239

Industrial & Engineering Chemistry Research

ARTICLE

Table 8. Estimated Variance from Bayesian Estimation (FA Initial) variable

FA

EM with FA

x1

386

246

x2 x3

40.8 423

55.8 329

y1

364

429

y2

151

179

y3

153

207

y4

51

104

y5

11.2

12.2

y6 y7

0.011 9.75

0.102 18.1

Table 9. Industrial Estimated Variance from Bayesian Estimation variable

σβ Prior

μβ

σβ Posterior

percentile

β1

250

47.6

237

15.9

β2

260

80.9

242

26.2

β3 β4

2.17 357

0.01 306

2.1 80.6

0.27 100.0

β5

22.6

15.2

14.6

70.3

β6

0.18

0.16

0.13

78.8

β7

19.7

5.22

14.4

28.4

steady-state operation are not included in this paper, for proprietary reasons. After verifying steady-state and multivariate normality, variance estimation was performed. Summaries for the resulting measurement noise variance are shown in Table 8. One observation was that the WENCO variance was quite substantial. After the variance parameters were estimated, bias means and variances were obtained in the same manner as that observed previously, using Bayesian estimation; the results are shown in Table 9. σβ Prior represents the root-mean-square (rms) error of each set of instrument measurements, which includes both random and systematic error; the rms error is used to adjust the instrument measurement weight for the first state estimate. μβ is the estimated mean gross error, while σβ Posterior represents the random error of each instrument after the state is estimated. Percentiles in the confidence result are given in the final column. The major concern from the industrial standpoint was the second weightometer Y4, which was reported to have biases of up to þ30%. One can see from the results that the corresponding bias β4 has substantial bias with a confidence of 99.997%. This result has been confirmed by engineers within the industry.

When trying to obtain adequate scaling for capacity, the massbalance factor model is adequate in specifying a regression model, with respect to the time window. The independence of capacity measurement variance with respect to Δt allowed the use of linear regression analysis to obtain an estimation of the scaling factor. Such an approach can solve the problems of implementing the Bayesian estimation approach when intermediate holding measurements are not properly scaled.

’ AUTHOR INFORMATION Corresponding Author

*Tel.: þ1-780-492-9016. Fax: þ1-780-492-2881. E-mail: bhuang@ ualberta.ca.

’ REFERENCES (1) Tong, H.; Crowe, C. M. AIChE J. 1995, 41, 1712–1722. (2) Reilly, P.; Carpani, R. E. Application of Statistical Theory of Adjustments to Material Balances. Presented at the 13th Canadian Chemical Engineering Conference, Montreal, Quebec, Canada, 1963. (3) Mah, R. S. H.; Tamhane, A. C. AIChE J. 1982, 28, 828–830. (4) Crowe, C. M.; Campos, Y. A. G.; Hrymak, A. AIChE J. 1983, 29, 881–888. (5) Ozyurt, D.; Pike, R. Comput. Chem. Eng. 2004, 28, 381–402. (6) Schladt, M.; Hu, B. Chem. Eng. Process. 2007, 46, 1107–1115. (7) Pan, X.; Metzner, G.; Selby, N.; Visser, C.; Letwaba, T.; Coleman, K. J. S. Afr. Inst. Min. Metall. 2003, 38, 291–296. (8) Bolstad, W. Introduction to Bayesian Statistics, 1st ed.; John Wiley and Sons: New York, 2004. (9) Korb, K. B.; Nicholson, A. E. Bayesian Artificial Intelligence, 1st ed.; Chapman & Hall/CRC: Boca Raton, FL, 2004. (10) Dempster, A.; Laird, N.; Rubin, D. J. R. Stat. Soc., Ser. B 1967, 39, 1–38. (11) Murphy, K. Bayesian Network Tool Box and Manual; 2007. (Available via the Internet at http://code.google.com/p/bnt/.) (12) Johnson, R. A.; Wichern, D. W. Applied Multivariate Statistical Analysis, Sixth ed.; Prentice Hall: Englewood Cliffs, NJ, 2008.

8. CONCLUSIONS While Bayesian estimation is capable of optimizing parameters in a way that is consistent with both mass balances and measurement noise covariance, this method comes at a price of requiring an initial value. However, factor analysis (FA) is capable of obtaining effective initial values. When FA is used to initialize Baysian estimation, results tend to converge well, since Bayesian estimation can force mass-balance consistency onto estimations. Results for this method were consistent both in simulation and in practice. 6239

dx.doi.org/10.1021/ie101770p |Ind. Eng. Chem. Res. 2011, 50, 6229–6239