Statistical Monitoring of Wastewater Treatment Plants Using

using double-weighted local outlier factor and its application to nonlinear process monitoring. Xiaogang Deng , Lei Wang. ISA Transactions 2018 72...
0 downloads 0 Views 5MB Size
Article pubs.acs.org/IECR

Statistical Monitoring of Wastewater Treatment Plants Using Variational Bayesian PCA Yiqi Liu,† Yongping Pan,‡ Zonghai Sun,† and Daoping Huang*,† †

School of Automation Science & Engineering, South China University of Technology, Wushang Road, Guang Zhou 510640, China Department of Biomedical Engineering, National University of Singapore, Singapore Medical Drive, 117575, Singapore



ABSTRACT: Multivariate statistical projection methods such as principal component analysis (PCA) are the most common strategy for process monitoring in wastewater treatment plants (WWTPs). Such monitoring strategies can indeed recognize faults and achieve better control performance for fully observed data sets but can be more difficult in the case of having missing data. This study presents a variational Bayesian PCA (VBPCA) based methodology for fault detection in the WWTPs. This methodology not only is robust against missing data but also reconstructs missing data. Furthermore, a novel historical data preprocess method is proposed to deal with diurnal behaviors in the WWTPs with fast sample rate. These methodologies have been validated by process data collected from two WWTPs with different process characteristics and different sample rates. The results showed that the proposed methodology is capable of detecting sensor faults and process faults with good accuracy under different scenarios (highly and lowly instrumented WWTP).

1. INTRODUCTION Due to the increasing concern about the protection of the environment and the stringent effluent quality requirements imposed by law, there is a strong need to monitor wastewater treatment plants (WWTPs) in order to detect early and identify the root cause of a fault (fault diagnosis) or the abnormality that might negatively affect the process. With online sensors available in the WWTPs, the huge amount of correlated data requires proper techniques to extract useful information.1 Multivariate statistical projection methods, such as principal component analysis, partial least-squares (PLS), and independent component analysis (ICA), provide an alternative to make data more comprehensible by extracting relevant information.2−6 These methods utilize data directly collected from the process to build an empirical model, which serves as a reference of the desired process behavior and against which new data can be compared. Also, they provide graphical tools easy to apply and interpret by process operators. Recent applications of these approaches and their modifications to wastewater treatment operation include continuous and batch processes.7−11 However, municipal WWTPs are exposed to disturbances such as heavy rains and snow. These variations and uncertainties in a WWTP often make the fault detection model unreliable. A potential way to solve these problems is to involve probabilistic strategies in the PCA model.12,13 These probabilistic strategies often attempted to identify parameters through the least-squares method for optimization by assuming they follow a Gaussian distribution.14 However, parameter identification using the least-squares method for optimization potentially results in an overfitting problem. It is, therefore, necessary to offer a reliable parameter learning methodology to solve this problem. Another issue we need to pay more attention to is that, due to the demanding conditions in the biological WWTPs, missing measurements and irregularly sampled data are commonly experienced mainly because of hardware sensor failure or routine maintenance, data © 2014 American Chemical Society

acquisition, or system malfunction. These often distort the covariance matrix of PCA even resorting to regular probabilistic PCA. To address this issue, Qin et al. proposed a variable reconstruction based PCA fault detection method,15 which dealt with fault detection and missing data reconstruction simultaneously. However, this work is premised on the assumption that the dimensions of missing data are not more than two; in other words, only one sensor failure is permitted every time step. To recover missing data in the multivariate data sets, Arteaga and Ferrer gave a detailed analysis, showing that three different techniques, two projection based techniques and a third regression based one, could be used to handle this problem.16 Even though the regression based technique prevailed in the past few years,10,17 projection based techniques still have potential to be explored. Therefore, we extended VBPCA for fault detection and diagnosis. It can cope with the overfitting problem by penalizing parameter values which correspond to more complex explanations of the data. Furthermore, using Bayesian learning to identify PCA parameters, missing data can be reconstructed smoothly. Additionally, it offered a more natural way to combine squared prediction error (SPE) and T2 index different from the method proposed by Qin.18,19 A probability given by this VB learning for the fault detection was generated as well; uncertainty information for unknown quantities, therefore, can be quantified by doing so and further used to detect unreliable results. Additionally, in a municipal WWTP, the process data often has typical diurnal and seasonal trends, e.g., in influent flow rate, concentrations, and temperature. Also, differences in the wastewater composition between weekdays and weekends exist. Received: Revised: Accepted: Published: 3272

November 8, 2013 January 16, 2014 January 17, 2014 January 17, 2014 dx.doi.org/10.1021/ie403788v | Ind. Eng. Chem. Res. 2014, 53, 3272−3282

Industrial & Engineering Chemistry Research

Article

These characteristics often result in different patterns for a highly and poor instrumented WWTP without paying enough attention by authors,20,21 only being simply treated in a unified manner when using the fault detection method. For a large and good equipped WWTP, it is common to collect data from sensors in a 15 min or faster sample rate. This means lots of dynamic behaviors and uncertainties tend to influence the data set significantly and the corresponding wastewater influent variations along with the daily urban life are necessary to be taken into account, whereas, since lots of fast online sensors are unavailable and due to cost savings requirements for small WWTPs in the rural areas, the sample rate for poor instrumented WWTP is 1 day commonly, thus leading to fewer dynamic behaviors or even none. Therefore, unlike the traditional unified method used for data preprocessing in WWTPs, we proposed a different data preprocess method for typical highly and lowly equipped wastewater treatment plants. The remaining sections of this paper are organized as follows. Section 2 gives some basic theories on variational Bayesian learning PCA and its extension to fault detection. Section 3 presents the performance of the VBPCA based fault detection method through a wastewater treatment process simulation on the BSM1 platform and a real wastewater process. Finally, the work is concluded.

SPE lim

where θi = for i = 1, 2, and 3, h0 = 1 − (2θ1θ3/3θ22), and cα is the normal deviate corresponding to the upper (1 − α) percentile.

3. VBPCA BASED PROCESS MONITORING The motivation behind VBPCA is to find the most probable parameter set θ = {W, vx} in the model structure: where x ∈ R and W ∈ R denotes the loading matrix. Both the principal components t and the noise e are assumed normally distributed: d×n

(4)

k(n − 1) F(k , n − 1, α) n−k

(9)

(10) t|x

t|x

N (t |x: MPTx , vxM )

T

−1

T

(11)

The probability density of e|x is derived from eqs 6−9. According to the conditional Gaussian property, p(e|x) = N(e|x: μe|x, Σe|x), where μe|x = (I − WMWT)x and Σe|x = vxWMWT; thus, N (e|x: (I − WMWT )x , vxWMWT )

(12)

It is obvious that the observations can be decomposed into the systematic part and the noise part for further fault diagnosis: x = WE[t |x] + E[e|x] = (WMWT )x + (I − WMWT )x (13)

where x = E[x|x]. 3.1. Monitoring the Systematic Part. In the monitoring process, t can be substituted by its estimate t = E[t|x] = MWTx, since the latent variables cannot be measured directly. Given that sample x satisfies eq 14, the process is considered as in the control with α, and the corresponding probability of t is given by eq 15.13

where Λk is a diagonal matrix constructed by the first k eigenvalues of Σ. The control limit of T2 is calculated by Tlim 2 =

p(e) = N (e: 0, vxI )

where p(t|x) = N(t|x: μ , Σ ), μ = (W W + vxI) W x, and Σt|x = vx(WTW + vxI)−1. Let M = vx(WTW + vxI)−1. We can reformulate it as

where xnew = Wkt. Two statistical variables T2 and SPE can be calculated by t and r, respectively:

SPE = r Tr

(8)

t|x

(2)

(3)

p(t ) = N (t : 0, I )

N (x: 0, WWT + vxI )

(1)

T 2 = t Λk −1t T

d×d

where vx is the noise variance. PCA is actually a special case of the VBPCA model restricted to vx. The parameter set θ can be estimated by the VB learning, as illustrated in the following section. The core in VBPCA is the evaluation of the probability densities of all variables in the model: p(t), p(e), and p(x) and also posteriors p(t|x) and p(e|x). Since p(tj) = N(tj: 0, I) and p(ej) = N(ej: 0, vxI) are already assumed in the VBPCA model and the remaining variables are p(x), p(t|x), and p(e|x), it is obvious that p(x) is subject to normal distribution:

Denote λj (j = 1, ..., n) as the eigenvalues of the matrix Σ that are arranged in descending order to determine the principal components (PCs), and their corresponding eigenvectors are the principal component loadings Wj. If the first k PCs are selected, the resulting residual is defined as r = (I − WkTWk)xnew

(7)

x = Wt + e

T

XX n−1

(6)

∑nj=k+1 λij

2. PRINCIPAL COMPONENT ANALYSIS Modern industrial processes often present huge amounts of process data due to the large number of frequently measured variables. One of the most widely used methods to deal with this problem in industries is PCA. PCA is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components (PCs). Due to its sensitivity to the relative scaling of the original variables, PCA is wildly used for fault detection and diagnosis in industries.22 Let n scaled observations generate a data matrix X ∈ Rd×n. Its corresponding covariance matrix of X is defined as23 Σ=

⎡ ⎤1/ h0 cα 2θ2h0 2 θ2h0(h0 − 1) ⎥ ⎢ = θ1 1 + + ⎢ ⎥ θ1 θ12 ⎣ ⎦

∥t∥2 ≤ χ1 − α ; c −2 (5)

(14)

p(t ) = (2π )(−0.5c) exp( −0.5 ∥t∥2 )

where F(k, n − 1, α) is an F distribution with degrees of freedom k and n − 1 with significance level a and the control limit of SPE is obtained by

(15)

2

Equation 14 is a T test actually, representing systematic information in the data set. 3273

dx.doi.org/10.1021/ie403788v | Ind. Eng. Chem. Res. 2014, 53, 3272−3282

Industrial & Engineering Chemistry Research

Article

W:k (scalar for each k). In this case, the model parameters θ = (W, t, m) are treated as hidden variables. Application of the EM algorithm to this problem requires computation of the posterior p(θ|x, e) in the E-step, which is generally intractable. The variational view of the EM algorithm justifies replacing the actual posterior p(θ|x, e) with a simpler probability density function (pdf) q(θ), which is selected to have a tractable form. Therefore, using the variational approach,24,25 the E-step is modified to update the approximation q(θ) so as to minimize the cost function

3.2. Monitoring the Noise Part. Since the squared Mahalanobis norm of e, ∥vx−0.5e∥2, follows χ(d)2, we can substitute undetectable e by e = E[e|x] = (I − WvxWT)x. Therefore, the in-model test can be given by eq 16. The happening probability density of e from the model is obtained by eq 17.13 ∥vx −0.5e∥2 ≤ χ(1 − α , d)−2

(16)

p(e) = (2πvx −0.5d exp( −0.5vx −1 ∥e∥2 )

(17)

Equation 16 is comparable to the SPE test expressed, representing noise information in the data set. 3.3. Monitoring of the Measurement Variable. In the VBPCA model, the covariance matrix of x is expressed by W and vx, i.e., N(x: 0, WWT + vxI). And the squared Mahalanobis norm of x follows a χ(d)2 distribution. Thus, the process distribution and corresponding probability of the given sample are shown as eqs 18 and 19. ∥(WWT + vxI )−0.5 x∥2 ≤ χ(1 − α ; d)−2

c(q) =

(18)

(19)

Let us define the combination of the noise and systematic parts as a combined index. In practice, a single monitoring index has been proposed before by Qin et al.,18 i.e., (SPE(x)/ δ2) + (T2(x)/χ2). The drawback of this method is that they need to compute SPE and T2 separately and do not take into account uncertainties in the fault detection process. However, the combined index proposed is obtained by calculating the probabilistic distribution of eq 7 directly. To obtain the proper PC number, we denote the systematic part of the ith variable in x as xsi . Then, the variance explanation ratio of xi is defined by ri = diag(WWT)i ∗ [diag(WWT)i + vx]−1, where diag(.)i and diag(.)i−1 denote the ith diagonal element and its inverse, respectively. Because Δri = ri(l + 1) − ri(l) ≥ 0 and l ∈ (dim(x) − 1), the PC number l can be decided by calculating max({ri(l)i∈d} ≤ ϵ). 3.4. Variational Bayesian PCA. All the statistical decisions in the monitoring index 14−19 are totally dependent on the adequate estimate of the model parameters, W and vx. The traditional probabilistic PCA parameter identification always results in an overfitting problem due to using the least-squares method for optimization.14 A common way to cope with this problem is to penalize parameter values which correspond to more complex explanations of the data. A natural regularization in PCA is using penalization of large values in matrices W and t. In the Bayesian formulation, this is equivalent to introducing a prior over the model parameters. To approximate the averaging value as well, the model in eq 7 can be reformulated as

x = Wt + m + e

∏ N(W:k: 0, vw ,kI ) k=1

∏ q(mi) ∏ q(wi) ∏ q(t j) (24)

j=1

i=1

i=1

Then, the cost function can be minimized alternately w.r.t. one factor, q(θi), in eq 24 while keeping the other ones fixed. Because we use conjugate priors in eqs 21 and 22, it is possible to find optimal pdf’s q(θi) on each step: The optimal q(mi), q(wi), and q(tj) are Gaussian, and their update boils down to reestimation of the corresponding means and covariance matrices. The mean parameters W, X, and m of the approximating pdf’s can be used as estimates of the corresponding model parameters. The cost function of VBPCA Cvb =

q(θ )

∫ q(θ) log p(X , θ|v , v x

d

+

∑ i=1 n

+

(20)

∑ j=1

+

∑ ij ∈ O

(21)

w , kvm)



d 1 T m m + log 2πvm 2wm 2

=

1 T 1 Wi diag(vw , k −1)Pi + 2 2

c

∑ 2πvw ,k k=1

c 1 t t j t j + log 2π 2 2 1 1 (xij − mi − wiT t j)2 + log 2πvx 2vx 2

+ log(q(θ ))

c

p(W ) =

n

d

d

q(θ ) =

This equation can be complemented with Gaussian priors over the elements of vector m and matrix W:

p(m) = N (m : 0, vmI )

(23)

Thus, the cost function is a function of variational parameters θq, the parameters of the approximating pdf q. On the M-step, the approximation q(θ: θq) can be used as if it was the actual posterior p(θ|x, e). The VB learning approach to PCA can be implemented by minimization of eq 24 w.r.t. the approximating pdf q(θ) and e. The complexity of the cost function 24 depends on the form of the approximating pdf q(θ). A computationally convenient form for the PCA model is

p(x) = (2π )−0.5d det[(WWT + vxI )−1]−0.5 exp( −0.5xT (WWT + vxI )−1x)

q(θ )

∫ q(θ) log p(x , θ|e) dθ

(25)

is minimized w.r.t. the approximating pdf q(θ) and model hyperparameters vx, vw, and vm. Here we use the notation θ = (W, t, m) and ⟨f(θ)⟩ = ∫ f(θ) q(θ) d(θ). To allow for a better optimization procedure, q is used in the following form as eq 26

(22)

where m represents the averaging value, vm is the prior variance for elements of m, vw,k is the prior variance for the elements of 3274

dx.doi.org/10.1021/ie403788v | Ind. Eng. Chem. Res. 2014, 53, 3272−3282

Industrial & Engineering Chemistry Research

Article

Figure 1. The schematic of the wastewater plant for BSM1. d

q(W , T , m) =

d

After setting initial values of the maximum number of iterations and the termination by rms training error, we can obtain all values for VBPCA by calculating eqs 27−35 iteratively until the minimum training error. In order to use eqs 7 and 14−19 process monitoring, we just set W = W̅ and m = m̅ and vx is obtained after the identification process convergence. VB learning is the sensitive posterior probability mass rather than the posterior probability density, thus making it more resistant against overfitting compared to other estimation methods (least squares, point estimation). Furthermore, since uncertainty information for unknown quantities can be obtained by this VB learning, it is useful to detect unreliable results (for example, unreliable reconstruction of missing data).

∏ N(m: m̅ i , m̃ i) ∏ N(wi: wi̅ , wĩ ) i=1

i=1 n

∏ N(t j: ti̅ , ti)̃ (26)

j=1

The update of the principal components is performed as eqs 27 and 28 tj̃ = vx(vxI +

tj̅ =

∑ [wi̅ ∗ wi̅ T + wĩ ])−1 i ∈ Oj

(27)

1 t j ∑ wi(xij − m̅ i) vx i ∈ O ̅

(28)

j

4. CASE STUDIES 4.1. Benchmark Simulation Model 1 (BSM1). (1). Background. The simulation platform used is the BSM1 developed by the second IWA Respirometry Task Group together with (COST) 682/624 Actions, providing an unbiased benchmarking system for comparing various control strategies without reference to a particular facility (http://www.benchmarkwwtp. org/). For the present study, a relatively simple plant layout is selected in the simulation benchmark (Figure 1). This layout combines nitrification with pre-denitrification, which is most commonly used for nitrogen removal. The plant is designed to treat an average flow of 20 000 m3/day with an average biodegradable chemical oxygen demand (COD) concentration of 300 mg/L. The plant consists of five compartment biological tanks (5999 m3) and a secondary settler (6000 m3). The last three compartments of the bioreactor are aerated, whereas the others are not. Complete mixing is assumed to occur in all compartments. The nonreactive secondary settler is modeled with a series of 10 layers with an area of 1500 m2 and a depth of 4 m. There are two internal recycles: the nitrate internal recycle from the last tank to the first tank and the activated sludge recycle from the underflow of the secondary settler to the front end of the plant. The IAWQ Activated Sludge Model No.1 (ASM1) is chosen to simulate the biological process. This model describes removal of organic matter, nitrification, and denitrification. The feed from the final biological reactor is connected to the sixth layer from the bottom of the secondary settler. The scenario with dry weather conditions is tested in this case study. The simulated BSM1 comprises 28 days of influent data recorded at 15 min intervals. (2). Historical Data Preprocess. In this case, seven important variables were selected for monitoring with lots of missing data at each data column displayed in Table 1.

where (.) is the mean value, (.)͠ is the variance, and j = 1, ..., n. And the update of the bias term and matrix W is implemented using eqs 29−32 vm m̅ i = ∑ [xij − wi̅ T tj̅ ] |Oi|(vm + vx /|Oi|) j ∈ O (29) i

vxvm m̃ i = |Oi|(vm + vx /|Oi|) wĩ = vx(vx diag(vw , k −1) +

(30)

∑ [ tj̅ tj̅ T + tj̃]) j ∈ Oi

1 wi̅ = wĩ vx



(31)

tj̅(xij − m̅ i)

j ∈ Oi

(32)

where O is the set of indices i, j for which xi,j is observed, Oi is the set of indices j for which xi,j is observed, Oj is the set of indices i for which xi,j is observed, |O| is the number of indices in O, i = 1, ..., d and the variance parameters: vx =

1 N

∑ [(xij − wi̅ T tj̅ − m̅ i)2 + m̃ i + wi̅ T tjw̃ i̅ + tj̅ Twtĩ j̃ ij ∈ O

̃ ĩ )] + tr(tjw

vw , k =

vm =

1 d

1 d

(33)

d

∑ (wik̅ 2 + wik̃ ) i=1

(34)

d

∑ (m̅ i 2 + m̃ i) i=1

(35)

where w̃ ik is the kth element on the diagonal of w̃ i. 3275

dx.doi.org/10.1021/ie403788v | Ind. Eng. Chem. Res. 2014, 53, 3272−3282

Industrial & Engineering Chemistry Research

Article

process properly and provide a more accurate model for fault detection and diagnosis. (3). Process Monitoring Validation. On the basis of the dry weather influent loads, we simulated the operation of the WWTP over a period of 4 weeks. The first two weeks were used for model training, while the remaining was for testing. All seven sensors gave correct measured values during the 4 weeks at the beginning. In order to validate the performance of data preprocess, the PCA fault detection model was constructed using the nonmissing data set first. In this model building, three PCs were selected according to the variance contribution. Two fault detection models were built on the basis of the unwhitening (not to subtract typical data) and whitening (to subtract typical data) data set. Figure 4 (left) reports that the PCA based model without subtracting typical data had a lot of faulty alarms for both SPE and T2. However, the PCA based model using whitening data had better performance with only a few points crossing the 98% control limit (Figure 4 (right)). These suggested that, by subtracting typical data, the dynamic behaviors during the wastewater plant data set are capable of being broken up, thereby ensuring a better PCA model for fault detection. In the next section, the effect of missing data was evaluated via the PCA model. 30% percent of the raw data was deleted and further used for PCA model building. In principal, to alleviate the negative influence of missing data, they should be replaced by mean values or median values. As can be seen in Figure 5a,c, due to missing data, the decision is difficult to make. One of the potential ways is to replace missing data with mean values. Figure 5 demonstrated that the PCA model did not perform well in terms of SPE and T2 index, even though missing data were replaced by mean values. One reason why this solution always fails was that the mean values take into account merely the influence from itself and exclude its correlation with other variables. The other was that the uncertainties and noise estimation were not considered properly. Also, due to such an effect, the 98% control limit of SPE obtained from original perfect data (raw data without missing points) decreased from 4.9 to 4.3 (scenario using mean values for replacement), illustrating the covariance matrix (see eq 1) and SPE control limit (see eq 6) of PCA were distorted dramatically. The whitening data samples were used to construct VBPCA models next, which is the same as PCA model building. Two PCs, which contributed a lot to not only detection ability but

Table 1. The Selected Sensors for Validation measurements 1 2 3 4 5 6 7

ammonia concentration in influent (g N m−3) influent flow rate (m3 day−1) dissolved oxygen in the third tank (g COD m−3) dissolved oxygen in the fourth tank (g COD m−3) dissolved oxygen in the fifth tank (g COD m−3) KLa (h−1) nitrate concentration in the fifth tank (g N m−3)

number of missing variables 201 47 48 214 134 34 168

Traditional PCA always removes incomplete samples or replaces them with mean values or median values, leading to a considerable loss of information and biased monitoring models. Unlike these ad hoc methods, VBPCA provides a more natural way to deal with this problem by VB learning. Model development is initially attempted using raw data. However, as depicted in Figure 2, the analysis of historical data sets revealed that, during dry-weather periods, daily flow and nitrate concentration follow a very similar pattern. In this light, a new approach based on the typical daily data profile was proposed where the error between historical data and typical data is used as the input to VBPCA model. In this respect, the typical data profile under dry weather conditions was generated. This was done by stacking all the historical data in one set (from 0 to 1 day), and from it calculating the average flow at each time step and the 95% confidence intervals. Following this procedure, outliers outside the 95% confidence intervals were discarded and the average was calculated again. Finally, the typical dry weather daily data profile was smoothed using a moving average filter with a window size of 13. Results of flow rate and nitrate concentrations are presented in Figure 2. Analysis of the data sets showed that daily data from Monday to Friday were very similar, whereas a slightly different pattern (shift in amplitude) was observed on the weekend profiles. Therefore, two typical data profiles were generated: one for weekdays and one for weekends. Such a pattern is consistent with the water consumed by inhabitants during everyday life. Results about flow rate are presented in Figure 3a. By subtracting typical data from historical data, we can capture the true variation of the

Figure 2. Flow rate data and nitrate concentration data in the fifth tank: (left) flow rate; (right) nitrate concentration. 3276

dx.doi.org/10.1021/ie403788v | Ind. Eng. Chem. Res. 2014, 53, 3272−3282

Industrial & Engineering Chemistry Research

Article

Figure 3. (a) Typical flow rate profiles for weekdays and weekends. (b) Variation explanation by each PC.

Figure 4. (left) PCA based fault detection with unwhitening preprocess. (right) PCA based fault detection with whitening preprocess.

Then, to access the fault detection ability of VBPCA, two fault cases were generated during the last week on the second sensor tabulated as Table 2. In the first type of fault, normal data was corrupted by a 35% abrupt change from day 11.4479 to 12.4896. Results from Figure 7a suggest that the combined index based fault detection method can detect the fault accurately with its corresponding faulty probability decreasing to zero sharply (Figure 7b). The contribution plot for the combined index can discriminate between the faulty sensor and the normal sensors (Figure 8b). It is also obvious in Figure 8a that the faulty data points deviate from the major pattern due to the negative influence of abrupt changes. To further illustrate the efficiency of VBPCA, the second scenario with drift fault exerting on the second sensor was also simulated, showing in Figure 7c that it achieved a good performance with accurate detection when the drift fault was happening. The probability of the test decreased to zero immediately. The contribution plot for fault isolation was shown in Figure 8d. The result clearly shows that the contribution plot of the VBPCA strategy successfully separated the out of control data from normal operation ones. Similarly, a significant deviation from the normal pattern occurred in the PC1−PC2 plot in Figure 8c. It

also isolation power, were chosen for VBPCA. Even though one PC has been more than enough to account for all the variance in the VPCA model, as depicted in Figure 3b, using two PCs is easier and more efficient for showing the fault detection results in this scenario. Due to the good performance of variational Bayesian learning, the monitoring model was built without necessarily dealing with missing data additionally. The combined index obtained from VBPCA was used to test 2 weeks of data under normal operation conditions first. As depicted in Figure 6a, the VBPCA model performed well with almost all points under the 98% control limit from day 9.3646 to 13.9896. From the engineering view, the data points exceeding a control limit less than or equal to 3 times successively are considered as normal in the fast sample rate scenario. Figure 6b reveals that the probability of failures changed along with the combined index based fault detection method. The SPE and T2 results are shown in Figure 6c,d, demonstrating that the VBPCA model performed well with almost all combined index points under the 98% control limit. This resulted from the fact that the correlation effect from other variables and uncertainties was considered seriously, thus laying a good foundation for missing data reconstruction. 3277

dx.doi.org/10.1021/ie403788v | Ind. Eng. Chem. Res. 2014, 53, 3272−3282

Industrial & Engineering Chemistry Research

Article

Table 2. Faults Generated on the Second Sensor fault scenarios

comments on the fault type

case 1 case 2

35% positive bias (abrupt change fault) 35% positive bias of standard deviation (drift fault)

wastewater is treated in the bioreactor first, where the level of substrate is reduced by the action of the microorganisms. Second, the wastewater flows to a secondary settler for the biomass sledges settlement. Thus, clean water locates at the top of the settler and is carried out of the plant. A fraction of the sludge is returned to the input of the bioreactor in order to maintain an appropriate level of biomass, allowing the oxidation of the organic matter. The remaining sludge is purged. A lot of variables related with the organic matter and microorganisms are measured in the plant, giving a lot of information that is difficult to manage. This plant treats a flow of 35 000 m3/d mainly wastewater. A total of 527 days of data daily recording from an online sensor have been recorded; each consists of 38 process variables. However, only 460 days of data were selected and considered in this study. Of these samples, 400 were utilized for training and 60 samples in the remaining were used to test the VBPCA based fault detection method; explanatory variables X are 22 input variables, which are depicted in Table 3. (2). Process Monitoring Validation. Different from the previous fast sample rate case with strong dynamic characteristic, this typical trend was not observed on the historical data sets of this case, and for the sake of simplicity, the typical flow profile was not considered. Before the model training, 8 data points with storm influence were deleted. The remaining 392 points were kept for model training. In this training set, lots of missing data existing as tabulated in Table 3 could result in an ill-suited covariance matrix, and thus, it could be impossible to further build a traditional PCA model. As such, for comparative purposes, a traditional PCA based fault detection method was presented first. Twelve PCs were selected for the PCA model according to the cumulated contribution of more than 95%. Data from day 401 to 460 were employed as the test set. This test contained lots of missing data. Also, in days 421, 424, and 427, a solid overloading problem occurred, which has a high

Figure 5. PCA based fault detection method performance with missing data and mean values replacement.

is evident VBPCA can achieve a good performance for monitoring WWTP containing lots of missing data. 4.2. A Real Wastewater Treatment Plant. (1). Background. The application considered in this case is an activated sludge WWTP, which is designed for the removal of organic matter and nutrients. In this process, the influent rate and the population of microorganisms (both in quality and number of species) vary over time and process knowledge is very limited. Furthermore, an online analyzer tends to be unavailable due to its sensitivity in weather conditions and seasonal changes. Such complexity and fluctuations often result in the degradation of the online analyzer performance or even failure. As shown in Figure 9, the proposed wastewater plant process26 is comprised of four elements: pretreatment, primary settlers, aeration tankers, and secondary settlers. After the primary settler,

Figure 6. Fault detection based on combined index, SPE, and T2 index and corresponding probability for combined index. 3278

dx.doi.org/10.1021/ie403788v | Ind. Eng. Chem. Res. 2014, 53, 3272−3282

Industrial & Engineering Chemistry Research

Article

Figure 7. Fault detection based on combined index in the abrupt change and drift scenarios.

model not capable of detecting the solid overload problem, except for day 421. To alleviate such an issue, the missing data was replaced by mean values. This enabled the PCA model to have a better performance with more obvious SPE and T2 values (Figure 10b,d). However, the solid overload in day 424 was still not detected accurately. This is potentially because the mean value replacement distorted the covariance matrix and uncertainties were not characterized properly. Rather than the traditional PCA model, VBPCA was employed in the following section. Nine PCs were selected for the VBPCA model according to the cumulated contribution of more than 98%, as shown in Figure 11. The reason why VBPCA can use less PCs to represent more information (12 PCs, 95% in the PCA model) could be the fact that, by estimating mean values and noise variance vm in the data set using VBPCA properly, more identical information could be fused accurately. In days 421, 424, and 427, a solid overloading problem occurred and triggered the detection index over 98% control limit obviously as displayed in Figure 12a,c,e, illustrating that the VBPCA model performed well. However, for day 427, the SPE index missed the fault mainly because the T2 index extracted too much information from SPE. To isolate the faulty

Figure 8. Fault isolation in the abrupt change and drift scenarios.

potential to lead to the wastewater reaching the output of the wastewater plant without being proper treated. It is obvious in Figure 10a,c that, due to the missing data, SPE or T2 cannot be calculated properly by the PCA model. This made the PCA

Figure 9. The wastewater plant. 3279

dx.doi.org/10.1021/ie403788v | Ind. Eng. Chem. Res. 2014, 53, 3272−3282

Industrial & Engineering Chemistry Research

Article

Table 3. The Parameters Selected for Monitoring locations influent to WWTP

primary settlers secondary settlers

effluent

primary settlers (calculated efficiency measures)

secondary settlers (calculated efficiency measures)

overall plant (calculated efficiency measures)

variable comments biological oxygen demand chemical oxygen demand suspended solids volatile supended solids sediments biological oxygen demand pH biological oxygen demand chemical oxygen demand suspended solids sedimentable solids pH chemical oxygen demand sedimentable solids biological oxygen demand suspended solids biological oxygen demand chemical oxygen demand biological oxygen demand chemical oxygen demand suspended solids sedimentable solids

variable number and corresponding values

number of missing variables

1: DBO-E

23

2: DQO-E

6

20: SS-E

1

21: SSV-E

11

22: SED-E 3: DBO-P

25 40

4: PH-D 5: DBO-D

0 28

6: DQO-D

9

7: SS-D

2

8: SED-D

25

9: PH-S 10: DQO-S

1 18

11: SED-S

28

12: RD-DBO-P

62

13: RD-SS-P

4

14: RD-DBO-S

40

15: RD-DQO-S

26

16: RD-DBO-G

36

17: RD-DQO-G

25

18: RD-SS-G 19: RD-SED-G

Figure 10. PCA based fault detection method performance with missing data.

8 31

Figure 11. Principal component selection.

part, the contribution plot in Figure 14a reveals sedimentable solids and suspended solids in the overall plant data and suspended solids in the influent increased dramatically, demonstrating that solids overload happened in this process. Following the first fault, the second fault with 50 abrupt changes from day 448 to 458 happened, which was detected by SPE and combined indexes efficiently however missed by T2. It is important to note that the combined index was the only one that can detect both process faults (solids overload) and sensor fault. Further analysis on the contribution plot showed that the SS sensor value was larger than the other part significantly. This is resulting from the fact that the SS sensor suffered from an abrupt change fault with a 50 range abrupt change. Also, as observed in Figure 13, the first fault data points with solid overload shifted away from the normal feature, whereas the second fault is still consistent with the normal feature. This is because PC1 and PC2 together explained only 64.78% contribution, which is not enough to capture the information for the second fault. Figure 12b,d,f reveals that, other than the recognition of fault information, the VBPCA based fault

detection method can even provide probability information, thus facilitating fault detection uncertainty description. Overall, it can be concluded that the proposed fault detection methodology has more potential to perform well without necessarily deleting data or substituting the missing data by means or medians like traditional PCA. Due to incomplete data, the estimation of the covariance matrix of data becomes difficult and therefore the solution by Eigen-decomposition is infeasible. Furthermore, the convergence to a unique solution cannot be guaranteed even for the simplest PCA. Thus, the relevant probabilistic models are definitely more complex compared to the complete data. For example, the posterior of principal component tj̃ is no longer the same for each sample. In both cases, the maximum number of iterations and the terminated training error are 30 and 10−3, respectively. In fact, the VBPCA model is robust to both parameters and converged to the terminated training error only by 20 steps. The initialized values for other parameters, such as 3280

dx.doi.org/10.1021/ie403788v | Ind. Eng. Chem. Res. 2014, 53, 3272−3282

Industrial & Engineering Chemistry Research

Article

Figure 12. Fault detection using T2, SPE, and combined index when the solids overload problem and the SS sensor fault occurred.

could allow stabilization of the modeling data and make sure the resulting data follow a Gausian distribution. For example, the m values of the data with a fast sample rate are subtracted after the dynamic behavior is broken up by removing typical data profiles, whereas the subtraction of m values in the data with a slow sample rate can be performed directly. The VBPCA monitoring approaches considered in this paper can be extended to other chemical processes due to their popularity of the incomplete data set in many processes. Furthermore, the historical data preprocess is suitable for many other processes, which exhibit a similar and significant dynamic pattern in different periods. Figure 13. PC1−PC2 plot.

5. CONCLUSIONS A methodology to monitor wastewater plants was successfully developed on the basis of VBPCA model, which allows a more effective way to build the PCA model without resulting in an illconditioned matrix on one hand and even providing uncertainty information compared with the traditional PCA based fault detection method on the other. Furthermore, for the WWTP with more obvious dynamic behavior, a typical pattern in daily data was summarized and consequently offered more potential to put into practical WWTP operation. The

W, can be learned by the variational Bayesian learning algorithm automatically by stating from random values. It is important to note that historical data preprocess with typical data profile is suitable for the highly instrumented plant with a fast sample rate. In the contrary, the data with a slow sample rate exhibit slight dynamic behavior (1 day for poor instrumented plant), and subtraction of this typical pattern is unnecessary. For both scenarios, subtraction of m from the data 3281

dx.doi.org/10.1021/ie403788v | Ind. Eng. Chem. Res. 2014, 53, 3272−3282

Industrial & Engineering Chemistry Research

Article

Figure 14. Fault isolation based on the contribution plot for fault 1 and fault 2. (9) Fuente, M. J.; Vega, P.; Zarrop, M.; Poch, M. Fault detection in a real wastewater plant using parameter-estimation techniques. Control Eng. Pract. 1996, 4, 1089. (10) Lieftucht, D.; Völker, M.; Sonntag, C.; Kruger, U.; Irwin, G. W.; Engell, S. Improved fault diagnosis in multivariate systems using regression-based reconstruction. Control Eng. Pract. 2009, 17, 478. (11) Yiqi, L.; Daoping, H.; Zhifu, L. A SEVA soft sensor method based on self-calibration model and uncertainty description algorithm. Chemom. Intell. Lab. Syst. 2013, 126, 38. (12) Ge, Z.; Song, Z. Mixture Bayesian regularization method of PPCA for multimode process monitoring. AIChE J. 2010, 56, 2838. (13) Kim, D.; Lee, I.-B. Process monitoring based on probabilistic PCA. Chemom. Intell. Lab. Syst. 2003, 67, 109. (14) Ilin, A.; Raiko, T. Practical Approaches to Principal Component Analysis in the Presence of Missing Values. J. Mach. Learn. Res. 2010, 11, 1957. (15) Qin, S. J.; Yue, H.; Dunia, R. Self-Validating Inferential Sensors with Application to Air Emission Monitoring. Ind. Eng. Chem. Res. 1997, 36, 1675. (16) Arteaga, F.; Ferrer, A. Dealing with missing data in MSPC: several methods, different interpretations, some examples. J. Chemom. 2002, 16, 408. (17) Lieftucht, D.; Kruger, U.; Irwin, G. W.; Treasure, R. J. Fault reconstruction in linear dynamic systems using multivariate statistics. IEE Proc.: Control Theory Appl. 2006, 153, 437. (18) Qin, S. J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220. (19) Pranatyasto, T. N.; Qin, S. J. Sensor validation and process fault diagnosis for FCC units under MPC feedback. Control Eng. Pract. 2001, 9, 877. (20) Corona, F.; Mulas, M.; Haimi, H.; Sundell, L.; Heinonen, M.; Vahala, R. Monitoring nitrate concentrations in the denitrifying postfiltration unit of a municipal wastewater treatment plant. J. Process Control 2013, 23, 158. (21) Corominas, L.; Villez, K.; Aguado, D.; Rieger, L.; Rosén, C.; Vanrolleghem, P. A. Performance evaluation of fault detection methods for wastewater treatment processes. Biotechnol. Bioeng. 2011, 108, 333. (22) Gustafsson, F. Statistical signal processing approaches to fault detection. Annu. Rev. Control 2007, 31, 41. (23) Cheng, C.; Chiu, M.-S. Nonlinear process monitoring using JITL-PCA. Chemom. Intell. Lab. Syst. 2005, 76, 1. (24) Neal, R. M.; Hinton, G. E. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models; MIT Press: Cambridge, MA, 1999; p 355. (25) Wallace, C. S. Classification by minimum-message-length inference. In Advances in Computing and Information  ICCI ’90; Akl, S. G., Fiala, F., Koczkodaj, W. W., Eds.; Springer: Berlin, Heidelberg, 1990; Vol. 468, p 72. (26) Blake, C. L.; Merz, C. J. UCI Repository of machine learning databases; University of California: Irvine, CA, 2007; http://www.ics. uci.edu/∼mlearn/MLRepository.html; Feb 18, 1998.

simulation study results showed that the proposed method can achieve good performance even with lots of missing data in the data set during the BSM1 simulation platform and data set in a real WWTP. This study further demonstrated the potential of performing the fault detection method together with uncertainty information.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Tel.: +86 13610008387. Fax: +86 20 87114189. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS



REFERENCES

This work was supported by the Specialized Research Fund for the Doctoral Program of Higher Education of China (20120172110026) and the Fundamental Research Funds for the Central Universities, SCUT (2014ZB0028, 2014ZZ0043). Besides, many thanks should also be given to the reviewers for this paper.

(1) Aguado, D.; Rosen, C. Multivariate statistical monitoring of continuous wastewater treatment plants. Eng. Appl. Artif. Intell. 2008, 21, 1080. (2) Lee, C.; Choi, S. W.; Lee, I. B. Sensor fault identification based on time-lagged PCA in dynamic processes. Chemom. Intell. Lab. Syst. 2004, 70, 165. (3) Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109. (4) Lee, J.-M.; Yoo, C.; Lee, I.-B. Statistical process monitoring with independent component analysis. J. Process Control 2004, 14, 467. (5) Gang, L.; Alcala, C. F.; Qin, S. J.; Donghua, Z. Generalized Reconstruction-Based Contributions for Output-Relevant Fault Diagnosis With Application to the Tennessee Eastman Process. IEEE Trans. Control Syst. Technol. 2011, 19, 1114. (6) Detroja, K. P.; Gudi, R. D.; Patwardhan, S. C. Plant-wide detection and diagnosis using correspondence analysis. Control Eng. Pract. 2007, 15, 1468. (7) Garcı ́a-Á lvarez, D. Fault detection using Principal Component Analysis (PCA) in a Wastewater Treatment Plant (WWTP), 7th IFAC International Symposium on Fault Detection, Supervision and Safety of Technical Systems, 2011. (8) Padhee, S.; Gupta, N.; Kaur, G. Data Driven Multivariate Technique for Fault Detection of Waste Water Treatment Plant. Int. J. Eng. Adv. Technol. 2012, 1, 45. 3282

dx.doi.org/10.1021/ie403788v | Ind. Eng. Chem. Res. 2014, 53, 3272−3282