Multivariate Online Monitoring of a Full-Scale Biological Anaerobic

AdVanced EnVironmental Biotechnology Research Center, Department of Chemical Engineering/School of. EnVironmental Science and Engineering, Pohang ...
1 downloads 0 Views 440KB Size
Ind. Eng. Chem. Res. 2006, 45, 4335-4344

4335

Multivariate Online Monitoring of a Full-Scale Biological Anaerobic Filter Process Using Kernel-Based Algorithms Dae Sung Lee,*,†,‡ Min Woo Lee,† Seung Han Woo,§ Young-Ju Kim,‡ and Jong Moon Park*,† AdVanced EnVironmental Biotechnology Research Center, Department of Chemical Engineering/School of EnVironmental Science and Engineering, Pohang UniVersity of Science and Technology, San 31, Hyoja-dong, Nam-gu, Pohang, Gyeongbuk 790-784, Republic of Korea, Department of Chemical Engineering, Hanbat National UniVersity, San 16-1, Deokmyeong-dong, Yuseong-gu, Daejeon 305-719, Republic of Korea, and Department of EnVironmental Engineering, Kyungpook National UniVersity, Sankyuk-dong, Buk-gu, Daegu 702-701, Republic of Korea

Multivariate statistical process control such as principal component analysis (PCA) and partial least squares (PLS) has been effectively utilized to analyze large databases accumulated in industrial plants in order to improve process performance and product quality. However, because both PCA and PLS are basically linear methods, nonlinearity in most chemical and biological processes is still a significant problem for their practical applications. Kernel-based algorithms are potentially very efficient for monitoring process disturbances and predicting key quality variables of nonlinear processes by mapping an original input space into a high-dimensional feature space. Nonlinear data structure in the original space is most likely to be linear at the high-dimensional feature space. Kernel-based methods essentially require only linear algebra, making them as simple as linear multivariate projection methods, and can handle a wide range of nonlinearities because of their ability to use different kernel functions. In this work, kernel-based algorithms such as kernel PCA (KPCA) and kernel PLS (KPLS) are applied to a full-scale biological anaerobic filter process treating highstrength organic wastewater in a petrochemical plant. KPCA is applied to detect process disturbances in real-time and KPLS to predict inferentially key process variables in the anaerobic filter process. The proposed kernel-based approaches could effectively capture the nonlinear relationship in the process variables and show far better performance in process monitoring and prediction of quality variables compared to the conventional PCA and PLS methods. Introduction With the advent of improved online sensor technology and automation, industrial processes with well-equipped computerized measurement devices produce large amounts of data. For most industrial processes, first principal models describing a specific plant are too complicated and hard to justify. Therefore, it becomes very important and essential to extract the useful information from the measurement data for improving process performance and product quality. However, this task is a really challenging one because of the complexities of industrial processes originating from the large number of measurement variables, strong interactions among those measurements, missing data, and considerable effects of the control loops. Multivariate statistical process control has been successfully applied for monitoring and modeling of chemical/biological processes.1-5 Principal component analysis (PCA) and partial least squares (PLS) are projection methods for analyzing a historical reference distribution of the measurement trajectories from past successful operations in a reduced latent vector space and comparing the behaviors of new operations to this reference distribution. However, when PCA and PLS are applied to chemical and biological processes, there have been some difficulties in their practical applications. Although they are linear methods in the basic form, most real problems are inherently nonlinear. The minor latent variables from linear PCA * To whom correspondence should be addressed. Tel.: +82-54-2798650.Fax: +82-54-279-8659.E-mail: [email protected],[email protected]. † Pohang University of Science and Technology. ‡ Kyungpook National University. § Hanbat National University.

and PLS models cannot always be discarded because they may not only describe noise or negligible variance/covariance structures in the data. They may encapsulate significant information about the nonlinear nature of the problem.6 A number of methods have been proposed to integrate nonlinear features within both linear PCA and PLS frameworks. A quadratic PLS (QPLS) modeling method was proposed to fit the functional relationship between each pair of latent scores by quadratic regression.7 Nonlinear PCA algorithms describing the relationship between original variables and scores with nonlinear functions such as neural networks were applied to process monitoring.8-10 Neural networks were also incorporated into linear PLS to identify the relationship between the input and the output scores, while retaining the outer mapping framework of the linear PLS algorithm.11,12 In recent years, nonlinear kernel-based algorithms such as kernel PCA (KPCA) and kernel PLS (KPLS) have been proposed.13-15 The basic idea of KPCA and KPLS is first to map each point in an original data space into a feature space via nonlinear mapping and then to develop linear PCA and PLS models in the mapped space. According to Cover’s theorem, nonlinear data structure in the original space is most likely to be linear after high-dimensional nonlinear mapping.16 KPCA and KPLS can efficiently compute latent variables in the feature space by means of integral operators and nonlinear kernel functions. Compared to other nonlinear methods, the main advantage of kernel-based algorithms is that they do not involve nonlinear optimization. They essentially require only linear algebra, making them as simple as the conventional linear PCA and PLS. In addition, because of their ability to use different

10.1021/ie050916k CCC: $33.50 © 2006 American Chemical Society Published on Web 05/04/2006

4336

Ind. Eng. Chem. Res., Vol. 45, No. 12, 2006

Figure 1. Schematic diagram of the anaerobic filter process. Table 1. Process Variables in the Full-Scale Biological Anaerobic Filter Plant variable

symbol

unit

description

predictor (X)

Qin TODin pHin Tin Tequ pHeff QR Qhigh

ton/hr mg/L

Qgas TODt

m3/h mg/L

TODr CT

g/h h

HRT

h

TODeff QCH4

mg/L m3/h

influent mass flow rate influent TOD concentration influent pH influent temperature temperature at equalization tank effluent pH recycle mass flow rate flowrate from the concentrated wastewater tank biogas flow rate TOD flow rate ) (QinTODin + QRTODeff)/(Qin + QR) TOD loading rate ) QinTODin contact time ) volume of the reactor/(Qin + QR) hydraulic retention time ) volume of the reactor/Qin effluent TOD concentration methane gas flow rate

response (Y)

°C °C ton/h m3/h

kernel functions, KPCA and KPLS can handle a wide range of nonlinearities. Anaerobic treatment of high-strength organic wastewater has been recognized as a potential alternative of aerobic treatment because of its several advantages such as lower nutrient requirement, less surplus sludge production, and energy recovery via methane production. However, stable operation of full-scale anaerobic processes is a difficult problem because of the complicated nature of the anaerobic process itself. The slow growth rate of methanogenic bacteria and their high sensitivity to toxic pollutants in wastewater are often regarded as significant obstacles to the anaerobic treatment of industrial wastewater. Online monitoring of anaerobic treatment processes, therefore, becomes very important to enhance the process performance by detecting disturbances leading to abnormal process operation and predicting key process variables in an early stage. The earlier a potential fault can be detected, the less severe its influence will be and the corresponding corrective action will consequently be more constrained.17 In addition, anaerobic processes are highly nonlinear, time-varying, and subject to significant disturbances such as hydraulic changes, composition variations, and equipment defects. Small changes in concentrations or flows can have a large effect on the kinetics of biological reactions, leading to process performance deterioration. In the application investigated here, the nonlinear kernel

methodologies are applied to develop an online monitoring system for a full-scale biological anaerobic filter process. KPCA is developed for monitoring of ongoing process behaviors and KPLS for inferential prediction of key process variables. These two methods are essential for an online process monitoring system of nonlinear biological processes. Like the conventional PCA and PLS approaches, the only information required to develop monitoring charts and to infer the values of product quality is the historical data collected from the past successful operations. The effectiveness of the kernel-based methods are demonstrated by monitoring and modeling capabilities based on their performance characteristics and prediction accuracy compared with other PCA and PLS methods. The primary goal of this study is to give operators and process engineers a reliable and accurate monitoring system that would allow them to arrive at the optimum operational strategy for the anaerobic filter process in real time. Monitoring and Modeling Algorithms Principal Component Analysis. In mathematical terms, PCA relies upon singular value decomposition of the covariance or correlation matrix of the process data. The input data matrix X with m measurements from n process variables can be decomposed into a series of principal components consisting of score vectors t and loadings v, plus residuals E.

Figure 2. Score plot of the training data set. The circles represent an operation data set in time series. The dotted ellipse represents the 99% confidence limit.

Ind. Eng. Chem. Res., Vol. 45, No. 12, 2006 4337

Figure 3. Linear PCA monitoring charts with 99% confidence limits.

Figure 4. Graphical representation of nonlinearity measured for accuracy bounds of four disjunct regions. a

X)

tivi + E ∑ i)1

(1)

The score vectors contain information on how the samples relate to each other. The loading vectors define the reduced dimension space and contain information on how the variables relate to each other. A few principal components can express most of the variability in the data when there is a high degree of correlation among the data [a , min(m, n)]. Here a is chosen such that most of the systematic variability of the process data is described by these principal components and that the residual matrix E is as small as possible in a least-squares sense. When new process data are obtained, they are projected onto the process subspace and noise space, respectively. Monitoring charts are generally based on the Q and D statistics in which control limits are used to determine whether the process is in control or not. The assumption behind these approximate confidence limits is that the underlying process exhibits a multivariate normal distribution with a population mean zero. This is to be expected because any linear combination of random variables, according to the Central Limit Theorem, should tend toward a normal distribution. The Q statistic is a measure of the lack of fit with the established model. For new observation i, Qi is calculated as follows:

Qi ) eieiT ∼ gχ2(d)

(2)

Figure 5. Normal probability plot of the first scores from PCA models.

where ei is an element of E. Qi indicates the distance between

the actual values of the new observation and the projected values

4338

Ind. Eng. Chem. Res., Vol. 45, No. 12, 2006

Figure 6. KPCA monitoring charts with 99% confidence limits.

onto the reduced space. The distribution of the calculated Qistatistic values can be approximated by a χ2 distribution, gχ2(d), where g is a constant and d is the effective degrees of freedom of the χ2 distribution. The D statistic, or Hotelling T 2 statistic, measures the degree to which data fit the developed model:

Di ) tiTΛ-1ti ∼

a(m - 1) F (m - a) a,m-a

(3)

where Λ is the diagonal matrix of the inverse of the eigenvalues associated with the retained principal components. The D statistic gives a measure of the Mahalanobis distance in the reduced space between the position of the new observation and the origin that designates the point with the average process behavior. The distribution of the D statistic can be approximated by an F distribution, Fa,m-a, and confidence limits for the D statistic are calculated from this F distribution. Kernel Principal Component Analysis. KPCA is a nonlinear PCA method that maps the original input data into a highdimensional feature space F where a linear PCA model is created.13 Consider a nonlinear transformation of x into a feature space F; Φ: x ∈ Rn f Φ(x) ∈ F. Then the sample covariance in the feature space is given by

C)

1

m

∑Φ(xi) Φ(xi)T m i)1

(4)

where xi is an element of X and Φ(x) is the centered nonlinear mapping of the input vector x. To diagonalize the covariance matrix, we have to find eigenvalues λ g 0 and nonzero eigenvectors vf ∈ F satisfying the eigenvalue problem in the feature space: m

λvf ) Cvf )

[Φ(xi)‚vf]Φ(xi) ∑ i)1

(5)

Because vf is spanned by Φ(x1), Φ(x2), ..., Φ(xm), eq 5, after multiplication with Φ(xk) from the left, can be written as

m

∑ i)1

λ

Ri[Φ(xk)‚Φ(xi)] )

1

m



m i)1

m

Φ(xk)][Φ(xj)‚Φ(xi)] ∑ j)1

Ri[Φ(xk)‚

for k ) 1, 2, ..., m (6)

where R is the expansion coefficient. When a kernel matrix K (m × m) is defined by K(xi,xj) ) Φ(xi)TΦ(xj), then the eigenvalue problem of eq 6 can be represented as

mλR ) KR

(7)

for nonzero eigenvalues where R ) [R1, R2, ..., Rm]T. The solutions should be further normalized by imposing λk(Rk‚Rk) ) 1 in F. The loading vectors vf in the feature space are then used to extract the principal components of a test vector x in F: m

tf,k ) [vf‚Φ(xj)] )

Rki [Φ(xi)‚Φ(x)] ∑ i)1

(8)

for k ) 1, 2, ..., h, where h is the number of principal components in KPCA. In KPCA, monitoring charts similar to linear PCA, based on the Qf and Df statistics in the feature space, can be used for process monitoring. For new observation i, Qf,i is calculated as follows:15

ˆ h(xi)||2 ∼ gfχ2(df) Qf,i ) ||Φ(xi) - Φ

(9)

where Φ ˆ (x) is a reconstructed feature vector. The distribution of the calculated Qf,i values can also be approximated by a χ2 distribution, Qf,i, where gf is a constant and df is the effective degrees of freedom of the χ2 distribution. The Df statistic is defined as the sum of the normalized squared scores in the feature space:

Df,i ) tf,iTΛf-1tf,i ∼

h(m - 1) F m - h h,m-h

(10)

Ind. Eng. Chem. Res., Vol. 45, No. 12, 2006 4339

Figure 7. Prediction results of the linear PLS (gray dots, measured values; solid lines, predicted values).

where Λf is the diagonal matrix of the inverse of the eigenvalues associated with the retained principal components in the feature space. Partial Least Squares. PLS is a regression technique for modeling a linear relationship between the output data matrix (responses) Y and the input data matrix (predictors) X. PLS reduces the dimension of the predictor variables X by extracting factors or latent variables that are correlated with responses Y while capturing a large amount of the variations in X. This means that PLS maximizes the covariance between matrices X and Y. In PLS, the scaled matrices X and Y are decomposed into score vectors (u and w), loading vectors (p and q), and residual error matrices (F and G): l

X ) UPT + F )

uipiT + F ∑ i)1 l

Y ) WQT + G )

wiqiT + G ∑ i)1

(11)

where l is the number of latent variables. In an inner relationship, the score vector u is linearly regressed against the score vector w:

wi ) biui + Ei

(12)

where b is a regression coefficient that is determined by minimizing the residual E. It is crucial to determine the optimal number of latent variables, and cross-validation is a practical and reliable way to test the predictive significance of each PLS component. There are several algorithms to calculate the PLS model parameters. In this work, the NIPALS (nonlinear iterative PLS) algorithm was used with the exchange of scores.18 Nonlinear PLS. To capture nonlinear structures between the predictor block and the responses, the PLS model can be extended to nonlinear PLS models. Major approaches have been to incorporate nonlinear functions within the linear PLS framework. Especially, quadratic functions and neural networks have been used to identify the nonlinear inner mapping between

the input and output latent variables. The QPLS method makes a polynomial fit for the PLS inner relationship.7 QPLS works just like PLS and uses the NIPALS algorithm to calculate the latent variables. Once a pair of latent variables is calculated, polynomial functions are used to model the functional relationship between the pair of latent variables. In this study, the degree of the polynomial used was two and the number of latent variables in the model was set to six on the basis of the crossvalidation analysis. Neural network PLS (NNPLS) is an integration of neural networks with PLS to model nonlinear processes with input collinearity.11 The input and output variables are projected onto the latent space to remove collinearity, and then each latent variable pair is mapped with a single-input single-output neural network. The neural network is trained to capture the nonlinearity in the projected latent space. In this application, a feedback neural network (FBNN) with sigmoidal functions was used to identify the nonlinear inner regression models for each of the six latent variables. Kernel Partial Least Squares. The KPLS method is also based on mapping of the original input data into a highdimensional feature space F. Good generalization properties of the KPLS model are then achieved by appropriate estimation of the regression coefficients in F and by the selection of an appropriate kernel function. In this application, we did not identify what kind of nonlinearities exists among variables. However, we assumed that the mapped data in a highdimensional feature space could be linear in relationships and a linear PLS model in the feature space could work very well even though the original data were from a highly nonlinear process. By nonlinear mapping Φ: x ∈ Rn f Φ(x) ∈ F, a KPLS algorithm can be derived from a sequence of NIPALS steps and has the following formulation.14 1. Set w equal to any column of Y. 2. Calculate scores u ) ΦΦTw and then normalize u to |u| ) 1. 3. Regress columns of Y on u: c ) YTu, where c is a weight vector.

4340

Ind. Eng. Chem. Res., Vol. 45, No. 12, 2006

Figure 8. Three-dimensional surface plot of the effect of the number of principal components and the width parameter on the RMSE values in KPLS.

4. Calculate a new score vector for Y: w ) Yc and then normalize w to |w| ) 1. 5. Repeat steps 2-4 until convergence of w. 6. Deflate ΦΦT and Y matrices:

ΦΦT ) (Φ - uuTΦ)(Φ - uuTΦ)T

(13)

Y ) Y - uuTY

(14)

7. Go to step 1 to calculate the next latent variable. Without explicitly mapping into the high-dimensional feature space, a kernel function can be used to compute the dot products as follows:

K(xi,xj) ) Φ(xi)TΦ(xj)

(15)

The deflation of the ΦΦT ) K matrix after extraction of the u components is then given by

K) (I - uuT)K(I - uuT)

(16)

where I is an m-dimensional identity matrix. Taking into account normalized scores u, the prediction of the KPLS model on training data is defined as

Y ) KW(UTKW)-1UTY ) UUTY

(17)

For predictions on new observation data, the regression can be written as

Yt ) KtW(UTKW)-1UTY

(18)

where Kt is the test matrix whose elements are Kij ) K(xi,xj), where xi and xj are the testing and training data points, respectively. Process Description A biological anaerobic filter process in a petrochemical company in Korea was designed for removal of the wastewater discharged from a purified terephthalic acid manufacturing plant (Figure 1). The anaerobic filter process packed with honeycomb packing materials is operated in a downflow mode. The reactor volume is 10 300 m3 with a gas headspace volume. For stable operation of the anaerobic filter process, the organic loading

rate and the pH of the feed stream are manually controlled by changing the addition amounts of a highly concentrated organic wastewater and sodium hydroxide, respectively. The effluent is recycled to the reactor for the purpose of mixing and dilution of the influent wastewater. The temperature of the influent is controlled at 38 °C by a cooling tower. The influent wastewater is around 8900 mg/L as the total oxygen demand (TOD) and mainly consists of acetic acid, terephthalic acid, benzoic acid, and p-toluic acid. The theoretical hydraulic retention time of the anaerobic filter process is approximately 2.3 days, but the actual values of mean residence time measured by trace tests ranged from 0.8 to 1.4 days. To overcome the media plugging problems that were determined as the cause of the difference between the theoretical hydraulic retention time and the mean residence time, backflushing has been carried out by using nitrogen gas at intervals. Online measurement variables are automatically stored at a database by a data acquisition system (Honeywell, Morristown, NJ). All variables used in model identification are shown in Table 1. The predictor matrix X consists of 10 online measurement variables and 4 augmented variables (TODQ, TODT, CT, and HRT) because they are closely related to the dynamics of the anaerobic filter process. The expansion of the input dimension could improve the performance of all of the developed models. The response matrix Y consists of two variables: TOD in the effluent and the flow rate of methane gas from the process. These two variables are selected because they are key indicators for the performance of the anaerobic filter process. Results and Discussion Historical Process Data. The process data used in this study were routinely measured with operation data sets for 6 months. Because the measurement data were collected with varying sampling intervals, the hourly mean values for each variable were used in the developed models. It corresponds to 4000 operation data sets. Statistical outliers that might appear because of measurement errors were removed from the data set on the basis of PCA.19 The data set was divided into two parts. The first 2000 observations collected when the anaerobic filter process was at normal operation conditions were used as a

Ind. Eng. Chem. Res., Vol. 45, No. 12, 2006 4341

Figure 9. Prediction results of KPLS (gray dot, measured values; solid lines, predicted values).

training data set. The remaining 2000 consecutive observations were used as a testing data set in order to verify the proposed methods. Process Monitoring System. A real-time monitoring system for the anaerobic filter process was first developed using a linear PCA model to detect abnormal process behaviors. Four principal components, which explained approximately 83.0% of the total variability in the training data set, were determined by crossvalidation.20 Inclusion of additional loads could not give an improvement in the monitoring charts on the basis of reducing false alarms. Figure 2 shows a score plot of the training data set in the space of the resulting first two principal components. The ellipse represents the 99% confidence limit, and each point indicates all of the process variables at a certain point in time. Under normal operating conditions, most of the points fall within the 99% confidence limit. Figure 3 represents Q- and D-statistic monitoring charts of the linear PCA model. The Q-statistic values of the testing data set were mostly inside the 99% confidence limit and exceeded the limit only when process disturbances occurred. On the other hand, D-statistic values violated its confidence limit from the beginning of the testing data set until sample 2900. This indicates that the linear PCA model could not adequately describe the anaerobic filter process that is inherently nonlinear and exposed to various disturbances such as influent composition variations, temperature changes, and equipment defects. A nonlinearity measure for the PCA model was calculated to assess the nonlinearity in data.21 The operation data sets were divided into four disjunct regions of 1000 samples each. The results of applying the new nonlinearity measure clearly showed that the anaerobic filter process is nonlinear, as shown in Figure 4. To develop a more reliable monitoring system for the anaerobic filter process, a KPCA model was developed based on the same data sets. There exist a number of kernel functions, and the choice of the kernel function implicitly determines the mapping Φ and the feature space F.13 However, a guideline of how to select the optimal kernel function for a given system is not available. In this application, the radial basis kernel function K(x,y) ) exp(-|x - y|2/σ) with width parameter σ was determined as the best one from the criterion that how fast and

correctly the model could detect known disturbances in the testing data set. The width parameter σ was tuned on the given data set, and σ ) 5 was determined as the optimal value. The cross-validation method may also be used to determine the number of principal components in the feature space, but it needs a high computational load. Therefore, the number of significant principal components was calculated using the cumulative percent variance (CPV) method.22 The CPV is a measure of the percent variance captured by the first h principal components: h

CPV(h) )

λi ∑ i)1

× 100%

(19)

trace(K)

The optimal number of principal components was chosen as 11 when CPV reached a predetermined limit (80%). In general, the selected number of principal components for KPCA is larger than that for linear PCA because KPCA extracts them from the high-dimensional feature space while linear PCA extracts them from the relatively small input space. The normal probability plots of the first score vectors calculated by both linear PCA and KPCA are shown in Figure 5. Figure 5a shows that the first scores in the linear PCA do not follow Gaussian distribution, but the first scores (Figure 5b) in the feature space from KPCA are distributed more likely as a Gaussian type. Figure 6 presents Q- and D-statistic monitoring charts of the KPCA model. The D-statistic values were mostly well inside the confidence limit. This implies that, because the KPCA model could capture the nonlinearity of the anaerobic filter process, the false alarms between the beginning of the testing data set and sample 2900 were remarkably reduced compared with those of the linear PCA model. The fault detection capability of both the linear PCA and KPCA was determined by calculating the false alarm (type I error) rate during normal operating conditions and the missed detection (type II error) rate for the identified 25 faults of the testing data set. The linear PCA and KPCA gave the false alarm rates of 46.5% and 0.5%, respectively. Because KPCA could

4342

Ind. Eng. Chem. Res., Vol. 45, No. 12, 2006

Figure 10. Parity plots from the linear PLS and KPLS models.

detect nonlinear process behaviors in the system, it dramatically reduced false alarms compared to the linear PCA. The missed detection rates of the linear PCA model and the KPCA model at the 99% confidence limit were 8% and 4%, respectively. On the basis of both the false alarm rate and the missed detection rate, the KPCA model gave a far better monitoring performance than the linear PCA model. Inferential Prediction of Key Process Variables. PLS modeling approaches, which have a distinct ability to model chemical and biological processes only based on historical operational data, were employed to develop an inferential prediction model in minimal time and with minimal cost. Four different PLS modeling strategies were applied to the anaerobic filter process. The capabilities of the PLS modeling approaches were assessed through their prediction accuracy and performance characteristics. The performance of each model was evaluated in terms of the root-mean-square-error (RMSE) criterion. The RMSE performance index was defined as

RMSE )

∑(yˆ - y)2

x

m

(20)

where y is the measured values, yˆ is the corresponding predicted values, and m is the number of observations.

Table 2. Comparison of the Models’ Performances model

RMSEtraining

RMSEtesting

BIC

linear PLS QPLS NNPLS KPLS

105.1 50.5 41.9 20.4

246.1 140.2 105.8 44.8

-18815 -17024 -16542 -15288

First, a linear PLS model was built between the predictor X and the response Y. On the basis of the cross-validation results, three latent variables were included in the model. This explained 53.76% of the variance of matrix X and 80.20% of matrix Y. The RMSE values for the training and testing data sets were 105.1 and 246.1, respectively (Table 2). The simulation results of the linear PLS model are shown in Figure 7. The model predicted the dynamics of the anaerobic filter process with a relatively good accuracy for the training data set, but there was a significant mismatch between the model prediction and the measured values in the later part of TODeff profiles in the testing data set. This exemplified the weakness of the linear multivariate regression model such as linear PLS. When both QPLS and NNPLS methods were used for modeling of the inner relationships, the prediction capabilities were largely improved (Table 2). The increased prediction performance of the nonlinear PLS models could be explained by the fact that the anaerobic filter process is an inherently nonlinear system and the nonlinear

Ind. Eng. Chem. Res., Vol. 45, No. 12, 2006 4343

methods could effectively capture the nonlinear relationship in process variables. A KPLS model was then developed with a radial basis kernel function. In this application, the radial basis function was determined as the best one from the RMSE value of the testing data set. Both the width parameter σ and the number of principal components in the feature space should be determined to optimize the KPLS model. For this purpose, a three-dimensional response surface, based on the RMSE value, was generated to study the interaction between the two parameters as shown in Figure 8. The surface plot shows a clear minimum point when the number of principal components and the width of the kernel function were 12 and 5, respectively. These values were used as the optimal parameters in the KPLS model. The developed KPLS model explained 95.60% of the variance of matrix Y. The captured variance of matrix X could not be calculated because it is impossible to find an inverse mapping function from the feature space to the original space. The RMSE values for the training and testing data sets were 20.4 and 44.8, respectively. As shown in Table 2, KPLS gave the best prediction performance compared to other linear and nonlinear PLS methods. This clearly indicates that KPLS benefitting from the linear data structure in the feature space could capture the nonlinearities in the original data space better than other nonlinear PLS methods. Figure 9 shows the simulation results of the KPLS model. This figure, when it is compared with Figure 7, also clearly shows that KPLS gives much better prediction results than the linear PLS model. The linear PLS model was not able to predict TOD concentrations in the effluent very well, and it showed bias in the prediction of the testing data set. However, the KPLS model gave very little bias for the same testing data set. The measured data versus the predicted values from both the PLS and the KPLS models are shown in Figure 10. Those plots visualize the performance of the models in an obvious way. The r2 value represents the fraction of the variance explained in the operation data by the model. However, the goodness of fit for the different PLS modeling approaches with different numbers of degrees of freedom cannot be assessed only by the RMSE values. More complex models with larger numbers of parameters will improve the model fit to the data because it reduces the RMSE of the residuals between the model predictions and the corresponding measured values. It is, therefore, necessary to have quantitative measures of model adequacy in order to decide between competing model structures. For large data sets, the appropriate criterion to use is the Bayesian information criterion (BIC):23

BIC ) L(ϑˆ |yij) -

np m ln 2 2π

(21)

where L is the likelihood function of the model, ϑˆ denotes the maximum likelihood estimates of the vector of unknown parameters, np is the number of parameters, and m is the number of measurements. For the likelihood function L, after taking the natural logarithm and maximizing with respect to the unknown parameters, the maximized logarithmic likelihood can be obtained:24

L(ϑˆ |yij) ) -

m ln{ 2

∑[yij - f(ϑˆ |xij)]2}

(22)

where f(ϑˆ |xij) is the model output at the ith value of the input xj. From eqs 21 and 22, a model with a high BIC is preferable to one with a lower value.25 As shown in Table 2, the KPLS

model’s BIC was higher than that of any other PLS model, implying that the KPLS model is a much better model in this direct quantitative comparison. These results consistently showed that the KPLS model outperformed other linear and nonlinear PLS models in the aspect of both model prediction and complexity. Conclusion A nonlinear process monitoring method based on kernel algorithms has been proposed. Kernel-based projection methods such as KPCA and KPLS map a nonlinear input space into a high-dimensional feature space where the data structure is likely to be linear. Principal components in the feature space can be calculated by means of integral operators and nonlinear kernel functions. They require only linear algebra to develop a process monitoring system compared to other nonlinear methods that involve nonlinear optimization. In this work, the KPCA and KPLS methods were applied to a full-scale biological anaerobic filter process. First, a KPCA model was developed to monitor process abnormal behaviors in real time. KPCA could capture obvious nonlinear behaviors in the normal operational region and significantly reduced the false alarms compared with the conventional PCA. Then four different PLS models were developed to predict key process variables. The KPLS model gave a much better prediction performance than other linear and nonlinear PLS methods. The increased prediction performance of KPLS could be explained by the fact that the biological anaerobic filter process is an inherently nonlinear process, and the KPLS model could capture the nonlinearities in the original data space benefiting from the linear data structure in the feature space. The successful application of the proposed methods to the anaerobic filter process has demonstrated the feasibility and effectiveness of the kernel-based monitoring and modeling algorithms. The methodology is fairly general and is applicable to most chemical and biological wastewater treatment plants. Acknowledgment The work was financially supported by the ERC program of MOST/KOSEF (Grant R11-2003-006-01001-1) through the Advanced Environmental Biotechnology Research Center at POSTECH. This work was also supported by the ET eduinnovation Project of Ministry of Environment in 2006. Literature Cited (1) Wise, B. M.; Gallagher, N. B. The Process Chemometrics Approach to Process Monitoring and Fault Detection. J. Process Control 1996, 6, 329. (2) Kourti, T. Process Analysis and Abnormal Situation Detection: from Theory to Practice. IEEE Control Syst. Mag. 2002, Oct, 10. (3) Lee, D. S.; Jeon, C. O.; Park, J. M.; Chang, K. S. Hybrid Neural Network Modeling of a Full-scale Industrial Wastewater Treatment Process. Biotechnol. Bioeng. 2002, 78, 670. (4) Lee, D. S.; Vanrolleghem, P. A. Monitoring of a Sequencing Batch Reactor Using Adaptive Multiblock Principal Component Analysis. Biotechnol. Bioeng. 2003, 82, 489. (5) Lee, D. S.; Park, J. M.; Vanrolleghem, P. A. Adaptive Multiscale Principal Component Analysis for On-line Monitoring of a Sequencing Batch Reactor. J. Biotechnol. 2005, 116, 195. (6) Baffi, G.; Martin, E. B.; Morris, A. J. Non-linear Dynamic Projection to Latent Structures Modeling. Chemom. Intell. Lab. Syst. 2000, 52, 5. (7) Wold, S.; Kettaneh-Wold, N.; Skagerberg, B. Nonlinear PLS Modeling. Chemom. Intell. Lab. Syst. 1989, 7, 53. (8) Kramer, M. A. Nonlinear Principal Component Analysis Using Autoassociative Neural Networks. AIChE J. 1991, 37, 233. (9) Dong, D.; McAvoy, T. J. Nonlinear Principal Component Analysis Based on Principal Curves and Neural Networks. Comput. Chem. Eng. 1996, 20, 65.

4344

Ind. Eng. Chem. Res., Vol. 45, No. 12, 2006

(10) Jia, F.; Martin, E. B.; Morris, A. J. Non-linear Principal Component Analysis for Process Fault Detection. Comput. Chem. Eng. 1998, 22, S851. (11) Qin, S. J.; McAvoy, T. J. Non-linear PLS Modelling Using Neural Networks. Comput. Chem. Eng. 1992, 23, 395. (12) Lee, D. S.; Vanrolleghem, P. A.; Park, J. M. Parallel Hybrid Modeling Methods for a Full-scale Cokes Wastewater Treatment Plant. J. Biotechnol. 2005, 115, 317. (13) Scho¨lkopf, B.; Smola, A. J.; Mu¨ller, K. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Comput. 1998, 10, 1299. (14) Rosipal, R.; Trejo, L. J. Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space. J. Mach. Learning Res. 2001, 2, 97. (15) Lee, J. M.; Yoo, C.; Choi, S. W.; Vanrolleghem, P. A.; Lee, I. B. Nonlinear Process Monitoring Using Kernel Principal Component Analysis. Chem. Eng. Sci. 2004, 59, 223. (16) Haykin, S. Neural Network; Prentice Hall: Englewood Cliffs, NJ, 1999. (17) Lennox, B.; Montague, G. A.; Hiden, H. G.; Kornfeld, G.; Goulding, P. R. Process Monitoring of an Industrial Fed-batch Fermentation. Biotechnol. Bioeng. 2001, 74, 125. (18) Geladi, P.; Kowalski, B. R. Partial Least Squares Regression: A Tutorial. Anal. Chim. Acta 1986, 185, 1. (19) Wold, S. Cross-validatory Estimation of Components in Factor and Principal Components Models. Technometrics 1978, 20, 397.

(20) Krzanowski, W. J. Cross-validation in Principal Component Analysis. Biometrics 1987, 43, 575. (21) Kruger, U.; Antory, D.; Hahn J.; Irwin, G. W.; McCullough, G. Introduction of a nonlinearity measure for principal component models. Comput. Chem. Eng. 2005, 29, 2355. (22) Li, W.; Yue, H.; Valle-Cervants, S.; Qin, S. J. Recursive PCA for Adaptive Process Monitoring. J. Process Control 2000, 10, 471. (23) Leonard, T.; Hsu, J. S. J. Bayesian Methods; Cambridge University Press: New York, 1999. (24) Main, I.; Leonard, T.; Papasouliotis, O.; Hatton, C. G.; Meredith, P. G. One Slope or Two? Detecting Statistically Significant Breaks of Slope in Geophysical Data with Application to Fracture Scaling. Geophys. Res. Lett. 1999, 26, 2801. (25) Seher, T.; Main, I. G. A Statistical Evaluation of a ‘Stress-forecast’ Earthquake. Geophys. J. Int. 2004, 157, 187.

ReceiVed for reView August 8, 2005 ReVised manuscript receiVed March 23, 2006 Accepted March 29, 2006 IE050916K