Application of a Moving-Window-Adaptive Neural ... - ACS Publications

Department of Chemical Engineering, School of Environmental Science and Engineering, POSTECH, San 31, Hyoja-dong, Pohang, Kyungbuk 790-784, Korea...
0 downloads 0 Views 1MB Size
Ind. Eng. Chem. Res. 2005, 44, 3973-3982

3973

Application of a Moving-Window-Adaptive Neural Network to the Modeling of a Full-Scale Anaerobic Filter Process Min W. Lee, Jea Y. Joung, Dae S. Lee, and Jong M. Park* Department of Chemical Engineering, School of Environmental Science and Engineering, POSTECH, San 31, Hyoja-dong, Pohang, Kyungbuk 790-784, Korea

Seung H. Woo Department of Chemical Engineering, Hanbat National University, San 16-1, Dukmyung-dong, Yuseong-gu, Daejeon 305-719, Korea

To explore the complex dynamics of a full-scale anaerobic filter process treating the wastewater from a purified terephthalic acid manufacturing industry, a new modeling approach based on a moving-window-adaptive neural network is proposed. The essential feature of this modeling approach is that the neural network model is automatically updated whenever a new data block is available so that it can effectively capture the slowly changing process dynamics. To elucidate the advances of the proposed method, four different modeling approaches combined with the concepts of autoregressive with exogenous (ARX) input and a finite impulse response model were evaluated and compared. During each model identification process, a modified crossvalidation technique was used to avoid the overfitting problem of a neural network. Among the tested models, a moving-window-adaptive ARX neural network model showed the best prediction ability with the smallest validation error. To investigate the feasibility of this model, various dynamic simulations were performed. Although some limitations such as ambiguity in sensitivity analysis and instability in long-term simulation were identified, it is considered that the movingwindow-adaptive ARX neural network model could provide a useful guideline to explore the complicated dynamics of the anaerobic filter process. Introduction Anaerobic treatment of wastewater has long been recognized as a potential alternative of aerobic treatment because of several advantages such as lower nutrient requirement, less surplus sludge production, and energy recovery via methane production. However, the slow growth rate of methanogenic bacteria and their high sensitivity to toxic pollutants were often pointed out as significant obstacles to the treatment of industrial wastewater with a complex feed composition.1 These obstacles have been gradually overcome along with the development of high-rate anaerobic processes such as an anaerobic fluidized-bed reactor, anaerobic filter (AF), and upflow anaerobic sludge blanket reactor, which could achieve a high reaction rate per unit reactor volume by retaining a high concentration of biomass in the reactor independently of incoming wastewater.2 As a consequence, however, stable operation of a full-scale process has become a challenging problem because of the complicated nature of the high-rate anaerobic process itself. Mathematical modeling has made a remarkable contribution to the stable operation of the anaerobic wastewater treatment process (AWTP). An adequate dynamic model could enhance the understanding of relevant biological phenomena and provide the basis for an operational optimization and control strategy. There are numerous mathematical models to describe AWTP * To whom correspondence should be addressed. Tel.: +82-54-279-2275. Fax: +82-54-279-8299. E-mail: jmpark@ postech.ac.kr.

in the literature3-7 that can be classified as kinetic models. Kinetic models are greatly valuable because they are based on real microbial reaction kinetics including mass-transfer phenomena in biofilm or sludge granule and so can provide rational insights into the physical and biological principles of various AWTPs. However, kinetic models have some limitations when they are applied to the modeling of a full-scale AWTP. First, most kinetic models usually have too many kinetic or stoichiometric parameters to be determined, which requires laborious parameter calibration work. Second, some critical state variables used in the model (for example, the biofilm thickness in an AF) are difficult or even impossible to quantify in a full-scale AWTP and the model validation is actually prohibitive. Third, the model parameters are highly dependent on specific environmental conditions, rendering their transfer to a different wastewater or slightly modified reactor configuration futile.8 More recently, the neural network model has received considerable attention in the modeling of AWTPs. The neural network is an artificial intelligence tool that could discriminate the arbitrary nonlinear functional relationship between input and output data sets. One of the most valuable aspects of the neural network model may be its wide applicability. Because the model relationship is deduced only from the historical data of the model process, any type of AWTP, especially a fullscale process, can be effectively modeled if a large volume of operation databases are available. On the other hand, the neural network model has been criticized for a lack of description for physical and biological principles involved in the process, which can result in

10.1021/ie048944a CCC: $30.25 © 2005 American Chemical Society Published on Web 05/04/2005

3974

Ind. Eng. Chem. Res., Vol. 44, No. 11, 2005

Figure 1. Schematic diagram of the AF process to treat PTA wastewater.

poor extrapolating capability.9 In addition, the overfitting problem of the neural network10 often leads to a faulty result, which can be a serious shortcoming in applying the model to process optimization and control. Although there are several reports that the neural network model was successfully applied to the modeling of AWTPs,8,11,12 it seems that these adverse aspects of the neural network model were rarely considered during the model development. In this study, a full-scale AF process to treat the wastewater from a purified terephthalic acid (PTA) manufacturing industry was selected as a model process, and various modeling approaches based on a feedforward back-propagation neural network were investigated to explore the process dynamics. Because only limited state variables were contained in the operation database, which might be insufficient to describe the full dynamics of the AF process, the model performance was enhanced by introducing the concepts of autoregressive with exogenous (ARX) input and a finite impulse response (FIR) model.13,14 In this model development, special effort was given on the optimum structure of the model to avoid the overfitting problem. Although several methods were available for this purpose,10,15 a modified cross-validation technique was adopted to reduce the computational load. Similar to most dynamic processes, the process dynamics of the AF is slowly changed because of its inherent nature such as the proceeding of media plugging and temporary inhibition by toxic compounds. It has been reported that an adaptive modeling concept could effectively capture this slowly changing process dynamics.16,17 In this work, therefore, a new protocol to adaptively update the neural network model based on the concept of moving window was finally presented. The ultimate goal of this modeling study is to provide a guideline that would allow operators and process engineers to deduce an optimum operational strategy for different dynamic situations using an accurate neural network model. The prediction performances of different modeling strategies were compared, and the dynamics of the AF process was intensively investigated by using the model showing the best prediction performance. Description of the Model Process The model process is a full-scale downflow AF process to treat the wastewater discharged from a PTA manu-

Table 1. Composition of PTA Manufacturing Wastewater component

average value

pH total oxygen demand chemical oxygen demand suspended solid acetic acid terephthalic acid benzoic acid p-toluic acid trimellitic acid o-phthalic acid 4-carboxybenzaldehyde

5.5 7851 g of TOD/m3 8425 g of CODcr/m3 47 g/m3 2125 g/m3 802 g/m3 662 g/m3 551 g/m3 169 g/m3 90 g/m3 13 g/m3

facturing plant (Samsung Petrochemical Co. Ltd., Ulsan, Korea). As shown in Figure 1, the AF process was designed for the preliminary conversion of organic pollutants in wastewater into methane gas, reducing the organic loading to the following activated sludge process operated to meet final effluent criteria. The feed temperature was controlled at 38 °C by a cooling tower. The effluent is recycled to the front of the reactor for the purpose of mixing and dilution of feed wastewater. An operation database consisting of online-measured variables was available, which was automatically accumulated by a data acquisition system (Honeywell, Morristown, NJ). Besides these online-measured variables, some important components of wastewater were measured occasionally by offline analysis during this study. The overall feed composition (after the addition of extremely high strength wastewater and sodium hydroxide) is summarized in Table 1. Figure 2 shows the time profiles of some remarkable variables that are closely related to the performance of the AF process. The typical volumetric organic loading rate ranged from 3 to 3.8 kg of total oxygen demand (TOD)/m3/day, and its average value was about 3.36 kg of TOD/m3/day. There was a remarkable decrease of the volumetric organic loading rate at the latter part of the operational period, which severely influenced the performance of the AF process. This fluctuation was due to the scheduled process shutdown for maintenance of the PTA manufacturing plant. The effluent pH was maintained around 7.0, which is the target value in controlling the influent pH by the addition of sodium hydroxide. When the effluent pH deviated from the target value, the influent pH was changed to compensate for this deviation. The TOD removal efficiency was normally varied from 70 to 80% during this study, which had been gradually decreased in the long-term perspective. This long-term decrease of the removal efficiency

Ind. Eng. Chem. Res., Vol. 44, No. 11, 2005 3975

Figure 2. Overall performance of the AF process to treat PTA wastewater. Table 2. List of Variables Used in the Model and Their Classification variable

description [unit]

classification* (domain)

QIN TEQU QR TODIN pHIN pHOUT QGAS QCO2 TODOUT QCH4

feed flow rate [m3/h] temperature of the equalization tank [°C] recycle flow rate [m3/h] total oxygen demand of feed flow [g/m3] pH of feed flow [dimensionless] pH of the effluent [dimensionless] total gas production rate [m3/h] carbon dioxide production rate [m3/h] total oxygen demand of effluent [g/m3] methane production rate [m3/h]

disturbance variable (X) disturbance variable (X) disturbance variable (X) control variable (X) control variable (X) resultant state variable (Y) resultant state variable (Y) resultant state variable (Y) resultant state variable (Y) resultant state variable (Y)

is believed to be due to the proceeding of media plugging because plugging control such as backflushing had never been conducted before this study, and the results of several tracer tests revealed that the actual to theoretical hydraulic retention time (HRT) ratio was extremely low. The theoretical HRT of the AF process was about 56 h on average, but the actual values of the mean residence time determined by tracer tests ranged from 20 to 33 h. During this study, backflushing was carried out by using nitrogen gas and the treated effluent at intervals, which could be identified by a sudden decrease of the methane content in Figure 2. As can be deduced from Figure 2, the AF process has an exceedingly complicated nonlinear dynamic nature. Although some important operational criteria such as the TOD loading rate and the pH of the reactor were controlled by field operators, the removal efficiency was still severely fluctuated. This fluctuation seems to originate from the rapidly changing feed composition. It is well-known that benzoic acid (BA) and acetic acid (HAc) are easily degraded under anaerobic conditions, but the anaerobic degradations of terephthalic acid (TA) and p-toluic acid are so slow that they limit the overall process performance.18,19 According to the researches of Kleerebezem et al.,18,20 both BA and HAc are inhibitory to the anaerobic degradation of TA. Because BA and HAc are not only the major pollutants in feed waste-

water but also the intermediates in TA degradation, the overall performance of the AF process is severely influenced by the change of the feed composition and the extents to which BA, HAc, and TA are degraded. Although the time profiles of BA, HAc, and TA are not provided here because of the limited number of data, the average removal efficiency for each compound was 80.9%, 72.1%, and 60.6%, respectively, which implies that the AF process was usually under rate-limiting conditions in acidogenesis of TA. Model Identification Conventional Neural Network Model. The first step of the process modeling using a neural network is probably to select the variables that could effectively describe the model process and to properly dispose of them into the input and output spaces of the model. Although some variables related to the feed composition (for example, the concentrations of TA and BA) would better explain the process dynamics, only onlinemeasured variables were considered in this study because the neural network model usually requires abundant data sets with equal sampling time intervals. Table 2 represents the variables used in the model and the basic classification according to their inherent nature in process dynamics. The model input space, X,

3976

Ind. Eng. Chem. Res., Vol. 44, No. 11, 2005

consisted of disturbance variables and control variables, which would eventually influence the dynamic state of the AF process that could be described by the resultant state variables of the model output space, Y. Then, the process dynamics at time t could be described by following equation:

y(t+1) ) f[x(t)]

(1)

where x and y are vectors of variables in the model input and output space, respectively. This systematic partitioning of variables into the model input and output space is very straightforward and crucial to the longterm simulation using the model. The changes of process dynamics far from the time t can be recursively predicted by introducing the concept of ARX input13 as follows:

y(t+1) ) f[x(t),y(t)]

(2)

A variety of neural network architectures and training algorithms have been proposed and used for the modeling of nonlinear dynamic systems. In this study, a multilayer feedforward back-propagation neural network, which has been most frequently applied in chemical and biological processes, was employed as a basic skeleton of the neural network model and the Levenberg-Marquardt algorithm21 was used for the model training process. The overall performance of the model is dependent on the structure of the neural network, i.e., the number of hidden layers, the number of nodes for each hidden layer, and the activation function. We chose the hyperbolic tangent function as the activation function because of its symmetry around the origin and easily computable derivatives. An overparametrized neural network model that has so many hidden layers and nodes can be trained to the extent that the estimated values are completely matched with real output values for the training data sets. However, when the trained model is subsequently used for the validation data sets, a very poor prediction result is usually observed. This overfitting problem has been issued in determining the optimal structure of the neural network model, and several strategies to overcome it have been proposed.10,15 In this study, the cross-validation technique described by Anders and Korn15 was used after combining it with the concept of early stopping10 to reduce the computational load. Figure 3 represents the flowchart of the model training algorithm, which is supervised by cross validation. To perform the cross validation, the training data sets are divided into M subblocks; one of them is assigned as the validation block, and others are assigned as new training blocks. Then, the model is trained until it is identified that further training will result in an increase of the validation error. At the termination of model training, the validation error is assigned to the mean-squared prediction error (MSPE) for the tested validation block. This procedure is repeated until all sub-blocks are tested for validation, and the average MSPE on the M sub-blocks is defined as the cross-validation error (CVE):

Figure 3. Flowchart of the model training algorithm supervised by cross validation.

this modeling study, hourly average data sets with 5832 total observations were used and the first 1000 observations were used for the model training. The crossvalidation technique used for the determination of the optimum structure of a simple ARX-type neural network revealed that the CVE converged on a minimum after the node number of the first hidden layer was increased to 5, regardless of the tested sub-block sizes for cross validation. Moreover, it was identified that the inclusion of the second hidden layer always resulted in a worse validation performance of the model. Therefore, it was presumed that the optimum structure of a simple ARXtype model consists of one hidden layer with five nodes. We also chose 50 as the sub-block size for cross validation, and this value was used in all subsequent model training. It should be noted that the training result for a given neural network structure can be slightly varied depending on the initial condition of the model parameters. Therefore, the calculation of CVE for each neural network structure was conducted 50 times, and its average value was used to determine the optimum model structure. All calculations were performed using MATLAB software (The MathWorks Inc., Natick, MA). A more generalized ARX model, combined with the concept of FIR, was also considered to enhance the prediction ability of the model (based on the researches of Teppola et al.13 and Nikolaou and Vuthandam14). This FIR-ARX-type neural network model can be represented as follows:

(3)

y(t+1) ) f[x(t),x(t-1),...,x(t-lx),y(t),y(t-1),...,y(t-ly)] (4)

The CVEs are evaluated for the different structures of neural networks, and the optimum structure corresponding to the minimum CVE is finally adopted. In

where lx and ly is the time lag to be considered in the model input and output space, respectively. This generalized model could be efficient in modeling the dynamic process with the nature of time-delayed response.

CVE )

1

M

∑ MSPEm

Mm)1

Ind. Eng. Chem. Res., Vol. 44, No. 11, 2005 3977

Figure 4. Conceptual diagram of a moving-window-adaptive neural network.

However, some adverse aspects such as the overfitting problem and high uncertainty of the model were also mentioned by Nikolaou and Vuthandam.14 In this study, several combinations of lx and ly were tested for the fixed hidden layer structure (one hidden layer with five nodes; corresponding to the optimum hidden layer structure of the simple ARX model). Each model was trained by using the cross-validation algorithm described previously. Because all results were not satisfactory, which might be due to the overfitting problem, only one case study result (lx ) 2 and ly ) 1) will be provided for comparison of the model performance. Moving-Window-Adaptive Neural Network Model. Figure 4 shows the scheme of the adaptive neural network, which was designed based on the concept of moving window proposed by Qin16 and Vijaysai et al.17 Initially, a conventional neural network model is prepared by using the previously described training algorithm and used to predict the process dynamics until a number of new data sets are accumulated. At the moment that a new data block is available, the neural network model is automatically updated and used to predict the process dynamics thereafter. This model update is repeated whenever a new data block is available. The strategy of the model update is quite simple. First, the global block, or moving window, is reconstructed by including the new data block and discarding the oldest sub-block. Then, the new oldest sub-block of the reconstructed global block is assigned as the validation block and the remaining subblocks are assigned as new training blocks. Finally, the original model is trained again until it is identified that further training will result in an increase of the validation error. Because the original model has already captured the information about the process dynamics contained in the historical data sets, the objective of the model update is to capture the new information of the new data block. The present moving-window concept can be applied to any type of neural network model. In this study, its feasibility was tested for both the simple ARX model and the FIR-ARX model described previously. The size of the moving window and its sub-block size for the model update was identical with the global training block size of the original model (1000) and the sub-block size for cross validation in the model training (50), respectively.

Results and Discussion Comparison of the Model Performance. Microbial reactions usually have autocatalytic nature, which implies that the reaction kinetics is strongly dependent on the dynamic state of the microorganism itself. For example, the reaction rate of methanogenic bacteria for a given process disturbance can be varied depending on their instantaneous activities. In this study the use of ARX input was inevitable to obtain a satisfactory modeling result because the variables in the model output space are closely related to the dynamic states of the microorganisms involved in the AF process. Figure 5 shows a simulation result that could be normally obtained by using the simple ARX model. As mentioned previously, the training result for a given neural network structure can be varied depending on the initial condition of the model parameters, and thus the simulation result is not unique. This ambiguity of the model mainly arises from the facts that the neural network model to describe the AF process was designed to have a multiple input and multiple output (MIMO) structure and the model training was stopped early by cross validation. In general, the training of the neural network model is carried out by way of minimizing the global error, i.e., mean squared error. In a MIMO-type model, therefore, there can be multiple optima corresponding to the different combinations of individual errors for output variables, which result in a single global error. Fortunately, the possible presence of multiple optima has no essential effect on the training result if the model is fully trained up to a sufficiently small global error. With the cross validation, however, the model training was stopped early when it was identified that further training would increase the validation error. Because the change of the validation error during the training process is governed by the searching pathway for the optimum, which is obviously dependent on the initial condition of the model parameters, the finally obtained model should be considered as only a partially optimized one for a given initial condition of the model parameters. Nevertheless, the trained model was still meaningful because the model structure was well optimized. The training results with different initial conditions were only slightly varied, showing very similar profiles for the behaviors of the AF process.

3978

Ind. Eng. Chem. Res., Vol. 44, No. 11, 2005

Figure 5. Prediction of the time profiles of resultant state variables using a simple ARX model. Gray circle: measured data. Solid line: simulation data.

Figure 6. Score plots of the operational data of the AF process for the first two principal components. The dotted line represents the 95% confidence limit.

As can be seen in Figure 5, the model could accurately predict the process dynamics for the training part. Initially, the prediction accuracy for the validation part

was also satisfactory. However, it was observed that the prediction accuracy for the validation part gradually declined with time. This decrease of the prediction

Ind. Eng. Chem. Res., Vol. 44, No. 11, 2005 3979

Figure 7. Prediction of the time profiles of resultant state variables using the moving-window-adaptive ARX model. Gray circle: measured data. Solid line: simulation data.

accuracy might be due to the change of the process behavior in the long-term perspective. Because the neural network model is usually derived from a limited historical database, it cannot well describe the slowly changing process dynamics that is not reflected on the training data sets. Figure 6 represents the score plot of the principal component analysis for the whole operational data sets used in this modeling study. It is wellknown that the score plot can effectively represent the process behaviors by projecting the original multivariate data onto the latent variable space with reduced dimension, which is usually more efficient in analyzing the trend of data.22 As can be seen in Figure 6, the overall behavior of the AF process was slowly but definitely changed with time. Figure 7 shows a simulation result that was obtained by using the moving-window-adaptive ARX model. It should be noted that the prediction accuracy for the training part is identical with that of Figure 5 because the same model was initially used for the purpose of comparison. The difference is that in the movingwindow ARX model the neural network model was adaptively updated by the algorithm described previously with the assumption that the validation data sets were incorporated into the database in an online manner. One can easily identify that the prediction accuracy for the validation part was greatly enhanced when the moving-window-adaptive ARX model was used. In the moving-window-adaptive ARX model, the model was repeatedly updated, capturing the information about the

Table 3. Average Mean Squared Errorsa for the Different Modeling Strategies model

training

validation

ARX FIR-ARX moving-window-adaptive ARX moving-window-adaptive FIR-ARX

0.0138 0.0133 0.0138 0.0133

0.0503 0.0551 0.0242 0.0277

a The mean squared error values were calculated using scaled data.

process dynamics from the incoming operational data so that the slowly changing process behavior was effectively reflected on the model. The moving-windowadaptive ARX model well described the AF process even for the times corresponding to the process shutdown. In this study, four different neural network modeling approaches were considered, and their prediction performances are compared in Table 3. The simulation was conducted 100 times for each modeling strategy to obtain a representative average mean squared error value. For all models, the mean squared errors of the validation part were higher than those of the training part, which implies that the neural network model inherently has a limitation in predicting the process dynamics not considered during the model training process. This limitation could be significantly reduced by introducing the concept of an adaptive model. It should be noted that within the same category, i.e., moving window adaptive or not, the performance of the FIR-ARX model was always inferior to that of the ARX

3980

Ind. Eng. Chem. Res., Vol. 44, No. 11, 2005

Table 4. Dynamic Situations That Were Selected for the Simulation of Process Dynamics variable

situation 1

situation 2

average

QIN TEQU QR TODIN pHIN pHOUT QGAS QCO2 TODOUT QCH4

189.87 62.74 470.12 7627.70 5.52 6.93 776.45 220.67 1780.40 538.38

179.98 61.99 500.10 7119.50 5.66 6.98 655.66 178.27 1771.10 463.64

183.06 62.06 472.91 7850.80 5.55 6.97 778.58 220.09 1921.80 538.27

a Situations 1 and 2 correspond to t ) 1232 and 4951, respectively.

model despite the better training performance. This might be due to the overfitting problem because the FIR-ARX model had more model parameters than the ARX model. Among the tested models, the movingwindow-adaptive ARX model showed the best prediction performance and was used in the subsequent simulation study. Simulation of Process Dynamics and Some Limitations of the Model. To confirm the feasibility of the moving-window-adaptive ARX model, several dynamic simulations were conducted. First, two distinct dynamic situations to have different sets of model input variables

were selected from the operation database, and the effects of control variables, i.e., TODIN and pHIN, on the instantaneous performance of the AF process were investigated. Table 4 represents the selected dynamic situations, and Figure 8 shows the simulation results. It was simulated that under both tested dynamic situations TODOUT decreased while QCH4 increased with a decrease of pHIN, which indicates that the performance of the AF process could be instantaneously enhanced by lowering pHIN. This result seems to be quite relevant to the actual process dynamics. As illustrated previously, the AF process was usually operated under the rate-limiting condition in acidogenesis of TA. Moreover, the pHOUT was maintained around 7.0, which is favorable to methanogens but relatively higher than the optimum pH range of acidogens.23 Therefore, the control action to lower the reactor pH would stimulate the activities of acidogens and result in a temporary increase of the overall performance. On the other hand, it was simulated that both TODOUT and QCH4 increased according to the increase of TODIN. Because the objective of the AF process was the reduction of TOD as well as the production of methane gas, there would be a tradeoff problem in determining the optimum TODIN if the model was used for the process optimization. It should be noted that the sensitivity of process dynamics to the control variables could be changeable

Figure 8. Comparison of the instantaneous dynamics of the AF process at different dynamic situations. (a) Situation 1: corresponding to t ) 1232. (b) Situation 2: corresponding to t ) 4951.

Ind. Eng. Chem. Res., Vol. 44, No. 11, 2005 3981

Figure 9. Long-term simulation results of the CV effects on the performance of the AF process at dynamic situation 2 (t ) 4951). (a) Effect of TODIN for fixed pHIN ) 5.66. (b) Effect of pHIN for fixed TODIN ) 7120 g/m3. The fixed ones are the actual values of the control variables used in the operation of the AF process.

depending on the dynamic situations. For instance, with respect to TODIN, the sensitivity of TODOUT at situation 1 was relatively higher than that at situation 2, whereas the sensitivity of QCH4 was vice versa (Figure 8). The major difference of the two dynamic situations is the availability of an additional process capacity. As can be seen from Table 4, the AF process was operated near its full capacity at situation 1. In this case, the control action of increasing TODIN would result in only a slight additional degradation of incoming TOD, leading to a large increase in TODOUT but small increase in QCH4. On the contrary, when an additional process capacity was available (at situation 2, the AF process was recovering its normal performance after temporary process shutdown), the control action of increasing TODIN would result in a significant additional degradation of incoming TOD, leading to a large increase in QCH4 but a small increase in TODOUT. The sensitivity of process dynamics with respect to pHIN is rather miscellaneous. It seems that this ambiguity might be due to the highly complicated dynamic nature of the AF process. Because of the fact that the neural network model can be easily incorporated with an online data acquisition system, there have been several attempts to apply the neural network model into the real-time control of the target process.24-26 For application into the control, the neural network model should have long-term prediction ability because the optimum set of control variables was usually determined from the changes of process dynamics during a certain time horizon. To investigate the possibility of applying the present moving-windowadaptive ARX model to process control, the long-term effects of the control variables on the process perfor-

mance were also simulated in this study. For most dynamic situations, the moving-window-adaptive ARX model could stably predict the long-term changes of process dynamics, leading to a steady state corresponding to the tested values of the control variables. For a certain dynamic situation, however, some simulated time profiles were not stable and finally converged on suspicious steady states that severely deviated from others depending on the tested values of the control variables (an example is shown in Figure 9). It seems that this instability in long-term prediction might be one of the general limitations of the neural network model. The prediction ability of the neural network model is basically achieved by the model training process, which only focuses on the discrimination of the instantaneous dynamic relationships between actually measured model input variables at time t and undetermined model output variables at time t + dt with a sustainable error. When the model is used in long-term prediction, the overall prediction error gradually increases with time and there are more chances to encounter a dynamic situation that is never considered in the model training. Once this strange dynamic situation is encountered, the prediction result after that time is thoroughly unreliable even if it converges on a steady state. Although the moving-window-adaptive ARX model showed the best prediction ability compared to others, it seems to still contain some limitations such as ambiguity in sensitivity analysis and instability in longterm simulation. We believe that these limitations could be resolved if some variables representing the wastewater composition would be included in the model. It is also expected that the unrelenting use of the model

3982

Ind. Eng. Chem. Res., Vol. 44, No. 11, 2005

finally leads us to obtain a fully satisfactory model to cover the entire dynamics of the model process because it would be updated adaptively. Conclusions To provide a guideline for the operational optimization and control strategy of a full-scale AF process, different neural network modeling approaches were investigated, especially focusing on the optimum structure of the neural network to avoid the overfitting problem. With the aid of a modified cross-validation technique, the optimum structure of the neural network could be determined. Despite the optimized model structure, however, the conventional neural network models, i.e., simple ARX model and FIR-ARX model, showed limited prediction accuracies because of the slowly changing dynamics of the AF process. These limited prediction accuracies could be greatly enhanced when the concept of the moving-window-adaptive neural network was incorporated. Among the tested models, the moving-window-adaptive ARX model showed the best prediction ability. Several dynamic simulation results supported that the model could more effectively describe the complicated dynamic nature of the AF process, though some limitations were also found. We believe that the present modeling approach using a moving-window-adaptive neural network can be widely applied to the exploration of the databases of various wastewater treatment processes that have complex and time-varying process dynamics. Acknowledgment This work was financially supported in part by the Samsung Petrochemical Co. Ltd. and by the Korea Science and Engineering Foundation through the Advanced Environmental Biotechnology Research Center at Pohang University of Science and Technology. Literature Cited (1) Switzenbaum, M. S. Anaerobic fixed film wastewater treatment. Enzyme Microb. Technol. 1983, 5, 242. (2) Barber, W. P.; Stuckey, D. C. The use of the anaerobic baffled reactor (ABR) for wastewater treatment: a review. Water Res. 1999, 33, 1559. (3) Gupta, N.; Gupta, S. K.; Ramachandran, K. B. Modelling and simulation of anaerobic stratified biofilm for methane production and prediction of multiple steady states. Chem. Eng. J. 1997, 65, 37. (4) Wu, C. S.; Huang, J. S.; Yan, J. L.; Jih, C. G. Consecutive reaction kinetics involving distributed fraction of methanogens in fluidized-bed bioreactors. Biotechnol. Bioeng. 1998, 57, 367. (5) Angelidaki, I.; Ellegaard, L.; Ahring, B. K. A comprehensive model of anaerobic bioconversion of complex substrates to biogas. Biotechnol. Bioeng. 1999, 63, 363. (6) Bernard, O.; Hadj-Sadok, Z.; Dochain, D.; Genovesi, A.; Steyer, J. P. Dynamic model development and parameter identification for an anaerobic wastewater treatment process. Biotechnol. Bioeng. 2001, 75, 424.

(7) Batstone, D. J.; Keller, J.; Angelidaki, I.; Kalyuzhnyi, S. V.; Pavlostathis, S. G.; Rozzi, A.; Sanders, W. T. M.; Siegrist, H.; Vavilin, V. A. The IWA anaerobic digestion model no 1 (ADM1). Water Sci. Technol. 2002, 45, 65. (8) Tay, J. H.; Zhang, X. A fast predicting neural fuzzy model for high-rate anaerobic wastewater treatment systems. Water Res. 2000, 34, 2849. (9) Lee, D. S.; Jeon, C. O.; Park, J. M.; Chang, K. S. Hybrid neural network modeling of a full-scale industrial wastewater treatment process. Biotechnol. Bioeng. 2002, 78, 670. (10) Prechelt, L. Automatic early stopping using cross validation: quantifying the criteria. Neural Networks 1998, 11, 761. (11) Premier, G. C.; Dinsdale, R.; Guwy, A. J.; Hawkes, F. R.; Hawkes, D. L.; Wilcox, S. J. A comparison of the ability of black box and neural network models of ARX structure to represent a fluidized bed anaerobic digestion process. Water Res. 1999, 33, 1027. (12) Sinha, S.; Bose, P.; Jawed, M.; John, S.; Tare, V. Application of neural network for simulation of upflow anaerobic sludge blanket (UASB) reactor performance. Biotechnol. Bioeng. 2002, 77, 806. (13) Teppola, P.; Mujunen, S. P.; Minkkinen, P. Partial least squares modeling of an activated sludge plant: a case study. Chemom. Intell. Lab. Syst. 1997, 38, 197. (14) Nikolaou, M.; Vuthandam, P. FIR model identification: parsimony through kernel compression with wavelets. AIChE J. 1998, 44, 141. (15) Anders, U.; Korn, O. Model selection in neural networks. Neural Networks 1999, 12, 309. (16) Qin, S. J. Recursive PLS algorithms for adaptive data modeling. Comput. Chem. Eng. 1998, 22, 503. (17) Vijaysai, P.; Gudi, R. D.; Lakshminarayanan, S. Identification on demand using a blockwise recursive partial least-squares technique. Ind. Eng. Chem. Res. 2003, 42, 540. (18) Kleerebezem, R.; Mortier, J.; Hulshoff Pol, L. W.; Lettinga, G. Anaerobic pre-treatment of petrochemical effluents: therephthalic acid wastewater. Water Sci. Technol. 1997, 36, 237. (19) Cheng, S. S.; Ho, C. Y.; Wu, J. H. Pilot study of UASB process treating PTA manufacturing wastewater. Water Sci. Technol. 1997, 36, 73. (20) Kleerebezem, R.; Hulshoff Pol, L. W.; Lettinga, G. The role of benzoate in anaerobic degradation of terephthalate. Appl. Environ. Microbiol. 1999, 65, 1161. (21) Hagan, M. T.; Menhaj, M. Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Networks 1994, 5, 989. (22) Kourti, T.; MacGregor, J. F. Process analysis, monitoring and diagnosis, using multivariate projection methods. Chemom. Intell. Lab. Syst. 1995, 28, 3. (23) Bailey, J. E.; Ollis, D. F. Biochemical Engineering Fundamentals; McGraw-Hill: New York, 1986. (24) Dirion, J. L.; Cabassud, M.; Le Lann, M. V.; Casamatta, G. Development of adaptive neural networks for flexible control of batch processes. Chem. Eng. J. 1996, 63, 65. (25) Emmanouilides, C.; Petrou, L. Identification and control of anaerobic digesters using adaptive, on-line trained neural networks. Comput. Chem. Eng. 1997, 21, 113. (26) Holubar, P.; Zani, L.; Hager, M.; Froschl, W.; Radak, Z.; Braun, R. Advanced controlling of anaerobic digestion by means of hierarchical neural networks. Water Res. 2002, 36, 2582.

Received for review November 1, 2004 Revised manuscript received February 6, 2005 Accepted April 8, 2005 IE048944A