Environ. Sci. Technol. 2001, 35, 157-162
Use of Neural Network Models To Predict Industrial Bioreactor Effluent Quality GLENDA M. PIGRAM Chevron Chemical Co., 100 Chevron Way, Richmond, California 94802 THOMAS R. MACDONALD* Environmental Science Department, USF, 2130 Fulton Street, San Francisco, California 94117-1080
Engineered bioreactors are useful tools for degrading wastes from crude oil refining facilities. One such bioreactor forms part of the wastewater remediation process used at a refinery in the San Francisco Bay Area. The flow rate and chemical concentrations of the waste vary, and it is necessary to be able to predict the efficiency of the reactor degradation process for this varied input. The complex biological, physical, and chemical processes of the reactor make deterministic modeling unsuitable. Therefore, predictive modeling for this system was performed using a neural network model. A predictive, time-series neural network model requires a complete data set. Often, in the case of a large industrial facility, data are missing. Various techniques can be used to reconstruct missing data, but comparisons of techniques have not been performed for large-scale remediation processes. In this manuscript, four techniques are used for reconstructing missing data to examine which ones provide superior predictive capabilities. It was found that the interpolated and moving average values methods provided the best predictions. The mean and median replacement methods, commonly used in neural network modeling, provided much poorer predictions. Another goal of this study is to determine which water quality parameters are more accurately predicted than others. In this study, pH was the most accurately predicted, while ammonia and total phenolics concentrations were the least accurately predicted.
Introduction Neural networks have been used as models for prediction of biological processes, particularly in the areas of wastewater treatment and bioreactors (1-3). Other studies have been performed on the use of neural networks to control bioreactors in both simulated and laboratory-scale systems (47). Application of neural networks to industrial-scale bioreactors generally has not been performed due to practical considerations. Industrial facilities cannot be controlled as readily as laboratory-scale systems. For example, in this study, the bioreactor covers over 28 acres, and factors such as flow rate and mixing are not easily controlled in such a large system. In cases where neural networks have been applied to large-scale municipal wastewater treatment systems, either * Corresponding author phone: (415)422-5895; fax: (415)422-6363; e-mail:
[email protected]. 10.1021/es001264o CCC: $20.00 Published on Web 12/01/2000
2001 American Chemical Society
complete data sets existed or the data set was weekly averaged (8-10). Previous studies have replaced missing data prior to neural network modeling in other applications such as prediction of business earnings and identification of phytoplankton species (11, 12). These studies typically have only a few missing values and one or two parameters at most, and they use techniques such as averaging, linear regressions, maximum likelihood estimate, and even additional neural networks to reconstruct the missing values. There is an apparent lack of studies applying these same data reconstruction techniques to wastewater treatment systems, particularly those of industrial scale. The importance of the present study is that it applies several different techniques to reconstruct missing values for developing neural network models used to predict daily bioreactor effluent water quality in an industrial system. Industrial facilities often have a large amount of historical daily data available that include operational and seasonal variations. These variations need to be taken into account when making predictions. Therefore, it is critical to use as much of these data as possible. Unfortunately, data at these facilities are often incomplete or certain parameters may be missing. Therefore, data time series without missing data may only exist for short periods of time, which does not provide a complete picture of facility variations. We used daily sample input and output data for a period of approximately 1.5 years from a bioreactor that is part of the wastewater treatment system at its refinery in the San Francisco Bay Area. Thirteen sample parameters were sampled for on a daily basis, but there were days when the samples were not taken or for which data are missing. These sample data are used to make predictions of six output parameters from the reactor. The complexity of the bioreactor system makes deterministic modeling unfeasible. Therefore, a neural network model was used to predict reactor output. The data time series has missing data points, and timeseries neural networks require a complete data set. The cases with one or more missing values could not be deleted because that would leave an incomplete time series as well as an insufficient number of complete cases to train a network for such a complex process. We needed to find techniques for replacing large amounts of missing data for modeling our industrial facility. A requirement of these techniques is that they are straightforward enough to provide for ease of use and timely predictions for practical integration into daily industrial practice. In this paper, we examine four techniques for replacing the missing data, and neural network models are developed from each of the resulting data sets. The accuracy of the neural network predictions is compared to determine the best reconstruction technique. In addition, we also wanted to understand which of the parameters could be best predicted with a neural network model.
Experimental Section Wastewater Treatment Facility. The biological wastewater treatment system studied in this paper is part of a larger wastewater treatment facility within a petroleum refinery in California. Common contaminants of the wastewater in this facility are petroleum hydrocarbons, bases, metals, and settleable and dissolved solids. Figure 1 is an overview of the entire wastewater treatment system. Three composite wastewater streams pass through American Petroleum Institute (API) separators where the aqueous and hydrocarbon phases are separated. The surface oils and settleable solids are removed. The surface oil is recycled to a processing plant, VOL. 35, NO. 1, 2001 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
157
FIGURE 1. Overview of refinery wastewater treatment system. FIGURE 2. Dye study of bioreactor flow. and the solids are handled as hazardous waste. The aqueous phase from the API separators is then fed via gravity through the bioreactor for the second phase of treatment. After treatment in the bioreactor, the wastewater is divided into two streams. One stream passes through wetlands for treatment prior to filtration with activated carbon, testing, and discharge to the estuary via a deep-water outfall. The other stream is also filtered with activated carbon, tested, and then discharged to the estuary via a deep-water outfall. During the dry season, approximately 4 million gallons of treated water are discharged to the estuary each day (13). Proper maintenance of all units in the treatment system is necessary to maintain effective treatment of this volume of wastewater. An effective predictive model of the bioreactor would aid in operation and maintenance of this system and also aid the refinery in better control of the effluent water quality to meet their effluent discharge requirements. Bioreactor. The bioreactor is an uncovered 28.9 acre lagoon that is designed to treat up to 16 000 pounds biological oxygen demand (BOD) per day (13). The bioreactor is operated in a continuous manner, and it consists of two main sections separated by a wall as shown in the bioreactor portion of Figure 1. The first section of the bioreactor treats the wastewater via aeration and bacterial activity. Approximately 1100 submerged static aerators discharge air into the bioreactor sections identified as quadrants I and II. Bacteria grow and feed on the waste, thus degrading the organic material. The second section of the bioreactor is composed of quadrants III and IV. These quadrants act as settling areas where solids and microorganisms are separated from the water and deposited on the bioreactor bottom. The bioreactor effluent may be recycled to the influent when needed. A tracer study was conducted to evaluate the residence time distribution in the bioreactor as shown in Figure 2. It was found that the residence time in the bioreactor ranges from several days to more than 30 days with an average residence time of approximately 10 days. Not only is there some variation in residence time, but waste stream compositions may also vary due to operational changes within individual process units in the refinery. These variations can affect the ability of the bioreactor to treat the waste on a single pass through the system. A better understanding of the relationship between bioreactor influent and effluent water quality parameters would enhance control and utilization of the bioreactor. Bioreactor influent was measured daily for 515 days via 24-hour composite samples for pH, total organic carbon (TOC), ammonia, surfactants, salinity, total phenolics, and various metals. Influent composite samples were collected at the point marked “Bio In” in Figure 1. The bioreactor 158
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 35, NO. 1, 2001
TABLE 1. Neural Network Input and Output Parameters input parameters
output parameters
TOC (ppm) surfactants (ppm) pH ammonia (ppm) salinity (ppt) total phenolics (ppm) organic acids (ppm) bioreactor quadrant I dissolved oxygen (ppm) bioreactor quadrant II dissolved oxygen (ppm) bioreactor quadrant III dissolved oxygen (ppm) bioreactor aerator air flow (mscfm) bioreactor level (ft) bioreactor effluent flowrate (mmgpd)
TOC (ppm) surfactants pH(ppm) ammonia (ppm) salinity (ppt) total phenolics (ppm)
effluent was also measured daily via 24-h composite samples for these same parameters at the point marked “Bio Out” in Figure 1. Within each of the four quadrants of the bioreactor, dissolved oxygen (DO) was measured daily. In addition, organic acid, bioreactor level, rate of aeration, and effluent discharge flow rate were measured daily. All of these parameters are used in the neural network model. Data Set. The data set for training and validating the neural network model consists of daily measurements of the bioreactor parameters for 515 days from January 1993 to May 1994. Table 1 lists the bioreactor parameters as neural network inputs and outputs. There are 13 inputs consisting of measurements of bioreactor influent and bioreactor physical characteristics. There are six output bioreactor effluent parameters predicted by the neural network. Metal concentrations were not included as parameters since these values were low, and there was not significant change in influent and effluent concentrations. A complete data set of 515 days of data that has 19 parameters per day should contain a total of 9785 individual data points. In this data set, however, only 7298 individual data points exist. The missing data points, or 25.4% of the data set, must be reconstructed to develop the neural network model for a continuous time series. Missing Data Reconstruction Techniques. Four different techniques were chosen to reconstruct the missing values in the data set. These techniques consisted of straightforward arithmetic calculations, which could readily be performed daily on the industrial process. The first technique replaces the missing values with the mean value of the parameter. The second technique replaces the missing data with the median value of the parameter. The third technique replaces the missing data with a moving 10-day average value. The fourth technique replaces missing data using a linear
direction of the line search for iteration n, dn, is chosen to minimize the error in that particular direction while still ensuring that the previously minimized directions for iterations 1 to n-1 remain minimized. This provides an efficient method for large networks that are typical of time-series problems. The direction vector for each successive time step is found by
dn ) -gn + βndn-1
(3)
where gn is the gradient of the error function and βn is a momentum parameter calculated for each iteration. βn is found using the Polak-Ribiere method modified to ensure convergence, which generally yields the best results (15, 16): FIGURE 3. Schematic of multilayer perceptron neural network showing weighted connections between input, hidden, and output layers.
βn ) max
〈
T
〉
gn (gn - gn-1) ,0 T gn-1 gn
(4)
where N is the total number of output values used for training and y represents the output values (14, 15). Figure 3 represents a MLP neural network, which is well suited for regression problems with continuous parameters (15-17). Note that any number of additional hidden layers could be added, but typically an MLP has between one and three hidden layers (15-17). The conjugate gradient descent method allows efficient training of MLPs. Conjugate gradient descent is a batch method where the individual weights are all updated at once according to
The value of ηn in eq 2 is found using Brent’s method to perform a line search to find the value of ηn that minimizes the error function, e. This iterative procedure is performed until the weights are found to provide a network with an acceptably low error and that generalizes well. The input values for the time series problem are used to calculate the output value at a given time step from measured values from prior time steps (15, 17). The values from previous time steps that are input into the neural network include all of the parameters measured at the bioreactor influent point and at the effluent from the bioreactor from a set number of previous time steps. The number of previous time steps that should be used is based upon the physical problem as well as the ability of the network to provide improved results (17). The reactor’s residence time distribution in Figure 2 indicates an average residence time of approximately 10 days that tails off gradually after approximately 12 days. Using this residence time distribution, we performed multiple neural network runs to find the optimal number of prior time steps to use in the model. From these runs, it was found that 30 days of prior time steps should be used. Thus, with 13 parameter values measured at the bioreactor inflow and six parameters measured at the outflow over 30 prior time steps, the output at the given time step is predicted using a neural network with 570 input values. The final network architecture was determined through multiple runs to test various architectures. These included multilayer perceptrons with both one and two hidden layers, which have been shown to be effective for bioreactor problems (17, 18). It was found that networks with two hidden layers provided improved results without loss of generality, which agrees with previous findings for fermentation reactors (17). The optimal number of nodes for the hidden layers varied slightly for each output parameter, but the performance of these varied architectures varied only slightly so that the overall results remained nearly the same. The final architecture used to compare the different missing data replacement methods and output parameter predictions has an input layer of 570 values, a first hidden layer of 20 nodes followed by a hidden layer of six nodes, and then the output node for the desired output parameter value. The network was run for each of the six output parameters for each of the four missing data replacement methods. The input data were normalized using eq 5
wn+1 ) wn + ηndn
yi ) (xi - xmin)/(xmax - xmin)
interpolation between the last known values preceding and following the missing data values. Neural Network Modeling. A neural network is a mathematical simulation of the neurological functioning of a brain. A neural network consists of neurons and connections between neurons that respond to inputs in a manner similar to the way a brain functions. The neurons or “nodes” are grouped in layers. A multilayer perceptron (MLP) neural network consists of an input layer, a number of hidden layers, and an output layer. The nodes within each layer are connected to the nodes in the next layer by weighted connections, and a transfer function is applied as the signals are passed from the hidden to the output layers. This type of neural network is called a feedforward neural network since the connections are all forward in direction. A neural network is “trained” by presenting it with known inputs and outputs, and it can learn the patterns of these inputs and outputs by adjusting the weights of the connections. During the neural network modeling process, the weights of the connections are adjusted until the error between the predicted outputs and the actual outputs is minimized. One way of efficiently adjusting the weights to minimize the error is through the conjugate gradient descent method. An effective neural network can then accurately predict outputs when it is input with data previously unseen by the network, known as the test set. The conjugate gradient training algorithm is used to minimize an error function, such as the sum-of-squared error N
e)
∑ {y
i,calculated
- yi,desired}2
(1)
i)1
(2)
where n is the iteration number, w is the vector of all the individual weights, η is the step length, and d is the step direction vector found by performing line searches across the multidimensional error surface of the network. The
(5)
where yi is the normalized value, xi is the parameter value, and xmax and xmin are the maximum and minimum measured values, respectively, for the particular parameter. Normalization of the input data has been shown to reduce the error VOL. 35, NO. 1, 2001 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
159
TABLE 2. Error Values and Correlation Coefficients for Network-Predicted Bioreactor Outputs technique
measurement
salinity
phenolics
NH3
TOC
pH
surfactants
mean value
RMS r >30% error RMS r >30% error RMS r >30% error RMS r >30% error
0.143 0.785 0.215 0.181 0.532 0.367 0.149 0.717 0.262 0.139 0.756 0.215
0.109 0.621 0.531 0.131 0.377 0.605 0.093 0.726 0.421 0.101 0.664 0.500
0.080 0.936 0.628 0.105 0.886 0.690 0.060 0.962 0.553 0.052 0.972 0.520
0.104 0.235 0.200 0.091 0.553 0.126 0.073 0.729 0.092 0.072 0.739 0.105
0.078 0.874 0.040 0.070 0.907 0.048 0.056 0.936 0.025 0.055 0.939 0.025
0.112 0.716 0.257 0.101 0.777 0.197 0.117 0.689 0.280 0.088 0.839 0.184
median value 10-day av interpolation
in the neural network predictions (15-17, 19). A sigmoidal transfer function was also used as in eq 6 (14-17).
f(x) )
1 1 + e-x
(6)
A sigmoidal transfer function is commonly used, and it has been successfully applied to other neural network models of bioreactor systems (5, 6, 17). There were a total of 515 daily patterns, of which the first 30 were used only as time-series inputs. For each neural network model, the remaining data set of 485 daily patterns was divided into a training set, a verification set, and a test set. Every fourth pattern was used for the verification set, every fifth pattern was used for the test set, and the remaining 291 patterns were used as the training set. This division provides a large number of data for training the network as well as adequate data for verification and testing. By dividing the sets up across the 485 days, the training, verification, and testing sets are more likely to include data from a wider range of facility operation conditions than if the data sets had been divided up sequentially. The network was trained using the training set to minimize the error, and the network was checked with the verification set after each iteration to prevent overlearning for the training set and loss of ability to generalize (15-17). If the error for the training set decreases at the expense of an increased verification set error, the training is halted. As a final check, the test set is used on the network to be sure that the network performs and generalizes well. Comparison of the neural network models has been determined in other studies using the root-mean-square (RMS) error of the predicted vs actual outputs (20). RMS error is calculated by
RMS error )
{∑ 1 x
x
(dx - yx)2
}
1/2
(7)
where dx and yx are actual and network-predicted values of a given parameter, respectively. Since our data set includes some outputs that are reconstructed values and not actual measurements, those reconstructed values are not included in the calculation of the RMS error.
Results and Discussion Table 2 shows a comparison of all four techniques using the RMS error values, correlation coefficient, r, and the fraction of values that are greater than 30% difference between the predicted and actual values. The RMS error provides a general quantity to judge overall performance, the correlation coefficient indicates whether the general trend is followed, and the fraction of values with errors greater than 30% helps indicate how well the network predicts extreme values. The 160
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 35, NO. 1, 2001
FIGURE 4. Model vs actual pH values for neural network model based on interpolated technique for data reconstruction. parameter values used to calculate these quantities were normalized according to eq 5, so that direct comparisons could be made without bias toward a parameter’s magnitude or range. All of these results are calculated on measured data only and not on the reconstructed values. These results show that a neural network approach to modeling an industrial-scale biological wastewater treatment system is feasible. Some parameters, such as pH, are predicted with little error. Other parameters, such as total phenolics, are more difficult to predict. Overall, the interpolation technique provided the most accurate results followed closely by the 10-day moving average technique. The mean and median techniques generally provided much poorer results. The interpolation technique had the lowest RMS error values for five bioreactor output parameters and the highest correlation coefficients for four bioreactor output parameters. The 10-day moving average had the lowest RMS error and correlation coefficient for one output parameter, and the mean value technique had the lowest correlation coefficient for one parameter. Thus, the interpolation technique had the best overall error and ability to replicate trends. In many cases, the correlation coefficient is quite large, indicating excellent trend predictions for the networks. For example, five of the six parameters for both the interpolation and 10day moving average technique showed greater than 70% correlation between measurements and predictions. Even the poorer performing mean and median techniques showed four and three parameters, respectively, with greater than 70% correlation. The prediction of pH illustrates the effect of the reconstruction technique on the ability of the network to accurately predict this effluent parameter. Figure 4 shows the model vs actual values for pH using the interpolation technique for reconstructing the missing values. This technique gave the lowest RMS error, highest correlation coefficient, and lowest
FIGURE 5. Model vs actual pH values for neural network model based on mean value technique for data reconstruction. fraction of outputs with error greater than 30% for pH prediction. For comparison, Figure 5 shows the model vs actual values for pH using the mean value technique for missing data reconstruction. This technique gave the highest RMS error and lowest correlation coefficient for pH prediction. Visual comparison of the charts shows that the interpolation technique is better able to predict the peaks and valleys in the data set, while the mean value technique is biased toward the mean and does not do as good a job predicting pH values with larger variations from the mean. Overall, the ability to predict the pH concentrations using the neural networks is excellent. In addition to the ability of the neural network to provide a strong correlation with the data, it is also important for plant operations that the predictions not differ too greatly from the measurements. These results again indicate the superior performance of the interpolation technique followed by the 10-day moving average and then the mean and median replacement techniques. pH values had the lowest fraction of predictions that differed from measurements by more than 30%, followed by TOC and surfactants. Ammonia and total phenolics had the largest fraction. The ammonia error may be caused by error in the analytical test method. A 30% error in the mean ammonia concentration of 2.3 ppm is approximately ( 0.69 ppm. It is possible that errors in the analytical method may account for a portion of the error. Similarly for total phenolics, there may be errors in the analytical test method. It is also possible that the neural network model does not include enough input parameters that affect total phenolics and ammonia concentrations in the bioreactor effluent. Figure 6 shows the model vs actual values of ammonia concentration using the interpolation technique to reconstruct missing values. This method gave the most accurate prediction of this parameter. For comparison, Figure 7 shows the model vs actual values of ammonia concentration using the median value technique for reconstruction. This model was the least accurate predictor of ammonia concentration. A comparison of the two charts shows a definite improvement by using the interpolation technique. Using this technique, the ammonia measurements could be predicted reasonably well. Thus, these results indicate that neural networks can be effective in making predictions for each of the parameters using a practical data reconstruction technique. Thus, the lack of data can be overcome when developing a neural network model for an industrial-scale bioreactor. There are limitations to this study and the neural network models. Random sampling and measurement errors certainly occurred, but assuming they are truly random as well as small in magnitude compared to the true signal, this should not have had a major effect on the results. In addition, the
FIGURE 6. Model vs actual ammonia concentrations for neural network model based on interpolation technique for data reconstruction.
FIGURE 7. Model vs actual ammonia concentrations for neural network model based on median value technique for data reconstruction. model was constructed and validated using data over a 1.5 year period. If the bioreactor operates in conditions different from those during which samples were collected, it is unknown how precisely and accurately the neural network can predict the effluent parameters (15, 16). Further development of this model may apply these techniques to include upstream waste source data and correlate these data to the bioreactor effluent water quality.
Literature Cited (1) Spall, J. C.; Cristion, J. A. IEEE Trans. Syst., Man, Cybern. 1997, 27, 369. (2) Chaudhuri, B.; Modak, J. M. Bioprocess Eng. 1998, 19, 71. (3) Zhao, H.; Hao, O. J.; McAvoy, T. J.; Chang, C. J. Environ. Eng. 1997, 123, 311. (4) Gorinevsky, D. J. Dyn. Syst., Meas., Control. 1997, 119, 94. (5) Muralikrishnan, G.; Chidambaram, M. Bioprocess Eng. 1995, 12, 35. (6) Chtourou, M.; Najim, K.; Roux, G.; Dahhou, B. Bioprocess Eng. 1993, 8, 251. (7) Normandin, A.; Thibault, J.; Grandjean, B. P. A. Bioprocess Eng. 1994, 10, 109. (8) Boger, Z. ISA Trans. 1992, 31, 25. (9) Zhang, Q.; Stanley, S. Water Resour. 1997, 31, 2340. (10) Rodriguez, M. J.; West, J. R.; Powell, J.; Serodes, J. B. Water Sci. Technol. 1997, 36, 317. (11) Gupta, A.; Lam, M. S. J. Oper. Res. Soc. 1996, 47, 229. (12) Boddy, L.; Wilkins, M. F.; Morris, C. In Intelligent Engineering Systems Through Artificial Neural Networks; Dagli, C. H., Akay, M., Akay, A., Ersoy, O., Fernandez, B. R., Eds.; 1998; Vol. 8, p 655. (13) Chevron USA Richmond Refinery. Utilities and Environmental Electronic Operating Manuals. Environmental Process Description; Richmond, California, 1998. (14) Halpin, S. M.; Burch, R. F. IEEE Trans. Ind. Appl. 1997, 33, 1355. VOL. 35, NO. 1, 2001 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
161
(15) Bishop, C. M. Neural Networks for Pattern Recognition; Clarendon Press: Oxford, 1995. (16) Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall: Upper Saddle River, 1999. (17) Baughman, D. R.; Liu, Y. A. Neural Networks in Bioprocessing and Chemical Engineering; Academic Press: New York, 1995. (18) Patnaik, P. R. Biotechnol. Tech. 1996, 10, 967. (19) Sola, J.; Sevilla, J. IEEE Trans. Nucl. Sci. 1997, 44, 1464.
162
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 35, NO. 1, 2001
(20) Yabunaka, K.; Hosomi, M.; Murakami, A. Water Sci. Technol. 1997, 36, 89.
Received for review May 15, 2000. Revised manuscript received October 16, 2000. Accepted October 16, 2000. ES001264O