128
Energy & Fuels 2008, 22, 128–133
Application of the Clustering Hybrid Regression Approach to Model Xylose-Based Fermentative Hydrogen Production† Nikhil,*,‡,§ Ari Visa,‡ Olli Yli-Harja,‡ Chiu-Yue Lin,4 and Jaakko A. Puhakka§ Institute of Signal Processing and Institute of EnVironmental Engineering and Biotechnology, Tampere UniVersity of Technology, P.O. Box 553, FI-33101, Tampere, Finland, and Department of EnVironmental Engineering and Science, Feng Chia UniVersity, Taichung, 40724, Taiwan ReceiVed May 27, 2007
The applicability of the CHR (clustering hybrid regression) approach was evaluated in modeling the H2 production rate from the metabolic endproducts (ethanol, acetate, butyrate, propionate, and valerate), CO2 production rate, and monitoring variables (pH, oxidation–reduction potential, and alkalinity) of the bioreactor system. Self-organizing maps (SOMs) were used to visualize and understand the relationships between the variables in the multidimensional data set. K-means clustering was used to cluster the data set into statistically significant clusters. The local multiple-regression models for modeling the H2 production rate were formulated for each cluster. The data was obtained from the xylose (concentration 20 gCOD/L) based fermentative H2producing continuously stirred tank reactor (CSTR). The bioreactor (working volume, 4 L) was operated for 376 days at 35 ( 1 °C and a hydraulic retention time of 12 h. The data was obtained when the bioreactor reached steady-state conditions. Different metabolic patterns (acetate and butyrate as the main metabolic products) in anaerobic xylose degradation were investigated. High H2 production rates were observed during two states: first, when butyrate metabolism at pH 7 was occurring; and second, when acetate coupled metabolism at pH 6.7 was taking place in the bioreactor.
1. Introduction Hydrogen (H2) is seen as a major energy carrier of the future. In addition, H2 has a wide range of industrial applications including hydrogenation processes, such as in the production of lower molecular weight compounds, saturation of compounds, and cracking of hydrocarbons or removal of sulfur and nitrogen compounds and as an O2 scavenger to prevent corrosion and oxidation and as coolant in electrical generators.1 Today, the majority of H2 is produced from fossil fuels, while numerous ways to produce H2 from renewable energy sources exist.2,3 One alternative of producing sustainable H2 energy from renewable energy sources is through microbiological fermentation or photosynthesis.4 Fermentation produces H2 at higher rates than photosynthesis and has the potential of combining organic waste management with simultaneous H2 production.2,3 A large amount of organic waste is produced from agricultural, industrial, and domestic processes. Converting organic wastes into valuable products such as hydrogen gas is sustainable. Glucose and xylose † Presented at the International Conference on Bioenergy Outlook 2007, Singapore, April 26-27, 2007. * Corresponding author. Tel.: +358 3 3115 4956. Fax: +358 3 3315 4989. E-mail:
[email protected]. ‡ Institute of Signal Processing, Tampere University of Technology. § Institute of Environmental Engineering and Biotechnology, Tampere University of Technology. 4 Feng Chia University. (1) Das, D.; Veziroglu, T. N. Hydrogen production by biological processes: a survey of literature. Int. J. Hydrogen Energy 2001, 26, 13–28. (2) Benemann, J. Hydrogen biotechnology: Progress and prospects. Nat. Biotechnol. 1996, 14, 1101–1103. (3) Levin, D. B.; Pitt, L.; Love, M. Biohydrogen production: prospects and limitations to practical application. Int. J. Hydrogen Energy 2004, 29, 173–185. (4) Nandi, R.; Sengupta, S. Microbial production of hydrogen: An overview. Crit. ReV. Microbiol. 1998, 24, 61–84.
are produced at a concentration ratio of 55–65% to 35–45% during saccharidification of some organic wastes.5 H2 production is an intermediary step in the anaerobic degradation of organic material. H2 is produced in order to maintain the electron balance in the anaerobic system.4 The gases (H2 and CO2), organic acids (such as acetate, butyrate, propionate, and valerate) and alcohols (e.g., ethanol) are the endproducts of the bioprocess.1,3,4,6 There is a need to comprehend the relationship that exists between several endproducts and utilize this information to better understand the complex dynamic behavior of the system.6,7 In modeling and control of complex systems such as biotechnological processes, it is usually assumed that a global, analytical system model can be defined.8 Generally, kinetic and stoichiometric models9–13 and models based on Anaerobic (5) Lavarack, B. P.; Griffin, G. J.; Rodman, D. The acid hydrolysis of sugarcane bagasse hemicellulose to produce xylose, arabinose, glucose and other products. Biomass Bioenergy 2002, 23, 367–380. (6) Rodriguez, J.; Kleerebezem, R.; Lema, J. M.; van Loosdrecht, M. C. Modeling Product Formation in Anaerobic Mixed Culture Fermentations. Biotechnol. Bioeng. 2006, 93, 592–606. (7) Lin, C.-Y.; Chang, R.-C. Fermentative hydrogen production at ambient temperature. Int. J. Hydrogen Energy 2004, 29, 715–720. (8) Schugerl, K.; Bellgardt, K. H. Bioreaction engineering. Modeling and control; Springer-Verlag: Berlin, Heidelberg, New York, 2000. (9) Bailey, E. J. Mathematical modeling and analysis in biochemical engineering: Past accomplishments and future opportunities. Biotechnol. Prog. 1998, 14, 8–20. (10) Bernard, O.; Bastin, G. On the estimation of the pseudo-stoichiometric matrix for macroscopic mass balance modelling of biotechnological processes. Math. Biosci. 2005, 193, 51–77. (11) Husain, A. Mathematical models of the kinetics of anaerobic digestion - a selected review. Biomass. Bioenergy 1998, 14, 561–571. (12) McCarty, P. L.; Mosey, F. E. Modelling of anaerobic digestion processes (a discussion of concepts). Water Sci. Technol. 1991, 24 (8), 123– 129.
10.1021/ef700619v CCC: $40.75 2008 American Chemical Society Published on Web 12/14/2007
CHR Modeling of Hydrogen Production
Digestion Model 1 (ADM1)14–17 have been used to describe the anaerobic bioprocesses. These models require detailed a priori knowledge of the bioprocess.6,12 In bioprocesses involving mixed cultures, it is not always possible to establish detailed a priori knowledge about the bioprocess. Knowledge mining based models of H2 fermentation processes may offer means to reveal hidden patterns in bioprocess data and to provide information for the optimization of H2 production bioprocesses. In this study, a xylose-based fermentative hydrogen production data set (obtained from ref 18) was used to evaluate the applicability of the clustering hybrid regression (CHR) approach in modeling the H2 production rate. CHR is an approach inspired by knowledge mining which does not require detailed a priori knowledge of the bioprocess. The clustering techniques (selforganizing maps (SOMs) and K-means) were applied to mine and cluster the endproducts of fermentative H2 production. The capabilities of SOM in finding biologically meaningful clusters have been demonstrated. The SOM has been used in clustering gene expression patterns from yeast or C. elegans19–21 and also in a cancer data set.22 SOMs were used to visualize biologically significant metabolic patterns and estimate the number of clusters (k). The clusters were formed using K-means clustering. The aim was to detect metabolic patterns in the bioprocess data set and to model the H2 production rate by using metabolites’ data and metabolic patterns. The CHR approach was applied to model the H2 production rate based on the values of CO2 production rate, organic acids (acetate, butyrate, propionate, and valerate), alcohol (ethanol), pH, oxidation–reduction potential (ORP), and alkalinity. 2. Materials and Methods 2.1. Experimental Setup and Data Set. The data set used in this study was obtained from the xylose-based fermentative hydrogen producing bioprocess.18 A chemostat-type anaerobic bioreactor (working volume 4L) with a continuous feeding mode and a digested gas-returned mixing mode23 was operated for 376 days at 35 ( 1 °C, a pH of 7.1, and a hydraulic retention time (13) Rani, K. Y.; Rao, V. S. R. Control of fermenters - a review. Bioprocess Eng. 1999, 21, 77–78. (14) Batstone, D. J.; Keller, J.; Angelidaki, I.; Kalyuzhnyi, S. V.; Pavlostathis, S. G.; Rozzi, A.; Sanders, W. T. M.; Siegrist, H.; Vavilin, V. A. Anaerobic digestion model no. 1 (ADM1); IWA Task Group for mathematical modelling of anaerobic digestion processes; IWA Publishing: London, UK, 2002. (15) Blumensaat, F.; Keller, J. Modelling of two-stage anaerobic digestion using the IWA Anaerobic Digestion Model No. 1 (ADM1). Water Res. 2005, 39, 171–183. (16) Kalyuzhnyi, S. V. Batch anaerobic digestion of glucose and its mathematical modeling. II. Description, verification and application of model. Bioresour. Technol. 1997, 59, 249–258. (17) Parker, W. J. Application of the ADM1 model to advanced anaerobic digestion. Bioresour. Technol. 2005, 96, 832–1842. (18) Lin, C.-Y.; Cheng, C.-H. Fermentative hydrogen production from xylose using anaerobic mixed microflora. Int. J. Hydrogen Energy 2006, 31, 832–840. (19) Tamayo, P.; Slonim, D.; Mesirov, J.; Zhu, Q.; Kitareewan, S.; Dmitrovsky, E.; Lander, E.; Golub, T. Interpreting patterns of gene expression with self-organizing maps; methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci., U.S.A. 1999, 96, 2907–2912. (20) Törönen, P.; Kolehmainen, M.; Wong, G.; Castren, E. Analysis of gene expression data using self-organizing maps. FEBS Lett. 1999, 451 (2), 142–146. (21) Hill, A.; Hunter, C.; Tsung, B.; Tucker-Kellogg, G.; Brown, E. Genomic analysis of gene expression in C.elegans. Science 2000, 290, 809– 812. (22) Chen, D.-R.; Chang, R.-F.; Huang, Y.-L. Breast cancer diagnosis using self-organizing maps for sonography. Ultrasound Med. Biol. 2000, 26:3, 405–411. (23) Chen, C.-C.; Lin, C.-Y.; Chang, J.-S. Kinetics of hydrogen production with continuous anaerobic cultures utilizing sucrose as the limiting substrate. Appl. Microbiol. Biotechnol. 2001, 57, 56–64.
Energy & Fuels, Vol. 22, No. 1, 2008 129
Figure 1. Idea of the SOM.30 All neurons contain a reference vector, whose dimension is the same as the dimension of the input pattern.
(HRT) of 12 h. The seed sludge, natural anaerobic mixed microflora, was collected from a municipal sewage treatment plant (Taichung, Taiwan; an activated sludge process) and screened with sieve no. 8 (diameter 2.35 mm) to eliminate large particulate materials. The pH, volatile suspended solids (VSS), and total solids concentrations of the screened sludge were 7.4, 29 640, and 48 350 mg/L. The screened sludge was heat-treated at 100 °C for 45 min and then seeded into the bioreactor. The enriched substrate was xylose (chemical oxygen demand (COD) 20 g/L) and contained sufficient inorganics24 for bacterial growth (mg/L): NH4HCO3 5240, K2HPO4 125, MgCl2 · 6H2O 15, FeSO4 · 7H2O 25, CuSO4 · 5H2O 5, CoCl2 · 5H2O 0.125, NaHCO3 6720. The parameters monitored during the experiments were pH, oxidation–reduction potential (ORP), alkalinity, gas production, volatile fatty acids (VFAs), and alcohol distribution. Ethanol and VFAs were analyzed with a gas chromatograph having a flame ionization detector (glass column, 145 °C; injection temperature, 175 °C; carrier gas, N2; packing, FON 10%). Gas composition was analyzed with a gas chromatograph having a thermal conductivity detector (column, 55 °C; injection temperature, 90 °C; carrier gas, Ar; packing, Porapak Q, mesh 80/100). The data set used in this study consists of the following measured variables (114 measured points): pH, ORP, and alkalinity and acetate, butyrate, propionate, valerate, ethanol, CO2, and H2 production rates. 2.2. Self-Organizing Maps (SOMs). SOMs, also known as Kohonen’s maps, are unsupervised artificial neural networks based on competitive learning.25 SOMs are good visualization tools for nonlinear high dimensional data and are generally used in the data understanding phase of model development. The SOM technique has proven to be a valuable tool in data mining and knowledge discovery. Applications over 7000 scientific publications on the SOM have been written.26 It has been successfully applied in various engineering applications like pattern recognition, image analysis, process monitoring and control, and fault diagnosis.27,28 The principal goal of the SOM (Figure 1) is to transform an incoming signal pattern of arbitrary dimension into a one- or twodimensional discrete map and to perform this transformation adaptively in a topologically ordered fashion. A SOM is therefore characterized by the formation of topographic map of the input patterns in which the spatial locations (i.e., coordinates) of the neurons in the lattice are indicative of intrinsic statistical features (24) Endo, G.; Noike, T.; Matsumoto, T. Characteristics of cellulose and glucose decomposition in acidogenic phase of anaerobic digestion. Proc. Soc. CiV. Eng. 1982, 325, 61–68. (25) Kohonen, T. Self-organizing maps; Springer Series in Information Sciences; Springer: Berlin, Heidelberg, New York, 2001; Vol. 30. (26) Bibliography of SOM papers. http://www.cis.hut.fi/research/sombibl/ (accessed Aug 2007). (27) Simula, O.; Kangas, J. Process monitoring and Visualization using self-organizing maps. Neural Networks for Chemical Engineers. ComputerAided Chemical Engineering; Elsevier, Amsterdam, 1995. (28) Kasslin, M.; Kangas, J.; Simula, O. Process state monitoring using self organizing maps; Aleksander, I., Taylor, J., Eds.; Artificial Neural Networks; North Holland: Amsterdam, Netherlands, 1992.
130 Energy & Fuels, Vol. 22, No. 1, 2008
Nikhil et al.
Figure 2. Algorithm for K-means clustering.
contained in the input patterns.25,29,30 Elements of the data that are close to each other and comprehend a disjunctive subset are said to belong to the same cluster. These elements of data from the same cluster are arranged as neurons close to each other in a twodimensional SOM display. The SOM tends to cluster data points, find the representatives or features, and minimize the mean-square error between features and the data points they represent min
∑∑ ∑
η(c, k)(wc - x)
2
(1)
x c∈M x ⇒ c k∈N
where x ⇒ c denotes that node c or its weight wc is closest to input x and η(c, k) is the neighborhood function of neuron c and its neighbor k. M represents the map grid, and N is the neighborhood region. In our study, we have used the free open-source SOMPAK toolbox31,32 for MATLAB33 to plot the SOM. The SOMPAK toolbox identifies the reference vectors automatically. In our study, 54 reference vectors (dimension ) 11) were identified out of 114 input patterns. 2.3. Clustering. a. K-means Clustering. K-means clustering is a simple unsupervised learning algorithm used to solve clustering problems. It classifies a given data set through a certain number of clusters (k) which are fixed a priori.34,35 In our research, SOM visualizations were used as a guide in choosing this k value. Steps involved in K-means clustering are presented as a flowchart in Figure 2. K-means is a simple algorithm that has been adapted to many problem domains.35 The algorithm is significantly sensitive to the initial randomly selected cluster centers. Thus, the K-means algorithm should be run multiple times to reduce this effect. To cluster the data set, the function kmeans, with the correlation as distance measure from MATLAB33 was used. (29) Kaski, S. Data exploration using self-organizing maps. DTech Thesis, Helsinki University of Technology, Finland, 1997. (30) Hautaniemi, S.; Yli-Harja, O.; Astola, J.; Kauraniemi, P.; Kallioniemi, A.; Wolf, M.; Ruiz, J.; Mousses, S.; Kallioniemi, O-P. Analysis and visualization of gene expression microarray data in human cancer using self-organizing maps. Machine Learning 2003, 52, 45–66. (31) Vesanto, J.; Himberg, J.; Alhoniemi, E.; Parhankangas, J. SOM Toolbox for Matlab 5; Som Toolbox Team, Helsinki University of Technology: Finland, 2000. See also: http://www.cis.hut.fi/projects/somtoolbox/. (32) Kohonen, T.; Hynninen, J.; Kangas, J.; Laaksonen, J. SOM_PAK: The Self-Organizing Map Program Package; Technical Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science FIN-02150: Espoo, Finland, 1996. (33) The MathWorks. MATLAB-The Language of Technical Computing. http://www.mathworks.com/products/matlab/ (accessed Mar 2007). (34) Jain, A. K.; Dubes, R. C. Algorithms for clustering data; Prentice Hall: Englewood Cliffs, NJ, 1988. (35) Jain, A. K.; Murty, M. N.; Flynn, P. J. Data Clustering: A Review. ACM Comput. SurV. 1999, 31, 264–323.
Figure 3. Schema of clustering hybrid regression. Table 1. Clustering Hybrid Regression (CHR) Model Parameter Values Obtained Using the MATLAB regstats Function: CO2PR, CO2 Production Rate; ORP, Oxidation–Reduction Potential; Alk, Alkalinity; EtOH, Ethanol; HAc, Acetate; HPr, Propionate; HBu, Butyrate; HVa, Valerate parameters
cluster 1
cluster 2
cluster 3
ε βCO2PR βpH βORP βAlk βEtOH βHAc βHPr βHBu βHVa
2.585 0.3803 -16.0763 -0.1186 0.0113 -0.0081 0.0021 -0.0052 0.0079 0.3022
-145.7771 0.44 16.1799 -0.1103 -0.0021 -0.0117 0.003 -0.0069 0.0108 0.075
-96.8562 0.2117 11.803 -0.0425 0.0024 -0.0039 0.0007 -0.0046 0.0073 0.0963
b. Silhouette Plot. Silhouette plots were used to display and check the clustering results. Consider an object Vi that belongs to cluster Cj. Let Ch be the closest (according to average distance) cluster to object Vi. The silhouette index36,37 is calculated as s(vi) )
d(vi, Ch) - d(vi, Cj) , max (d(vi, Cj), d(vi, Ch))
where
- 1 < s(vi) e 1 (2)
where d(Vi, Cj) is the average dissimilarity of the Vi object to all other objects in the same cluster and d(Vi,Ch) is the minimum of average dissimilarity of the Vi object to all objects in another cluster (in the closest cluster). When s(Vi) is close to 1, Vi is said to be “well-clustered”. When s(Vi) is close to 0, Vi is said to be intermediate between two clusters. When s(Vi) is close to -1, Vi is said to be “badly clustered”. The largest overall average silhouette indicates the best clustering (the number of clusters). The clusters were validated using the silhouette function from MATLAB.33 (36) Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math 1987, 20, 53–65. (37) Kaufman, L.; Rousseeuw, P. J. Finding Groups in Data: An Introduction to Cluster Analysis; Wiley, New York, 1990. (38) McGee, V. E.; Carleton, W. T. Piecewise Regression. J. Am. Stat. Assoc. 1970, 65, 1109–1124. (39) Cleveland, W. S.; Grosse, E. H.; Shyu, W. M. Local regression models; Chambers, J. M., Hastie, T. J., Eds.; Chapman and Hall: London, 1992. (40) Chen, Y.; Dong, G.; Han, J.; Wah, B. W.; Wang, J. MultiDimensional Regression Analysis of Time-Series Data Streams. Proceedings of the 28th International Conference on Very Large Data Bases, Hongkong, China, Aug 20–23, 2002; pp 323-334. (41) Karim, M. N.; Hodge, D.; Simon, L. Data-Based Modeling and Analysis of Bioprocesses: Some Real Experiences. Biotechnol. Prog. 2003, 19, 1591–1605.
CHR Modeling of Hydrogen Production
Energy & Fuels, Vol. 22, No. 1, 2008 131
Figure 4. Self-organizing maps (SOMs) of bioreactor kinetic data. Dark red represents the highest value observed for the variable. Dark blue represents the least value observed for the variable. Acetate coupled with ethanol and propionate is marked as rectangular box. Butyrate alone dominating is marked as triangular boundary: HPR, hydrogen production rate; CO2PR, CO2 production rate; ORP, oxidation–reduction potential; ALK, alkalinity; HAc, acetate; HPr, propionate; HBu, butyrate; HVa, valerate; HBU/Hac, butyrate/acetate ratio.
2.4. Clustering Hybrid Regression (CHR). The CHR model (Figure 3) combines the ideas from clustering,34,35 piecewise linear regression, and multiple regression techniques.38–41 Piecewise linear regression is a local modeling approach that proposes different straight-line relationships for different intervals over the range of data.38 Breakpoints which define the interval boundaries are the values where the slope of the linear function changes. The regression function at the breakpoint maybe discontinuous, but a model can be written in such a way that the function is continuous at all points including the breakpoints.39 When there is only one breakpoint, at x ) c, the piecewise linear regression model can be written as follows: Y ) a1 + b1x for x e c
(3)
Y ) a2 + b2x for x e c
(4)
In CHR approach, clusters obtained from K-means clustering define these intervals (subsets of data). The relationships between the response and the explanatory variables are then modeled. The model has different regression parameter values for different clusters. For each of the clusters obtained, multiple regressions are done to analyze the relationship between variables. H2 production rate was modeled as a function of organic acids (acetate, butyrate, propionate, and valerate), alcohol (ethanol), CO2 production rate, pH, oxidation–reduction potential (ORP), and alkalinity. The computational problem that needs to be solved in multiple regression analysis is to fit a straight line (or plane in an n-dimensional space, where n is the number of independent variables) to a number of points.40,41 The mathematical form of CHR model is
[][
Y1 β11 Y2 β21 Y3 ) β31 l l βk1 Yk
β12 β22 β32 l βk2
· · · · ·
· · · · ·
· · · · ·
β1n β2n β3n l βkn
] [][] X1 ε1 X2 ε2 × X3 + ε3 l l Xn εk
(5)
where k is the number of cluster and i is the number of predictor variables. β is the parameter of the model, x is the predictor variable, and ε is the noise term.
Figure 5. Silhouette plot of the bioreactor data. The clustering algorithm used was K-means with correlation as the distance measure. The mean silhouette value of the clusters was 0.5.
The function regstats from MATLAB33 was used to estimate the parameters (β and ε) of the model. The parameter values obtained are shown in Table 1.
3. Results and Discussions SOM maps with colored component planes (Figure 4) were used to visualize and identify the metabolic patterns and clusters of the fermentation process. Colored component planes were formed from the SOM reference vector by splitting it to n components, where n (for our data set n ) 11) is the dimension of the reference vector. In the context of bioprocess data, the component plane corresponds to a feature (variable) in the data set. Neurons in the component planes were color shaded. The shades of dark red correspond to high values for the variable, the shades of yellow correspond to moderate values, and the shades of dark blue correspond to low values. High and low values for a variable are the maximum and minimum values for that variable in the data set.
132 Energy & Fuels, Vol. 22, No. 1, 2008
Nikhil et al.
Figure 6. Combined plot of the CHR model. The hydrogen production rate in red represents the real experimental values. Blue represents the results based on the CHR model. Correlation distance measure was used in this clustering technique.
SOM analyses suggested three metabolic patterns (clusters): (a) acetate (HAc) coupled with other metabolites (ethanol and propionate) marked as a rectangular box in Figure 4 (Nodes of high values (red neurons) of propionate (HPr) and ethanol are under the same topographic position (neighborhood region) as high values (red neurons) of acetate (HAc).); (b) butyrate (HBu) alone marked as triangular area in Figure 4; and (c) the transition state when no metabolite was dominating. Thus, three (k ) 3) metabolic states during the bioreactor operation were assumed. From SOM analysis, the k value was chosen to be 3. It was also found in K-means clustering that when k was 3, clusters were well-separated. The mean silhouette value obtained for clusters was 0.5, indicating that the clusters are well-separated and values are properly labeled, as shown in Figure 5. Increasing or decreasing the value of k resulted in a poor separation of clusters. The CHR model was developed for the bioreactor. Multiple regression parameters were estimated (Table 1) for each of the clusters locally, and a combined plot of these local models was plotted (Figure 6). It was found that the CHR model performed well in modeling the H2 production rate. The H2 production rate was modeled as a function of pH, oxidation–reduction potential (ORP), and alkalinity and ethanol, acetate, propionate, and butyrate concentration values. From SOM analyses, the H2 production rate was found to be highest when pH was in the range of 6.7–7. At pH 6.7, the acetate coupled metabolism seems to appear. At pH 7, only butyrate metabolism appears to be prevailing, and at this stage, the butyrate/acetate ratio is also highest. Similar experimental results have been reported.42 It has been suggested that the butyrate/acetateratioisaquantitativeindicatorofH2 production.43–45 (42) Lin, C.-Y.; Hung, C.-H.; Chen, C.-H.; Chung, W.-T.; Cheng, L.H. Effects of initial cultivation pH on fermentative hydrogen production from xylose using natural mixed cultures. Process Biochem. 2006, 41, 1383– 1390. (43) Khanal, S. K.; Chen, W-H.; Li, L.; Sung, S. Biological hydrogen production: effects of pH and intermediate products. Int. J. Hydrogen Energy 2004, 29, 1123–1131. (44) Lin, P-Y.; Whang, L-M.; Wu, Y-R.; Ren, W-J.; Hsiao, C-J.; Li, S-H.; Chang, J-S. Biological hydrogen production of the genus Clostridium: Metabolic study and mathematical model simulation. Int. J. Hydrogen Energy 2007, 32 (12), 1728–1735.
In our study, when the butyrate/acetate ratio is low, the H2 production rate still seems to be high at several time-points. It can be said that a high butyrate/acetate ratio signifies a higher H2 production rate, but not vice-versa. Further investigations of the butyrate/acetate ratio along with microbial community dynamics and other metabolic patterns during bioreactor operation can better explain the H2 production rate. The data set used in our study was unevenly spaced (in the context of measurement points). The interpolation methods were not used, as the actual nonlinear relationships between variables were not known. In such a situation, applying any data preprocessing methods would have resulted in the loss of information. Taking these aspects of the data set into consideration, a simple and straightforward CHR approach was used. Further investigations on different H2-fermenting systems with the CHR model are required to confirm the capabilities of the model. The perception of distinct metabolic patterns and choice of the k value for K-means clustering is user-defined and a limitation in the approach. Depending on the quantity and quality of the data set, other clustering techniques can also be used. The involvement of known nonlinearities between variables can further improve the model performance. The predictive capabilities of the modeling approach need to be developed and evaluated. 4. Conclusions The applicability of the CHR (clustering hybrid regression) model to an anaerobic xylose degradation process was investigated. The H2 production rate was modeled as a function of CO2 production rate, organic acids (acetate, butyrate, propionate, and valerate), alcohol (ethanol), pH, oxidation–reduction potential (ORP), and alkalinity. Visualization of the multidimensional data set with SOMs revealed distinct metabolic patterns (acetate and butyrate as the main metabolic products) during the bioreactor operation. There were few instances of acetate and ethanol; acetate and propionate; and acetate and butyrate metabolism coupled together. A high H2 production rate was observed when butyrate metabolism was at pH 7 and when (45) Hawkes, F. R.; Hussy, I.; Kyazze, G.; Dinsdale, R.; Hawkes, D. L. Continuous dark fermentative hydrogen production by mesophilic microflora: Principles and progress. Int. J. Hydrogen Energy 2007, 32, 172–184.
CHR Modeling of Hydrogen Production
acetate coupled metabolism at pH 6.7 was taking place in the bioreactor. It was also found that in a continuously stirred tank reactor (CSTR) a high butyrate/acetate ratio signifies higher H2 production rate, but not vice-versa. Acknowledgment. The authors acknowledge the support by HydrogenE (research project (2005–2008) under the Academy of Finland). This work was also supported by the Academy of Finland
Energy & Fuels, Vol. 22, No. 1, 2008 133 (application number 213462, Finnish Programme for Centres of Excellence in Research 2006–2011), Taiwan National Science Council (Contract No. NSC91-2211-E-035-016), National Science Council of Taiwan (NSC95-2221-E-035-094), and Feng Chia University (Grant No. 94GB66). We are highly thankful to Mr. Chao-Hui Cheng and Mr. Chyi-How Lay from Feng Chia University, Taiwan, for providing the experimental data set. EF700619V