Enhanced Biosorptive Remediation of Hexavalent Chromium Using

Apr 15, 2014 - where f is the number of degrees of freedom (it will equal zero when a ... almost impossible to determine the best fitting parameters f...
2 downloads 0 Views 156KB Size
Correspondence pubs.acs.org/IECR

Comment on “Enhanced Biosorptive Remediation of Hexavalent Chromium Using Chemotailored Biomass of a Novel Soil Isolate Bacillus aryabhattai ITBHU02: Process Variables Optimization through Artificial Neural Network Linked Genetic Algorithm” Xiang Cheng and Shaojun Li* Key Laboratory of Advanced Control and Optimization for Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, People’s Republic of China

S

equal to zero, the problem is correctly specified and there would be a determined solution or limited number of multiple solutions. To an overspecified problem, it can be solved by an optimization approach (least-squares, etc.) to get the “best” answer that most closely solves the most equations. Because there are some errors in the measurement of an actual process, the process modeling must choose enough data to ensure that the problem is overspecified, especially when no expert knowledge is available for use. It had been proved that a trilayer BP neural network can fit any nonlinear function in arbitrary precision.5 As a result, trilayer BP networks are the most commonly used ANN. In ref 1, a trilayer BP neural network was designed to model the removal of Cr(VI) from water in sewage treatment. The network has a “4−11−1” structure, where the numbers of neurons in the input layer, hidden layer, and output layer are 4, 11 and 1, respectively. That is, the BP ANN has 67 parameters which includes 55 weight variables (4 × 11 + 11 × 1) and 12 threshold variables (11 + 1). According to the degrees of freedom analysis, from the view of statistics, at least 67 group data are needed to determine the process model correctly. The amount of training data used in ref 1 is only 24 to train the ANN. Although the neural network can be trained, the model is not mathematically justified. In this case, the theoretic optimizing goal is zero and there are infinite solutions. It is almost impossible to determine the best fitting parameters from the available data. The result is that the training error is lower and the predicting error is very high. This is known as overfitting. Overfitting often occurs in ANNs when one tries to determine more parameters with incomplete data. Because of the “black-box” model, overfitting in ANNs can have much more serious consequences than other fitting methods. Some researchers6,7 have used different strategies to avoid the overfitting problem. For example, Napoli and Xibilia7 proposed a strategy to cope with the identification of nonlinear models of industrial processes, when a limited number of experimental data are available. They improved the generalization capabilities of the model based on the integration of bootstrap resampling and noise injection. In ref 1, the authors stated that the mean square error (MSE) values for the training dataset and test data were 0.025 (E = 0.025) and 0.043. However, the authors of this comment find that the two values should be 0.067 and 0.051, according to the training data in ref 1. The authors of this paper have trained the

ir: The Artificial Neural Network (ANN) Model has been widely used to model chemical process. Then, the model can be further used to predict, optimize, or control the chemical process. A paper that used ANN to model the removal of hexavalent chromium [Cr(VI)] from water in sewage treatment was published in a recent issue of Industrial & Engineering Chemistry Research.1 This paper attempts to discuss and comment on it comprehensively.



MATHEMATICAL INDETERMINACY Because of the prediction robustness, parallel processing capability, self-learning and good nonlinear fitting performance, the artificial neural network (ANN) has been widely used in many fields,2,3 especially in the field of process modeling. The ANN is a data-driven process modeling method. Usually, this type of method is called a black-box model. Because of the lack of need for process knowledge, the ANN is widely used to model processes independently or combining with firstprinciples method. However, it also has some demerits. One of the main problems is an overfitting problem, especially in the circumstances of a small training sample.4 The BP neural network is the most commonly used ANN. The training process of the BP network is to minimize the sum of the square error between actual and predicted data by continuously adjusting and finally determining the weights and thresholds of hidden neurons. Considered from the perspective of mathematical methods, the weights and thresholds of an ANN are equivalent to the unknown variables of an optimal problem. A group of training data corresponds to an equation. One of the technologist’s most widely used tools to describe the relationship between unknown variables and equations is a technique known as degrees of freedom analysis. The “degrees of freedom” of a problem is a way to measure whether a system is properly specified. It uses the equation f=V−E−S

(1)

where f is the number of degrees of freedom (it will equal zero when a problem is properly specified), V the number of independent variables, E the number of independent equations, and S the number of specifications imposed on the variables. Often, eq 1 is presented without the specifications term. If the degree of freedom is positive, the problem is said to be underspecified and an infinite number of solutions is possible. If the degree of freedom is negative, the problem is overspecified, and the problem cannot be solved. If the degree of freedom is © 2014 American Chemical Society

Published: April 15, 2014 7268

dx.doi.org/10.1021/ie501189c | Ind. Eng. Chem. Res. 2014, 53, 7268−7270

Industrial & Engineering Chemistry Research

Correspondence

Table 1. Training and Testing MSE Using Five Methods MSE for Training Data

MSE for Testing Data

tool box method

maximum

average

minimum

maximum

average

minimum

trainbr traincgf traincgb trainlm trainrp

0.0667 0.0667 0.0667 0.0667 0.0741

0.0614 0.0659 0.0656 0.0623 0.0663

0.0604 0.0632 0.0622 0.0604 0.0642

160.8935 684.6816 546.3653 1547.4880 696.4441

81.3448 181.2996 176.5751 322.0598 250.1772

18.2666 16.1361 11.9917 55.6611 36.9063



BP ANN with five methods that come from the Neural Network Toolbox8 in Matlab. The structure (4−11−1) and training goal (MSE = 0.0667) of the BP ANN are as the same as those in ref 1. Each method has been performed 100 times for statistical analysis. The average, minimum, and maximum of training and predicting MSE with five methods are listed in Table 1. In Table 1, “trainbr” is the Bayesian regulation backpropagation function, “traincgf” is the conjugate gradient backpropagation function with Fletcher−Reeves updates, “traincgb” is the conjugate gradient back-propagation function with Powell−Beale restarts; “trainlm” is the Levenberg−Marquardt back-propagation function; and “trainrp” is the declare function. As shown in Table 1, the average predicting errors using different tool boxes are 81.34, 181.30, 176.58, 322.06 and 250.18, respectively. The “trainbr”, which uses Bayesian method, has the best result. However, the minimum predicting MSE, 18.266, is almost 350 times larger than that of the literature presented in ref 1. The average MSEs of the five methods are almost 103−104 times bigger than that value. The authors of this comment paper want to know how to train the process in the literature.1 Because of the mathematical indeterminacy, the model will have insufficient accuracy, which results in difficulties in application.

OTHERS In the linear regression model presented in ref 1, the results listed in Table 2 in ref 1 could not be obtained from RSM eq 16 in ref 1. It seems that the authors built the linear model using normalized technology, but forgot to antinormalize the parameters after they had finished the model. According to the data, the linear model can be built by the least-squares method as follows: y = −223.41 + 73.73X1 + 48.12X 2 + 3.50X3 + 1.56X4 − 12.42X12 − 7.12X 2 2 − 0.05X32 − 0.01X4 2 − 0.23X1X 2 − 0.06X1X3 + 0.01X1X4 + 0.08X 2X3 − 0.07X 2X4 + 0.0019X3X4



(2)

CONCLUDING REMARKS In conclusion, ANN is an effective approach to be employed in modeling a process. However, the structure of ANN should be designed carefully, according to the experimental data. To ensure the mathematical determinacy, the size of training data should be no less than the size of undetermined parameters in the ANN. Otherwise, the ANN model will lose statistical significance because of overfitting problems or poor generalization ability. In addition, the data used to model a process should be complete as much as possible, to avoid extrapolation of the ANN. Unfortunately, such misuse of the neural network technique is not uncommon, prompting the need for this comment paper to raise the attention of the research community.



PREPROCESSING OF DATA As shown in ref 1, the size of training data is 24. However, there are 7 group data with the same input data whose pH, biomass, temperature, and initial Cr(VI) concentration are 3, 3, 30 and 80, respectively. However, in the output, the data (84.8, 85.8, 85.8, 84.8, 85.7, 84.8, and 85.8) are different. Usually, when the samples with the same input and different output, they have been identified as a sample with the average output. When they are used as different samples, it is equivalent to think that the sample is very important and is endowed with a larger weight in the process of modeling. If these data are considered as one sample with average output, there are only 18 group samples in the case. Thus, the degree of freedom in ref 1 is 49. The author should reduce the number of hidden layer nodes to achieve a good prediction model, especially when the model is further used to optimize the best output. The predicting precision of an ANN model is very important. If the predicting precision is not very high, just like that in Table 1, the model is not suited to optimize the best output. Although the authors obtained a good result, it does not have representation. In this case, not all readers are lucky to get the desired results. In addition, an ANN can only learn from the cases that are present. If a sample is not within the scope of training data, just like the smallest and the biggest values of X4 (40 and 120) in ref 1, you cannot expect it to make a correct decision when it encounters one of the previously unseen cases. Extrapolation is dangerous with any model, but some types of neural networks may make particularly poor predictions under such circumstances.



AUTHOR INFORMATION

Corresponding Author

*Tel.: +86-21-64253820. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors of this paper appreciate the National Natural Science Foundation (under Project No. 21176072) and the Fundamental Research Funds for the Central Universities for their financial support.



REFERENCES

(1) Verma, D. V.; Hasan, S. H.; Singh, D. K.; Singh, S.; Singh, Y. Enhanced Biosorptive Remediation of Hexavalent Chromium Using Chemotailored Biomass of a Novel Soil Isolate Bacillus aryabhattai ITBHU02: Process Variables Optimization through Artificial Neural Network Linked Genetic Algorithm. Ind. Eng. Chem. Res. 2014, 53 (9), 3669−3681. (2) Judd. S. J. Neural Network Design and the Complexity of Learning; MIT Press: Cambridge, MA, 1990.

7269

dx.doi.org/10.1021/ie501189c | Ind. Eng. Chem. Res. 2014, 53, 7268−7270

Industrial & Engineering Chemistry Research

Correspondence

(3) Gevrey, M.; Dimopoulos, I.; Lek, S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol. Modell. 2003, 160 (3), 249−264. (4) Hawkins, D. M. The Problem of Over-fitting. Chem. Inf. Comput. Sci. 2004, 44 (1), 1−12. (5) Cybenkot, G. Approximation by Superpositions of a Sigmoidal Function. Math. Control, Signals, Syst. 1989, 2, 303−314. (6) Delin, S.; Soderstrom, M. Performance of soil electrical conductivity and different methods for mapping soil data from a small dataset. Soil Plant Sci. 2002, 52, 127−135. (7) Napoli, G.; Xibilia, M. G. Soft Sensor for a Topping process in the case of small datasets. Comput. Chem. Eng. 2011, 2011 (35), 2447− 2456. (8) Neural Network Toolbox; http://www.mathworks.cn/help/pdf_ doc/nnet/nnet_ref.pdf.

7270

dx.doi.org/10.1021/ie501189c | Ind. Eng. Chem. Res. 2014, 53, 7268−7270