Ind. Eng. Chem. Res. 2006, 45, 8225-8226
8225
Response to “Comment on ‘Design of a Propane Ammoxidation Catalyst Using Artificial Neural Networks and Genetic Algorithms’” Shigeharu Kito*,† and Tadashi Hattori‡ Department of Applied Information Science and Department of Applied Chemistry, Aichi Institute of Technology, Yakusa, Toyota, 470-0392 Japan 1. Introduction Sir: In Sha’s comment on the article by Cundari et al.,1 he refers to the authors’ article2 and also makes comments against the validity of the application of a supervised artificial neural network (ANN). Sha’s criticism at the application of supervised ANN made in both articles by Cundari et al. and ours is that the connecting weights between neurons cannot be determined if the number of those weights are larger than that of teacher signals (training examples), as is the case in both of the articles. We think there has been some confusion in Sha’s understanding between information processing and mathematical soundness. Although Genal et al. had pointed this out in their response3 to another of Sha’s comments,4 this does not seem to be incorporated in the present Sha’s comment. In the present note, therefore, we show the correct interpretation of information processing by supervised ANN in a somewhat different way from Genal et al., and by the way, a comment on the relation between the number of connecting weights in ANN and that of teacher signals in practical applications based on our experience. 2. Information Processing by Neural Networks ANNs including supervised ones express the relation between input and output as a collection of all of the connecting weights among (artificial) neurons or processing units constituting the ANN. That is, information on input-output mapping is distributed throughout the entire network, and from this reason, neural networks are often called parallel distributed processors.5 In this respect, such a collection of connecting weights as a whole is referred to as network pattern. In the case of supervised ANNs brought into discussion in Sha’s comment, a set of labeled teacher signals or training examples, each of which consist of a unique input signal and a corresponding desired response, is learned as a whole collection of the connecting weights. This means that each of those connecting weights does not bear any kind of explicit role, and then, discussion on the value of any specified weight is quite nonsense. This is a very large difference between the values of connecting weights of ANN and those of variables in, for example, simultaneous linear equations, which themselves are subjects of solving and discussion. In supervised ANNs, values of connecting weights, the number of which is less than that of teaching signals, are calculated by the technique of nonlinear programming so as to minimize errors between desired output and calculated output. In this regard, especially in error back-propagation learning, the steepest descent method is usually used. In these cases, various weight sets are obtained depending on the used set of initial * To whom correspondence should be addressed. E-mail:
[email protected]. † Department of Applied Information Science. ‡ Department of Applied Chemistry.
values of weights. This corresponds to that input-output mapping that can be represented by diverse network patterns. What really matters is ANNs can represent nonlinear inputoutput mapping as a network pattern, and the manner in which the information is distributed over connecting weights is a matter of no great importance. Genal et al. expressed this point by the following statement: “ANN is purely phenomenological and does not inherently produce a mechanistic understanding of the physical phenomena”.3 The similar situations can be seen when we solve indeterminate simultaneous equations by the GaussSeidel iterative technique starting from some initial values of the variables. Also in this case, diverse solutions are obtained depending on initial values used. This kind of solving is not mathematically sound, but the solutions are useful from the viewpoint of information processing. In the following, we can take a great example of such a situation for a catalytic reaction. 3. Sensitivity Analysis by Using Numerical Partial Differentiation of Network Pattern The present authors proposed a method of extracting catalytic properties that control catalytic performance by numerical partial differentiation of network pattern with respect to each of the assumed catalytic properties.6 In the case of catalytic oxidation of methane on lanthanide oxides, a back-propagation-based ANN is trained by using only 10 teacher signals. The ANN consists of a single input layer of 4 units, two hidden layers each of which has 8 units, and a single output layer of 1 unit. In that case, the number of teacher signals is clearly far less than that of connecting weights. In these reactions, the change of catalyst activity, the amount of methane converted to carbon dioxide, with the fourth ionization potential of lanthanide oxide, which is one of the catalytic properties, gives a volcano-type correlation and the sign of the derivative of catalyst activity with respect to the fourth ionization potential changes from positive to negative. The numerical partial differentiations could correlate the volcano-type feature very well. By taking into consideration that derivatives are more sensitive to the state of network pattern than naı¨ve input-output mapping is, our results can be said to strongly support the discussion described above. 4. Involved Comment on Scanty Set of Teacher Signals In usual cases of neural network application to natural-world problems, the issue of training neural networks associated with a large number of connecting weights by using a scanty number of training examples turns up like a bad penny. The results of predictions or correlations, however, are often very satisfactory. This is the case also in our works including those2,6 referred to in this note. On the basis of our experiences in the catalysis field, we think this may be due to the following feature of the real-world problems: only from the mathematical point of view,
10.1021/ie061272z CCC: $33.50 © 2006 American Chemical Society Published on Web 10/28/2006
8226
Ind. Eng. Chem. Res., Vol. 45, No. 24, 2006
every input variable and output variable is assumed to take the values ranging from -∞ to +∞. However, as far as real-world problem solving is concerned, each variable such as physicochemical property has its own range of change ruled by naturalworld law, and so, it is probable that the portion of inputoutput mapping function or hyperplane of interest is very restricted and very simple or nearly linear. As a result, the neural network may have to learn just a straightforward relation, and in such a case, a scanty number of teacher signals can be adequate to give a satisfactory result. Finally, the authors sincerely appreciate that Sha is concerned about our article and referred to it as pioneering work. Literature Cited (1) Cundari, T. R.; Deng, J.; Zhao, Y. Design of a Propane Ammoxidation Catalyst using Artificial Neural Networks and Genetic Algorithms. Ind. Eng. Chem. Res. 2001, 40, 5475. (2) Kito, S.; Hattori, T.; Murakami, Y. Estimation of the Acid Strength of Mixed Oxides by a Neural Network. Ind. Eng. Chem. Res. 1992, 31, 979-981.
(3) Genal, K.; Kurnaz, S. C.; Durman, M. Response to Sha’s Comment on our article titled “Modeling of Tribological Properties of Alumina Fiber Reinforced Zinc-Aluminum Composites using Artificial Neural Network”. Mater. Sci. Eng. 2004, A379, 457. (4) Sha, W. Comment on “Modeling of Tribological Properties of Alumina Fiber Reinforced Zinc-Aluminum Composites using Artificial Neural Network” by K. Genel et al. [Mater. Sci. Eng., A 2003, 363, 203]. Mater. Sci Eng. 2004, A372, 334. (5) Computational model of ANN appears explicitly or implicitly almost everywhere. For examples, Rumelhart, D. E.; McClelland, J. L.; PDP Research Group. Parallel Distributed Processing; MIT Press: Cambridge, MA, 1986; Vol. 1; and Haykin, S. Neural Networks: A ComprehensiVe Foundation, 2nd ed.; Prentice Hall: Upper Saddle River, NJ, 1999. (6) Kito, S.; Hattori, T. Analysis of Catalytic Performance by Partial Differentiation of Neural Network Pattern. Presented at 19th International Symposium on Chemical Reaction Engineering (ISCRE 19), Berlin, 2006; 573; and Analysis of Catalytic Performance by Partial Differentiation of Neural Network Pattern. Chem. Eng. Sci., to be published.
IE061272Z