BOOK REVIEWS applications of GFA: (1) use of genetic algorithm to establish reliable QSAR and (2) application of QSAR in molecular diversity experiments. The next chapter (by S. P. van Helden, H. Hamersma, and V. J. van Geerestein on 30 pp) illustrates in some detail use of a genetic algorithm combined with neural networks in predicting progesterone receptor binding of 56 steroids. Use of over 50 quantum chemical and steric descriptors results in a nonlinear relationship with r2 ) 0.64, which is comparable to the results obtained by stepwise regression and PLS. The best model using neural networks with GFA used for selection of variables gives r2 ) 0.88. Authors conclude that this approach is superior to alternatives (stepwise regression, PLS, CoMFA, and PCA (principal component analysis) that are labeled as “inadequate because the data set contains nonlinear relationships”. Perhaps this ia premature judgment, since the alternatives (which can handle nonlinear relationships when suitably modified) have not been explored so thoroughly. Perhaps the sample of the 56 steroids considered can be taken as a standard set of compounds on which diverse methodologies (augmented by use of alternative descriptors, cf. the modified quote of Rutherford!) ought to be compared, since even the best result reported here (r2 ) 0.88) is not so impressive. D. E. Walters and T. D. Muhammad (18 pp) consider a procedure for construction of a receptor model in the absence of a receptor crystal structure. They considered two dozen sweeteners whose potency varied by five orders of magnitude. The compounds are of varied structure (aspartic acid derivatives, arylurea derivatives, and guanidine derivatives) which makes this also an attractive set of structures for testing different methodologies. In the next chapter (32 pp) G. Jones, P. Willett, and R. C. Glen use a genetic algorithm in substructure searching of three-dimensional compounds. This is extended to a molecular recognition problem which is considerably more involved as it requires solving multiple minimum problems and generating suitable target functions. The article considers use of a genetic algorithm for flexible ligand docking and for flexible molecular overlay. C. Putavy, J. Devillers, and D. Domine (26 pp) used a classical genetic algorithm for the selection of aromatic substituents for designing a test series. Over 160 substituents were considered described by half a dozen parameters, including π constant, H-bonding acceptor and donor abilities, and molar refractivity. Although the results of this study are of a preliminary nature, they appear very promising. Not only the best series was obtained but also as a result one obtains a population (compounds) that allows synthetic chemists some freedom in selecting the target structure. V. Venkatasubramanian, A. Sundaram, K. Chan, and J. M. Caruthers (32 pp) consider combined GA and NN to approach real-life interactive CAMD (computer-aided molecular design). In particular they address the genetic algorithms for the inverse problem and discuss the characterization of the search space in view that sometimes GA-based design framework (under difficult circumstances) failed to locate the target. The last chapter (20 pp) by J. Devillers and C. Putavy illustrates yet another hybrid system of combined NN and GA. Each chapter is preceded by a short abstract and ends with extensive literature that many may find very beneficial. The first, introductory chapter has almost 200 references cited. In view of the extensive literature, almost 500 references, an author index would seem useful. Equally a large number of abbreviations (almost 50) could be collected in a single index table where they could be briefly explained (with indications of the pages where they appeared). The index at the end of the book is somewhat terse. For instance, stepwise regression is not included, CAMD is not listed as an abbreviation, and as discussed on pp 286-299, QSPR appears also on p 278 (not indexed), molecular mechanics (p 279) is not indexed, correlation coefficient (p 196) is not indexed, etc. Despite these minor limitations, which in no significant way diminish the usefulness of the present book, the book is a valuable addition to the growing literature associated with the use of computers in chemistry. With the remaining books in this series, it ought to find
J. Chem. Inf. Comput. Sci., Vol. 37, No. 3, 1997 627 a place on the desk of anyone who wishes to be kept abreast of recent advances in QSAR.
Milan Randic´ Drake UniVersity CI970385D S0095-2338(97)00385-5
Computer Software Applications in Chemistry. By Peter C. Jurs. Second Edition. John Wiley & Sons, Inc.: New York, 1996. 291 pp with bibliographical references and index. $49.95. ISBN 0-471-10587-2. Peter C. Jurs’ Computer Software Applications in Chemistry has been published in its second edition after a very successful reception of the first. This work provides an interesting reading since the author is an established scientist, educator, and writer. With the presence of desktop computers on literally every desk and workbench in every laboratory, computers have become an indispensable tool in the working life of every chemist. Because of this computer accessibility, the author rightly points out in the Preface that “the need for computer skills on the part of practicing chemists continues to grow”. It has become imperative for everyone working in a laboratory to be not only computer literate but also software literate. The present book offers a convenient stepping stone in that direction. This book contains 18 chapters covering topics in developing mathematical algorithms for solving chemical problems and some novel applications of the developed software. Although the chapters are logically arranged, it might have been better to divide them into two partssthe first part covering the first nine chapters on the basic concepts and the second part on the more novel applications. Chapter 1 provides the essential introduction to the development of scientific computers, their applications, and design of algorithms. Basic concepts of statistics including errors, propagation of errors, and floating-point number system are introduced quite aptly in the second chapter. In the next seven chapters, linear and nonlinear curve fitting, matrix manipulation, solution of differential equations, numerical integration, simulation, and optimization methods have been presented with chemical examples and a listing of corresponding programs in FORTRAN language. Most scientists agree that FORTRAN is still the language of choice for solving scientific problems which, of course, may be simply a matter of age-old habit! In chapter 3, the example of enzyme kinetics is particularly useful since Michaelis-Menten hyperbolic equation and its linear counterpart, Lineweaver-Burk equation, appear in many areas of chemistry and biochemistry under slightly different forms. Differences in the parameter values obtained by fitting the hyperbolic and linear forms of the same equation to a given set of data are discussed. It may have been better if the influence of weighting on the fitted values was more clearly emphasized. Chapter 6 dealing with the numerical solution of differential equations has been very well-written with useful examples from the realm of chemical kinetics, an area that challenges many chemists with programming skills. Chapters 10-18 focus on the more current areas of chemistry and computers. With the explosion of chemical databases and information sources, the demand on retrieval and search of chemical structures and related information is increasing tremendously. Methods based on graph theory, pattern recognition, neural networks, and artificial intelligence are developed every day to meet this demand. These techniques have been catalogued nicely with some examples and extensive references in these chapters. The title of the book includes the word “software”, and though many programs are listed in the book, it would have been much better to provide a disk with the book so that the reader could directly or after modification use the programs for specific purposes. This book has been written for the advanced undergraduate or graduate student and is ideally suited for a course on “Computer Applications in Chemistry”. However, for this purpose the book would have served better if it included practice problems or projects at the end of every chapter. Computer Software Applications in Chemistry is a very wellorganized and -written book that should find home in every practicing
628 J. Chem. Inf. Comput. Sci., Vol. 37, No. 3, 1997 chemist’s bookshelf and close to their computers. Transferring the listed programs to the computer would be the most exciting step one would take to enter the world of programming. I cannot wait to do exactly that!
Narinder Singh UniVersity of Kansas Medical Center CI970383T S0095-2338(97)00383-1
Online Searching: A Scientists’s Perspective. A Guide for the Chemical and Life Sciences. Damon D. Ridley. John Wiley & Sons: Chichester. 1996. 344 + xx pages. List Price $79.95 (hdbk). ISBN 0-471-96520-0 hdbk, 0-471-96521-9 pbk. After attempting to cultivate the market for end-user access to information, publishers and vendors have recently made great progress in providing end-user-friendly information access systems. For example, STN recently announced STN Easy, a Web based product analogous to KR ScienceBase. These programs require of the enduser even less prior knowledge about information resources and access than SciFinder, the product for end-users that appeared in 1995. Educators, especially in University Departments of Chemistry, are clamoring for inexpensive access to such products, because many say they cannot provide the training for information access for research. The latter point is debatable, because many educators do provide such training. Many in the information industry insist that end-user products like those listed above are great but that end-users would use them even more effectively if they had education and training in the fundamentals of chemical and technical information. This book by Damon Ridley can help provide this proficiency in both the fundamentals and pragmatics of technical information. Ridley aims the book at end-users, stressing the particulars of searching online databases in general, especially databases of scientific information, but primarily on searching chemical databases online on the STN network. This pragmatic approach to searching is especially valuable for those endusers who do not have access to end-user search aids. Since Ridley’s book is primarily aimed at end-users not in a classroom setting, it falls somewhere in between prior books for endusers like Online Information Hunting (N. Goldmann, McGraw-Hill, 1992) and course textbooks like Chemical Information Sources (G. Wiggins, McGraw-Hill, 1991). Ridley is far less confrontational vis´a-vis´ information specialists than Goldmann and is, in my opinion, more upbeat and informative. The writing is apparently patterned after oral presentations, and exclamation marks are used liberally, hopefully not a turn-off for readers. The emphasis is definitely on searching, but knowledge of the construction and maintenance of databases is provided so that the enduser may search better. The first two chapters cover the general topics of online searching and basic commands and tools. Chapters 3-7 cover bibliographic database searching, followed by chapters on full text files, patents, and special topics. Chapters 11-17 cover searching for substances, concluding with chapters on property data and chemical reactions. Structure searching methods include names and nomenclature, molecular formulas, and construction of structure/substructures, including the trade-offs encountered with the various methods. The examples are all based on the STN Messenger system and STN files and emphasize chemical topics. However, the methods are translatable to other search systems and subject disciplines, and command comparison charts for STN, KR DIALOG, and ORBIT are provided in Appendix 1. Ridley develops some rather interesting concepts, including quotes like, “Online searching requires a meeting of three minds: the author, the indexer, and the searcher...”; “It is just as important to search the literature properly as it is to conduct proper research...”; “A conservative estimate is that online costs are only 25% of the real costs of searching and maintaining the hard copy library”; and “Indeed online searching is best conducted through a close association between the information specialist and the scientist, and each has special roles.” The reviewer believes strongly that search aids like SciFinder, STN Easy, and KR ScienceBase promote the acquisition of more information
BOOK REVIEWS by end-users than they would acquire without use of these products. However, I believe just as strongly that knowledge of information and database fundamentals can help end-users acquire even better information more efficiently, with or without the use of information professionals or search aid programs. The preferable method for students would be classroom instruction with texts like Wiggins. However, if the training and education of an end-user has been deficient in this area, use of Ridley’s book can help the end-user to be a more effective user of information and a better contributor to the user’s organization. Students and educators may question the price, but just as information is not free, neither is training for information.
Robert E. Buntrock Buntrock Associates, Inc. CI9703819 S0095-2338(97)00381-8
Neural Networks in QSAR and Drug Design. Edited by J. Devillers. Vol. 2 in the Series: Principles of QSAR and Drug Design. Academic Press: San Diego, 1996, 284 pp. ISBN 0-12-213815-5. The list price of this book is 65.00 pounds sterling. The 11 chapters of this hard-cover book are written by authors who are active research workers in the fields of neural networks (NNs) and/ or quantitative structure-activity relationships (QSARs) or quantitative structure-property relationships (QSPRs). The editor is at the same time the author of the first chapter (Strengths and Weaknesses of the Backpropagation Neural Network in QSAR and QSPR Studies), and one of the co-authors of five other chapters among which we cite AUTOLOG Versus Neural Network Estimation of n-Octanol/Water Partition Coefficients; Use of a Backpropagation Neural Network and Autocorrelation Descriptors for Predicting the Biodegradation of Organic Chemicals; and A Neural Structure-Odor Threshold Model for Chemicals of EnVironmental and Industrial Concern. A total of more than 10 000 bibliographical references for all chapters is made available to the readers, and the chapters are presented in an easily accessible and very readable style. In the first chapter, an extensive review is presented on the standard backpropagation neural network (BNN) algorithm and its variations, with practical recipes for selecting the number of neurons in the various layers, the learning rate, and the momentum. A selected list of addresses on the Internet is appended for obtaining information on artificial neural networks such as software availability, conferences, etc. The great advantages of BNNs are their ability to find nonlinear or multilinear relationships, learning from examples, and making successful interpolations (less so for extrapolations), even starting from a set of noisy, incomplete, and sometimes faulty data. In order to counter some of the drawbacks of BNNs, one has to devise validation tests which are reviewed critically. Other paradigms are also presented in the book. Thus, Kohonen mapping and ReNDeR (reversible nonlinear dimension reduction) are introduced by Livingstone in a chapter entitled MultiVariate Data Display Using Neural Networks and illustrated by Manallack and co-workers in a chapter dealing with nicotinic agonists. The adaptive resonance theory (ART) neural networks are discussed in the chapter AdaptiVe Resonance Theory Based Neural Networks Explored for Pattern Recognition Analysis of QSAR Data by Wienke and co-workers, and an original hybrid mapping called nonlinear neural mapping (N2M) is clearly presented in the chapter entitled A New Nonlinear Neural Mapping Technique for Visual Exploration of QSAR Data by Domine and co-workers. A chapter by Gasteiger and his co-workers is entitled EValuation of Molecular Surface Properties Using a Kohonen Neural Network; it is illustrated with color plates and deals with data sets of ryanodines, cardiac glycosides, and steroids. Maggiora and co-workers provided a chapter entitled Combining Fuzzy Clustering and Neural Networks to Predict Protein Structural Classes, illustrating thereby a hybrid system. This chapter is accompanied by a color plate with stereo drawings showing how one can group together structural classes of proteins, namely all-R helices,