Genetic Algorithms in Molecular Modeling. Edited by James Devillers

Edited by James Devillers. Academic Press: London, 1996. xi + 327 pp. $74.95. ISBN 0-12-213810-4. Milan Randić. Drake University. J. Chem. Inf. Compu...
3 downloads 0 Views 62KB Size
626 J. Chem. Inf. Comput. Sci., Vol. 37, No. 3, 1997 Large scale scientific data projects, both national and international in scope, are used to demonstrate the variety of information issues. Examples of the topics dealt with are as follows: the Internet and Web services, free or fair circulation of scientific data, distributed networks, security, and national legislation. Part 2 provides examples from all over the globe of large information systems in use by the different scientific communities. The essays are written by information scientists from the field and/or the specific data project. Wherever relevant, the Web addresses are provided to the project and all the project participants. The inclusion of these Web addresses is a very helpful aid for the readers, and all of the addresses were correct at the time of review. The monograph’s emphasis is largely on the current state of the technology with some discussion of the direction these different projects may take. It is important to note that the projects in place are by no means of equal strength. Different regions are still dealing with limited internet connection, while others have already provided universal desktop access. East Asia struggles to standardize software languages, hardware, alphabets, and Internet access, while the West pushes toward something ominously called “the federation”. The interesting aspect of the federation is that it aims towards “interoperability” at a local, collaborative, and community level. It is, in essence, the information science manifestation of how scientific progress is attained. It is that unique balance of workstation raw data, finessed into the larger data pool, which is being added to the community’s (biology, chemistry, etc.) greater knowledge. The monograph is exciting in many ways: some organized projects are in the last stages of fine-tuning, while others are just approaching a prototype. However, the issues all parties and countries face, regardless of the strides already made, are issues of standardization, price, fairness of access, flexibility for local environments, and strength, or robustness, for international data collaborations. One chapter discusses the fairness of scientists having to pay for data to which they contribute gratis. This section rings especially true for those working in academic libraries and laboratories; science faculty are only too familiar with the ever-rising prices of scientific journals and databases. The monograph does not assume the reader has extensive information or technological literacy. The early chapters provide elegant and comprehensive overviews of the Internet, Web, hypertext, multimedia, database retrieval, and management. In addition to presenting the issues, the authors also present their solutions to data management and retrieval issues. The problem with this monograph is the same problem that any state of the art analysis has; there are already advances and refinements to much of the software mentioned in the text. This is an insurmountable problem where adVances and state of the art are concerned; this kind of material may be better suited to theme issues of relevant journals. The price of this work, $115.00, is expensive enough that any potential buyer may want to review the book prior to acquisition in order to determine which chapters still apply and which ones are no longer quite so valid. The audience for this work is broad: science and technology libraries, scientific laboratories struggling with work station implementation, students of information science, data management teams, in short any individual involved in the provision, development, and management of scientific data.

Veronica Calderhead Rutgers UniVersity CI9703866 S0095-2338(97)00386-7

Reviews in Computational Chemistry. Vol. 8. Edited by Kenny B. Lipkowitz and Donald B. Boyd. VCH Publishers, Inc., 220 East 23rd Street, New York, N.Y. 10010. xxi + 324 pp., June 1996. List Price $110.00. ISBN 1-56081-929-4 (Hard Copy), ISSN 1069-3599. This series brings together respected experts in the field of computeraided molecular research. Computational chemistry is increasingly used in conjunction with organic, inorganic, medicinal, biological, physical,

BOOK REVIEWS and analytical chemistry. This volume examines various aspects of computations in treating fullerenes and carbon aggregates, pseudopotential calculations of transition metal compounds, core potential approaches to the chemistry of the heavier elements, relativistic effects in chemistry, and the ab initio computation of NMR chemical shielding. This volume, the eighth, of Reviews in Computational Chemistry, represents the editors’ ongoing effort to provide tutorials and reviews for both the novice and the experienced computational chemists. The five chapters are written for newcomers learning about molecular modeling techniques as well as for seasoned professionals who need to acquire expertise in areas outside their own. All the chapters in the volume have a quantum mechanical theme. In Chapter 1, the authors show how ubiquitous semiempirical molecular orbital techniques need to be adjusted to correctly determine the three-dimensional geometries, energies, and properties of fullerenes and carbon aggregates. Chapters 2 and 3 elucidate the so-called effective core potential or pseudopotential methods that have proved invaluble for handling transition metals and other heavy metals. Quantum theory for describing relativistic effects, particularly important to heavy metals, is presented in Chapter 4. In Chapter 5, the author reviews NMR chemical shifts and explains the methodology with examples of heterocycles, buckminsterfullerenes, proteins, and other large molecules. The volume contains an excellent author and subject index. Information about the Reviews in Computational Chemistry is now available on the World Wide Web (http://www.chem.iupui.edu/∼boyd/ rcc.html).

Venkat K. Raman Chemical Abstracts SerVice CI970387Y S0095-2338(97)00387-9

Genetic Algorithms in Molecular Modeling. Edited by James Devillers. Academic Press: London, 1996. xi + 327 pp. $74.95. ISBN 0-12-213810-4. This is the first book in the new series: Principles of QSAR and Drug Design, edited by J. Devilers. The series is a welcome addition to scattered literature on QSAR and drug design in over a dozen journals, and if judged by this first volume, the introduction of the series is timely. QSAR, the quantitative structure-activity relationship, has grown considerably in the last 20 years, not only by the volume of researches devoted to this discipline but also the diversity of methodologies applied to QSAR. For example, the relatively recent methodologies include the partial least squares method, the cell automata, the neural networks, orthogonalized multiple regression analysis, and genetic algorithms, to which this book is devoted. The book consists of a dozen chapters written by leading researchers in the field, starting with introductory chapters on genetic algorithms in computer-aided molecular design (34 pp by J. Devillers), an overview of genetic methods (32 pp by B. T. Luke), and genetic algorithms in feature selection (20 pp by R. Leardi). The remaining eight chapters are devoted to different applications of the genetic algorithm. D. Rogers (22 pp) illustrates nonlinear modeling with splines and makes a comparison between GFA (genetic function approximation) and PLS (partial least squares). He started with a quote of Ernest Rutherford: “If your experiment needs statistics, you ought to have done a better experiment”, which only reminds us about the bias and misunderstanding of statistics at the turn of this century. It would be nice to know what would be the reply of Stainslaw Ulam (the father of the “Monte Carlo” method) to such criticism, but the quote of E. Rutherford is not quite out of place if suitably modified: “If your experiment needs better statistics, you ought to have used better descriptors”. W. J. Dunn and D. Rogers (22 pp) continue with introducing PLS and combining the advantages of PLS (extraction of latent variables approximately along the axes of greatest variations, optimal correlation) with the model generating ability of genetic algorithms to create modified genetic PLS. A. J. Hopfinger and H. C. Patel consider two

BOOK REVIEWS applications of GFA: (1) use of genetic algorithm to establish reliable QSAR and (2) application of QSAR in molecular diversity experiments. The next chapter (by S. P. van Helden, H. Hamersma, and V. J. van Geerestein on 30 pp) illustrates in some detail use of a genetic algorithm combined with neural networks in predicting progesterone receptor binding of 56 steroids. Use of over 50 quantum chemical and steric descriptors results in a nonlinear relationship with r2 ) 0.64, which is comparable to the results obtained by stepwise regression and PLS. The best model using neural networks with GFA used for selection of variables gives r2 ) 0.88. Authors conclude that this approach is superior to alternatives (stepwise regression, PLS, CoMFA, and PCA (principal component analysis) that are labeled as “inadequate because the data set contains nonlinear relationships”. Perhaps this ia premature judgment, since the alternatives (which can handle nonlinear relationships when suitably modified) have not been explored so thoroughly. Perhaps the sample of the 56 steroids considered can be taken as a standard set of compounds on which diverse methodologies (augmented by use of alternative descriptors, cf. the modified quote of Rutherford!) ought to be compared, since even the best result reported here (r2 ) 0.88) is not so impressive. D. E. Walters and T. D. Muhammad (18 pp) consider a procedure for construction of a receptor model in the absence of a receptor crystal structure. They considered two dozen sweeteners whose potency varied by five orders of magnitude. The compounds are of varied structure (aspartic acid derivatives, arylurea derivatives, and guanidine derivatives) which makes this also an attractive set of structures for testing different methodologies. In the next chapter (32 pp) G. Jones, P. Willett, and R. C. Glen use a genetic algorithm in substructure searching of three-dimensional compounds. This is extended to a molecular recognition problem which is considerably more involved as it requires solving multiple minimum problems and generating suitable target functions. The article considers use of a genetic algorithm for flexible ligand docking and for flexible molecular overlay. C. Putavy, J. Devillers, and D. Domine (26 pp) used a classical genetic algorithm for the selection of aromatic substituents for designing a test series. Over 160 substituents were considered described by half a dozen parameters, including π constant, H-bonding acceptor and donor abilities, and molar refractivity. Although the results of this study are of a preliminary nature, they appear very promising. Not only the best series was obtained but also as a result one obtains a population (compounds) that allows synthetic chemists some freedom in selecting the target structure. V. Venkatasubramanian, A. Sundaram, K. Chan, and J. M. Caruthers (32 pp) consider combined GA and NN to approach real-life interactive CAMD (computer-aided molecular design). In particular they address the genetic algorithms for the inverse problem and discuss the characterization of the search space in view that sometimes GA-based design framework (under difficult circumstances) failed to locate the target. The last chapter (20 pp) by J. Devillers and C. Putavy illustrates yet another hybrid system of combined NN and GA. Each chapter is preceded by a short abstract and ends with extensive literature that many may find very beneficial. The first, introductory chapter has almost 200 references cited. In view of the extensive literature, almost 500 references, an author index would seem useful. Equally a large number of abbreviations (almost 50) could be collected in a single index table where they could be briefly explained (with indications of the pages where they appeared). The index at the end of the book is somewhat terse. For instance, stepwise regression is not included, CAMD is not listed as an abbreviation, and as discussed on pp 286-299, QSPR appears also on p 278 (not indexed), molecular mechanics (p 279) is not indexed, correlation coefficient (p 196) is not indexed, etc. Despite these minor limitations, which in no significant way diminish the usefulness of the present book, the book is a valuable addition to the growing literature associated with the use of computers in chemistry. With the remaining books in this series, it ought to find

J. Chem. Inf. Comput. Sci., Vol. 37, No. 3, 1997 627 a place on the desk of anyone who wishes to be kept abreast of recent advances in QSAR.

Milan Randic´ Drake UniVersity CI970385D S0095-2338(97)00385-5

Computer Software Applications in Chemistry. By Peter C. Jurs. Second Edition. John Wiley & Sons, Inc.: New York, 1996. 291 pp with bibliographical references and index. $49.95. ISBN 0-471-10587-2. Peter C. Jurs’ Computer Software Applications in Chemistry has been published in its second edition after a very successful reception of the first. This work provides an interesting reading since the author is an established scientist, educator, and writer. With the presence of desktop computers on literally every desk and workbench in every laboratory, computers have become an indispensable tool in the working life of every chemist. Because of this computer accessibility, the author rightly points out in the Preface that “the need for computer skills on the part of practicing chemists continues to grow”. It has become imperative for everyone working in a laboratory to be not only computer literate but also software literate. The present book offers a convenient stepping stone in that direction. This book contains 18 chapters covering topics in developing mathematical algorithms for solving chemical problems and some novel applications of the developed software. Although the chapters are logically arranged, it might have been better to divide them into two partssthe first part covering the first nine chapters on the basic concepts and the second part on the more novel applications. Chapter 1 provides the essential introduction to the development of scientific computers, their applications, and design of algorithms. Basic concepts of statistics including errors, propagation of errors, and floating-point number system are introduced quite aptly in the second chapter. In the next seven chapters, linear and nonlinear curve fitting, matrix manipulation, solution of differential equations, numerical integration, simulation, and optimization methods have been presented with chemical examples and a listing of corresponding programs in FORTRAN language. Most scientists agree that FORTRAN is still the language of choice for solving scientific problems which, of course, may be simply a matter of age-old habit! In chapter 3, the example of enzyme kinetics is particularly useful since Michaelis-Menten hyperbolic equation and its linear counterpart, Lineweaver-Burk equation, appear in many areas of chemistry and biochemistry under slightly different forms. Differences in the parameter values obtained by fitting the hyperbolic and linear forms of the same equation to a given set of data are discussed. It may have been better if the influence of weighting on the fitted values was more clearly emphasized. Chapter 6 dealing with the numerical solution of differential equations has been very well-written with useful examples from the realm of chemical kinetics, an area that challenges many chemists with programming skills. Chapters 10-18 focus on the more current areas of chemistry and computers. With the explosion of chemical databases and information sources, the demand on retrieval and search of chemical structures and related information is increasing tremendously. Methods based on graph theory, pattern recognition, neural networks, and artificial intelligence are developed every day to meet this demand. These techniques have been catalogued nicely with some examples and extensive references in these chapters. The title of the book includes the word “software”, and though many programs are listed in the book, it would have been much better to provide a disk with the book so that the reader could directly or after modification use the programs for specific purposes. This book has been written for the advanced undergraduate or graduate student and is ideally suited for a course on “Computer Applications in Chemistry”. However, for this purpose the book would have served better if it included practice problems or projects at the end of every chapter. Computer Software Applications in Chemistry is a very wellorganized and -written book that should find home in every practicing