Book Reviews - American Chemical Society

The term, not in the. American Heritage Dictionary on the reviewer's hard disk nor in any ... and mathematical methods for analyzing chemical data and...
0 downloads 0 Views 374KB Size
936 J. Chem. InJ Comput. Sci., Vol. 33, No. 6, 1993

BOOKREVIEWS

BOOK REVIEWS Cbemometrics Tutorials, Collected from Chemometrics and Intelligent Laboratory Systems-An International J o u r ~ l , Volumes 1-5. Edited by D. L. Massart, R.G. Brereton, R.E. Dessy, P. K. Hopke, C. H. Spiegelman, and W. Wegschneider. Elsevier: Amsterdam, NY. 1990. 428 pp. $78.50. ISBN 044488 837 3. It is hard to find a definition of “chemometrics”. The term, not in the American Heritage Dictionary on the reviewer’s hard disk nor in any dictionary to which he haseasy access, was introduced by Bruce Kowalski in 1975linanarticleinthisjournal. Thefirst monographonchemometrics appeared in 1978: and several texts have been published recently.*5 The two primary journals of the field are ChemometricsIntelligent Laboratory Systems-An International Journal and The Journal of Chemometrics, begunin 1986and 1987,respectively. Asearchon thechemical Abstracts database on the journal CODENs shows 41 1 articles from the former and 162 from the latter (as of July 1993). While the editors of Chemometrics Tutorials do not define the term in their Foreword the content of the volume suggests the following definition: chemometrics is a subbranch of analytical chemistry dealing with advanced statistical and mathematical methods for analyzing chemical data and with the experimental data collection methods which make these techniques both feasible and necessary. The editors of Chemometricsand Intelligent Loboratory Systems have a policy of publishing tutorial papers solicited from experts in various disciplines relating to chemometrics. As they state in the Foreword: “Many similar conceptsoccur throughout severalsubdisciplinesof science andarereferred toin different contexts by different groupofinvestigators. A good tutorial will explain, exemplify and compare these methods and ideas, drawing on key applications in different areas. To workers in the field of chemometrics and the related area of laboratory computing these tutorials are of particular importance: concepts and methods developed by computer scientists, mathematicians, and statisticians are first applied and refined by chemists and then used by investigators in fields as diverse as pharmacy, geochemistry, environmental chemistry, and so on, so there is a strongly defined nccd for interdisciplinary communication.” Chemometrics Tutorials has 28 chapters, written by 35 different authors, over a third of whom are not chemists. The longest section is on multivariate and related methods (13 chapters), with other sections coveringexperimentaldesign and optimization(4 chapters), expert systems (4 chapters), signal processing, times series, and continuous processes (3 chapters), computers in the laboratory (3 chapters), and fuzzy methods (1 chapter). As might be expected from a book with 35 coauthors, the style and approach of the chapters varies widely. Most of the authors are European; only six are from the USA and one from Canada. Some chapters are almost devoid of mathematical symbolism, whereas others rely heavily on matrix algebra. Virtually all have useful references which can serve as a further entry into the topic, current as of late 1989. The least successful section is the very first in the book Computers in Laboratory. The article on scientific word processing by Dessy is dated and out of place; there is such a large market for technical word processing that the rate of introduction of new products is too fast to capture in such a genre as the tutorial article reprinted from a primary journal (and,admittedly, reviewed3 years later!). The two other articles, one by McDowall, Pearce, and Murkitt on the LIMS infrastructure and theother by FlerackersonscientificprogrammingwithGKS,alsoaddress technical issueswhich have been made somewhatirrelevant by the advance of software engineering and graphics hardware. The next section on expert systems is more interesting because of its controversy. In his chapter “Dendral and Meta-Dendral-The Myth and The Reality”, computer scientist N. A. B. Gray reviews the history of the first large-scale application of artificial intelligence to chemistry and critically assesses its impact on the practice of chemometrics. To quote from the beginning of the article: “Dendral is presented as an AI success-a program that has ‘redefined the role of humans and computers in chemical research.’ Dendral’s supposed success is cited as an example of AI applied to the real world and used as an argument justifying wider attempts to apply AI techniques. However, this popular perception of Dendral is misleading. Gray uses the Science Citation Index to show that the methods developedwithin Meta-Dendra1to predict mass spectral fragmentation patterns of keto-androstanes were not cited or used successfully by anyone outside project. He concludes that although the

scientific work published on the Dendral project is correct and has led to some advances in combinatorial mathematics, “the project is usually misinterpreted within the AI community as being more influential in chemistry than it really has been.” There is a rebutting article by Buchanan, Feigenbaum, and Lederberg, with a response from Gray. To this reviewer, the real impact of Dendral was the introduction of the idea of expert systemsto chemists. Computational chemistscontinue to explore their use? Even though the Dendral software itself is not widely in use, the chemometrics community has been enriched by its legacy. An excellent two-part article on Prolog for chemists by Kleywegt, Luinge, and Schuman describes a declarative programming language (PROgamming in LOGic) which is of growing interest to chemists (35 references in Chemical Abstracts since 1987). The facts, queries, and rules illustrating Prolog’s syntax are all taken from chemical examples, as are the three longer sample programs: reasoning about ligand field splitting patterns, transcribing DNA sequences into proteins, and calculating the Randic index for molecular graphs. For the chemistprogrammer thinking of exploring or even developing an expert system, these two articles are indispensable. The section on experimental design and optimization is helpful. In the article by Morgan, Burton, and Church on practical exploratory experimental designs, there are worked examples of single factor and factorial designs, as well as a good description of the efficiencies gained when fractional factorial designs are used. A closely related article on optimizationvia simplexby Burton and Nicklcsscontains the background, definitions, and a simple application for optimizing the procedure for determining sulfur dioxide in air. In two useful articles, Berridgediscwes chemometrics and method development in high-performance liquid chromatography; the sccond, for example, presents optimization response functions for sequential design, the selection of a simplex optimization algorithm, and application to the separation of a six-solute mixture by HPLC. Sequential and simultaneous experimental designsare compared and contrasted. In the section on signal prowsing are to be found articles by Brereton on Fourier transforms and their applications to spectroscopicand related data; by A. G. Marshall on dispersion and absorption spectra and their representation by dispersion-absorption plots (Cole-Cole and progeny); and by Kateman on sampling theory and its statistical descriptors, such as autocorrelation and semivariance. The largest section of Chemometrics Tutorialsdealswith multivariate analysis, the topic most widely identified with chemometrics by the chemistry community at large. There are two introductory articles, one on principal component analysis by Wold, Esbensen, and Geladi and the other by Mellingeron the methodsof multivariate analysis. In the limited reading and practice of multivariate analysis by this reviewer, he has been discouragedby the vast, specialized,and often redundant vocabulary used by different practitioners. What a relief to come across the following passage from Mellinger: “...a prolificvocabulary has emerged during the developmentof the field of multivariate statistics, which is both confusing and discouraging for new users. It is hoped that the few comments that follow will not add to the confusion. ...It is an unfortunate fact that the human vocabulary is very limited considering thevariety of concepts that human beings can generate: the additional fact that discipline in vocabulary usage is a difficult task does not help the situation.” In order for any aspects of chemometrics to find their way into the undergraduate chemistry curriculum, this confusion of vocabulary must be sorted out, hopefully by a textbook author who writes simply and clearly, with examples that students with limited chemical experience can understand. Some of the authors of these tutorial articles appear to have this knack, but many do not. One of the strengths of the editors’ insistence on a multidisciplinary approach is evident in the section on multivariate analysis. Here we find articles by scientists working in pharmaceutics, by Lcwi (“Spectral Map Analysis: Factorial Analysis of Contrasts, Especially from Log Ratios”) and by Thielmans, Lewi, and Massart (“Similarities and Differences among Multivariate Display Techniques Illustrated by Belgian Cancer Mortality Distribution Data”); in statistics and chemometrics,by Christie (“Some Fundamental Criteria for Multivariate Correlation Methodologies”), by Windig (“Mixture Analysis of Spectral Data by Multivariate Methods”), and by Kvalheim (”Interpretation of Direct Latent-Variable Projection Methods and Their Aims and Use in the Analysis of

BOOKREVIEWS MulticomponentSpectroscopicand Chromatographic Data”); in biological taxonomy, by Vogt (‘Soft Modelling and Chemosystematics”); and in geology, by Birks (‘Multivariate Analysis in Geology and Geochemistry: An Introduction”), by R e p e n t (‘Multivariate Analysis in Geoscience: Facts, Fallacies, and the Future”), by Mellinger (‘Interpretation of Lithogeochemistry Using Correspondence Analysis”), and by Birks (“Multivariate Analysis of Stratigraphic Data in Geology: A Review”). Finally, there is a delightful article by Otto: ”Fuzzy Theory Explained”. The concept of fuzzy sets was introduced in 1965and has been since used in many fields as a way of representing vague statements and uncertain knowledgein computer programs. Chemical Abstracts has 366 references to “fuzzy”, and a scan of some of them showed applications of fuzzy set theory in such arcas as instrument control, environmentaldecisionmaking, and physical property databases. One does not usually come across a learned article in a primary rescarch journal which devotes two full pages to the question, ‘how much is twenty minus fourteen?” Otto shows how fuzzy set theory can be applied to the identification of components in the infrared spectrum of a mixture. Should a library buy this volume? If the primary journal itself is not available, then by all means yes. It will not serve as a text or even an adjunct volume anywhere in the undergraduate chemistry curriculum. For the graduate or professional chemist, however, who nceds to know more about chemometrics and finds the style or coverage of the few available texts a drawback, this volume serves as an excellent introduction to the subject.

J. Chem. Znf. Cornput. Sci., Vol. 33, No. 6, 1993 937

REFERENCES AND NOTES (1) Kowalski, B. R. Chemometrics. Views and propositions. J . Chem. In?. Cornput. Sci. 1975, 15, 201-3. (2) . . Chemometrics: Theorv and ADDliCUtiOn: Kowalski. B. R.. Ed.: ACS SymposiumSeries, Voi. 52; Amirican Chemical Society: Waehkgton, D.C., 1977 (English). ( 3 ) Chemometrics: Experimental Design (Analytical Chemistry by Open Learning); Morgan, E., Ed.; John Wiley & Sons: Chichester, U.K., 1991 (English). (4) Chemometrics: A textbook; Massart, D. L., Vandeginste, B. G.M., Demin, N., Michotte, Y.,Kaufman, L.,Eds.; Elsevier: Amsterdam, 1987 (English). ( 5 ) Chemometrics. Chemical Analysis, Vol. 82; Sharaf, M.A., Illman, D. L.,Kowalski,B.R.,Eds.;JohnWiley &Sons: NewYork, 1986(English). (6) See, for example: Expert System Applicationsin Chemistry (Developed for the SymposiumSponsoredby the Divisionof Computers in Chemistry at the 196thNational Meeting of the American Chemical Society,Los Angelts, CA, Sep 25-30, 1988); Hohne, B. A., pierce, T. H.,Eds.; American Chemical Society: Washington, D.C., 1988 (English).

AUan L. Smith Chemistry Department Drexel University

938 J. Chem. In5 Comput. Sci., Vol. 33, No. 6, 1993 Practical Curve Fitting and Data Analysis: Software and Self-Instruction for Scientists and Engineers. By Joseph H. Noggle. PTR Prentice-Hall: Englewood Cliffs, NJ. 1993. 192 pp withindex. Includes EZFITsoftwareon 3.5-in. diskette (5.25 in. available). $48. Practical Curve Fitting and Data Analysis: Software and S e l j Instruction for Scientists and Engineers, by Joseph H. Noggle, is an excellent, concise text which demonstrates the application of linear and nonlinear (simplex) regression and interpolation to a wide selection of chemical problems. The thrust of the book is to give a feel for curve fitting and a sensitivityto indications, both graphical and statistical, that a method is appropriate. The book is supplied with a software package called EZFIT, written by the author. This is a surprisingly powerful and easy-to-use curve fitting and plotting program, not a tutorial. A set of 58 data files is included for use with the EZFIT program to illustrate and complement the lessions in the text. The selection is as wide as the imagination of an experienced teacher of physical chemistry, from whose courses many of the data sets are developed. The software and text do work well together, and the exercises may captivate the user and occupy him/her late into the night. Users are likely to find immediate application for the information and tools provided. The software runs on minimal IBM or compatible systems under DOS 3.0 or later. The main menu includes linear, simplex, or interpolation methods, and logicalsubmenus are easily mastered. EZFIT is the perfect (almost necessary) complement to expensive commercial programs, like PlotItl or Table Curve? providing the intuition and ”feel” necessary to avoid the pitfalls of their powei and automation. For example, if the LINEAR.DAT data file from EZFIT is loaded into Table Curve, 40 model equations with 9 > 0.99 are automatically presented in order of r2 (an 11-term polynomial ranks first) or F statistic (the simple linear model ranks first). The EZFIT tutorials provide statistical and chemical criteria for selection of the best fit, but the user must choose or construct the model (from a virtually limitless menu). For approximately 10 times the cost, Table Curve may provide 10 times the convenience, but both programs print graphs and plots of residuals to the screen instantaneously and can print to Hewlett-Packard plotters and LaserJets and create PCL or HPGL files, as well as to printers or ASCII files which allow easy importation into almost any program. Data input is via the keyboard or ASCII files, but EZFIT can also be used to digitize printed graphs, using the Hewlett-Packard plotter to follow printed plots manually. The commercial program ‘Un-Plot-It”3 has the same function but includes an automatic mode. EZFIT does all operations virtually instantaneously on a 486/66 machine; a 200-iteration simplex takes only a few seconds on a IO-MHz 286 machine. Thechapter on linear regression begins with applicationsof Beer’s law which serve to familiarize the user with EZFIT. Subsequent lessons are an excellent combination of review and introduction of new techniques or concepts. They sometimes illustrate how artifacts can be revealed and judiciously eliminated by the methods of linear regression, as exemplified by rejection of disparate data in the analysis of the rotational spectrum of DCI. But the exercises also reveal problems that arise from simpleminded application of linear regression, suggesting the need for more sophisticated methods which are introduced in later chapters. For example, systematic variation of residuals in the Clausius-Clapeyron plot of liquid CO2 indicates a failure of the regression model. Distortions of the data space in double reciprocal linear regressions are revealed by analysis of residuals, as illustrated by the Lineweaver-Burk plot for the amalase-catalyzed degradation of starch. The author shows how its inconsistencies with the Hanes plot for the same system reveal the need for weighted linear or nonlinear methods. Linear methods, including

BOOK REVIEWS coordinate or functional transformations, are applied to examples from chemical kinetics, flow of water in p i p and from a tank, Arrhenius plots, and multisite equilibria in macromolecules. In a nine-page chapter and the related software, Noggle presents an excellent digest of weighted regression methods for chemists. Examples include second-order kinetics and an improved treatment of enzyme catalysis. The final example is of radioactive decay with Poisson noise (the file FISH.DAT, no dry humor here). A table shows the proper weights to apply when the variables are transformed in common ways, so that points whose errors may be exaggerated by the transformation are given less weight in the regression. The software handles weighted regressions conveniently. A chapter on polynomials and other linear models contains several examples of interest to chemists, including interpolation of a table of water viscosities,temperature calibration of a thermocouple,heat capacity of chlorine as a function of temperature, the Van Deemeter equation, determination of particle size distribution by sedimentation rate analysis, determination of activity coefficients by vapor pressure methods, and extrapolation to infinite dilution of (1) the dissociationconstant of acetic acid, (2) equivalent conductivity of sodium iodide, and (3) the osmotic pressure of poly(isobuty1ene) in cyclohexane. Polynomials up to order 15 are allowed by the software, and the user can enter the very general form polynomials Y = BLfl(X) BZfz(X) + ..., whereft(X) can be any of nine functions including Gaussians, exponentials, tringonometric, reciprocal, and power functions. The fifteen-pagechapteron interpolationdiscussescubicspline,Newton divided difference, and B spline interpolations. Examples include the numerical integration of heat capacity vs temperature data to give the molar entropy of silver, numerical differentiation after interpolation of data from several titrations, and other applications. As usual, emphasis is placed on selection of appropriate methods. Two chapters (56 pp) deal with theory and examples of nonlinear (simplex) regression. The first chapter is a general explication of the approach with its strengths and weaknesses, while the second chapter takes up several examples, including Gaussian and Lorentzian line shape functions, NMR relaxation, a two-step (mother/daughter) radioactive decay, deconvolution,and the kinetics problems which were earlier found not to be amenable to methods of linear regression. The program provides 28 simplex models. For comparison, Table Curve uses an automated Levenburg-Marquardt/matrix inversion method. The book has very few, minor errors, mostly on the harmless level of the synopsis on the frontispiece, where the book is described as “Practical C u r e Fitting and Data Analysis”. A good, but not lengthy index is included,and appendicesaredevoted to propagationof errors and advanced features of EZFIT. The book is highly recommended for all chemists, including teachers who may use the examplesdirectly. If you have started other books on data analysis and curve fitting only to lose motivation after the first chapter, get this book.

+

REFERENCES AND NOTES (1) ScientificProgramming Enterprises,P.O. Box 669, Haslett, MI 48840;

$495.

(2) Jandel Scientific, 2591 Kerner Blvd., San Rafael, CA 94901; $495.

(3) Developed by Silkscientific Corp. and distributed (about $300) by the American Chemical Society, Distribution Office, Dept. 113, P.O. Box 57135, West End Station, Washington, DC 20037.

Ed Vitz Department of Physical Sciences Kutztown University