Report Stuart A. Borman A N A L Y T I C A L CHEMISTRY
SCIENTIFIC SOFTWARE Until recently considered a narrow, vertical m a r k e t t h a t warranted little attention, the scientific software market is now quickly becoming a growth industry. This field has been dominated by scientific instrument companies writing programs to enhance and support their products, by computer companies such as Digital E q u i p m e n t Corporation (DEC) t h a t provide scientific software products t o make their computers more useful, and by individual scientists honing their programming skills on Apple II computers while their spouses turned into computer widows and widowers. Today, however, interest in the scientific software m a r k e t is more widespread. For example, Lotus DevelopPhotograph courtesy of Floating Point Systems 0003-2700/85/0357-983A$01.50/0 © 1985 American Chemical Society
ment, one of the most successful and best funded software houses, recently announced formation of an engineering and scientific products division. This division's first product is TK'Solver, an equation-solving program t h a t was initially developed by Software Arts Inc., a company acquired by Lotus earlier this year. Elsevier Science Publishing j u m p e d into t h e fray a couple of years ago with a series of software packages for scientific applications, and Macmillan Software, a subsidiary of Macmillan P u b lishing, began marketing t h e Asyst scientific d a t a acquisition and analysis package in 1984. International Business Machines (IBM) also began paying more attention to scientific applications when, in 1984, it introduced its Engineering/Scientific Series of products for d a t a acquisition and control,
high-speed computation, and computer graphics. This month's REPORT covers various aspects of scientific software, including the following: evaluation and selection of commercial software products; program exchanges, catalogs, and other information sources; major d a t a analysis packages, including RS/1 and Asyst; statistics and chemometrics software; and artificial intelligence (AI) for the scientific laboratory. In addition, commercially available scientific word-processing packages were covered in last month's FOCUS feature (1).
Evaluating software As software becomes more sophisticated, shopping for software becomes increasingly difficult. In some cases, the company t h a t manufactures your
ANALYTICAL CHEMISTRY, VOL. 57, NO. 9, AUGUST 1985 · 983 A
scientific instruments may be able to supply all your software needs as well. However, if you are shopping elsewhere for software, you will want to begin by checking each candidate product for compatibility with your instruments and computers. For example, the Asyst data analysis product from Macmillan Software runs on IBM personal computers and compatibles equipped with an Intel 8087 (IBM PC and PC-XT) or an 80287 (IBM PC-AT) math coprocessor. RS/1 from BBN Software Products (a subsidiary of Bolt Beranek and Newman), a package that is also marketed by IBM and DEC under their own labels, runs on IBM PC-XT, PC-AT, and 3270 computers (with or without Intel math coprocessors), DEC Professional 350s and 380s, DEC PDPs and VAXes, and IBM mainframes. For any software under consideration, you will also want to determine each package's resident memory and disk memory requirements. Further complications involve its compatibility with data acquisition hardware and software and with particular printers you may already own. Once you've assured yourself of compatibility, obtain detailed information from vendors—written technical descriptions or even user manuals, if possible. Is the documentation well written? Demo disks, tutorial disks, and noobligation trial periods can provide an in-depth understanding of a product's capabilities. Does the software have HELP files? Does the company provide a toll-free phone number for user assistance? Will the vendor supply a new disk if yours becomes damaged? Are you permitted to make copies for your own security, or can you share the wealth by making a disk copy for your lab partner? Most software vendors today ensure against indiscriminate copying with security features of varied degrees of sophistication. Source code A continuing problem, according to some scientists, is the admittedly justifiable refusal of most software vendors to reveal the source code, the actual lines of programming language of which the program is composed. These lines of code represent an enormous investment in development time, and most vendors choose not to place this investment at risk by revealing the source code. Purists argue that the scientific, mathematical, or logical validity of a program cannot be verified directly by users or by government regulatory agencies unless the source code is available. In reality, however, only a tiny percentage of users have the skill,
You can't really verify. . . software... by looking at the code. The bottom-line way to do it is to have an exhaustive set of test cases run. —Frederick Putnam the time, or the inclination to decipher and validate the thousands of lines of code in even a relatively simple program. Thus, although scientific researchers are accustomed to having total control over all experimental variables, this is one situation in which the exercise of that control will remain impractical. Nevertheless, scientific software must be certified and validated in some way. Frederick A. Putnam of Laboratory Technologies, the company that markets the Labtech Notebook scientific software package, contends that certification cannot be accomplished through publication of the source code because "you can't really verify that a piece of software does something by looking at the code. The bottom-line way to do it is to have an exhaustive set of test cases run. This, in fact, is how they validate Ada compilers and accounting systems." Elsevier Science Publishing's "The Software Catalog: Science and Engineering" makes a similar recommendation: "Even with the most widely distributed generalized packages, we warn readers to be extremely cautious and test the results against known answers. Much . . . relatively new . . . software has not had enough time for thorough debugging and the errors could turn out to be very costly." Exchanges, marketers, and information sources Among the important scientific software information sources is Science Software Quarterly (SSQ, Center for Environmental Studies, Arizona State University, Tempe, Ariz. 85287; 602-965-3051). SSQ provides information and reviews on commercially available scientific software packages, free or almost-free publicdomain software, and relatively inexpensive custom-made software written by working scientists. The latest issue
984 A · ANALYTICAL CHEMISTRY, VOL. 57, NO. 9, AUGUST 1985
of the publication (Vol. 2, No. 1) includes reviews of a number of commercial packages of interest to analytical chemists: Asyst, Labtech Notebook, and three major statistical packages (Systat, SPSS-PC, and the microcomputer version of Minitab). SSQ includes news about hardware and software developments and a readers' question section, with responses from the editorial staff or from other readers. In addition, an annual readers' survey assesses the popularity and usefulness of commercially available software packages. One important source of scientific software is Indiana University's Quantum Chemistry Program Exchange (QCPE, Department of Chemistry, Indiana University, Bloomington, Ind. 47405; 812-335-4784), which, despite its name, covers the entire field of computational chemistry. Of particular interest to analytical chemists are QCPE's holdings in spectroscopy, including PAIRS, an AI program for the analysis of infrared spectra that was developed at Merck, Sharp & Dohme Research Laboratories (2). QCPE's programs are donated to the exchange by their authors in the interest of advancing chemical research and development, and they are sold by QCPE at very reasonable cost. QCPE publications include a quarterly bulletin, an annual guide and index in which all available programs are listed by application category, and a catalog that provides detailed abstracts and technical specifications for each program. A similar program was initiated about two years ago by Research Corporation, a nonprofit foundation that provides grants for scientific research. The Research Corporation/Research Software (RC/RS) program (6840 E. Broadway Blvd., Tucson, Ariz. 85710; 602-296-6400) acts as a national clearinghouse for computer software created by scientists and engineers at colleges, universities, and nonprofit research centers. RC/RS copyrights this software and markets it through nationally distributed catalogs and promotional materials. Software buyers pay a fee or royalty, 60% of which is returned to authors or their institutions; the rest goes to subsidize the costs of the program. According to RC/RS program manager Mark Ogram, the program differs from QCPE in that its scope is broader and programs are donated to QCPE by their authors, whereas RC/RS program authors can expect to be compensated monetarily. The RC/RS catalog currently includes about 70 titles, one of which is a program for Xray fluorescence spectrometry. Scientific Computing (Essex, Vt.) also purchases software written by scientists and other private individuals,
markets it, and returns royalties (gen erally 15-30%) to the authors. The company's first product is Sim-Soft ($895), a menu-driven analytical sam ple management program for the IBM PC and PC-XT that provides data storage and maintenance of data files in the laboratory. One source of information about sci entific software, "The Software Cata log: Science and Engineering" ($45), published by Elsevier Science Pub lishing and based on the International Software Database (Imprint Software Ltd., Fort Collins, Colo.), is, unfortu nately, a bit lean on chemistry listings.. According to Elsevier, the recently published 1985 edition of the catalog contains only 46 programs in the "Sci entific: Chemistry" category out of a total of about 3200 programs. The catalog is arranged alphabetically by vendor name, but the listings are also cross-referenced by computer model, operating system, programming lan guage, microprocessor type, applica tion, and package name. Asyst, R S / 1 , and other data processors RS/1 and Asyst are full-perfor mance software packages designed specifically for use by scientists, engi neers, and mathematicians. In some respects, Asyst and RS/1 are function ally similar. Thus, for some microcom puter applications, these two systems are in competition with each other— Asyst does not run on larger comput ers as does RS/1. If a multiuser, mini computer-based data analysis system is desired, RS/1 would be the software of choice. The DEC VAX or IBM mainframe versions of RS/1, for exam ple, could provide data analysis ser vices to an entire laboratory. Asyst is a modular system, presently available as four modules—system/graphics/sta tistics, analysis, acquisition, and gen eral-purpose interface bus (GPIB) ac quisition—that can either be pur chased separately or as a set. Both Asyst and RS/1 provide the commonly used mathematical func tions (such as those found on a scien tific calculator), statistical functions (mean, standard deviation, and many more), least-squares approximations (linear and nonlinear), table (array) editing and manipulation, graphics output and display (including auto mated plotting routines), program ming capability, and access to external word-processing software (1) for re port writing. Asyst has a number of built-in functions that are not avail able with RS/1, including some vector and matrix arithmetic (determinants, Gram-Schmidt orthogonalization), baseline correction, peak-finding algo rithms, digital smoothing, convolu tions and filtering, and fast Fourier
Figure 1. RS/1 integrates analytical, graphical, and reporting capabilities
transformation. RS/1 functions that are not available with Asyst include pie chart and bar graph data presenta tion and special facilities to help users construct mathematical models of complex physical processes. Asyst permits arrays of up to 16 di mensions, and RS/1 supports unlimit ed multidimensional constructs. This should please chemometricians inter ested in multivariate data analysis. Al though the maximum number of ele ments in an Asyst table is limited to 64 Kbytes of data, there is no software limit on the size of RS/1 tables. Macmillan Software points out that Asyst's table size was intentionally limited to maintain rapid execution speed. Real-time operation provides the ability to view a graphical display of experimental data while the experi ment is proceeding, similar to viewing the experimental results on a strip chart recorder. Real-time operation also makes it possible to control the experiment with the computer. Asyst offers a choice between real time and postprocessing operational modes and supports functions such as digital and analog I/O, triggering sup port, timing synchronization, and data buffering. Although RS/1 does not in clude data acquisition facilities and therefore operates only in the postpro cessing mode, the integration of RS/1 with the Labtech Notebook software package makes Labtech Notebook's real-time I/O capabilities available to RS/1 users, albeit at the additional ex pense of purchasing Labtech Note book. Other software packages that permit real-time data acquisition and control include Insight-1000 (Chesa peake Software), Appligration II (Dy namic Solutions), and Labtech Note book itself.
986 A · ANALYTICAL CHEMISTRY, VOL. 57, NO. 9, AUGUST 1985
Introduced in 1979, RS/1 (Figure 1) has been the market leader in scientif ic data analysis; BBN lays claims to the largest installed base of minicom puter and mainframe-based users. A session on RS/1 might begin by importing experimental data into an RS/1 table from an ASCII or binary file. RS/1 prompts the user for table column headings and row names. One advantage of RS/1 is that, at each point, it provides a choice between command-driven and menu-driven operation, for those who are relatively skilled or unskilled in using particular RS/1 functions. Another important feature of RS/1 is its use of English like commands that are easy to learn. The data in two table columns could be plotted against each other with the command MAKE GRAPH TEST FROM COLUMN Ά ' VS COLUMN 'B', in which TEST is a graph name selected by the user. An EDIT com mand provides menus from which the user can add messages and explana tory axis labels to the graph. By typ ing in FIT FUNCTION and supplying RS/1 with a suggested type of equa tion, the program fits a least-squares curve to the data and returns goodness-of-fit information. A presenta tion copy of the results could then be created, including the graph (with the experimental points, the least-squares curve, and perhaps some error bars), a table of the data, and some explana tory text merged in from a word pro cessor. As mentioned above, RS/1 operates on IBM PC-XT, PC-AT, and 3270 computers and on DEC Pro 350/380 computers; a minimum of 512 Kbytes of memory and one 10-Mbyte hard disk are required. RS/1 has also been available for some time on larger com puters, such as DEC PDPs and
Command W r i t e r . . . A Partial Feature List Scientific notation Formula Writing Shortest keystroke paths Least memorization Largest command structure Patent pending user interface Dynamic footnoting at end of page or end of document Odd/even headers and footers Superscript, subscript, italics Bold, shadow, strikeover Underline, double underline Superscript/subscript combined Expanded or condensed type Extensive on-screen help Escape and menu-command undo Screen index of all features Automatic outlining/numbering Automatic and soft hyphenation Integrated automatic speller Insert Wordstar and ASCII files Insert switchable (on/off) nonprinting comments Multiple libraries/glossaries for inserting frequently used formulas, files, and phrases Powerful macro create capacity Column move, copy, and delete Block move, copy or delete Non-block delete, copy, and move modes for maximum efficiency Undo a delete (multiple layers) View/move/copy among 3 windows Most powerful swap structure Extensive locate capacity: Format structure, attributes, userinserted tags, etc. Extensive find and replace: Backwards and forward global Scan, match case or word Repeat find/last find Most powerful and flexible cursor movement capacity Easy directory control of file copying, renaming, deleting Extensive user control over screen display and file setup Control editing through viewed or hidden display codes Selectable rulerline banks Very wide array of tab types: decimal, center, right, left Control files in RAM buffers Up to 10 files in buffer Flexible print controls Background/foreground Multiple copies Page numbering, no numbers Print to disk Print to ASCII file Dynamic continual reformatting Tag or mark sections of text Automatic backup with Save
©Copyright Command Software Inc. 1985. All rights reserved.
Figure 2. Two screens from the Asyst software package
VAXes and IBM mainframes. Prices range from $1900 for single-user systems to $85,000 for multiuser configurations. The price of RS/1 on IBM personal computers is $2000, with discounts possible for multiple copies. Asyst (Figure 2) was introduced in 1984 and has been growing rapidly in popularity ever since; Macmillan claims the largest installed base of microcomputer users relative to competitive products. At its simplest level, Asyst operates much like a Hewlett-Packard (HP) programmable calculator, providing basic mathematics, statistics, trigonometry, and other functions. Unlike an HP calculator, however, the Asyst number stack supports arrays as well as scalers. On a second level, Asyst provides a wide variety of data acquisition, data reduction, and plotting routines that are invoked through a library of commands. A few examples are: INTEGRATE.DATA, which computes the running integral of an array; LOCAL.MAXIMA, which finds local maxima in a spectrum, chromatogram,
or other data array; and SMOOTH, which smooths data with a Blackman window-type filter. New, user-defined commands can be established with "colon definitions," which are automatically compiled into machine language for fast execution. On a third level of complexity, Asyst commands can be linked together or combined with programming control structures (such as IF . . . THEN . . . ELSE) to form user-defined programs. Asyst's excellent documentation not only teaches one how to use Asyst, but also takes the trouble to provide educational information on the theory behind some of the many mathematical and statistical functions available on the system. Asyst works in conjunction with the IBM PC, PC-XT, PCAT, Compaq, AT&T 6300, Zenith, and other IBM-compatible computers. The hardware configuration must include 384 Kbytes of RAM or more, two disk drives, an IBM color graphics video interface board (regular or enhanced version) or a Hercules graphics card, and an Intel 8087 or 80287 math-
ANALYTICAL CHEMISTRY, VOL. 57, NO. 9, AUGUST 1985 · 989 A
Figure 3. Labtech Notebook graphical screen output
ematical coprocessor. Asyst's data acquisition modules require no programming and are compatible with interface boards from a wide variety of vendors; the GPIB data acquisition module provides a connection to as many as 16 instruments on an IEEE488 bus. Asyst prices range from $1695 for two modules (system/graphics/ statistics and analysis), to $1995 for those two modules plus one of the two acquisition modules, to $2195 for the set of all four modules. Labtech Notebook (Figure 3) is a real-time data acquisition software system that requires no programming; it is entirely menu driven and easy to use. Labtech Notebook includes its own capability for real-time I/O and graphical display, curve fitting, fast Fourier transforms, and foregroundbackground operation. Foregroundbackground operation is defined by Laboratory Technologies as the capability to run a program in the foreground (as long as the foreground program doesn't use the computer's realtime clock) while Labtech Notebook is performing real-time data acquisition in the background. Some other software vendors define this term as the ability of different modules of the same program to operate at the same time. Labtech Notebook performs realtime data acquisition and control, but its data analysis capabilities are supplied, for the most part, by RS/1, Lotus 1-2-3, or other data analysis programs, which have to be purchased separately. However, Laboratory Technologies is developing some modular data analysis add-ons, and the first of these, for chromatography applications, is just on the verge of being released.
A Labtech Notebook session might begin by acquiring data from an instrument, which can be displayed graphically on the screen as it is being acquired; data can be acquired and displayed on a number of channels simultaneously. A different display format for the next run could be created by, for example, changing the display size, the trace type, and the input signal scaling factor in a setup option menu. A SAVE/RECALL command could be used to save the current setup options, representing a particular type of experiment, or to recall a set of previously saved setup conditions. Experimental data can then be imported into RS/1 or Lotus 1-2-3 for data reduction, statistical analysis, and graphics presentation. Labtech Notebook runs on the IBM PC, PC-XT, PC-AT, and compatibles. At least 256 Kbytes of RAM is needed. In addition, an IBM color graphics board is needed to take advantage of Labtech Notebook's real-time display feature. Labtech Notebook supports data acquisition and control hardware from virtually every major manufacturer of such equipment. The price of the software is $895, to which the cost of RS/1 or Lotus 1-2-3 must be added if they are not already available. Other data acquisition and analysis packages include Appligration II (Dynamic Solutions) and Insight-1000 (Chesapeake Software). The Appligration II starter set ($2195) is an integrated hardware and software product for Apple 11+ and Apple He microcomputers. The software provides data acquisition and control, peak detection routines, data manipulation (including differentiation, integration, peak height and area analysis, smoothing, and regression statistics),
990 A · ANALYTICAL CHEMISTRY, VOL. 57, NO. 9, AUGUST 1985
graphics, and a BASIC-type programming language; the hardware consists of an extended memory card and a multifunction card with a serial data acquisition interface. Modules ($595 each) for specific applications—chromatography, continuous-flow colorimetry, spectroscopy, and thermal analysis—are added to the starter set to form totally turnkey systems. Introduced for the first time at the Scientific Computing and Automation Conference in Atlantic City in May and scheduled to be shipped this fall, Insight-1000 is a menu-driven, DEC Micro VAX-based system for real-time data acquisition and display that allows automatic data transfer into RS/1 or SAS (SAS Institute) for data analysis. The system is priced at $20,000 and up. TKÎSolver (Lotus Development) was originally marketed by Software Arts, the company that developed the popular VisiCalc spreadsheet package, but it became a Lotus Development product when Lotus acquired Software Arts earlier this year. TKÎSolver uses two modes to solve equations, Direct Solver, which works on simple equations with one unknown, and Iterative Solver, which handles sets of simultaneous linear and nonlinear equations with a modified NewtonRaphson procedure. Iterative Solver kicks in automatically when Direct Solver fails to produce a solution. TKÎSolver also performs unit conversions (the user provides the conversion factors) and generates tables and plots of results. TKiSolver is available for IBM PCs and compatibles, the DEC Professional and Rainbow, the Wang Professional, and the Apple He. A new equation solver, Formula/ One (Alloy Computer Products), has recently appeared. Among its features are multiple regression analysis, curve fitting, graph plotting, and the ability to solve linear, nonlinear, simultaneous, and transcendental equations. Statistical packages Although packages such as Asyst and RS/1 are optimized for scientific applications, commercially available statistical software products that can provide similar mathematical and graphical functions should also be considered for laboratory use. Four microcomputer-based statistics packages were recently reviewed (3): Crisp (Crunch Software Corp., San Francisco, Calif.), which requires minimal knowledge about statistics and provides the user with immediate feedback at each step of the analysis; SPSS/PC (SPSS Inc., Chicago, 111.); Systat (SPSS Inc.); and Statpro PC (Wadsworth Professional Software). Although Wadsworth Professional
Software went out of business in April, discussions were under way with sev eral parties interested in acquiring the rights to Statpro. In a talk on microcomputer chemometric techniques at the November 1984 Eastern Analytical Symposium, Stanley N. Deming of the University of Houston suggested some of the fol lowing statistics packages for consid eration: • A micro-mainframe interface makes it possible to use Statgraphics (Statistical Graphics Corp., Princeton, N.J.) in a number of ways—stand alone on a mainframe or an IBM PC, on a mainframe using the IBM PC as a smart terminal (for information I/O to and from the mainframe), on a mi crocomputer while accessing main frame data files, or simultaneously on both. • Minitab (Minitab Inc., University Park, Pa.), available on mainframes, minis, and micros, is especially power ful for row and column operations on matrices. • BMDP (BMDP Statistical Soft ware Inc., Los Angeles, Calif.) is a comprehensive library of statistical programs that runs on the proprietary StatCat microcomputer or on the IBM PC. • Microcomputer-based packages include Omnibase (Conceptual Soft ware Inc.); Systat (Systat Inc., Evanston, 111.); the Portable Statistician (Statware, Morton Grove, 111.), which, at $99, represents a price break over many other programs; MicroTSP (Quantitative Micro Software); Medlog (Information Analysis Corp., Mountain View, Calif.), a clinical data management and analysis system; Stan (Statistical Consultants Inc., Lexington, Ky.); and NWA Statpak (Northwest Analytical, Portland, Ore.). • Statistical packages for specific laboratory applications from Elsevier Scientific Software (Amsterdam) in clude Balance, which Deming de scribed as an "almost-expert system to compare a series of measurements us ing parametric or nonparametric teste," and Cleopatra, which includes autocorrelation, curve fitting, and Fourier filtering functions. • Deming's own company, Statisti cal Programs (Houston, Tex.), pro vides packages that use calculated Τ and F values to determine confidence levels and that optimize chemical re actions and processes based on a se quential simplex algorithm. • Curve fitting is a common proce dure in the chemical laboratory; pro grams that address this area in par ticular include Curve Fitter-PC (In teractive Microware Inc., State College, Pa.) for the IBM PC, and Perkin-Elmer's Graph 7000 package
The knowledgebased AI system can handle uncertain, incomplete, and contradictory information and can deal witha wider range of situations. for Perkin-Elmer 7000 series profes sional computers. Software exhibitors at this year's Pittsburgh Conference included IMSL Inc. (Houston, Tex.), which offers a full range of mathematical and statis tical programs and subprograms that run on everything from Cray super computers to IBM PCs, and Heyden & Son (Philadelphia, Pa.), which in troduced Statworks, a statistics pack age that runs on the Macintosh. Also announced at this year's Pitts burgh Conference was Ein*Sight (Infometrix Inc., Seattle, Wash.), an ex ploratory multivariate analysis system for the IBM PC and compatibles that makes it possible to analyze and dis play complex interrelationships be tween multiple variables and samples. The program, which uses Symphony for most input and output functions, is a subset of Arthur, a mainframebased pattern recognition and multi variate data analysis system. Expert systems The expert system, a form of AI, re presents a completely different type of programming, in which problems are solved by logical rules of inference rather than by calculation, as in con ventional, algorithmic programming. The expert system manipulates sym bols, rules, and knowledge rather than data. "Pieces of knowledge are en coded symbolically as facte and rules," explains Joe Karnicky, project leader for expert systems at Varian Associ ates. "Functions that perform logical inferences on these rules are then evaluated, so that the system can reach conclusions based on particular sets of inputs." Karnicky is presently working on an expert system to help chromatographers design liquid chromatographic (LC) methods (4). However, a number of non-ΑΙ software packages are al ready commercially available to assist in LC solvent optimization. Is AI,
992 A · ANALYTICAL CHEMISTRY, VOL. 57, NO. 9, AUGUST 1985
then, simply a new gimmick, or does it offer something truly new? The answer is that AI is very differ ent indeed. The optimization software that is available from a number of LC vendors involves algorithmic strate gies, in which the program processes the input data in a fixed way and cal culates optimal mobile phase condi tions for the separation at hand; there is a fixed prescription for how to pro cess the data. However, the AI system Karnicky is developing operates off a knowledge base, which affords a num ber of advantages: The knowledgebased AI system can handle uncertain, incomplete, and contradictory infor mation and can deal with a wider range of situations than algorithmic optimization strategies. "Another advantage is that the knowledge contained in the program is explicit," explains Karnicky. "One can find symbolic statements in the code that represent particular rules, such as: Reversed-phase columns should be used to separate phenols. The knowl edge is explicitly stated, usually in the form of facte and rules, but not al ways, and it is separate from the infer ence machinery, the functions that process the knowledge. Because you can see the knowledge explicitly, it is easy to update the program. If later on you decide that this is not a good piece of knowledge to use, you can replace it. In a conventional program, the knowledge is hidden; it's implicit in the flow chart." AI development efforts of impor tance to analytical chemistry are cur rently proceeding on a number of fronts: • A presentation at this year's Pittsburgh Conference by Carla Wong of Lawrence Livermore National Lab oratory (LLNL) on TQMSTUNE, an AI system designed to tune a totally computerized triple-quadrupole mass spectrometer, drew an overflow crowd. • Former LLNL staffer Jack Frazer is working on an AI system that will design experimental sequences based on the results of preceding experi ments (adaptive, self-optimizing ex perimentation) . • BBN is presently developing an ΑΙ-based statistical advisory package called RS/Expert for the science and engineering market. • Award Software Inc. (Los Gatos, Calif.) is developing a set of AI mod ules that will solve problems in capil lary gas chromatography. • AI research is also proceeding apace at Hewlett-Packard, PerkinElmer, and IBM, in addition to Var ian. A source at one of these compan ies predicted that commercial AI prod ucts of interest to analytical chemists would begin to emerge in "a couple of years, maybe sooner."
Although AI systems could, theo retically, be designed with convention al programming languages, many AI researchers and programmers work in Lisp or Prolog, programming lan guages designed to deal with informa tion that is coded symbolically instead of numerically. Special Lisp comput ers, known as Lisp machines, are sold by companies such as Xerox, Lisp Ma chine Inc., and Symbolics. A dynamic market is also shaping up in Lisp-based AI programming tools that can reduce the amount of time it takes to create a new expert system. Examples include the follow ing: • KEE (Knowledge Engineering Environment) by IntelliCorp, ART (Automated Reasoning Tool) by Infer ence Corporation, Loops by Xerox, and S.l by Teknowledge, all of which run on Lisp machines; • RuleMaster (Radian), M.l (Tek nowledge), Personal Consultant (Tex as Instruments), and perhaps another dozen or so commercial packages that have been designed to run on personal computers; and • relatively inexpensive AI tools that have been developed at academic institutions such as Stanford and the University of Maryland. The natural-language interface One of the most important applica tions of AI in scientific computation may turn out to be its use in the devel opment of software front ends or user interfaces. Why do we need a new type of software interface? Consider Asyst, the data analysis, statistics, graphics, and data acquisi tion package discussed above. Asyst is a command-driven system, in which you must either know the command syntax and structure by heart or you must look them up in the manual or in the HELP files. There is nothing in herently wrong with a command-driv en system; in fact, it's the fastest type of system to use once you've learned the syntax. Aye, but there's the rub, because programs such as Asyst de mand a certain amount of learning time. BBN has taken an interesting and effective approach to this problem: Its RS/1 system gives the user a choice between menu-driven operation and command-driven operation with En glish-like syntax. This type of system addresses part of the problem, but if you've ever tried using a sophisticated menu-driven product such as Lotus 12-3 without first reading the docu mentation or using the tutorial disks, you'll know that menus are not the last word in user interfacing. "The main limitation these days to use of computers is not the equipment or software per se," explains Nancy Woo
The main limitation these days to use of computers is not the equipment or software per se, but the time required for people to learn to use these systems effectively. —Nancy Woo of Merck & Company, "but the time required for people to learn to use these systems effectively." That is why software engineers are turning to a concept known as natu ral-language interfacing. According to Art Rosenberg, also of Merck, "I see applications of AI as a front end to laboratory information management systems or data acquisition systems that will allow relatively free-form in put by the user, approaching naturallanguage input. These systems will be built with AI tools." One such tool, called a parser, takes apart a command such as "Let me see data on all the formaldehyde sam ples," figures out what its components are, and passes the interpretation over for processing by the computer. With out a parser at the front end, querying the data base for formaldehyde sam ples necessitates a highly specific syn tax, the slightest deviation from which might result in an error message. "This discourages people from using computers," says Nancy Woo. "They don't know what the problem is; they just know they didn't do it right." A number of companies are current ly involved in natural-language work. For example, Artificial Intelligence Corporation's popular Intellect pro gram (also marketed by IBM) helps users retrieve information from large mainframes without having to use the complex command structure required by most computer systems (5). Microrim's microcomputer-based Clout program (6) permits natural-language access to information on data bases such as Lotus 1-2-3, dBase II, and PPS:File. Texas Instruments offers NaturalLink, a natural-language front end for the MS-DOS operating sys tem, the Dow Jones News Retrieval Service, dBase II, Multiplan, and a number of other applications pro grams. Carnegie Group recently an
994 A · ANALYTICAL CHEMISTRY, VOL. 57, NO. 9, AUGUST 1985
nounced availability of a flexible natu ral-language interface processing envi ronment called Language Craft. In addition, RS/Expert, the experimental design, analysis, and graphics package being developed at BBN, will report edly include some natural-language interfacing capability. AI ideas for the future Kevex Corporation markets a mi croanalysis system with an optional voice recognition feature that enables the system computer to understand and execute spoken commands. Com bining this sort of voice recognition system with natural-language inter pretive software might be a powerful user interface concept for future im plementation. A natural-language front end could also be used to provide educational feedback by translating natural-lan guage requests into formal system command language and displaying the formal syntax on the terminal. Such a natural-language interface could teach the command language to the user in much the same way that a baby learns language from his parents. After learning some of the syntax through repeated exposure, the user could be gin to use the command language di rectly. This sort of system would make it possible for people to immediately use unfamiliar software without need ing to read the manual first. AI tools might also be used one day to address what is certainly one of the most difficult problems in analytical laboratory management—routing a sample through a sequence of labora tory tests to obtain the information needed in the most efficient manner possible. This type of process is inter active in nature, because the decision as to whether or not to perform test Β will often depend on the results of test A. If an expert system that helped the laboratory manager select the optimal test sequence for each sample could one day be integrated into a laborato ry information management system, this would indeed represent a signifi cant step in the direction of the totally automated laboratory. A source at one major scientific instrument and soft ware company admitted that "We're working on something like that, but I can't talk about it." References (1) Anal. Chem. 1985,57, 888-92 A. (2) Woodruff, Hugh B. Anal. Chem. 1984, 56,1314-20 A. (3) Fridlund, Alan J. Infoworld 1985, 7(6), 42-50; Update and letters: 1985, 7(16), 50 and 1985, 7(20), 55. (4) Karnicky, Joe. Anal. Chem. 1984, 56, 1312-14 A. (5) Buell, Barbara. Bus. Week, April 15, 1985, p. 150 J. (6) Foster, Edward. Personal Computing, 1985, 9(4), 62-69.