computer-assisted structure elucidation - ACS Publications - American

graphic data,and chemical struc- tures were stored. Retrieval by complete or partial spectrum, struc- ture or substructure, or various text fields was...
0 downloads 0 Views 9MB Size
A/C INTERFACE

COMPUTER-ASSISTED STRUCTURE ELUCIDATION

Indirect database approaches and established systems Spectral interpretation can be automated by three primary methods: library searching to match an unknown spectrum against spectra stored in a database; artificial intelligence and pattern recognition techniques to link spectral features with the chemical substructures responsible for them; and spectral simulation to select the most likely candidate structure from those generated. In Part 1 of this two-part series, which appeared in the December 1 issue, library search methods and data collections were examined. In Part 2 the focus shifts to artificial intelligence, pattern recognition, and spectral simulation.

Wendy A. Warr Wendy Warr & Associates 6 Berwick Court Holmes Chapel Cheshire, CW4 7HZ, England

In this continuation of our review of computer-assisted structure elucidation begun earlier (1), systems based on artificial intelligence (AI) (2) and those based on statistical techniques (3) will be considered together because both statistical and probability techniques may be used to generate t h e r u l e s for e x p e r t s y s t e m s . This approach has advantages over using only t h e s p e c t r o s c o p i s t e x p e r t in building a knowledge base and establishing rules.

Indirect database approaches Many different p a t t e r n recognition techniques have been used in spectral interpretation. The self-training interpretative and retrieval system (STIRS), for example, uses a nearestneighbor method. In their structureo r i e n t e d d a t a b a n k s y s t e m for t h e identification and interpretation of IR spectra (IDIOTS), Passlack and Bremser use a linear discriminant m e t h o d (4). Supervized c l u s t e r i n g has also been used. Recently there has been much interest in using the computer-simulated associative memories and modeling tools known as neural networks in computer-assisted structure elucidation (5). One advantage these systems offer is t h a t the rules relating

ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993 · 1087 A

A/C

INTERFACE

the predicted structural features to the spectral information input need not be specified; the network itself deduces the rules during the training process. In fields such as molecular spectroscopy, w h e r e t h e r u l e s are scarce but the number of examples is large, neural networks may be superior to expert systems. In an unpublished paper (August 1993), Bo Curry of Hewlett-Packard Laboratories compared nearestneighbor methods with neural networks for the classification of mass spectra. The a d v a n t a g e s of n e u r a l net classifiers are that reliable membership probabilities can be determined and classification is very fast. T r a i n i n g on a new d a t a b a s e , however, is slow. A major problem is that the classes to be identified must be predetermined. If they are poorly selected, distinctive subclasses may be missed. T h e a d v a n t a g e s of t h e n e a r e s t neighbor approach are t h a t classes need not be predefined, and rare but distinct classes can be identified. The disadvantages are that reliable m e m b e r s h i p p r o b a b i l i t i e s are not available, it is difficult to determine the absence of substructures, and the database search is relatively slow. A major problem is t h a t the distance measured in spectral space may not c o r r e s p o n d w i t h t h e d i s t a n c e in "structure" space. In a d d i t i o n to classifying m a s s spectral data (6, 7), neural networks have been used to interpret IR spect r a (8-12) and to predict 1 3 C NMR shifts (13-15). PAIRS. PAIRS, a Program for the Analysis of Infrared Spectra (16, 17), uses CONCISE (Computer-Oriented Notation Concerning Infrared Spectral Evaluation), a language similar to English, for the rules. However, the program itself is written in FORTRAN rather than in languages such as LISP or PROLOG, which are designed for expert systems. Programming an expert system in a standard computer language such as FORTRAN is a prodigious task. The PAIRS interpreter uses a trees t r u c t u r e d h i e r a r c h y of r u l e s t h a t define absorption regions associated with particular substructures. A probability is assigned to a substruct u r e on the basis of how well a r e corded spectrum m a t c h e s with r e quired absorptions. EXSPEC. EXSPEC (18) consists of three main parts: a rule generator, a spectrum i n t e r p r e t e r , and a s t r u c ture generator. Designed to interpret IR and mass spectral data, the system is w r i t t e n u s i n g PROLOG. It

has an advanced user interface for a Macintosh computer, with pull-down menus and graphics displays. D E N D R A L . The well-known DENDRAL project started at S t a n ford U n i v e r s i t y in t h e late 1960s. The early heuristic DENDRAL prog r a m used a very simple decision tree to interpret the mass spectra of acyclic monofunctional compounds. This t r e e - s t r u c t u r e d classification system could not be generalized to h a n d l e t h e m a s s s p e c t r a of m o r e complicated s t r u c t u r e s . The m e t a DENDRAL program took as input a general model of mass spectral processes, together with more specific c o n s t r a i n t s a n d a set of e x a m p l e structures and spectra. From this information, the program derived class-specific s p e c t r u m prediction r u l e s t h a t w e r e more i n f o r m a t i v e t h a n the simple spectrum-structure correlation rules produced by other systems at t h a t time through pattern recognition techniques. However, the DENDRAL r u l e s were much more costly to develop. To generate candidate structures (19), t h e Stanford t e a m developed two programs: the constrained generator (CONGEN) and GENOA. The combinatorial process of generating all possible s t r u c t u r e s t h a t can be

R

iecently there has been much interest in using neural networks

assembled from various fragments, or substructures, is one t h a t a comp u t e r can h a n d l e much faster and more e x h a u s t i v e l y t h a n a h u m a n could. For CONGEN the spectra had to be interpreted by a spectroscopist to yield a set of distinct, nonoverlapping s u b s t r u c t u r e s t h a t could then be treated as superatoms. The superatoms and residual atoms were comb i n e d by f i r s t p l a c i n g t h e b o n d s among them and then expanding each superatom part. GENOA could handle overlapping s u b s t r u c t u r e s . The p r o g r a m built structures by adding successive comp o n e n t s while looking for possible overlaps with an existing partially assembled structure, as well as cre-

1088 A · ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993

ating new bonds among constituent a t o m s of components. Various ext r a v a g a n t claims for the success of t h e DENDRAL project h a v e b e e n made and refuted (20-22), and the project was discontinued some years ago. CHEMICS. The system for Combined Handling of Elucidation Methods for Interprétable Chemical Struct u r e s ( C H E M I C S ) is one of t h e longest established i n t e g r a t e d systems for computer-assisted structure elucidation (23). It uses proton NMR, 13 C NMR, and IR data and has a library of about 630 predefined small s t r u c t u r a l c o m p o n e n t s , w i t h one a t o m b e a r i n g free v a l e n c e ( s ) by w h i c h t h e c o m p o n e n t m a y be a t t a c h e d to a p a r t i a l l y a s s e m b l e d structure. Each component is characterized by s p e c t r a l f e a t u r e s t h a t m u s t be p r e s e n t in t h e s p e c t r a of all compounds containing t h a t component. In the interpretation process for an u n k n o w n , t h e component lists are filtered by composition constraints and by the exclusion of components associated with spectral features t h a t are not in the spectrum of the unknown. All distinct combinations of t h e p e r m i t t e d c o m p o n e n t s a r e passed on to the structure generator. Other constraints are needed to ensure t h a t the structure generator does not yield very large numbers of candidate structures. Although the CHEMICS structure generator (24) is intended primarily for automated structure elucidation, it does allow t h e u s e r to specify known s u b s t r u c t u r a l c o m p o n e n t s . The system h a s been extended for the use of 2D NMR information (25). E a c h c a r b o n - c a r b o n coupling r e v e a l e d in a 2D NMR e x p e r i m e n t identifies unambiguously a bond bet w e e n specific c o m p o n e n t a t o m s . This sort of information is very useful in reducing the complexity of the structure generation process. SESAMI. The Computer-Assisted S t r u c t u r e Elucidation (CASE) system, l a t e r improved as Systematic E l u c i d a t i o n of S t r u c t u r e Applying Machine Intelligence (SESAMI), is an AI approach in which the molecular formula and the spectral data for an unknown compound are used to g e n e r a t e a m a n a g e a b l e n u m b e r of plausible structures t h a t are examined by a h u m a n expert for assignment of a unique structure (10, 26). Scientists thus are spared the most time-intensive step in the structure elucidation process but intervene at the stage when they can proficiently select t h e c o r r e c t s t r u c t u r e . E n -

hanced structure elucidation r a t h e r t h a n a u t o m a t e d s t r u c t u r e elucidation is the goal. In SESAMI, data for the unknown are interpreted by a module known as INTERPRET, which involves two r o u t i n e s : P R U N E a n d I N F E R . An e x h a u s t i v e l i s t of b a s i c u n i t s of structure is input to PRUNE, which outputs a short list of compatible basic units of structure. INFER outputs s t r u c t u r a l inferences t h a t will be used as constraints on the structure g e n e r a t i o n p r o g r a m . These s t r u c tural inferences and the short list of basic structural units are input to a structure generation program called COCOA, w h i c h s t a n d s for constrained combination of atom-centered fragments (ACFs) (27), and possible structures for the unknown are output. The spectrum prediction component is called SIMULATE. The predefined structural building blocks must be neither too large nor too small: The compromise was onelayered ACFs. The library holds a b o u t 5 1 0 0 of t h e s e c o m p o n e n t s (compared with 630 in CHEMICS). COCOA uses structure generation by reduction (27). More recently, 2D NMR data was used in SESAMI (28). In the GENOA project the fragments from 2D NMR had to be derived manually and input to the GENOA program. In SESAMI t h e c o m p u t e r accepts s i g n a l connectivity information and generates fragments consistent with the d a t a a n d t h e i r s y m m e t r y implications. One reason for moving from ASSEMBLE to COCOA was that the additional required substructures (e.g., from other spectroscopic methods) e n t e r e d a s c o n s t r a i n t s w e r e used inefficiently by ASSEMBLE. Usually 2D NMR information has been used as a complement to other techniques: It is used to distinguish b e t w e e n a l t e r n a t i v e s l a t e r in t h e structure elucidation process rather t h a n to construct compatible struct u r e s v e r y e a r l y in t h e p r o c e s s . INFER2D and COCOA show promise in the prospective use of 2D NMR data in directly constructing compatible molecules even in the presence of molecular symmetry. Until recently, SESAMI relied largely on proton, 1 3 C, and 2D NMR data, but there has been a desire to add IR spectral interpretation to the system. Different teams have tried a variety of techniques for IR spectrum interpretation, including simple search and match; correlation tables; statistics; set theory; pattern recognition; and expert systems, both rule- and table-driven. Neural net-

works a r e a m o n g t h e m o s t recent tools, a n d t h e S E S A M I t e a m h a s started to investigate them (8-10). D A R C . The DARC PLURIDATA system had several databases (includi n g 1 3 C NMR, m a s s s p e c t r a , a n d X-ray diffraction) on a French computer network (29). Spectra, bibliog r a p h i c d a t a , a n d chemical s t r u c t u r e s w e r e s t o r e d . R e t r i e v a l by complete or partial spectrum, structure or substructure, or various text fields was possible. Users could input 13 C NMR chemical shifts and find structural features that could be combined into candidate structures. Spectral simulation was also possible.

M

Q :>

2 ^ > c

\0pectral prediction is useful for ranking candidate structures

The DARC system Elucidation by Progressive Intersection of Ordered Substructures (EPIOS) was designed primarily for structure elucidation, unlike earlier systems more geared to prediction or assignment of chemical shifts (30). It uses DARC codes (31, 32) to r e p r e s e n t t h e s u b s t r u c tures. More recent work at the Univ e r s i t y of P a r i s includes chemical shift simulation as part of the Shift Evaluation for Resonating Carbons (SERC) system a n d t h e u s e of 2D NMR information (33). SPEKTREN. The German Cancer Research Center has a relational database management system for 8500 13 C NMR, 3000 IR, and 20,000 mass spectra in SPEKTREN-II (1989 figures) (34, 35). The system stores the full topology of structures as connection tables (including stereochemistry) as well as H i e r a r c h i c a l l y Ord e r e d S p h e r i c a l d e s c r i p t i o n of E n v i r o n m e n t (HOSE) codes (4, 36) which, in this system, allow the environment of a carbon atom to be described out to four spheres. I n 1 3 C NMR s p e c t r a l m a t c h i n g (37), the substructure codes of a certain number of best hits of a match a r e d i s p l a y e d i n a h i s t o g r a m of abundance. Substructure codes t h a t occur frequently should also be p a r t of the structure of the unknown. The system can also be used to assign

shifts, check database integrity, and predict a spectrum. For spectrum prediction, if a substructure code of a hypothetical s t r u c t u r e is not completely represented in four spheres, it is reduced sphere by sphere until at least one representative is found in the database. This process broade n s t h e r a n g e of r e t r i e v e d shift v a l u e s . Fuzzy set t h e o r y h a s also been applied (38). S P E K T R E N h a s a r o u t i n e (39) t h a t derives classification rules automatically from the fingerprint region of a set of IR spectra. The perc e n t a g e of p e a k s b e l o n g i n g to a cluster within a set of spectra is used as a measure to train classification rules and define families of spectra. CARBON. This expert system is d e s i g n e d for u s e on P C s to solve problems in 1 3 C NMR spectroscopy (40-42). It is built around a database of 2500 assigned 1 3 C NMR spectra and a knowledge base consisting of s p e c t r a - s t r u c t u r e correlations, t a bles of data, mathematical formulas, a n d g r a p h - t h e o r y procedures. Zupan's team has described four different forms of storing the knowledge: formulas, tables, libraries, and hierarchical trees. The spectra are preprocessed for hierarchical clustering, and hierarchical trees are generated for use in library and interpretative searches. CARBON also has its own structure generator (19). MAPS. MS/MS has been used for more than 10 years, but no standard for collection of spectra exists and no database h a s been established. One problem with basic MS is t h a t the p r o d u c t s of all t h e f r a g m e n t a t i o n processes overlap in one spectrum. From MS/MS d a t a it is possible to derive specific product ions, neutral losses, and precursor-to-product transitions. Enke's team (43) has developed an algorithm called Method for Analyzing Patterns in Spectra (MAPS) that automatically identifies the relationships between MS and MS/MS spectral features and substructures. MAPS expresses these relationships in the form of production rules that can be used to help identify the presence or absence of substructures in unknown compounds. Inclusion and exclusion rules are generated automatically by using p a t t e r n recognition techniques. MAPS operates in two modes: learning and identification. Developed in I n t e r L I S P - D , it generates structures via GENOA. Other s y s t e m s . The multidimensional system SEAC, Structure Eluc i d a t i o n Aided by C o m p u t e r , h a s b e e n developed into SCANNET, a

ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993 · 1089 A

A/C

INTERFACE

microcomputer system for 13 C NMR, Ή NMR, IR, Raman, UV, and mass spectra (44, 45). In EXPERTISE, an expert system for evaluating IR spectra, t h e rules for finding t h e substructure features are obtained by a pattern recognition procedure. The system uses superelements, superbonds, and superatoms. The structure generation al­ gorithm links superatoms (46). A more unusual recent application for u s e on Macintosh computers is SpecTool, a h y p e r m e d i a toolkit for structure elucidation based on Hy­ perCard (47, 48). Various educational programs such as SpectraBook (for IBM) a n d SpectraDeck (for Macintosh), written by Paul Schatz of the University of Wisconsin-Madison, teach the prin­ ciples of i n t e r p r e t a t i o n for IR, 1 H NMR, 1 3 C NMR, a n d mass spectra. The Journal of Chemical Education: Soflware specializes in distributing edu­ cational software. 13 C NMR s p e c t r a l p r e d i c t i o n . 13 C NMR spectra have t h e a d v a n ­ tage t h a t a signal is g u a r a n t e e d in the spectrum for every carbon atom in the molecule. This one-to-one r e ­ l a t i o n s h i p m a k e s 1 3 C NMR d a t a ­ bases particularly useful for deduc­ i n g t h e s t r u c t u r e of a n u n k n o w n compound from i t s spectrum. Less reliably, the 13 C NMR spectrum of a given structure can be predicted by using a spectral database. Systems for prediction of mass spectra and IR spectra are not well developed. Computer-aided structure elucida­ tion systems output large numbers of possible candidate s t r u c t u r e s , a n d spectral prediction is one tool t h a t can be used to select the best candi­ dates. Spectral prediction alone should not be used for excluding can­ didates, b u t it is useful for r a n k i n g structures. The 13 C NMR spectrum of a candi­ date can be predicted by identifying the structural environment (prefera­ bly the stereochemical environment) of each carbon atom and looking u p t h a t e n v i r o n m e n t in a h i g h l y d e ­ tailed table relating substructures to chemical shift ranges. It is best if the method of substructure representa­ tion used allows t h e environment of a carbon atom to be described with details of all atoms out to a radius of a t least four bonds, because shifts can be very sensitive to the relative positions of substituents on rings, for example. A five-bond substructural model means that a substituent at a para position c a n be t a k e n into ac­ count. A good system will report a measure of the quality of the predic­

tion a n d indicate t h e substructures used. As a general tool this approach is l e s s t h a n s a t i s f a c t o r y b e c a u s e it suffers from t h e lack of a database sufficiently comprehensive in s u b structural content, especially if ste reochemical information is consid­ ered. The three most common a p ­ p r o a c h e s for p r e d i c t i n g 1 3 C NMR spectra are database retrieval methods (4, 36, 49), linear additivity relationships (50, 51), and empirical modeling techniques latterly u s i n g neural networks (13, 14). J u r s ' team has reported a combined approach in which database retrieval and empiri­ cal modeling methods are used (52). Systems used on PCs Although much r e s e a r c h h a s been done in the field of spectral database systems, very few systems are com­ mercially available. Some examples are described below. In addition, cer­ tain commercially available software packages in t h e fields of molecular modeling and computational chemis­ try allow prediction of spectra as a spin-off of semiempirical q u a n t u m mechanics, but these are beyond the scope of this article. PC s y s t e m s l i n k e d t o substruc­ t u r e s e a r c h i n g software. Sadtler's PC SearchMaster contains 160,000 IR spectra (about 80,000 with chemi­ cal structures) and 34,000 1 3 C NMR

spectra with structures searchable in a system under Microsoft Windows, which includes t h e H a m p d e n Data Services' s t r u c t u r e s e a r c h e n g i n e (31). Full spectrum search is possi­ ble, and Sadtler's Hit Quality Index is used to list t h e h i t s in order of closeness of matching. Up to four IR spectra can be displayed in different colors on one screen for comparison, and t h e relevant chemical structure is displayed. Spectral subtraction is possible, a n d a difference spectrum can be displayed. The Scientific I n s t r u m e n t s Divi­ sion of Hewlett Packard h a s added chemical structures to its G1034 MS ChemStation Windows-based GC/MS and IR instrument controller a n d a n a l y s i s software. U s e r s c a n create a "user library" of structures corresponding to spectra by transfer­ ring structures from MDL Informa­ tion Systems' ISIS software. Five MS l i b r a r i e s ( N I S T / N I H / E P A ; Wiley; Pfleger, Maurer, and Weber Drug Li­ brary; a n d two H P libraries) can be searched using PBM software. Three IR libraries of vapor-phase spectra are offered: The EPA spectra, a col­ lection of 5000 spectra from Aldrich, and the Robertet Fragrance Library. The search software was developed by H P in conjunction with Galactic Industries Corp. These systems go one step beyond t h e simple spectral d a t a b a s e soft­ w a r e t h a t allows s t r u c t u r e display

SPEC 15752) 2.0 1.51.0· 0.50.0220

JLU. 200 180 160 ΜΠΗ00006370 I 25344)

140

120

100

80

60

40

20

0 IR-P

JK. 4000

3600 3200 2800 2400 MSMHÛ0000484 ( 50260)

2000

1800 1600 1400 1200 1000

800

600

-

80 60 40 20 J

400 MS

,1, 10

30

Figure 1. Speclnfo benzopyran.

Ji J 1

1

40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200

13

C NMR, IR, and mass spectra of 3,4-dihydro-1 H-2-

(Courtesy Chemical Concepts)

1090 A · ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993

but has no structure search features. However, they are not truly struc­ ture elucidation systems. Spectral libraries and struc­ ture display. The PC version of the NIST/EPA/NIH mass spectral data­ base can be searched by CAS (Chem­ ical Abstracts Service) registry num­ ber, chemical n a m e , molecular formula or weight, u p to 10 p e a k s with an intensity range for each, an input spectrum, n e u t r a l losses, and highest m a s s peak. Spectra can be displayed, a n d a utility is supplied for users to add their own spectra. Wiley s u p p l i e s t h e fifth edition (1992) of its mass spectral database on diskette or CD-ROM with struc­ t u r e s . Also included is B e n c h T o p / PBM software for spectral l i b r a r y searching and structure display. CRC Press h a s issued Properties of Organic Compounds on CD-ROM. Searches of principal spectral peaks (MS, IR, Raman, UV, and NMR) are possible. Data from the NIST/EPA/ NIH mass spectral database are in­ cluded. Sprouse Scientific markets a grow­ ing list of small FT-IR libraries, con­ taining no older dispersive spectra a n d no r e d i g i t i z e d s p e c t r a . T h e Quick-Search software allows users to create and search libraries of IR spectra. In all four of these systems chemi­ cal s t r u c t u r e s can be displayed b u t not searched by substructure. I n t e r p r e t a t i o n of s p e c t r a . MassSpec ( T r i n i t y Software) is a graphics-based program for Macin­ tosh computers or PCs t h a t aids the i n t e r p r e t a t i o n of m a s s s p e c t r a . Structures are entered graphically, and MassSpec t h e n g e n e r a t e s t h e fragments t h a t would be formed by breaking one, two, or t h r e e bonds. For any given m a s s n u m b e r , MassSpec will h i g h l i g h t t h e frag­ m e n t s on t h e original s t r u c t u r e in sequence. Spire Software's MacSpec for Mac­ intosh computers takes a graphically entered structure and allows users to assign fragment ion structures. They can then simulate the spectrum r e ­ sulting from t h e list of fragments. Isotope cluster analysis is also fea­ tured. S a d t l e r offers IR Mentor, a pro­ gram to help users interpret their IR spectra. This product s u m m a r i z e s i n f o r m a t i o n on IR i n t e r p r e t a t i o n from several reference works. Larger systems

The Chemical Information Sys­ t e m (CIS). This system, which origi­ nated at t h e National I n s t i t u t e s of

Continu» C M K / 1 2 ) 1 STOP «ei2

I

3 S

c

R3(C,,)

c

a7aizM

V

\

\

Β***· II Jg

"———ι

Y

/

t

"

1 ι ±____w*

\

-1

m . -' -•» Κ • • — ' " ^ • f-

/

\\ 1 1

»..-;lr-»»«.i.-i«_ji_^i«_"__J,^^gB

Figure 2. Speclnfo substructure analysis of a doublet at 20 ppm, which may be caused by a C atom in a cyclopropane group. (Courtesy Chemical Concepts)

Health and the U.S. Environmental Protection Agency in the 1970s, of­ fers a range of numeric and factual databases linked to a central index of chemical s t r u c t u r e s online a n d is now supported by Chemical Informa­ tion Systems, a division of PSI Inter­ national. The Structure and Nomen­ c l a t u r e S e a r c h S y s t e m (SANSS), which h a n d l e s t h e chemical s t r u c ­ ture component, is based on search algorithms developed by F e l d m a n n (31). Searches for specific structures can be carried out by name, molecu­ lar formula, and CAS registry num­ ber; searches for substructures can be c a r r i e d out by n a m e fragment, partial molecular formula, and query structure. O n e d a t a b a s e on t h e C I S i s WMSSS, t h e Wiley Mass S p e c t r a l Search System, which offers 140,000 mass spectra for about 120,000 com­ p o u n d s , including t h e N I S T / E P A / NIH spectra. The d a t a can be searched on the basis of individual peak and intensity values as well as the complete spectrum (using Biemann or probability-based matching methods). Spectra may be displayed g r a p h i c a l l y or a s t a b l e s of p e a k / intensity values. S p e c l n f o . This structure-related spectroscopic d a t a b a s e system for VAX/VMS computers, used in house or on line, was written by Bremser (36) at BASF s t a r t i n g in the 1970s. Development is now a cooperative ef­

fort of BASF a n d various academic t e a m s (such a s t h o s e a t t h e M a x Planck Institut fur Kohlenforschung in Mulheim and IS AS, the Institute for S p e c t r o c h e m i s t r y a n d Applied Spectroscopy in D o r t m u n d ) with C h e m i c a l C o n c e p t s in W e i n h e i m , which m a r k e t s t h e system (53, 54). Not only is Speclnfo " m u l t i d i m e n ­ s i o n a l " (Figure 1), h a n d l i n g more t h a n one type of spectral technique, b u t i t also i n c o r p o r a t e s c h e m i c a l structure handling software. Further program development is done with the Technical Universities of Munich and Vienna, Toyohashi University, Bergakademie Freiberg, and others. The project is supported by the Ger­ m a n Federal Ministry of Research and Technology. The following searches are possi­ ble: spectral identity, spectral simi­ l a r i t y , p r e s e n c e of s p e c t r a l l i n e s (variable n u m b e r of missing lines), IR b a n d s , m a s s peaks a n d n e u t r a l loss reactions, full and partial struc­ tures, similar structures, systematic CA names or partial names, CA reg­ istry numbers, and ranges of molecu­ lar formulas a n d molecular weights. Hit lists r e s u l t i n g from spectral similarity searches are analyzed sta­ tistically for structural information, and relevant partial structures can be extracted automatically; interpre­ tative searching is possible (Figure 2). NMR spectra of arbitrary struc­ t u r e s can be s i m u l a t e d . Coupling

ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993 · 1091 A

A/C

INTERFACE

constants of arbitrary structures are estimated in a m a n n e r analogous to the chemical shift prediction. Spect r a l d a t a can be displayed both graphically and as a listing. The i n - h o u s e s y s t e m u s e s color graphic displays. Speclnfo is suitable for use on a Wide Area Network and with a wide variety of inexpensive terminals (albeit with consequent restrictions on the appeal of the user interface). There are about 105 users in ICI, in an essentially client-server environment (55). The full topology of each chemical s t r u c t u r e is stored as a connection t a b l e , b u t s i m i l a r i t y s e a r c h i n g is done by functional group encoding in a system based on Bremser's HOSE codes or Hierarchically Ordered Ring Description (HORD) codes (36). At the moment Speclnfo can use HOSE codes to only 4 bond levels, and in practice it is often only able to use a 3-bond substructural model because of a 96-bit limit to the HOSE codes. Data storage and retrieval are b a s e d on t h e r e l a t i o n a l d a t a b a s e m a n a g e m e n t system SYBASE, and various indexes support fast data retrieval. Individual users can be given different access privileges, and spectral collections in any desired combination can be made accessible to various users or kept private. In the in-house system, structure queries are entered by naming various fragments and giving n u m b e r s for ring positions. The input is textual, but the structure is displayed graphically. MOLfile input and output (a standard for the MACCS system of MDL Information Systems) is supported, and ChemBase databases (also from MDL I n f o r m a t i o n S y s tems)—including structures and peak assignments—can be imported i n t o S p e c l n f o . MACCS a n d I S I S links are planned. It is important for companies and analytical chemists to be able to incorporate their own data into a commercially available database system. Doing so increases both the size and the coverage of the database and introduces structures of specific local interest. Data entry methodology in Speclnfo is by no m e a n s ideal, but NMR p e a k l i s t s can be i m p o r t e d from Bruker, Varian, or JEOL spectrometers; IR spectra in JCAMP-DX format; and mass spectra as EPA or JCAMP-DX files. Special r o u t i n e s are then used to check the format of the input spectra and compare NMR spectra with calculated ones to detect errors and wrong peak assignm e n t s . D a t a c a p t u r e from i n s t r u m e n t s is i m p o r t a n t , b u t problems

with standards hinder fast progress. This month Chemical Concepts will l a u n c h t h e n e x t v e r s i o n of Speclnfo, an X-Windows-driven version t h a t will include stereochemist r y a n d a n e x t e n d e d H O S E code. Later versions may incorporate the results of research in progress, such as the integration of 3D coordinates and a proton NMR module being developed by BASF. In addition, BASF, Chemical Concepts, Toyohashi, and Sumitomo are u s i n g t h e s t r u c t u r e g e n e r a t o r in C H E M I C S in a n a t t e m p t to a u t o m a t e s t r u c t u r e e l u c i d a t i o n completely. Varmuza's Exploratory Data Analysis of Spectra, EDAS (56), was recently linked to Speclnfo as a sepa r a t e m o d u l e to allow s t a t i s t i c a l analysis of Speclnfo mass spectra hit lists via p a t t e r n recognition techniques. At the University of Munich, G a s t e i g e r ' s t e a m is w o r k i n g on a mass spectral simulation tool. Coope r a t i v e d e v e l o p m e n t on t h e s e m i automatic interpretation of IR spect r a h a s b e e n s t a r t e d by O t t o a t Bergakademie Freiberg, Gasteiger at the University of Munich, and Zupan in Ljubljana. S p e c l n f o Online. An online syst e m m a y not be ideal for o r g a n i z a t i o n s t h a t h a n d l e m a n y novel compounds and wish to integrate inhouse and publicly available spectra.

However, the Speclnfo d a t a b a s e is available on line on STN I n t e r n a tional (54). Many CA registry numbers have already been assigned and eventually will be added for all the chemical structures. Beilstein registry numbers will be added for struct u r e s t h a t can be converted to the Beilstein format. Five file-specific software applications are available: SPECAL for estimating NMR spectroscopic i n f o r m a t i o n for a q u e r y structure, COUPCAL for estimating coupling constants for a query structure, CHESS for searching for chemical structures identical or similar to a query structure using HOSE and HORD codes, EDSPEC for formulati n g q u e r i e s , a n d G E T S P E C for searching for spectra. Structures are input on line by using a slightly modified version of the STN Structure command or off line in STN Express. Structure searching is done by u s i n g t h e RUN C H E S S command to execute a structure code similarity search. Various search options, such as structure identity and s t r u c t u r e functionality, a r e a v a i l able. The familiar STN Messenger s u b s t r u c t u r e search system is not used. T h e e d i t o r for s p e c t r a l q u e r i e s ( E D S P E C ) allows u s e r s to c r e a t e new queries or edit existing ones. It can handle spectra uploaded as peak

Figure 3. CSEARCH data sheet showing structure, and assignments. (Courtesy Bio-Rad Sadtler Division)

1092 A · ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993

13

C NMR spectrum, peak list,

A/C

INTERFACE

constants of arbitrary structures are estimated in a m a n n e r analogous to the chemical shift prediction. Spect r a l d a t a can be displayed both graphically and as a listing. The i n - h o u s e s y s t e m u s e s color graphic displays. Speclnfo is suitable for use on a Wide Area Network and with a wide variety of inexpensive terminals (albeit with consequent restrictions on the appeal of the user interface). There are about 105 users in ICI, in an essentially client-server environment (55). The full topology of each chemical s t r u c t u r e is stored as a connection t a b l e , b u t s i m i l a r i t y s e a r c h i n g is done by functional group encoding in a system based on Bremser's HOSE codes or Hierarchically Ordered Ring Description (HORD) codes (36). At the moment Speclnfo can use HOSE codes to only 4 bond levels, and in practice it is often only able to use a 3-bond substructural model because of a 96-bit limit to the HOSE codes. Data storage and retrieval are b a s e d on t h e r e l a t i o n a l d a t a b a s e m a n a g e m e n t system SYBASE, and various indexes support fast data retrieval. Individual users can be given different access privileges, and spectral collections in any desired combination can be made accessible to various users or kept private. In the in-house system, structure queries are entered by naming various fragments and giving n u m b e r s for ring positions. The input is textual, but the structure is displayed graphically. MOLfile input and output (a standard for the MACCS system of MDL Information Systems) is supported, and ChemBase databases (also from MDL I n f o r m a t i o n S y s tems)—including structures and peak assignments—can be imported i n t o S p e c l n f o . MACCS a n d I S I S links are planned. It is important for companies and analytical chemists to be able to incorporate their own data into a commercially available database system. Doing so increases both the size and the coverage of the database and introduces structures of specific local interest. Data entry methodology in Speclnfo is by no m e a n s ideal, but NMR p e a k l i s t s can be i m p o r t e d from Bruker, Varian, or JEOL spectrometers; IR spectra in JCAMP-DX format; and mass spectra as EPA or JCAMP-DX files. Special r o u t i n e s are then used to check the format of the input spectra and compare NMR spectra with calculated ones to detect errors and wrong peak assignm e n t s . D a t a c a p t u r e from i n s t r u m e n t s is i m p o r t a n t , b u t problems

with standards hinder fast progress. This month Chemical Concepts will l a u n c h t h e n e x t v e r s i o n of Speclnfo, an X-Windows-driven version t h a t will include stereochemist r y a n d a n e x t e n d e d H O S E code. Later versions may incorporate the results of research in progress, such as the integration of 3D coordinates and a proton NMR module being developed by BASF. In addition, BASF, Chemical Concepts, Toyohashi, and Sumitomo are u s i n g t h e s t r u c t u r e g e n e r a t o r in C H E M I C S in a n a t t e m p t to a u t o m a t e s t r u c t u r e e l u c i d a t i o n completely. Varmuza's Exploratory Data Analysis of Spectra, EDAS (56), was recently linked to Speclnfo as a sepa r a t e m o d u l e to allow s t a t i s t i c a l analysis of Speclnfo mass spectra hit lists via p a t t e r n recognition techniques. At the University of Munich, G a s t e i g e r ' s t e a m is w o r k i n g on a mass spectral simulation tool. Coope r a t i v e d e v e l o p m e n t on t h e s e m i automatic interpretation of IR spect r a h a s b e e n s t a r t e d by O t t o a t Bergakademie Freiberg, Gasteiger at the University of Munich, and Zupan in Ljubljana. S p e c l n f o Online. An online syst e m m a y not be ideal for o r g a n i z a t i o n s t h a t h a n d l e m a n y novel compounds and wish to integrate inhouse and publicly available spectra.

However, the Speclnfo d a t a b a s e is available on line on STN I n t e r n a tional (54). Many CA registry numbers have already been assigned and eventually will be added for all the chemical structures. Beilstein registry numbers will be added for struct u r e s t h a t can be converted to the Beilstein format. Five file-specific software applications are available: SPECAL for estimating NMR spectroscopic i n f o r m a t i o n for a q u e r y structure, COUPCAL for estimating coupling constants for a query structure, CHESS for searching for chemical structures identical or similar to a query structure using HOSE and HORD codes, EDSPEC for formulati n g q u e r i e s , a n d G E T S P E C for searching for spectra. Structures are input on line by using a slightly modified version of the STN Structure command or off line in STN Express. Structure searching is done by u s i n g t h e RUN C H E S S command to execute a structure code similarity search. Various search options, such as structure identity and s t r u c t u r e functionality, a r e a v a i l able. The familiar STN Messenger s u b s t r u c t u r e search system is not used. T h e e d i t o r for s p e c t r a l q u e r i e s ( E D S P E C ) allows u s e r s to c r e a t e new queries or edit existing ones. It can handle spectra uploaded as peak

Figure 3. CSEARCH data sheet showing structure, and assignments. (Courtesy Bio-Rad Sadtler Division)

1092 A · ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993

13

C NMR spectrum, peak list,

lists and spectra from the Speclnfo database or calculated by SPECAL. T h e m o s t r e c e n t v e r s i o n of GETSPEC searches both 1 3 C NMR spectra and mass spectra, the latter using SISCOM (Search for Identical and Similar Compounds) (57-59). It outputs accession numbers and spec­ trum sequence numbers with a simi­ larity percentage. An STN a n s w e r L-number is created. M a s s L i b . This set of p r o g r a m s was built at the Max Planck Institut fur Kohlenforschung for the evalua­ tion of low-resolution mass spectra (57-59). A l t h o u g h it is now being m a r k e t e d by C h e m i c a l C o n c e p t s , Henneberg and co-workers are still writing software for MassLib. The program can be used on VAX/VMS, or D E C / U L T R I X c o m p u t e r s , a n d HP-UX and Sun/OS versions are be­ ing developed. The combined Wiley a n d N I S T / E P A / N I H collections are available t o g e t h e r w i t h m o r e t h a n 15,900 spectra and structures (as of March 1993) from the Max Planck Institut fur Kohlenforschung, 5100 spectra from Seibl at Ε Τ Η Zurich, and a n additional, more specialized collec­ tion. Users can build their own li­ b r a r i e s by a d d i n g s t r u c t u r e s w i t h the MassLib structure editor or via MOLfiles or by e n t e r i n g s p e c t r a m a n u a l l y from a s p e c t r o m e t e r or from a standard file (e.g., JCAMP). Searches for CA registry numbers, CA names, molecular formulas, and similar structures are possible. Spec­ t r a l s i m i l a r i t y s e a r c h (SISCOM), neutral loss similarity search, "tar­ get search" (the search for a known spectrum in an analysis), and single peak search are featured. SISCOM, the interpretative library search sys­ tem, is based on five efficient r a n k ­ ing p a r a m e t e r s ; it can find s t r u c ­ tures related to the unknown even in cases where the spectra are visually dissimilar. SISCOM can discrimi­ n a t e between methyl and ethyl es­ ters that have almost indistinguish­ able spectra. Structure similarity searches were i n t r o d u c e d in 1 9 9 1 . A s y s t e m of structural descriptors was developed from MS considerations. The goal is to extract structures t h a t are rele­ vant for the interpretation of an un­ known spectrum and to allow stan­ d a r d d a t a b a s e o p e r a t i o n s such as retrieving a class of compounds. Par­ tial s t r u c t u r e a n a l y s i s of hit lists based on s t r u c t u r e descriptors has been implemented, and grouping has been done according to structural pa­ r a m e t e r s . P r o g r e s s also h a s been made in a u t o m a t i n g s t r u c t u r e pro­

posals. The significant partial struc­ t u r e s of a spectral hit list serve as t a r g e t s for s t r u c t u r e s i m i l a r i t y search, giving a representation of all the available information relevant to t h e i n t e r p r e t a t i o n of a n u n k n o w n spectrum. Various spectral manipu­ lations (e.g., addition and subtrac­ tion of spectra) and hit list analysis options are also available. CSEARCH. This system for h a n ­ dling 1 3 C NMR spectra was devel­ oped by Robien a n d co-workers a t the University of Vienna. They are working on new algorithms (60, 61) while S a d t l e r commercializes t h e s y s t e m (62). A l t h o u g h t h e s y s t e m c u r r e n t l y h a n d l e s only 1 3 C NMR spectra, Sadtler introduced the dis­ play of IR spectra early in 1993 and plans to introduce a product in the near future for searching IR spectra.

CSEARCH allows spectroscopists to create their own databases

CSEARCH runs under X-Windows on Silicon G r a p h i c s w o r k s t a t i o n s , S u n S P A R C s t a t i o n s , S u n 4, VAX, and VAXstations. Many people are also using X-terminals. The user-friendly color graphics interface is a particular advantage of CSEARCH. However, the laying out of menu items along three layers of a w a l l of " b r i c k s " a t t h e top of t h e screen and the appearance of other options at the bottom or the right is not a computer-industry standard. With the tools provided users can read s p e c t r o m e t e r - g e n e r a t e d peak lists directly; search spectra for iden­ tical chemical shift patterns; request all functional groups for a chemical shift; r e q u e s t t h e c h e m i c a l shift range for a functional group; search by either full structure or substruc­ tures; combine chemical shift with structure searches; retrieve data based on author, source, or proper­ t i e s ; a n d e s t i m a t e s p e c t r a from a structure proposal. CSEARCH allows spectroscopists to create their own databases. Pass­ word protection at nine levels, auto­ matic data checking, and assignment comparisons are featured. Data en­ t r y h a s been much improved w i t h

t h e addition of a new, albeit non­ standard and proprietary, molecule editor. One important advantage of CSEARCH is the point-and-click in­ terface that provides an easy assign­ ment method for customers to record unique i n t e r p r e t a t i o n s into CSEARCH d a t a b a s e s . D a t a can be imported in V a r i a n (old and new), B r u k e r (old and new), and FELIX compatible forms. Robien's develop­ m e n t version also includes NMRi, WinNMR, and JEOL interfaces. Before beginning a search, u s e r s choose a database or databases. Mul­ tiple d a t a b a s e s are searched as if they were a contiguous set. CSEARCH allows two large groups of databases, and access can be r e ­ stricted to some of these. Structure input is accomplished by the proprietary molecule editor or by Robien line n o t a t i o n (Line Note), SMD, JCAMP-CS, or MOLfile. Users can r e s t r i c t t h e i r searches by r i n g size, molecular formula, homologous series, or molecular weight, or they can choose no restriction at all. The initial screen search uses three-atom fragments, molecular formula, and number of rings. Structures and hit lists can be stored. After a search u s e r s have four choices of display: structures, spectra, data sheet (Fig­ ure 3), or structures and spectra. Moving the cursor to point to any carbon atom in the structure results in an unambiguous identification. A box is drawn around the correspond­ ing carbon atom(s) in the assignment list. A prominent triangle points to the line(s) in the simulated 13 C NMR spectrum t h a t correspond(s) to the chemical shift position(s) of the se­ lected carbon atom(s). If a peak list is entered, the results of a similarity search are a spectrum of the exact compound if it is in the database and other spectra t h a t are the nearest matches. Structures that correspond to t h e s e spectra can be displayed four to a screen. Stereo­ chemical display is available in ver­ sion 5. Like Speclnfo, CSEARCH uses Bremser's HOSE and HORD codes. W h e n a s t r u c t u r e is d i s p l a y e d , CSEARCH explains what substruc­ t u r e s it used in finding/predicting t h a t structure by highlighting them within the display of the complete structure. CSEARCH has the advan­ tage of using HOSE codes to 5 bond levels. The "homologous series" display or increment function is unique to C S E A R C H : One s p e c t r u m is d i s ­ played above another, and a set of link lines indicates how a peak has

ANALYTICAL CHEMISTRY, VOL 65, NO. 24, DECEMBER 15, 1993 · 1093 A

A/C

INTERFACE Hearmon of ICI, Morton Munk of Arizona State U n i v e r s i t y , Michael P e n k of Chemical Con­ cepts, Kazutoshi Tanabe of NIMCR, and Bruce Woods of Bio-Rad Sadtler Division.

Figure 4. CSEARCH comparison of two spectra and structures showing substituent effect. (Courtesy Bio-Rad Sadtler Division) moved in one s p e c t r u m c o m p a r e d with the other (Figure 4). The effect of a substituent on a chemical shift can thus be clearly illustrated. Dur­ ing spectral prediction, t h e r e is an option for detailed a n a l y s i s of t h e database entries contributing to each estimated chemical shift. Histograms of chemical shifts allow selection of s u b s t r u c t u r e s by c h e m i c a l s h i f t range, enabling analysis of stereo­ chemical, conformational, or solvent effects. In CSEAECH, Boolean logic can be used to l i n k s t r u c t u r e a n d p e a k searches. In Speclnfo, users intersect the hit lists from separate peak and s t r u c t u r e s e a r c h e s , a l t h o u g h they could define a macro to simplify a query to just one command. L i t e r a t u r e s e a r c h i n g is possible with CSEARCH. It is available indi­ rectly via STN for Speclnfo. How­ ever, it is not clear t h a t most spectroscopists need t h e capability for literature searches within their spec­ tral database systems.

ing. M S / M S a n d 2D NMR e x p e r i ­ ments also generate large quantities of information for which automated i n t e r p r e t a t i o n is n e e d e d . P r o t o n NMR is often the first technique to be used on an unknown compound, because only a very small sample is needed to produce useful data, b u t a u t o m a t i c i n t e r p r e t a t i o n of proton NMR spectra is in its infancy. Sys­ tems for U V - v i s spectra have also appeared only recently. Despite the importance of stereo­ chemistry in determining the proper­ ties of compounds, most computerassisted structure elucidation systems do not incorporate routines for the constrained generation of ste­ reoisomers. The linking of Labora­ tory Information Management Sys­ tems (LIMS) with spectral database systems is also likely to be of interest in future. However, progress in auto­ mated spectral interpretation could well continue to be hindered by the lack of widely available software and quality databases.

Conclusion There is a trend toward experiments t h a t generate very large amounts of data. The continuing growth in the so-called hyphenated techniques such as GC/MS and GC/IR increases the demand for automated process­

The a u t h o r is very grateful to Gerry v a n d e r Stouw of Chemical Abstracts Service for finding literature references. This review has also been much improved by helpful comments from An­ dreas Barth of FIZ Karlsruhe, Kimito F u n a t s u of Toyohashi U n i v e r s i t y of Technology, Neil Gray of the University of Wollongong, Angus

1094 A · ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993

References (1) Warr, W. A. Anal. Chem. 1993, 65, 1045 A-1050 A. (2) Hippe, Z. Artificial Intelligence in Chem­ istry: Structure Elucidation and Simulation of Organic Reactions; Elsevier: Amster­ dam, 1991. (3) Jurs, P. C. In Reviews in Computational Chemistry; Lipkowitz, K. B.; Boyd, D. B., Eds.; VCH: New York, 1990; pp. 169212. (4) Passlack, M.; Bremser, W. In Comput­ er-Supported Spectroscopic Databases; Zupan, J., Ed.; Ellis Horwood: Chichester, England, 1986; pp. 92-117. (5) Zupan, J.; Gasteiger, J. Neural Net­ works for Chemists; VCH: Weinheim, Ger­ many, 1993. (6) Lohninger, H.; Stand, F. Fresenius J. Anal. Chem. 1992, 344(4-5), 186-89. (7) Curry, B.; Rumelhart, D. E. Tetrahe­ dron Comput. Methodol. 1990, 3(3/4), 213—37 (8) Robb, E. W.; Munk, M. E. Mikrochim. Acta (Wien) 1990, /, 131-55. (9) Munk, M. E.; Madison, M. S.; Robb, E. W. Mikrochim. Acta (Wien) 1991, //, 505-14. (10) Munk, M. E.; Velu, V. Κ.; Madison, M. S.; Robb, E. W.; Badertscher, M.; Christie, B. D.; Razinger, M. In Recent Advances in Chemical Information II; Col­ lier, H., Ed.; Royal Society of Chemis­ try: Cambridge, England, 1993; pp. 247-63. (11) Meyer, M.; Weigelt, T. Anal. Chim. Acta 1992, 265, 183-90. (12) Weigel, U-M.; Herges, R./. Chem. Inf. Comput. Sci. 1992, 32, 723-31. (13) Anker, L. S.; Jurs, P. C. Anal. Chem. 1992, 64, 1157-64. (14) Bail, J. W.; Jurs, P. C. Anal. Chem. 1993, 65, 505-12. (15) Kvasnicka, V.; Sklenak, S.; Pospichal, J. /. Chem. Inf. Comput. Sci. 1992, 32, 742-47. (16) Ying, L. S.; Levine, S. P.; Tomellini, S. Α.; Lowry, S. R. Anal. Chem. 1987, 59, 2197—2203 (17) Wythoff, B.; Hong-Kui, X.; Levine, S. P.; Tomellini, S. A. /. Chem. Inf. Com­ put. Sci. 1991, 31, 392-99. (18) Luinge, H-J.; van der Maas, J. H. Anal. Chim. Acta 1989, 223, 135-47. (19) Bohanec, S.; Zupan, J.J. Chem. Inf. Comput. Sci. 1991, 31, 531-40. (20) Buchanan, B. G.; Feigenbaum, Ε. Α.; Lederberg, J. Chemom. Intell. Lab. Syst. 1988, 5, 33-35. (21) Gray, N.A.B. Chemom. Intell. Lab. Syst. 1988 5 11—32 (22) Gray, N.A.B. Chemom. Intell. Lab. Syst. 1988 5 37—38 (23) Sasaki, S-L; Kudo, Y . / Chem. Inf. Comput. Sci. 1985, 25, 252-57. (24) Funatsu, K.; Miyabayashi, N.; Sasaki, S-I./. Chem. Inf. Comput. Sci. 1988, 28, 18-28. (25) Funatsu, K.; Susuta, Y.; Sasaki, S . / Chem. Inf. Comput. Sci. 1989, 29, 6-11. (26) Munk, M. E.; Farkas, M.; Lipkus, A. H.; Christie, B. Mikrochim. Acta 1986, //, 199-215. (27) Christie, B. D.; Munk, M. E . / Chem. Inf. Comput. Sci. 1988, 28, 87-93. (28) Christie, B. D.; Munk, M. E. /. Am. Chem. Soc. 1991, 113, 3750-57. (29) Dubois, J. E.; Bonnet, J. C. Anal. Chim. Acta 1979,112, 245-52. (30) Dubois, J. E.; Carabedian, M.;

Dagane, I. Anal. Chim. Acta 1984, 158, 213-33. (31) Chemical Structure Systems; Ash, J. E.; Warr, W. Α.; Willett, P., Eds.; Ellis Horwood: Chichester, England, 1991. (32) Dubois, J. E.; Panaye, Α.; Attias, R. / Chem. Inf. Comput. Sci. 1987,27, 74-82. (33) Panaye, Α.; Doucet, J-P.; Fan, Β. Τ. /. Chem. Inf. Comput. Sci. 1993, 33, 25865. (34) Fôrster, T.; von der Lieth, C. W.; Kôhler, I.; Opferkuch, H. J. Fresenius Z. Anal. Chem. 1987, 327, 71-72. (35) von der Lieth, C. W.; Fôrster, T. GIT Fachz. Lab. 1991, 35, 1245-46; 1249-50; 1253. (36) Bremser, W. Angew. Chem. Int. Ed. Eng. 1988, 27, 247-60. (37) von der Lieth, C. W.; Seil, J.; Kôhler, I.; Opferkuch, H. J. Magn. Reson. Chem. 1985, 23, 1048-55. (38) Seil, J.; von der Lieth, C. W. Fresenius Z. Anal. Chem. 1989, 333, 767-68. (39) Seil, J.; Kôhler, I.; von der Lieth, C. W.; Opferkuch, H. J. Anal. Chim. Acta 1986, 188, 219-27. (40) Razinger, M.; Zupan, J.; Novic, M. Mikrochim. Acta (Wien) 1986, //, 411-21. (41) Zupan, J.; Novic, M.; Bohanec, S.; Razinger, M.; Lah, L.; Tusar, M.; Kosir, I. Anal. Chim. Acta 1987, 200, 333-45. (42) Zupan, J.; Razinger, M.; Bohanec, S.; Novic, M.; Tusar, M.; Lah, L. Chemom. Intell. Lab. Syst. 1988, 4, 307-14. (43) Enke, C. G.; Wade, A. P.; Palmer, P. T.; Hart, K. J. Anal. Chem. 1987, 59, 1363 A—1371 A (44) Debska, B. Anal. Chim. Acta 1992,

265, 201-09. (45) Debska, B.; Hippe, Z. S.J. Mol. Struct. 1992, 267, 261-68. (46) Blaffert, T. Anal. Chim. Acta 1992, 265, 243-57. (47) Cadisch, M.; Farkas, M.; Clerc, J-T.; Pretsch, E. /. Chem. Inf. Comput. Sci. 1992, 32, 286-90. (48) Cadisch, M.; Pretsch, E. Fresenius J. Anal. Chem. 1992, 344(4-5), 173-77. (49) Small, G. W.J. Chem. Inf. Comput. Sci. 1992, 32, 279-85. (50) Pretsch, E.; Furst, Α.; Badertscher, M.; Biirgin, R.; Munk, M./. Chem. Inf. Comput. Sci. 1992, 32, 291-95. (51) Tusar, M.; Tusar, L.; Bohanec, S.; Zupan, J. /. Chem. Inf. Comput. Sci. 1992, 32 299—303. (52)' Jurs, P. C.; Ball, J. W.; Anker, L. S.; Friedman, T. L. /. Chem. Inf. Comput. Sci. 1992, 32, 272-78. (53) Bremser, W.; Grzonka, M. Mikro­ chim. Acta (Wien) 1991, //, 483-91. (54) Barth, A. /. Chem. Inf. Comput. Sci. 1993, 33, 52-58. (55) Hearmon, R. A. Spectroscopy Interna­ tional 1991, 3(7), 14-18. (56) Varmuza, K; Werther, W.; Henneberg, D.; Weimann, B. Rapid Commun. Mass Spectrom. 1990, 4, 159-62. (57) Domokos, L.; Henneberg, D.; Wei­ mann, B. Anal. Chim. Acta 1983, 150, 37-44. (58) Domokos, L.; Henneberg, D.; Wei­ mann, B. Anal. Chim. Acta 1984, 165, 61-74. (59) Domokos, L.; Henneberg, D.; Wei­ mann, B. Anal. Chim. Acta 1984, 165,

75-86. (60) Chen, L.; Robien, W./. Chem. Inf. Comput. Sci. 1992, 32, 501-06. (61) Chen, L.; Robien, W. /. Chem. Inf. Comput. Sci. 1992, 32, 507-10. (62) Robien, W.; Woods, B.; Tzodikov, N. Spectroscopy International 1991, 3(5), 1821.

Wendy A. Warr holds M.A. and D. Phil, degrees in chemistry from the University of Oxford (England). Before founding Wendy Warr & Associates in January 1992, she spent 20 years in the pharma­ ceutical industry. She is active in the Divi­ sion of Chemical Information of the ACS and serves on the Advisory Board of ACS Software and the ACS Committee on Copyrights. An Associate Editor of the Journal of Chemical Information and Computer Sciences, she serves on sev­ eral international conference committees.

Don't miss a single issue, Subscribe Today! Call Toll Free (U.S. only): 1-800-333-9511 Outside the U.S.: 614-447-3776 FAX: 614-447-3671 Or Write: American Chemical Society Member and Subscriber Services P.O. Box 3337 Columbus, OH 43210

Quality information that gives you the leading edge Covering the broad, interdisciplinary field of chemical engineering and industrial chemistry, Industrial & Engineering Chemistry Research delivers peer-reviewed, monthly reports with a focus on the fundamental and theoretical aspects of chemical engineering, process design and development, and product R&D. A typical issue contains original studies in the areas of kinetics and catalysis, materials and interfaces, process engineering and design, separations, and other topics, with an emphasis on new areas of science and technology.

Volume 32 (1993) Printed

U.S.

Canada & Mexico

ACS Members One Year Two Years Nonmembers

$ 64 $115 $567

$ 84 $155 $587

Europe*

All Other Countries*

$108 $203

$120 $227

$611

$623

" Air Service Included. Member subscription rates are for personal use only. Subscriptions are based on a calendar year. Foreign payment must be made in U.S. currency by international money order, UNESCO coupons, or U.S. bank draft, or order through your subscription agency. For nonmember rates in Japan, contact Maruzen Co., Ltd. This publication is available on microfilm, microfiche, and the full text is available online on STN International.

ANALYTICAL CHEMISTRY, VOL. 65, NO. 24, DECEMBER 15, 1993 · 1095 A