Table-driven procedure for infrared spectrum interpretation - Analytical

Table-Driven Procedure for Infrared Spectrum Interpretation Mark 0. Trulson’ and Morton E. Munk*

Department of Chemistry, Arizona State University, Tempe, Arizona 85287

An infrared spectrum interpreter program lis described which is based on a table-driven procedure. Eighteen commonly encountered carbonyl-containing groups were selected for program development and testlng. The knowledge base includes only information about those regions of the spectrum considered to be diagnostic for each of the 18 classes. It was derived from an experimental data file of about 2000 infrared spectra compiled specifically for this study. The program requires peak position, Intensity, and shape. Its performance was evaluated on a test set of 146 spectra.

An important rnethod of the elucidation of the structure of organic compounds depends on an analysis of their spectral and chemical properties. CASE (I),a linked set of computer programs, is being developed to assist the chemist in the execution of the three major components of the process: (a) interpretation, the reduction of the chemical and spectral properties of the unknown compound to tlheir structural imPresent address: Department of Chemistry, University of California, Berkeley, Berkeley, CA 94704. 0003-2700/83/0355-2 137$01.50/0

plications; (b) molecule assembly, the generation of molecular structures compatible with the inferred structural information; and (c) spectrum simulation and comparison, the ranking of the generated structures on the basis of the fit between predicted and observed spectral properties. An advanced molecule assembler, program ASSEMBLE (2), has been developed to accept the structural inferences drawn by the interpreter, program INTERPRET, on the basis of multispectral data. ASSEMBLE generates all molecular structures consistent with these inferences and any additional information provided by the chemist. Program SIMULATE (3), like INTERPRET, is at an early stage of development. In this paper the role of infrared spectrometry in the development of INTERPRET is discussed. Three approaches to automated spectrum interpretation have received wide attention: pattern recognition (4-7), library search (6, 8-13), and artificial intelligence (14-18). More recently, hierarchical clustering techniques based on “average spectra” as representations of specific structure-spectra correlations have been studied (19, 20). Our initial effort in automated infrared spectrum interpretation resulted in program INFRARED, an interactive heuristic program designed for application to multifunctional 0 1983 Amerlcan Chemical Society



Table I. Functional Group

Table 11. Peak Intensity and Shape Code

major group


carboxylic acid

1. nonconjugated 2. conjugated 3. 0-electronegativelysubstituted 4. nonconjugated 5. conjugated




shape average

weak medium






5 8

broad 3 6 9

6. enol

I. acetate 8. enol acetate 9. chelated 0-hydroxy

amide aldehyde ketone acid halide

10. long-chain fatty acid 11. primary 12. secondary 13. tertiary 14. nonconjugated 15. conjugated 16. nonconjugated 17. conjugated 18. acid halide

compounds (21,22). A revised version, PAIRS, uses the same knowledge base, but the interpretation rules are treated as “data” and written in an English-like language to facilitate an understanding of the rules and their modification where necessary (14, 23, 24). The present study departs from INFRARED/PAIRS in two important ways. The first change was prompted by the ohservation that reported group frequency ranges vary, in some cases markedly, throughout the literature of infrared spectrometry (2530). For example, the carbonyl stretching frequency range for +unsaturated aldehydes is reported by Bellamy (25) a t 1675-1705 cm-’, by Szymanski (26) a t 16851715 m-’, and by Colthup tables (27)at 1650-1725 cm-’. Such variations led us to the compilation of an infrared spectrum library from which a new knowledge base was extracted. The second departure relates to the network of Boolean expressions characteristic of WFRARH) and PAIRS that generally makes program revision no small task and, in addition, limits optimization of class discrimiition based on patterns of peak distribution within diagnostic spectral regions. An approach based on a table-driven p m d u r e (31)permitted us to addrea both of these problems. A limited set of functional groups was selected for initial program development and evaluation: 18 carbonyl-containing functional groups (Table I). Each is designated and treated as a separate class;however, structurally, each belongs to one of six major groups of classes: carboxylic acid, ester, amide, aldehyde, ketone, and acid halide. All but the acid halide group contain more than one class. The classification scheme emphasizes a structure-based organization that parallels the thinking of the practicing chemist and was strongly influenced by the present-day strengths and limitations of infrared spectrometry in infemng structural features. I t recognizes the need to avoid “overinterpretation” of infrared spectral data. Unusual spectral properties can be used to advantage. Thus, long-chain fatty acid esters are included as a separate class because of a unique spectral feature: the coupling that usually reduces the expected strong absorption band around 1200 cm-’ to a series of weak-to-medium peaks in the same region. METHOD Knowledge Base. The infrared spectrum interpreter described herein is designed to report the *presence”of each of the 18 classes with a confidence level ranging from 0 to 100.






