Automated interpretation of infrared spectra with an instrument based

Jul 1, 1981 - (2) Hotchkiss, J. H.; Libbey, L. M.; Scanlan, R. A. J.Assoc. Off. Anal. Chem. 1980, 63, 74-79. (3) Chang, S. K.; Harrington, G. W. Anal...
0 downloads 0 Views 358KB Size
Anal. Chem. 1981, 53, 2367-2369

higher detection limit, is probably due t o surfactants still present in solution after cleanup. If lower limits are desired for a particular application the extended cleanup procedure can be used with the TEA via direct injection. The additional steps do not provide sufficient cleanup t o increase detection limits via the D P P method. Thus an analyst has available a choice of methods depending on levels. In either case preparation times are short; for whole blood an average time is approximately 10 min.

2367

(3) Chang, S K.; Harrington, G. W. Anal. Chem. 1975, 47, 1857-1860. (4) Hasebe, K.; Osteryoung, J. Anal. Chem. 1975, 47, 2412-2418. (5) Fine, D. H.; Rufeh, F.; Lieb, D.; Roundbehler, D. P. Anal. Chem. 1975, 47, 1188-1191. (6) Buglass, A. J.; Challis, B. C.; Osborne, M. R. IARC Scl. Pub/. 1975, NO. 9 , 94-100. 1 (7) Magee, P. N.; Montesano, R.; Preussmann, R. In “Chemical Carcinogens”; Searle, C. E., Ed.; American Chemical Society: Washington, DC, 1976; ACS Monograph Series, No. 173, Chapter 11. (8) Fan, T. Y.; Krull, I. S.;Ross, R. D.; Wolf, M. H.; Fine, D. H. IARC Scl. Pub/. 1978, NO. 19, 3-17. (9) Druckrey, H.; Preussmann, R.; Ivankovic, S.;Schmahl, D. 2.Krebsforsch. 1967, 69, 103-201; Chem. Abstr. 1967, 67, 3 2 2 7 7 ~ .

LITERATURE CITED (1) Howard, J. W.; Fai!io, T.; Watts, J. 0. J . Assoc. Off. And. Chem. 1970, 53, 269-274. (2) Hotchkiss, J. H.; Libbey, L. M.; Scanlan, R. A. J . Assoc. Off. Anal. Chem. 1980, 63, 74-79.

RECEIVED for review July 1,1981. Accepted August 24, 1981. This investigation was supported, in part, by Grant No* CA18618, awarded by the National Cancer Institute, DHEW.

Automated Interpretation of Infrared Spectra with an Instrument Based Minicomputer Sterling A. l’omellini,” David D. Saperstein,’ James M. Stevenson, Graham M. Smlth, and Hugh B. Woodruff Merck Sharp & Dohmet Research Laboratories, P.O. Box 2000, Rahway, New Jersey 07065

Paul F. Seellg Automation & Control, Merck & Co., Inc., Rahway, New Jerse,y 07065

The interpretation of infrared data is usually a difficult task for all but the most, experienced spectroscopists. Thus, a variety of methods have been devisedl to aid the scientist in evaluating infrared spectral data. For some compounds, namely, compounds represented in spedral library collections, computerized searching of a spectral library is the method of choice. Efficient searching algorithms have existed for many years, and a successful search results in the identification of the compound (1). lJnfortunately, the spectrum for the unknown of interest iei frequently not in the library; thus an immediate identification is not possible ahd an interpretation is required. Both pattern recognition (2-5) and rule-based (heuristic) algorithms (6-8) have been investigated to effect spectral interpretations. One program, PAIRS (program for the analysis of IR spectra) (B), was designed to imitate the approach a spectroscopist uses in interpreting spectra. This program has been tested on a variety of complex spectra with very encouraging results. The one major difficulty encountered in using PAIRSon a routine basis is supplying a digitized specimm prior to the interpretation. This process is both time-consuming and error prone. This paper describes a solution to that difficulty. A modified version of PAIRShas been implemented on a Nicolet 1180 minicomputer with .Diablo Model 44B dual disk storage. When this program is coupled with a fast peak picking algorithm, the result is that one computer controls the entire infrared experiment including acquisition, processing, and interpretation of data.

EQUIPMENT A Nicolet 7199 FTlR system including the Nicolet 1180 minicomputer with 40K of 20 bit word semiconductor memory was used for this work. Available mass storage was 4.5 mega words using a dual, moving head, disk system. Full use was made of existing Nicolet software, including a FORTRAN 77 based compiler. The FORTRAN compiler greatly reduced the number of ‘Current address: IBM Instruments Inc., 40 W. Brokaw Rd., San Jose, CA 95110.

coding changes required to move the interpreter from the IBM 370/168, used in previous work (8),to the Nicolet 1180. A Decitek paper tape reader was used to input the program into the minicomputer.

DESCRIPTION OF PROGRAMS Peak Picker. A critical component of an interpreter that requires peak position, intensity, and width information is an automatic peak picking routine. This routine stores the peak information in a disk file, which is subsequently accessed by the interpretation module. The peak picking program is approximately 16K words in length. The most intense peaks (maximum of 50) are determined by first derivative inflection and transferred to the disk file in a format compatible with the requirements of the interpreter. Spectral information consists of peak position, intensity, and width data. Peak positions usually range between 4000 and 450 cm-’, although there is no fundamental restriction preventing peak locations outside this region from being used. Peak heights are assigned values between 1and 10, with 10 being the strongest peak in the spectrum, the peak to which all other peaks are normalized. Peak widths an be sharp, average, or broad and are assigned 1,2, or 3, respectively. Presently, the peak picker does not calculate width information but instead assigns an average value, 2, to all peaks. Since most peaks will be average this minimizes the number of changes needed before interpretation. Supplemental information, composed of sample state and empirical formula, is requested from the operator. The six possible sample states are oil, neat, NaCl, CC4,CHCl,, and other. Sample state information is used by the peak picker to eliminate solvent peaks which would interfere with the interpretation. Empirical formula information is optional, but if included, the entire formula may be given or else certain atoms known to be missing from the compound may be excluded. Interpreter. PAIRS was designed to aid the chemist in predicting the presence or absence of various functionalities using the infrared spectrum of an unknown compound. Functionalities that display their typical spectroscopic characteristicswill be found with reasonable probability. Other possibilites which may not actually be present will often be suggested. Of course, compounds may not have representative spectral characteristics for a certain functionality. The on-line interpreter described in this paper makes the use of PAIRSeven easier. The components required for computerized

0003-2700/81/0353-2367$01.25/00 1981 American Chemical Society

2368

ANALYTICAL CHEMISTRY, VOL. 53, NO. 14, DECEMBER 1981

CONCISE

rules

-

IBMl370

-9

Nicoiei

'f C Y - 1

9 -T-RUTYLC"3L1HEXPNCL

-1

Nicalsi

I

Flgure 1. Components required for computerized on-line interpretation.

on-line interpretation are shown in Figure 1. The main program, called INTERP,is coded in FORTRAN and treats the other modules (processed peaks and translated rules) as data. The rules are written in an English-like language called CONCISE (computer oriented notation concerning infrared spectral evaluation) (8). This language was created for easy comprehension and, when needed, modification of the interpretation rules by non-computer-oriented chemists. CONCISE consists of if-then-elselogic and begin-done blocking of statements. The vocabulary is succinct which allows for rapid user proficiency Interpretation rules have currently been written for 169 different functionalities. The rules, in CONCISE,are translated into integer strings on a large computer. These integer strings are transferred to the minicomputer via punched paper tape and converted to unformatted integers for storage. INTERP requires 30K words of main memory and the interpretation rules fill 62K words of disk space.

dk

m 5

Figure 2. Spectrum of 4-fert-butylcyclohexanoi. cr-1

BrSE-INE A R R E L T E C

r-'-E~TILCYC-UrtXRNUL

i

RESULTS AND DISCUSSION Substantial changes were necessary to adapt the PAIRS program from a large 32 bit word computer to the 20 bit word, 40K Nicolet 1180. Changes in FORTRAN coding, rule packing, sort routines, data input and output routines, and general machine-dependent optimization were all necessary. High priority was placed on minimizing running time. For this reason the rules are stored as unformatted rather than formatted integers, since disk acquisition time is more rapid for unformatted data. This mode of storage is less efficient than others as far as space considerations are concerned, but disk storage is less of a problem than slow running time. Approximately 6 min are necessary to go from digitized spectrum to interpreted results. The digitized spectrum is base line corrected, if necessary, followed by automatic peak selection. Supplemental data such as empirical formula and sample state are added a t this time and transferred with peak information to a disk file. INTERPaccesses this file and allows changes to be made including additions and deletions of peak data. The actual interpretation requires slightly over 1 min; thus, the time necessary to manipulate the spectrum, add additional information, and interpret the spectrum is roughly equal to the time necessary to obtain the spectrum. The two examples given below were chosen to demonstrate the current status of the program. The first spectrum, 4tert-butylcyclohexanol, was run as a KBr pellet while the second spectrum, propionitrile, was run as a neat liquid between KBr plates. Both were taken with 300 scans, 4 cm-' resolution, and were entered into the interpreter without formula information. Figure 2 shows the original spectrum of 4-tert-butylcyclohexanol.The base line was corrected by using existing Nicolet software which produced the spectrum in Figure 3. Base line correction is often necessary for peak normalization to be valid. The peak picker found the 25 largest peaks listed in Table I. Noting that the region from 3150 t o 3550 cm-l contains only one broad, noisy peak, the operator made the following changes: the width of peak (7) was changed from 2 (average) to 3 (broad) and noise peaks (8), (9), (lo), (13), and (24)were eliminated. The interpretation results are listed in Table 11. Methyl and alcohol furictionalities were correctly predicted with highest proba-

,

. -

fO;O

3600

~-z~-zco

---

7------7

2WO ZOCO WRVENLJMBERS

1610

1200

BCO

9UL

Figure 3. Base line corrected spectrum of 4-tert-butylcyclohexanol.

Table I. The 25 Largest Peaks for 4-tert-Butslcsclohexanol as Determined by the Peak kcking Routine peak no. 1 2 3 4 5 6 7 8

9 10 11

12 13 14 15

16 17

18 19 20 21 22 23 24 25

position

re1

CM-1

intens

width

2944 2963 1066 2859 1365 2904 3281 3264 3254 3227 1448 1469 3160 1477 960 1392 1336

10 9 7 7 6 6 6

2 2 2 2 2 2 2 2 2

1008

1186 1240

1309 1224 1303 3087 981

5

5 5 4 4 4 4 3 3 3 3 3 3

2

2 2

2 2 2 2 2 2

2 2

3 3

2

3 3

2 2 2

2

2

bilities. Also suggested as possible functionalities were thiocarbonyl and amines, though with much lower proba-

ANALYTICAL CHEMISTRY, VOL. 53, NO. 14, DECEMBER 1981

Table 11. Interpretation Results for 4-tert-Butylcyclohexanol 1 2 3 4

5 6 7 8 9

10 11 12 13 14

Table 111. The ll Largest Peaks for Propionitrile as Determined by the Peak Picking Routine

group name

probability

methyl alcohol alcohol-tert-(* 2" ) alcohol-see(*1*) thiocarbonyl methylene amine amine-secondary amine-tertiary ether-unsaturated ether methyl-gemdi sulfoxide ether-epoxide

0.70 0.68 0.51 0.51 0.50 0.50 0.45 0.45 0.40 0.36 0.36 0.35 0.25 0.16

I_

c i l fl

PROPIONITRILE

2369

Y

peak no.

position CM-1 2246 2996 1461 2950 1431 1074 787 2892 1319 1386 546

1

2 3 4 5

6 7 8 9 10 11

-

group name

'1

1 2 3 4 5 6

nitrile

nitrile-saturated isocyanate methyl ether ether-saturated sulfoxide acetylene acetylene-internal aromatic nitrile-unsaturated diazo

7 8 9 10 11 12

WPVEWMBERS

bilities. Another example, Figure 4, which is the spectrum of propionitrile, required no base line correction. The peaks are listed in Table 111. Peaks (l), (6), (7), and (9) are noted to have sharp rather than average widths. These changes were made by the operator. The interpretation results are given in Table IV. Nitrile, nitrile saturated, and methyl are correctly suggested with high probabilities. Also suggested with high probability is t h e isocyanate functionality. It should be noted that the elimination of noise peaks and addition of width information often results in more accurate interpretations. Ready access, minimal operator intervention, and speed have allowed us to extend the use of the interpreter. High molecular weight natural products, organic adsorbates, precipitates from complex reaction mixtures, and multicomponent solutions have all been analyzed. The fact that PAIRS determines functionalities as opposed to matching a spectrum to an individual compound mades these kinds of applications possible. In these complex cases more operator intervention is required to eliminate spurious peaks picked due to noise, isolate spectral regions of interest, and adjust relative intensities due to major species or functionalities which are present in unusually high amounts (e.g., C-H bands in a high mo-

10 8 7 6 5 4 3 3 2 1 1

width 2 2 2 2 2 2 2 2 2 2 2

Table IV. Interpretation Results for Propionitrile

cn-1

Figure 4. Spectrum of lpropionitrile.

re1 intens

probability 0.50 0.50 0.40 0.40 0.28 0.28 0.25 0.20 0.20 0.15 0.10 0.10

lecular weight aliphatic). Time that was formerly used to digitize spectra manually is now invested in interpreting more spectra and making the kinds of changes indicated above that allow complicated systems to be explored. Both of the examples given clearly demonstrate that improvements in the peak picking routine to include peak width determinations and better noise rejection will enhance the usefulness of the interpreter. Work is in progress to effect these improvements.

ACKNOWLEDGMENT The authors are indebted to George R. Smith for many helpful discussions and to the staff of the Nicolet Instrument Gorp., especially Stephen Lowry and Donald Parker, for technical assistance. LITERATURE CITED Hippe, Zdzislaw; Hippe, Rita Appl. Spectrosc. Rev. 1980, 76, 135-1 86. Varmuza, K. Anal. Chim. Acta 1980, 122, 227-240. Gribov, Lev A.; Eiyashberg, Mlkhaii E. CRC Crk. Rev. Anal. Chem. 1870. .- . -, 8 - ,. I 11-220. . -. Zupan, Jure Anal. Chlm. Acta 1978, 103, 273-288. Gribov, L. A.; Elyashberg, M. E. J . Anal. Chem. USSR (Engl. Trans/.) 1977. 32. 1609-1624. Visser, T'.; van der Maas, J. H. Anal. Cbim. Acta 1980, 122, 383-372. Leupold, Wolf-Rudiger; Domingo, Concepcion; Niggemann, Werner; Schrader, Bernhard Fresenius' Z. Anal. Chem. 1980, 303, 337-348. Woodruff, Hugh B.; Smith, Graham M. Anal. Chem. 1980, 52, 2321-2327.

__

RECEIVED for review June 30, 1981. Accepted September 8, 1981.