Computer Storage and Search System for Infrared Spectra Including Peak Width and Intensity Elwin C. Penski, Daniel A. Padowski, and James B. Bouck Physical Chemistry Branch. Chemical Research Division, Chemical Laboratory, Edgewood Arsenai. Aberdeen Proving Ground, M d . 21070

In recent years, many reports have been written on systems for searching spectral data files, particularly those collected by the American Society for Testing and Materials (ASTM). Methods for searching the mass spectra files have been reported by many authors. A few of these are Grotch ( I ) ; Wangen, Woodward, and Isenhour (2); and Hertz, Hites, and Biemann ( 3 ) . A number of methods (4-6) for searching the ASTM infrared (IR) data file have been developed. Some workers such as Horlick (7) have developed systems for comparing spectra in a more detailed manner than peak by peak methods. These latter methods have not been applied to the searching of large files. Codding and Horlick (8) have developed a small binary coded file of 35 direct current arc emission spectra and a method to search the file. Kowalski, Jurs, and Isenhour (9) have applied computerized learning machine methods to classify rather than match spectra. The studies listed above have been developed based on foundations such as statistics, file research, computer logic, learning machines, artificial intelligence, cross-correlation techniques, information theory, graph theory, and pattern recognition. It has been found in this laboratory that while the ASTM file of infrared spectra is by far the largest file available (IO), the data storage and data search system (11) has a number of drawbacks: The spectra file card format is not compatible with most card readers. The magnetic tape file is not compatible with some computers without programming to rearrange the format. There is no gradation in the intensity data stored with a spectrum and, in addition, the intensity data is ambiguous in that it is not applied to a specific peak but to all peaks within a 1-micron range. The search techniques are based mainly on the types of matching possible with a card sorter, and, as a result, do not take into account many variables such as differences in spectrometers and the spectroscopist's interpretation of spectra. The ASTM system leads to a large number of matches if not used with additional information. The ASTM system does provide means to narrow these matches down by specifying considerable chemical data for each of the spectra, but the coding techniques for these chemical data are rather difficult to adapt to standard searching methods. S. L. Grotch, A n a l . Chem . 45, 2 (1973) L. E. Wangen. W . S. Woodward, and T. L Isenhour. Anal. Chem.. 43, 1605 (1971). H. S. Hertz, R . A Hites. and K Biemann. Anal. Chem. 43, 681 (1971) D. H. Anderson and G . L. Covert, Anal. Chem. 39, 1288 (1967). D. S Erley, Anal Chem . 40,894 (1968) R W Sebesta and G G Johnson, J r . . Anal. C h e m . 44, 260 (1972) G . Horlick, Anal. Chem , 45, 319 (1973) E . Codding and G . Horlick, Appl Speclrosc . 27, 366 (1973) B R Kowalski. P C . J u r s , and T. L Isenhour, Anal. Chem.. 41, 1945 ( 1969) L. H . Gevantman, Anal. Chem . 44 (7), 30A (1972)

"Codes and Instructions for Wyandotte-ASTM Punched Cards," American Society for Testing and Materials, Philadelphia, Pa.. 1964

The object of the study described in this report is the development of an IR spectral search system with fewer limitations which is designed for up-to-date spectrometers and computers. In addition, the method described here should have greater utility in devices for detecting and determining the composition of pollutants, illicit materials, and military weapons. At present, it is not applicable to large files of 100,000 spectra since the large files which are now available do not have the detailed spectral information used by this system. When detailed files are developed, computer speeds have increased, and the system is optimized; it should be applicable to any type of search method, including use with conversational time-sharing terminals. The results from such searches should yield more definitive results than present methods.

EXPERIMENTAL In the method described in this report, the positions of the largest peaks are stored on a card along with a code number for the peak intensity and shape. The format and intensity codes are given in Tables I and 11. Code numbers for the type of sample medium, cell, and spectrometer are stored as shown in Tables 111, IV, and V; and the compound is given an arbitrary identification number on each card. It is assumed t h a t a match between two sharp peaks is not equivalent to a match between a broad, weak peak and a sharp, strong peak, nor that a match between two sharp, intense peaks is equivalent to a match between two sharp, weak peaks. Table I1 shows the weights given to a match between like peaks, and Table VI shows the extent that the preceding weights are reduced by a match between unlike peaks. The weights given in Tables I1 and VI are based solely on the authors' judgement.

THEORY The match sum, a measure of how good a match has been obtained, is calculated by summing several factors. The first factor is based on the assumption that the separation in wavelength of peaks reduces their probability of being a match by a functional relationship based on the normal distribution (12). Two peaks with exactly the same wavelengths have a distance factor of 1. The second and third factors are the weights taken from Tables I1 and VI, respectively. The match result is obtained by dividing the match sum between the unknown and known by the match sum between the unknown and itself. The following equations are used to calculate the match sum and the match results, respectively, for a pair of spectra.








A4 = Match sum Mo = Match sum of unknown with itself (12) A. J .

Duncan, "Quality Control and Industrial Statistics." Richard A.

Irwin. Inc.. Homewood. Ill , 1959.

Table I. Spectrum Card Format Peak

Table IV. Cell Code and Range of Effectiveness






4 6-8


9 ... 66-68

Wavelength X 10 or wavenumber x 10-1 Intensity code (Table 11) Wavelength X 10 or wavenumber x 10-1 Intensity code

69 71 72 73 74 75-80 a

Code (col. 72)

Type of cell window or pellet

Effective range, p

0 1

CSI KBr NaCl KRS-5 Polyethylene CsBr CaF2

1.0-40.0 1.0-25.0 1.0-15.0 1.0-40.0 15.0-1000.0 1.0-35.0 1.0-9.0

No cella


2 3 4 5

6 7

Wavelength X 10 or wavenumber x 10-1 Intensity code Sample preparation (Table 111) Cell (Table IV’, Spectrometer (Table V) Wavelength or wavenumber unit codea Spectra identification


9 For plastic films.

Table V. Spectrometer Code and Range of Effectiveness

Code: 0 for microns and 9 for cm - 1

Code (col. 73)




Sharp (50)b Medium to broad (200)b Very Broad (>200) Sharp (40)b Medium to broad (150)“ Very broad (>150) Sharp (30)b Medium to broad

Strong Medium Medium Medium Weak Weak




Very broad



0.7 0.3 0.7 0.5 0.5


Table 111. Type of Sample Preparation


Gas phase Liquid film Solid film KBr pellet CCla solution CHCL solution


CS, solution


Nujol mull


Other mull or pellet Solutions and/or other

2 3 4


Ineffective ranges,


25.0-40.0 12.0-15.0 8.0-8.4, 12.0-15.0 4.2-4.8 6.2-7,2 23.0-28.0 3.4-3.6 6.8-7.4

M , = Match result Y , = Wavelength of known spectral peak j X,= Wavelength of unknown spectral peak i u, = Estimated standard deviation at ith peak. Values of u, were assumed to be X , / 6 0 W,, = Weight from Table VI for peak intensity and shape for the ith peak of unknown and the j t h peak of known R, = Maximum contributions of ith unknown or weight of match between like peaks (Taken from Table 11) 956

1.5-6.0 1.5-9.0 2.5-15.0 2.0-25.0 ~




0 1


Table VI. Reduction Factor for Matches between Unlike Peaks

The peak transmission for spectra with ca. 807, background transmission were coded as 0-307, strong, 30-607, as medium, and >60$, as weak. Maximum band half width (cm - 1 ) for designated shape.


2 . O-15.0

2.5-40.0 2.5-25.0 10.0-25.0 15.0-34.3




Code (col. 71)

Effective range, p

NaCl prism Grating Grating KBr prism CsBr prism NaCl prism LiF prism CaF, prism Grating Prisms

2 3

Weight of like peak match


Strong Strong


Type of spectrometer

0 1

Table 11. Intensity and Shape Coding System-Weights of Match

0 1

2 3 4 5 6 7 8








1.0 0.8 0.3 0.6 0.2 0 . 1 1 . 0 0 . 7 0 . 4 0.6 0 . 3 0.7 1.0 0 . 3 0.4 0.6 0.4 0 . 3 1.0 0.7 0 . 3 0.6 0.4 0.7 1.0 0.4 0 . 3 0.6 0 . 3 0.4 1.0 0 . 1 0 . 1 0.5 0 . 3 0 . 1 0 . 1 0 . 2 0 . 1 0 . 2 0.6 0 . 3 0 . 0 0.0 0 . 1 0 . 1 0 . 1 0 . 5

0.8 0.3 0.6 0.2 0.1 0.2


0.2 0.1 0.1 0.2 0.1 0.1 0.5 0.2 0.3 0.6 0.1 0.3 1.0 0.3 0.3 1.0 0.1 0.2


0.0 0.0 0.1

0.1 0.1 0.5 0.1 0.2 1.0

Table VII. Match of “Unknown” Spectrum (Diethylchlorophosphate, Liquid Film)


Diethylchlorophosphate (Liquid film) Diethylchlorophosphate (Liquid film) Triethylphosphate (Liquid film) Triethylphosphate (Liquid film)

Compared peaks


Match result





2.5-40 p Grating NaCl prism NaCl prism 2.5-40 p Grating


82.0 NaCl


70.3 KBr


64.2 CSI

Not all peaks in the two spectra are counted in the match sums if the spectra are run with a different medium, cell, or spectrometer. Tables 111, IV, and V list the ineffective and effective ranges for different media, cells, and spectrometers. Only those wavelengths which fall in the effective ranges of both spectra are compared. Matches between each pair of spectra may be run twice; the sec-

T a b l e VIII. Search Using Tri-n-butylphosphate Spectrum (Liquid Film, CsI Cell, and Grating Spectrometer (2.5-40 p ) ) as the “Unknown” Compounds

Spectra type

Tri-n-butylLiquid film, CsI cell, phosphate grating (2.5-40 H) Tri-n-butyl. Liquid film, KBr cell, phosphate grating (2.5-25 p ) Tri-n-butylLiquid film, KBr cell, phosphate prisms (2-25 p ) Di-n-butylLiquid film, KBr cell, n-butylgrating (2.5-25 p ) phosphonate

Match result


Compared peaks








Table IX. Search Using Triphenylphosphate Spectrum (KBr Pellet and KBr Prism as the “Unknown”) Compounds

Spectra type

Triphenylphosphate Triphenylphosphate

KBr pellet, KBr prism KBr pellet, NaCl prism

Match result

Compared peaks





