Pattern recognition techniques applied to the interpretation of infrared

Nonparametric feature selection in pattern recognition applied to chemical problems. G. S. Zander , A. J. Stuper , and P. C. Jurs. Analytical Chemistr...
0 downloads 0 Views 677KB Size
edged. Advice during the early stages of the project regarding the specifications for procurement of the computer, t h e initial design of the interrupt response routine, and the functional aspects of the system from Roy G. Saltman, Institute for Computer Services and Technoldgy, NBS, was sincerely appreciated. Also t h e assistance of Richard N. Freemire, Institute for Basic Standards, is a p preciated. Permission to use the photograph in Figure 8

was granted by Radu Mavrodineanu, Analytical Chemistry Division. KBS. Received for review, July 12, 1973. Accepted, November 28, 1973. I n no case does t h e identification of trade names imply recommendation or endorsement by the National Bureau of Standards, nor does it imply t h a t t h e material or equipment identified is necessarily t h e best available for the purpose.

Pattern Recognition Techniques Applied to the Interpretation of Infrared Spectra D. R . Preuss and P. C. Jurs Department of Chemistry. The Pennsylvania State University. University Park. Pa. 16802

Pattern recognition techniques can be usefully employed for the interpretation of chemical data. An investigation into the classification of infrared spectra is reported. A new training routine utilizing a new thickness parameter for the decision surface is introduced. The thickness is then used in developing a new effective feature selection routine. This routine is successfully applied to a number of well-characterized synthetic infrared data sets and plots of the resulting weight vectors are presented. The techniques are finally applied to three chemical classes -carboxylic acids, esters, and primary amines-and the resultant weight vectors are plotted and discussed.

The interpretation of infrared spectral data to be used in the classification and identification of unknown compounds depends to some extent on the theory which describes the vibrational motion of atoms in molecules. characterized by atomic masses. and vibrational force constants. T o a n even greater extent, particularly in the study of complex organic molecules, the interpretation of infrared spectra depends upon empirical and semi-empirical rules which have been developed by analyzing the spectra of large numbers of compounds for which the structures have been previously determined. It is this semi-empirical method which closely parallels the pattern recognition technique. Pattern recognition comprises the detection. perception, and recognition of invariant properties among sets of measurements on objects or events. The purpose of pattern recognition is generally to categorize a sample of observed data as a member of the class to which it belongs. This general approach has been applied to problems from a great number of diverse fields (I). There is now a growing literature reporting applications of pattern recognition to chemical problems (2-7). The patten recognition method used in this study is a binary classification technique employing an error correc( 1 ) George Nagy. Proc. / € E € . 56. 836 (1968). (2) T. L lsenhour and P C. J u r s . A n a / . Chem.. 43 ( 1 0 ) . 20A (1971). (3) B R . Kowalski and C F Bepder. J . Amer. Chem. Soc. 94. 5632

(41 (5) (6) (7)

(1972). L. B. Sybrandt and S. P. Perone, Anal Chem , 44, 2331 (1972) D. D Tunnicliff and P. A . Wadsworth, Anal. Chem.. 45. 12 (1973). Joseph Schechter and P. C. Jurs, AOLJ/. Spectrosc . 27. 30 (1973). K . - L Ting. R. C. T. Lee. G W. A Milne. M. Shapiro. and A. M . Guarino, Science 180. 417 (1973)

520

A N A L Y T I C A L C H E M I S T R Y , VOL. 46,

NO. 4 ,

A P R I L 1974

tion feedback algorithm for development. Information concerning the application of pattern recognition techniques to the interpretation of infrared spectral data has appeared previously (8, 9). The work by these authors has demonstrated that infrared d a t a can in general be quite successfully treated by pattern recognition techniques.

DATA SETS For this study, two data sets were prepared. The first data set consisted of 500 infrared spectra of simple organic compounds, which fit the general formula: C s 1 0 H 2 - 2 2 0 ~ 3 N@z. The first 500 solution infrared spectra listed in the Sadtler tables. which satisfied this criterion were selected for the data set. Each spectrum was digitized at 0.1-micron intervals. from 2.0 to 14.7 microns. giving a total of 128 descriptors. The transmittances were read as accurately as possible to the nearest per cent. If the strongest absorption in the spectrum was greater than 5% transmittance-i.c.. the absorption was weaker than one which would give