Nuclear magnetic resonance spectral ... - ACS Publications

Shell Development Company, Emeryville, California 94608. (Received November SO, 1970). Publication costs assisted by theShell Development Company...
0 downloads 0 Views 1MB Size
B. R. KOWALSKI AND C. A. REILLY

1402

Nuclear Magnetic Resonance Spectral Interpretation by Pattern Recognition

by B. R. Kowalski and C. A . Rei&* Shell Development Company, Emeryville, California 94608 (Received November SO, 1970) Publication costs assisted by the Shell Development Company

A pattern recognition method is applied to high-resolution nmr data for the purpose of detecting the presence of various molecular structural features. Pattern vectors, derived from calculated nmr spectral frequencies

and intensities, comprise the training set used to calculate weight vectors for classification. The spectra are preprocessed with the autocorrelation fuiiction to remove the translational frequency variance produced by chemical shift variations from spectrum to spectrum. Truncation of the autocorrelation function, necessary to keep the pattern dimension relatively small, is possible because of a redundancy in the information. Weight vectors for ethyl, n-propyl, and isopropyl groups were trained by a regression procedure and tested on unknown spectra (not in the training set).

Spectral interpretation can be the most tedious and time consuming step in a spectroscopic experiment. Yet, if this arduous task is not properly done, valuable information may be lost. The analysis of a spectrum is usually accomplished via determination of intermediate parameters @.e., chemical shifts and coupling constants in the case of nmr) which are then related to structural and/or elemental information. I n order to free the chemist from this labor, a more direct path leading from spectral data to structural and/or elemental information is very desirable. The transformation of N pieces of spectral information into M pieces of desired structural and/or chemical information can be thought of as mapping the experimental data in N space into the derived features in M space (N >> M ) . For example, a digitized spectrum is actually a series of experiments resulting in the relative intensities measured at N incremental units along some axis (time, frequency, mass, etc.). The spectrum can be thought of as a vector in N-dimensional space. There is a variety of ways of achieving such a mapping. Some of these ways actually correspond to an analysis of the spectrum on theoretical grounds (e.g., assignment of absorption frequencies in vibrational spectroscopy) while others are strictly empirical methods that use some criterion, such as minimum variance, for the mapping. The objective of most of these methods is the same: to reduce the large amount of information in the N space to a comparable amount in the smaller A4 space. Probably the newest and most interesting approach to performing this task is the broad and incohesive discipline called pattern recognition. It is rather d f i cult to think of pattern recognition as a discipline when in reality it consists of a number of algorithms that have been used to solve problems from many diverse fields. Basically, it involves the analysis of unknown data by mathematical models trained on known data. Although the largest amount of published research is The Journal of Physical Chemistry, Vol. 76,No. 10,1971

found in the area of alphanumeric character recognition, a few of the other applications include particle tracking in cloud, bubble, and spark chambers, fingerprint analysis, speech analysis, weather prediction, medical diagnosis, and aerial and microphotographic processing. An excellent review by Nagy evaluates many of the algorithms used for pattern recognition and discusses several interesting applications.’ Molecular structural information has been obtained directly by applying a computerized learning machine to digitized spectra from mass s p e ~ t r o m e t r y , ~infrared -~ spectrometry,6 and a combination of both.’ The successful results of these studies and the work of others* indicate that spectral interpretation can be performed on an empirical basis by a computer without imparting the characteristics of the spectroscopic method or rules for determining molecular structure. One of the most useful tools for determining the molecular structure of molecules is high resolution nuclear magnetic resonance (nmr). This paper will be concerned with proton magnetic resonance only, but the ideas developed should be transferable to the nmr spectra of other nuclei (such as lac)as well. The block diagram in Figure 1 shows the present method and two proposed methods of determining molecule structural units from nmr. Path 1 is the con(1) G. Nagy, Proc. IEEE, 56, 836 (1968). (2) P. C. Jurs, B. R. Kowalski, and T. L. Isenhour, Anal. Chem.* 41, 21 (1969). (3) P. C. Jurs, B. R. Kowalski, T. L. Isenhour, and C. N. Reilley, ibid., 41, 690 (1969). (4) B. R. Kowalski, P. C. Jurs, T. L. Isenhour, and C. N. Reilley. ibid., 41, 695 (1969). (5) E’. C. Jurs, B. -R. Kowalski, T. L. Isenhour, and C. N. Reilley, ibid., submitted for publication. (6) B. R. Kowalski, P. C. Jurs, T. L. Isenhour, and C. N. Reilley, ibid., 41, 1945 (1969). (7) P. C. Jurs, B. R. Kowalski, T. L. Isenhour, and C. N. Reilley, ibid., 41, 1949 (1969). (8) L. R. Crawford and J. D. Morrison, ibid., 41, 994 (1969).

NMRSPECTRAL INTERPRETATION BY PATTERN RECOGNITION SPECTRUM

= kt

f (”)

1

--C

SPECTRAL ANALYSIS

I

NMR PARAMETERS

V A

DIGITIZED

R F PULSES

RESPONSE

Figure 1. Schematic procedure for nmr data acquisition and processing.

ventional method that has been yielding important information for many years. The manual scan produces an analog spectrum which shows the relative intensities of the various proton groups as a function of frequency (usually measured with respect to some standard such as the TMS line). An experienced spectroscopist can analyze the analog spectrum for nmr parameters (chemical shifts and spin-spin coupling constants) for the various nuclei present. This analysis is relatively easy when the spectrum is close to first order and contains no overlapping multiplets. When either or both of these conditions are not met, the analysis is best done with the aid of a computerg and can be tedious and time consuming to say the least. Specialized experiments (e.g., double resonance) can be performed, provided the necessary equipment is available. The fact is that all of the information is present in the analog spectrum (unless it is first order) and most of these experiments serve only to simplify the spectrum so that it can be analyzed more readily. Path no. 2 shows the method of nmr data analysis that is proposed and partially tested in this paper. A sample is analyzed by having a computer or other device first collect a spectrum in digitized form. After preprocessing to convert the spectrum to a pattern vector, it is analyzed by pattern recognition techniques. Chemically meaningful information or structural parameters can be produced without ever determining a chemical shift or a coupling constant explicitly. Path no. 3 is a proposed method that is closely related to that of path no. 2. An elaboration of this proposed method will be made in the last section. The Pattern Recognition Method. A primary objective of pattern recognition is to develop a mathematical pattern classifier that can be trained on a set of known patterns (training set) and used to classify other new, unknown patterns. For spectrometric applications, this translates into using knowledge obtained from known spectra to analyze unknown spectra. This larger objective can be broken down into smaller objectives each presenting problems of a different nature. These subobjectives are: formation of pattern vectors, selection of a classification algorithm, training, and finally, testing and application. These topics and their accompanying problems will be discussed individually

1403

in this paper with a definite slant towards an application to nmr spectrometry. Pattern vector formation can be the most formidable problem in any application of pattern recognition. It is sometimes called preprocessing and can be defined as: given an amount of information on a large number of samples, what is the best possible vector representation of this information with respect to the algorithm chosen? I n the case of nmr spectral analysis, the information is the digitized, continuous wave (cw) ’spectrum measured a t a constant resolution and referenced to a constant frequency. Each such spectrum is called a spectrum vector. The samples are the actual chemical compounds that were used to obtain the spectrum vectors, The specific problem in the application of pattern recognition to nmr is: given a number of spectrum vectors representing digitized nmr spectra, what must be done to form pattern vectors so that an optimal amount of the information present in the spectra can be used by the pattern classifier to obtain structural information? I n many cases, the unprocessed data (e.g., normalized mass spectra in the case of mass spectrometry) are adequate and no further pretreatment is necessary. Unfortunately, this is not the case in nmr as will be discussed in the next section. Another objective in a pattern recognition application is the selection of a pattern classification algorithm. The algorithm includes a training method and a classification scheme. There are several powerful algorithms to choose from1 and new methods along with improvements on existing methods are being introduced at a rapid rate. Spectroscopic applications however, because of the large amount of data that can be collected, impose certain constraints on the selection of an algorithm. For example, it may be necessary and practical to digitize hundreds of nmr spectra in order to obtain satisfactory performance for a given analysis. For high-resolution spectra, the digitization procedure could produce as many as 10,000intensities for each spectrum vector. This, of course, means that millions of numbers must be processed. If the algorithm is mathematically complicated, the application can be quite costly. Due to the large amount of data and the complexity of the pretreatment necessary for the applications in this study, a simple pattern classifier (linear discriminant and a relatively fast training algorithm (regression analysis) l2 were used. It must be emphasized that the linear discriminant function is an extremely powerful pattern classifier and was chosen (9) J. W. Emsley, J. Feeney, and L. H. Sutcliffe, “Pregress in Nuclear Magnetic Resonance Spectroscopy,” Vol. 1, Pergamon Press, New York, N. Y., 1966. (10) N. J. Nilsson, “Learning Machines,” McGraw-Hill, New York, N. Y., 1965. (11) G. S. Sebestyen, “Decision-Making Processes in Pattern Recognition,” Macmillan, New York, N. Y., 1962. (12) A. Ralston and H. Wilf, “Mathematical Methods for Digital Computers,” Wiley, New York, N. Y., 1966. The Journal of Physical Chemistry, Vol. 76, No. IO, 1971

1404

B. R. KOWALSKI AND C. A. REILLY

over possibly more powerful ones not only for economic reasons but also because it served w a vehicle to help solve the more serious problem of forming useable pattern vectors from the nmr spectrum vectors. The linear pattern classifier uses the dot product of the input pattern vector (y) and a trained weight vector (w)to produce a scalar output (s) as s = way

+c

(1)

where c is a scalar constant and is determined, along with w,during the training procedure. If s is a positive number for a particular y, we say that the classifier has generated an affirmative output. If s is negative or zero, the output is negative. Training consists of selecting a set of known pattern vectors called the training set and using a training algorithm to calculate a corresponding weight vector. The weight vector consists of one parameter (adjustable weighting factor) for each dimension of the training patterns. (It will henceforth be assumed that the constant in eq 1 is part of the pattern classifier and therefore part of the weight vector. I n reality, it is calculated along with the weight vector by augmenting the pattern vectors, thereby increasing the pattern space by one dimension.) As an example of training, it may be desired to calculate a weight vector that can be used by the pattern classifier to determine whether or not a compound (represented here by its pattern vector) contains an isopropyl group. The procedure would be first to select a number of representative compounds from the two groups (Le., those containing and those not containing isopropyl groups). Then, the spectra obtained from these compounds are used to form the training set of pattern vectors. The training algorithm calculates a weight vector that can be used by the pattern classifier to determine whether or not unknown compounds contain the isopropyl group. A properly chosen training set could be used to train many weight vectors, each determining a different functional part of the molecule. Together, these weight vectors would aid immensely in the automatic computer interpretation of spectroscopic data. There are a number of training algorithms that could be used to obtain a weight vector. Early applications to spectroscopic data analysis used a feedback alg ~ r i t h r nand ~ , ~ the method of least square^.^ I n the feedback method a training pattern is presented to the classifier containing an arbitrarily initiated weight vector and the response is checked against the correct answer. If it is the desired response (a correct determination of the presence or absence of the sought for structural unit in the molecule), no action is taken and the next pattern is presented. If an incorrect response is given, the weight vector is adjusted (a fraction of the training pattern is added or subtracted) to produce the correct response. This is continued in a cyclic process with the hope that all of the training set patterns can be The Journal of Physical Chemistry, Vol. 76, No. IO, 1971

classified correctly. The feedback method is an acceptable training procedure provided that the size of the training set is not too large. At this point it should be mentioned that a particular weight vector is not unique and many factors must be considered in a discussion of the “best” weight vector for a specific determination. This discussion is beyond the scope of this paper. The feedback algorithm can be expected to produce a weight vector rapidly, provided that the entire training set resides in the core memory of the computer. As the training set gets larger and overflows onto peripheral storage devices (ie., drum, disk, tape) the advantage of this method disappears. Even when the data are stored on a high-speed drum, and accessed using true random accessing methods (calculation of a hardware address) the calculations can be prohibitively costly. This is mainly due to the multiple accessing required for complete training. Most practical applications of pattern recognition to spectroscopy will probably require training sets of spectral vectors exceeding core storage limits of most computers. The data used in this study, for example, required mare than a quarter of a million words of storage and were, therefore, stored permanently on a drum (Univac 1108 Fastrand). For this reason, it is desirable to use an algorithm that requires reading the training set only once, processing it one pattern at a time as it is brought into the core from a peripheral device. One method that satisfies this requirement is multiple regression analysis.12 The method starts by setting up the normal equations from the system of linear equations (one equation for every pattern in the training set) using eq 1. The desired responses(s) are input as 1 for a positive pattern (representing a compound containing the sought-for-structure) and -1 for a negative pattern. The Gaussian elimination method is used to solve the normal equations. Stepwise regression, used to calculate the weight vector for this study, makes use of intermediate regression equations and allows a one-at-a-time addition or removal of parameters to the solution. The parameter that is added at any step is the one that will reduce the standard error by the greatest amount. Parameters are removed from the equations if the removal will not increase the standard error by more than a specified amount, This procedure can be stopped a t any time or it may be allowed to use all of the available parameters for the solution (the latter case corresponds to the complete multiple regression analysis), This procedure permits a careful examination of the importance and interrelation of the parameters. In the present study the procedure was stopped when the addition of a parameter did not lower the standard deviation by more than a specified amount proportional to the ratio of the error in the parameter to the absolute value of the parameter. The final step in the development of a pattern recog-

+

1405

NMRSPECTRAL INTERPRETATION BY PATTERN RECOGNITION nition method is testing the weight vectors. This simply amounts to obtaining pattern vectors that are not present in the training set, presenting them to the pattern classifier, and checking the responses. The test vectors must represent typical patterns that might be expected to turn up in a practical application. If performance is not adequate, assuming that all of the other problems have been solved, the training set should be enlarged. Problems in Pattern Formation from Nmr Spectra. The first experiments in this study attempted direct applications of pattern recognition to normalized spectrum vectors. Three hundred cw proton spectra (60 MHz) from 0 to 500 He relative to TAlS were digitized manually at 2.5-HZ intervals to give pattern vectors of 200 dimensions each. This is an obvious starting point in a study of this type because it represents the minimum amount of preprocessing. The results of these experiments showed that it was possible to derive a weight vector that performed well on the training set but gave results on test patterns that were only slightly better than random. The key to understanding the results of these early experiments is found in the nature of the representation of the information in an nmr spectrum. Usually the spectrum is analyzed in terms of chemical shifts and coupling constants. The positions of the multiplets on the frequency axis give information as to the types of protons in the compound. The detailed fine structure within a multiplet provides information about the locations of other nuclei in the molecule. Figure 2 shows two calculated ethyl spectra with the same line width (0.5 He) and the same coupling constant (7.0 Hz). Their spectra are similar because of the obvious pattern of two very similar multiplets. It is obvious to the eye that the spectra represent ethyl groups even though there are rather large differences in the corresponding chemical shifts. The eye ignores the translational differences and minor differences in fine structure and easily identifies each pattern as consisting of a triplet and a quartet. The large translational differences make a direct application of pattern recognition to such cw ninr spectra unfeasible. The mathematical model used by the pattern classifier cannot tolerate large translational shifts in the patterns. This is not to say that chemical shifts are unimportant in the analysis of an nmr spectrum. They are, indeed, of paramount importance. However, they must be present in the pattern vector in a different representation. I n other words, a transformation of the data (preprocessing) is necessary in order to map the translational information into a translationally invariant form while preserving the important multiplicity information. These translational shifts in a spectrum are ordinarily produced by two different effects. The first effect is a solvent shift. Although it is a relatively small problem, molecules can absorb energy at different frequencies depending on the

FREQUENCY (Hz)

I

I

300

200

I IO0

i

0

FREQUENCY ( H i )

Figure 2. Calculated nmr spectra of ethyl groups with different chemical shifts.

solvent used. Of greater importance are the frequency shifts produced by structural differences. The large shifts in the two spectra in Figure 2 could arise only because of the different environments of the groups. These shifts represent important information and, even though the spectral data must be transformed so as to make pattern recognition possible, the chemical shift information must be preserved. Another problem that plagues an application to nmr (and actually to any kind of high resolution spectroscopy) is the sheer size of the spectrum vectors (Le., volume of information contained, much of which is redundant). Consider the digital recording of a cw 6O-lIHz nmr spectrum over the region 0-500 He. If the instrument resolution is 0.5 Hz and a reasonable representation of line shape is necessary, about 10 readings should be recorded per 0.5-Hz increment. This amounts to a spectrum vector containing 10,000 values. I n other words, the weight vector will need 10,000 parameters to separate the pattern vectors that, geometrically represented, are in a 10,000-dimensional hyperspace. Aside from the large amount of computer time necessary to calculate a weight vector (assuming such a calculation is possible) there exists an even larger problem. From an elementary knowledge of the theory of linear algebra, it is clear that perfect training results are guaranteed with a training set containing 5 10,000 patterns. There is serious doubt, however, that the resulting weight vector would perform acceptably on The Journal of Physical Chemiatrg, Vol. 76, No. 10, 1971

B. R. KOWALSKI AND C. A. REILLY

1406

height of a single resonance line and this is found by direct measurement of the plotted cw spectrum. It is essentially a measure of the ability of the instrument to identify two closely spaced lines. The second resolution or the “digital resolution’’ ( R )is the calibrated size of the stepping increment on the frequency axis during digitization. It is a measure of the faithfulness with which the digital spectrum represents the analog spectrum. Considerable information could be lost if digital

a 0.5-Hz resolution. It should be clear that R is an important quantity in any computer application and should be chosen carefully. This digital resolution is of particular importance to the transformation used by this study to eliminate the translational difficulties discussed in the last section. This transformation consists of autocorrelating the digitized nmr spectrum vector. The autocorrelation function A(z) of a continuous function F(f) is definedla as =

s,

F(f)F(f

+ x)df

(2)

where x in this case can take on any desired value. F(f) is the cw nmr spectrum which is a function of frequency (f) and F(f x) is the spectrum offset by x. The discrete form of this definition is

+

A(nR) = C F ( f ) F ( f f

+ nR),

rt = 0, 1, 2,

. . . (3)

where R is the digital resolution discussed above. Keeping n positive produces only half of this symmetric function. Autocorrelation has an interesting effect on a digital function F(f) that is most easily demonstrated by a simple example. The top part of Figure 3 is a simplified AB stick spectrum. It is important to remember that although such multiplets can be found almost anywhere on the frequency axis, there is a very definite internal structure which is essentially constant and independent of the distance from a reference. The lower part of Figure 3 shows the positive part of the autocorrelation function for this AB spectrum. The most important point in this figure is the fact that autocorrelating the AB pattern will produce exactly the same pattern no matter where the center of gravity of the AB speotrum is found on the frequency axis. The values of the autocorrelation function can be readily The Journal of Physical Chemistry, Vol. 76,No. 10, 1971

35 30 25 20 A(nR)

15

IO

b

-

-

5 -

I

0

found from definition 3 using the various values for CY, (Y p) where nonzero intensities exist. For example, for nR = 0, A(nR) is simply the sum of the squares of the peak intensities in the spectrum. Calculation of the autocorrelation of a digitized spectrum can be quite a costly process. The number of multiplications alone is approximately 1/2N2where N is the size of the spectrum. Fortunately, as will be explained later in this section, only a small part of the autocorrelation spectrum was needed for this study. However, where it is necessary to calculate the entire function, a faster procedure can be used. The procedure consists of taking the Fourier transform of the spectrum in order to obtain the complex waveform G(t)where

nR (0, p,

+

G(t) = JF(f)e-”2”ltdf

(4)

Next, G(t) is multiplied by its complex conjugate to form the power function )G(t)I2of the waveform. The power function is then inverse transformed as in ( 5 ) below to obtain the autocorrelation function. Thus, the autocorrelation function actually requires (5)

fewer mathematical operations and consequently less computer time than a straightforward autocorrelation procedure. The autocorrelation function provides a better pattern vector than does the original nmr spectrum vector. This is essentially due to the removal of translational variance from the data. A certain amount of information (phase) is indeed lost during the transform and this (13) R. Bracewell, “The Fourier Transform and Its Applications,” McGraw-Hill, New York, N. Y., 1965.

1407

NMRSPECTRAL INTERPRETATION BY PATTERN RECOGNITION

a 43F(f)

Iln,

2-

1-

a

I

a L

n

A

f-

35

b

30 -

25 20 A(nR)

-

15 10

B

5-

a 4 .

0

F

I

A

I

1

Figure

is consistent with the fact that the autocorrelation function does not have an inverse. A few observations are in order before proceeding. (1) The value of A(nR) at n = 0 is used to normalize A(&) at n = 1, 2,. . . This makes A(nR) of equivalent spectra recorded at different instrument gains the same. (2) The dimension of the pattern vector obtained is only one less than the original spectrum vector. (3) Line shape information is preserved. The use of the total autocorrelation function does not solve the problem of vectors with a large number of dimensions. Observation no. 2 above states that the dimensionality is not significantly reduced. The interesting point, however, is that there is a certain redundancy of information that can be eliminated from the autocorrelation function. This is not the case for simple examples such as the one in Figure 3. However, it becomes clearer by examining a spectrum with a large difference in chemical shifts between two multiplets. Figure 4 shows the AB spectrum exactly as in Figure 3 with the addition of a single line with unit intensity a t a distance y from the origin. The broken axis is meant to show that this singlet is a large distance from the p). The effect of the additional multiplet ( y >> 2a peak on the autocorrelation function is most interesting. The lower part (nR near 0) of the autocorrelation function is identical to that of Figure 3 while the upper part (nR near y ) is an exact reproduction of the original

+

spectrum. Unfortunately, this upper part suffers from the same translational problems as found in the original spectrum. This suggests using only the lower part of the autocorrelation function as a pattern vector. The problem of too large a pattern vector would thereby be solved. I n practice, a 500-He spectrum is autocorrelated and only the lowest 25-Hz portion is used for pattern formation. The 25-Ha range was chosen as a compromise between two conflicting factors. The first factor was the necessity for keeping the pattern vector dimension below 250. Since the regression analysis program operates on the correlation matrix stored in core memory, the above limit was set by core storage limits. The second factor was the desire to maintain high resolution by setting the digital resolution, R, equal to 0.1 He. A larger or smaller value of R could conceivably produce better results but with an instrument resolution of 0.5 Hz (obtainable on most high resolution instruments) an R of 0.1 Hz was deemed necessary. These two limits allowed a range of 25 Hz of the autocorrelation function. This range actually contains a contribution from every peak in most typical nmr spectra. This truncated autocorrelation function would seem to contain only coupling constant and multiplicity information but a closer examination shows that this is not necessarily so. Unless the spectrum is first order, some chemical shift information is actually retained in The Journal of Physical Chemaktry, Val, 76, N o . 10, 1971

1408 the form of deviations of peak intensities from those observed in a first-order nmr spectrum. This representation is best described by considering two possible extreme cases in nmr spectra. One extreme is a spectrum that contains only widely spaced singlets. This condition can occur when proton couplings are too small to be detectable. The autocorrelation function is clearly of little value in this case. The other extreme is the pure first-order splitting patterns obtained when the chemical shift differences are much larger than the observable constants. Only multiplicity and coupling constant information appears in the lower part of the autocorrelation function because all of the differences in chemical shifts are greater than 25 Hz. In between these two extremes, chemical shift information appears as extra lines or as deviations in peak intensities which come from distorted multiplet intensities in the original spectrum. An example of the combined benefits is illustrated by comparing Figure 2 with Figure 5. For the application of pattern recognition, it is clear that an intolerable distortion of the pattern (actually in the form of important information) is seen in Figure 2 going from the top spectrum to the bottom one simply by changing the two chemical shifts. Figure 5 shows the lower portion (2.0-25.0 Hz) of the normalized autocorrelation function for each spectrum of Figure 2. (The lowest 2.0 Hz is not shown and in actual practice is not used because it is redundant except for A(O).) Two important features should be noted. (1) The translational differences between the two spectra are eliminated in the truncated autocorrelation functions. (2) The chemical shift information is not lost but can be seen as differences in the shapes of the autocorrelation functions. The two subjects discussed in the previous section were (1) translational difficulties and (2) the large size of the spectrum vectors. Using the lower 2.0-25.0 Hz of the autocorrelation function as a pattern vector represents a significant step in solving these problems as the experimental results will show. Derivation of a Training Set. One important requirement of a training set for deriving weight vectors is that it must contain spectra from a representative distribution of compounds of practical interest. For many applications it would be a rather time consuming and expensive task to obtain such a set. Compounding this difficulty is the fact that, at this time, the authors know of no collection of high quality nmr spectra in a computer-compatible format (card, tape, etc.). Until such time as a large number of representative nmr spectra in digital form have been collected experimentally, such spectra can readily be simulated by means of any one of a number of well known computer program^.^ The stripped down noniterative part of the Feqguson-Marquardt version of program N M R I T ~was incorporated into a larger program, MRCALC, used to calcuThe Journal of Physical Chemistry, VoL 76,No. 10, 1071

€3. R. KOWALSKI AND C. A. REILLY i

I

FREQUENCY (Hz)

0

I

I

5

IO

I 15 FREQUENCY ( H z )

I 20

I

Figure 5. Lower portion (2.0-2.5 Hz) of autocorrelation functions for calculated ethyl group nmr spectra of Figure 2.

late the training set for this study. Program MRCALC needs the same information used by NMRITbut in the form of limits for the constants normally found in instrument run spectra. The program can calculate a number of spectra that will span the range of those normally found in an application. For example, in order to calculate representative spectra of ethyl groups, the program would be given the following information: (1) CHI chemical shifts that run between 50 and 150 Hz at intervals of 50 Hz (all frequencies in this paper refer to spectra run on a 60-MHz instrument); (2) CH2chemical shifts that start at a few Hertz above the CH3 shift and range to 275 Hz at intervals of 25 Hz; (3) the coupling constant between the two groups with possible values of 7.0,7.5, and 8.0 Hz. Program MRCALC proceeds to calculate a number of stick spectra (63 in the case of the ethyl group) that form a good distribution of the ethyl spectral patterns. The output from MRCALC in this study consisted of punched cards containing the frequencies and intensities of the lines in the calculated spectrum. These cards form the input to program MRDATA which in turn calculates the pattern vectors. The first step in the pattern formation process is autocorrelating the stick spectrum obtained from MRCALC. Since only 25 Hz is used in the pattern, the calculation was discontinued when nR = 25.0 ( R = 0.1 Hz). Applying the autocorrelation function to the spectrum before the line shape is added amounts to a calculational savings of several orders of magnitude and has been justified by

1409

NMRSPECTRAL INTERPRETATION BY PATTERN RECOGNITION experiment. A Gaussian line shape is next added to each line in the autocorrelation function of the stick spectrum. The half-width required was determined experimentally. The 230 intensities at every 0.1 HZ from 2.1 Hz t o 25.0 Hz become part of the pattern. I n order to introduce information on absolute chemical shifts into the pattern vectors, five moments, calculated over five 100-Hz frequency intervals (0-100 Hz, 100200 He, etc.) were also added to the pattern vectors. These five intervals were chosen because they were equal and they roughly corresponded t o frequency intervals of chemical interest. The moments M(f1,fi) were calculated over the intervals fl and fi by

M(f1,fd

=

5 I(nR)f(nR) 5 f(nR)

nR = f i

(6)

nR=fi

where the I(nR) array contained the spectral intensities relating directly t o the frequency array f(nR). The intensities were normalized to a constant while f(nR) ranged from 0.0 to 500.0 Hz, This gave a total pattern dimension of 235 with the augmentation t o add the constant making a 236-dimensional pattern space. Table I shows the actual structures and the number of spectra of these structures that were used for training. The nine different structural groups generated 634 patterns with each pattern having a total of 236 dimensions.

Table I : Structures for Data Base CHa-CHrCHa-CH< CHs-CHr-CHzCH3-CHa-CH< (CHa)z-CH-

63 63 63 64 84

-CHz-CHzCH3--CH=CH-CHr-CH=CH>CH-CH=CH-

45 84 84 84

At first glance it would appear that a pattern would change drastically if any group were connected t o another proton containing group instead of t o heteroatoms or carbons without protons. This is indeed the case in the original spectrum as addition of a -CH- to the chain would give rise to a new splitting pattern in the spectrum. Fortunately, the lower part of the autocorrelation function which makes up the majority of the pattern still retains the pattern structure. This is seen by comparing the two ethyl patterns in Figure 5 with the pattern calculated from the spectrum of l-nitropropane shown in Figure 6. They are quite similar even though the ethyl group in the n-propyl compound undergoes splitting by the second methylene. Figure 7 shows the pattern obtained by autocorrelating the spectrum of isopropyl cyanide which does not contain an ethyl group. The pattern is considerably different from the others shown in Figures 5 and 6. The most important

point demonstrated by a comparison of these spectra is that the spectrum of the basic molecular group is transformed to a pattern that dominates the autocorrelation function regardless of its environment but that the environmental information is still present to some extent. This makes the detection of the molecular group possible and allows for future determination of its environment.

Results and Discussion Several weight vectors were calculated and tested during the course of this study. The results were generally excellent indicating the effectiveness of the autocorrelation function for generating pattern vectors to describe spectral features in high-resolution nmr spectra. The weight vector trained to detect the presence of the n-propyl group, for example, easily detected the presence of the group in test spectra as long as the chemical shifts and coupling constants were within the limits used for calculating the training set. Table I1 shows the training results for four weight vectors. The first column indicates the structural group that the vector was trained to detect. The first ethyl vector was determined by placing only the 63 ethyl pattern vectors in the positive category (+1.0) and the remaining 571 pattern vectors in the negative category (- 1.0). The strategy behind this decision was that the spectral pattern of an ethyl group is considerably different from an n-propyl spectral pattern. The relatively poor results obtained by testing the vector (shown later) prompted further investigation. Close examination of the numerical values found for the pattern vectors indicated that the characteristic ethyl structure was present in both the n-propyl and CH3CH2CH1 pattern vectors. The second ethyl vector in the table was trained by defining ethyl, n-propyl, and CH&H&H1 pattern vectors as members of the positive category. The two remaining vectors are self-explanatory. The second column contains the number of dimensions that were actually calculated by the stepwise regression procedure. For example, the first ethyl weight vector used only 41 dimensions out of the 236 available. Even with these small numbers of dimensions, each weight vector was able to correctly classify every pattern vector in the training set. Being able to train a weight vector to classify 634 patterns using only 41 dimensions is indicative of a correct representation of the information in the pattern vectors. The “positive” and “negative” columns give the numerical accuracy for the classification of the training set into positive and negative categories. The first number is the average of all the responses after training (value of s in eq 2) of the members in that category. The number in parentheses is the standard deviation of this average for the category. The isopropyl weight vector is an ideal case in that the averages (0.94 and - 1.00) are close t o the desired responses of 1.0 and -1.0, respectively, and the The Journal of Physical Chemistry, Vol. 76, N o . 10, 1071

1410

B. R. KOWALSKI AND C. A. REILLY

0

I

I

I

I

5

IO

15

20

FREQUENCY (Hz)

Figure 6. Autocorrelation function obtained from the calculated nmr spectrum of l-nitropropane a t 60 MHa.

IO

5

0

15

20

25

FREQUENCY (Hz)

Figure 7. Autocorrelation function (2.1-25.0 Ha) obtained from the calculated nmr spectrum of isopropyl cyanide (60 Ha).

Table I1 : Training Results for Four Weight Vectorsa

Weight vector

No. of parameters used

Positive

Negative

(NO. 1) CHaCH2(NO.2) CHaCHzCHaCH2CHz(CHahCH-

41 41 44 47

0.60(0.44) 1.30(0.34) 0.84 (0.20) 0.94(0.18)

-1.04(0.26) -0.83(0.26) -0.98 (0.18) -1.00(0.12)

634 patterns in training set, 236 total dimensions available for each pattern vector. (I

standard deviations are small. Applying a 95% onesided confidence limit on each of these results would allow positive responses down to 0.58 and negative The Journal of Physical Chemistry, Vol. 76, No. 10, 1071

responses as large as -0.76. These distributions are far from overlapping (giving rise to incorrect responses) at high confidence levels. Since the cases tested gave correct responses using the midpoint (0.0) between the desired responses (+1.0 and - 1.0) as a decision surface, this value was used for all further tests. Of considerably more interest than tabulating correct responses from several test cases were the attempts to use the trained weight vectors for classifications beyond reasonable limits. These studies provided valuable information on the limits to which the weight vector could be used and still provide useful information. This was done by asking the weight vectors to classify test patterns significantly different from any of those in the training set. It is difficult to imagine anything but random results from these weight vectors when used

1411

NMRSPECTRAL INTERPRETATION BY PATTERN RECOGNITION to classify molecular types other than the nine contained in the training set. Table I11 shows the results of applying the four weight vectors of Table I1 to nine test cases. Incorrect responses (according to the training definition) are in parentheses. The most important point to note is that none of the test cases is in the training set. The first four test cases represent groups that are structurally the same as those contained in the training set. Correct responses are mandatory for -

~~

Table I11 : Application of Trained Weight Vectors to Unknown Test Cases

Test pattern

@ CHsCHgBr @ CHaCHzCHzNOz @ (CHa)zCHCN @ CHsCH&H