Application of a complex-valued nonlinear ... - ACS Publications

Applicationof a Complex-Valued Nonlinear. Discriminant Function to Low-Resolution Mass Spectra. J. B. Justice, Jr., D. N. Anderson, T. L. Isenhour,and...
0 downloads 0 Views 308KB Size
discussed this in detail. Effects of desorbing the gases from the sample cylinder walls have been neglected. However, the technique described above is also applicable to very low boiling point gases such as methane if they occur in concentrations of -20 ppm o r greater i n the sample. Other freeze-out techniques cannot, generally, d o this since a n analysis is performed only after concentration of impurities. Obviously, there is also a transition region of compounds where initially, say 50 ppb of a compound is present, but its vapor pressure a t 80°K is high enough for a significant fraction of the impurity in question to be pumped away and it in turn becomes undetectable. The data and analyses presented in this paper were done manually. However, there is no reason a computer would

not also perform these tasks. I t would be relatively effortless to design a program to group the amu’s which increase and decrease in magnitude simultaneously. Computer identification of these groups would then follow a program such as that described by Crawford and Morrison (9). ACKNOWLEDGMENT

The author thanks J. E. Dennison of Western Electric Company for making the permeation tubes.

RECEIVED March 13,1972. Accepted June 19,1972. (9) L. R. Crawford and J. D. Morrison, ANAL. CHEM.,43, 1790 (1971).

Application of a Complex-Valued Nonlinear Discriminant Function to Low-Resolution Mass Spectra J . B. Justice, Jr., D. N. Anderson, T. L. Isenhour, and J. C. Marshall’ Department of Chemistry, The Unicersity of North Carolina, Chapel Hill, N.C. 27514

PREVIOUS WORK has demonstrated the applicability of pattern recognition approaches to analysis of chemical data (1-1 1). An iterative technique applied negative feedback to develop a linear weight vector which was then used to predict molecular structure parameters from spectral data. However, a linear learning machine can only converge o n linearly separable data and chemical data are not always linearly related to the desired classifications. Nonlinear discriminant functions have been reported (1214), but they usually require either excessive computational time or storage when applied to high dimensionality data such as mass spectra. The complex-valued nonlinear discriminant function (CNDF) used in this work is based o n a generalized Walsh transform chosen for its proven ability in low-dimenPresent address, Department of Chemistry, Saint Olaf College, Northfield, Minn. 55057. (1) P. C. Jurs, B. R. Kowalski, and T. L. Isenhour, ANAL. CHEM., 41, 21 (1969). (2) P. C. Jurs, B. R. Kowalski, T. L. Isenhour, and C. N. Reilley, ibid., p 690. (3) B. R. Kowalski, P. C . Jurs, T. L. Isenhour, and C . N. Reilley, ibid., p 695. (4) P. C . Jurs, B. R. Kowalski, T. L. Isenhour, and C. N. Reilley, ibid., p 1945. ( 5 ) Ibid., p 1949. ( 6 ) L. E. Wangen and T. L. Isenhour, ibid., 42, 737 (1970). (7) P. C. Jurs, B. R. Kowalski, T. L.Isenhour, and C. N. Reilley, ibid., p 1387. (8) T. L. Isenhour and P. C. Jurs, ibid., 42 (lo), 20A (1971). (9) L. E. Wangen, N. M. Frew, T. L. Isenhour, and P. C. Jurs, Appl. Spectrosc., 25, 203 (1971). (10) P. C. Jurs, ANAL.CHEM., 42, 1633 (1970). (11) Ibid., 43, 22 (1971). (12) G . S . Sebestyen, “Decision-Making Processes in Pattern

sional pattern recognition work (15). The nonlinear decision surface is easily modified as the data set is enlarged. Also the number of computations increases only linearly with the size of the data set. Hence, very large data sets are no more difficult to train upon than small ones. The C N D F is constructed o n a single pass through the training set and requires storage of only one spectrum a t a time. DATA AND COMPUTATION

Computations were done o n the Triangle Universities Computation Center (TUCC) IBM 370/165 using Fortran IV computer programs. The data set consisted of 630 low-resolution mass spectra taken from the American Petroleum Institute Research Project 44 tables. Compounds were in the range Cl-lo, O0-.,, Of these, 387 were CH compounds a n d 243 were CHON compounds. One hundred and nineteen dimensions of the mass spectra were used for training. Calculated weight vectors were stored o n disk for later retrieval and use. CONSTRUCTION ON THE DISCRIMINANT FUNCTION

The C N D F makes use of a generalized Walsh transform to construct the discriminant function. F o r a first order generalized Walsh transform, in which the spectra are allowed 50 integer intensities ranging from 0 to 49, each dimension of the pattern or spectrum is transformed by the relation

where I is the intensity a t each mass position in the spectrum. Then @(x) is the vector representing the transform of all intensities (dimensions) in the mass spectrum.

Recognition,” The Macmillan Co., New York, N.Y., 1962.

@,(XI =

(13) N. J. Nilsson, “Learning Machines,” McGraw Hill Book Co.,

(T(G), W ) , ’ , W n ) ) ’



(2)

New York, N.Y., 1965. (14) D. F. Specht, IEEE Trans. Elecrron Cornput., EC-16, 308

(1967).

(15) Y. Uesaka, ZEEE Trans. Sysr. Sei. Cybern., SMC-1, 194

(1971). ANALYTICAL CHEMISTRY, VOL. 44, NO. 12, OCTOBER 1972

2087

~~

~~

~

Table I.

Predictive Ability of CNDF Z in Positive Negative larger cutoff category" category category Theta Oxygen 1 456 174 72.4 -0,622 2 544 86 86.4 -0.326 Carbonyl 76 87.9 -1.614 1 554 Nitrogen 1 549 81 87.1 -1.111 Amine 1 572 58 90.8 -1.422 -C==C241 61.8 -0.429 1 389 2 522 108 82.9 -1.466 3 555 75 88.1 -2.014 476 154 75.6 -1.955 C~HZ, 89 86.0 -1.777 541 Cr"n+~ Methyl 1 87 543 82.8 1.022 Ethyl 1 341 287 54.5 -0.148 107 83.1 -0,340 2 523 Phenyl 1 568 62 90.2 -2.311 5 105 525 83.4 0.458 Carbon 6 183 447 71 .O 0.177 7 279 351 55.7 0,088 273 56.7 0.014 8 357 9 446 184 70.8 0.192 10 544 86 86.4 0.177 Hydrogen 9 135 495 78.6 0.311 11 225 405 64.3 0.148 13 317 313 50.3 -0,888 15 425 203 67.5 -0.800 17 501 129 79.5 -0,666 19 554 76 88.0 -0,844 a Positive category contains compounds whose number of functional groups is less than the cutoff.

A discriminant function using @(x) can be constructed having the form ~ ( x =)

e

+ w * @(x)

(3)

W is simply the vector sum of all @(x) of the training spectra, given by (4) where a and b are the number of spectra in category A and B, respectively, and WA and WB are the weight vector components resulting from vector summation of transformed spectra of compounds in category A and B, respectively. W* is the conjugate transpose of W, obtained by changing the sign of the imaginary part of the complex number. Since W is a vector, transposing it involves n o actual operation, but maintains mathematical uniformity with matrix notation. 8 is a constant related to the relative sizes and variances of training set categories A and B . F(x) is then a complex number and is nonlinear with respect to the components of the mass spectrum. For prediction, compounds with positive real parts of F(x) are classified in one category and those with negative F(x) are put in the other. From Equation 3, it is seen that the discriminant function requires the calculation of @(x) and W*. Since the transformed spectral intensities can assume only the 50 values given by Equation 1, these intensities may be calculated and stored for use in calculating the decision surface, rather than recalculating them for each new spectrum, thereby greatly reducing computation time. W is calculated directly by Equation 4. The construction of the CNDF for mass spectra interpretation is implemented by taking two classes of compounds and transforming the mass spectra of the compounds in each class according to Equation 1. Considering one of the training set classes to be positive and the other negative, the transformed 2088

ANALYTICAL CHEMISTRY, VOL. 44,

Prediction no normalization 87.6 90.0 87.9 88.3 91.4 80.0 94.9 98.3 91.6 95.9 87.5 71.4 86.5 96.5 91.6 84.0 77.3 84.3 86.7 90.5 85.6 80.2 78.9 85.6 87.8 92.1

Z Prediction sum normalization 88.3 90.0 88.7 90.3 92.9 82.5 95.2 98.3 96.8 95.6 88.7 77.1 86.0 96.8 92.1 86.4 85.2 88.1 85.4 90.5 87.8 81.6 77.8 84.0 87.0 92.1

spectra are summed algebraically to form a weight vector W, which may then be used to predict the category of compounds not included in the training set by using Equation 3. F o r example, if the negative class consists of compounds whose molecular formula is given by C,H2, and the positive class consists of all other compounds, a compound whose calculated F(x) is less than zero is predicted to have the molecular formula C,H?,. In this manner predictions were made o n 630 compounds using the categories listed in Table I. Overall prediction percentage was determined by training on all 630 compounds followed by subtracting the contribution to the weight vector of the compound to be predicted on and predicting the category of the compound. The compound's contribution to W was then added in again and the contribution of the next compound was subtracted. All 630 compounds were predicted on using the above method. The training set size was therefore 629 compounds. Theta was determined by simultaneously predicting with a range of increments about zero added to the product of W*@(x). These increments were plotted cs. the per cent predicted correctly and the maximum in the curve was taken as the optimum theta. Table I lists the optimum theta for each question. RESULTS AND DISCUSSION

Table I shows the results of predictions based o n the transformation of the original spectra and on sum normalized transformed spectra. The sum normalization consisted of setting the sum of peak intensities for each compound equal to 100 and recalculating each peak accordingly. Any intensity greater than 49 was set equal to 49. The improvement in prediction resulting from setting the sum of intensities equal to 100 is interpreted as being due to the influence of individual compounds on the weight vector. Table I includes the per cent of total compounds in the larger category for each question. Since by always predicting that a

NO. 12, OCTOBER 1972

number o f compounds

FtX)

distance

f r o m decision surface

Figure 1. Distribution of 630 compounds as a function of distance from decision surface. Decision surface constructed for distinguishing C,Hz,compounds from nonC,Hz, compounds

compound belongs in the larger category, one could predict at a level equal to the per cent in the larger category, this value is a guide to whether the discriminant function has learned anything about the categories in question. The greater the difference in prediction percentage and per cent in larger category, the better able the discriminant function is to differentiate two classes of compounds. Thus while the C N D F was not able to improve much on prediction of the carbonyl functional group, (87.9 6s. 88.7), it was significantly higher on questions such as detection and number of double bonds. The per cent predicted correctly for a given question can be used as an indication of the confidence to be placed in an individual prediction. For example, a prediction o n the presence of the phenyl group (96.8 %) is more likely to be right than prediction on the presence of nitrogen (90.3z). This can be further refined. Figure 1 is a plot of the distributions of two classes (C,H2,) compounds 6s. non-C,H2, compounds as a function of F(x). One would have the least confidence in a prediction based on a value falling very close to the decision surface. As the distance between F(x) and the decision surface increases, the more confidence one has in the prediction. This is particularly useful when making a variety of predictions on individual compounds. Conflicting predictions can be resolved by using the prediction which has the greatest probability of being correct, i.e., the prediction having the greatest distance between F(x) and the decision surface. This, of course, does not guarantee a correct decision, as the distribution shows. The C N D F was also tested on the real, imaginary, and phase parts of Fourier transformed mass spectra. The results were comparable except for the phase part, which was lower (1-8z)on most questions. However, the phase prediction percentage was 92.2 for nitrogen and 95.1 for amines, a significant improvement. The improvement in nitrogen

questions seems understandable because of the “odd mass” effect of nitrogen, causing peaks resulting from fragments containing nitrogen to be out of phase with non-nitrogen compound peaks. By using the CNDF, the size of the training set is limited only by the size of the data set. Also, the C N D F can be revised easily without referencing previously included spectra as new compounds are added to the data set or old ones eliminated. The above features, plus the ability to separate linearly inseparable data, constitute sufficient reasons to warrant further investigation of the CNDF. RECEIVED for review February 17, 1972. Accepted May 15, 1972. The financial support of the National Science Foundation is gratefully acknowledged.

COR R ECTlON Simple Equation for Linearization of Data in Differential Scanning Calorimetric Purity Determinations In this paper by David L. Sondack [ANAL.CHEM.,44, 888 (1972)], an error appeared in Equation 4 in publication. The correct equation is

ANALYTICAL CHEMISTRY, VOL. 44, NO. 12, OCTOBER 1972

e

2089