Multiclass linear classifier for spectral interpretation (pattern recognition)

Classification of Mass Spectra via Pattern Recognition. James R. McGill and B. R. Kowalski. Journal of Chemical Information and Computer Sciences 1978...
0 downloads 0 Views 344KB Size
In the gas-flow type cryostat, the pressure is atmospheric, and the teiaperature can be measured equivalently in the sample at the sample holder or in the chamber and it was controllable in all cases with a precision to fl "C. The spectra measured at the same temperature were reproducible within *270 in intensity.

RESULTS AND DISCUSSION Figure 4 shows spectra taken while using the gas-flow cryostat. The increasingly good resolution is due to the cold effect (The spectroscopic interpretation will be published later). In Figure 5 , the best result obtained with the heat-conducting cryostat can be seen. In this case, the copper transfer block and non-illuminated parts of the sample were kept a t the experimental temperature of -190 "C. The measured spectrum thus corresponds to a spectrum a t about -50 "C. The figures demonstrate clearly that the gas-flow technique is the more effective. With this technique, the tem-

perature can be set in less than 10 seconds per "C.The conductive cryostat does not work satisfactorily if the contact between heat transfer block and sample is imperfect or if the sample material is of poor thermal conductivity. Generally neither of these conditions is fulfilled. It can be concluded that the spectroscopic data obtained a t temperatures cryostated with heat-conducting devices are of questionable reliability. Reports in the literature of small temperature effects may be therefore attributable rather to poor construction of the cryostat than to temperature insensitivity of the material.

ACKNOWLEDGMENT The authors wish to thank J. Balla and I. Peter for helpful discussions and Mrs. B. Szab6 and B. Bencze for the useful technical assistance. Received for review February 1, 1973. Accepted August 17,1973.

Multiclass Linear Classifier for Spectral Interpretation (Pattern Recognition) C. F. Bender Lawrence Livermore Laboratory, University of California, Livermore, Calif. 94550

B. R. Kowalski Department of Chemistry, University of Washington, Seattle, Wash. 98795

Linear classifiers were originally designed to construct decision hyperplanes for binary (yesjno) decisions ( I ) . A linear classifier was the first pattern recognition technique applied to the interpretation of spectroscopic data (2). With this technique, a number of binary decisions were used to predict structural information directly from lowresolution mass spectra. Although the problem is multiclass in nature, a complicated use of the linear classifier gave very encouraging results. Difficulties arise in trying to define multiclass linear machines ( I ) ; hence, a considerable effort has been spent on improving the linear, binary classifier (3). Recently we used a multiclass technique, the K-Nearest Neighbor Rule ( 4 ) , for the interpretation of NMR spectra. For this particular application, the multiclass technique out-performed the technique using linear classifiers as outlined above. The K-Nearest Neighbor Rule is expensive to use and not suited to small laboratory computers. This problem has been partially solved by reducing the number of variables (5, 6), but the K-Nearest Neighbor Rule is still best suited to computers with a large amount of fast storage. (1) N . J . Nilsson. "Learning Machines." McGraw-Hill, N e w York, N . Y . . 1965. (2) P. C. Jurs, 8 . R. Kowalski, and T . L. Isenhour. Anal. Chem., 41, 21 (1969). . (3) L. E. Wangen, N. W. Frew. and T . L. isenhour, Anal. Chem., 43, 845 (1971). (4) 8.R . Kowalski and C. F . Bender, Anal. Chem., 44, 1405 (1972). (5) C. F. Bender and 6.R . Kowalski, Anal. Chem., 45, 590 (1973). (6) C. F. Bender a n d H. D. Shepherd, and B. R . Kowalski, Anal. Chem., 45, 617 (1973).

294

The purpose of this note is to present an efficient multiclass linear classifier which can be easily applied, yet retains the power and simplicity of the binary classifier. This is accomplished by compromising linear separability for interclass separability.

DEFINITIONS AND METHOD A pattern space, X , is a collection of patterns, X,

XI, X,,XI, 1.. X,P X*l x,x, ... X,P

\ XMl.............. where Xil, is the ith variable of the pth pattern. In spectral analysis, each spectrum is a pattern and the variables are related to the spectral intensities and positions of the peaks. A class is a collection of patterns in which all members have a common feature. Pattern recognition methods are used to extract this feature from the variables. Binary decisions are made using a linear classifier by noting the value of the dot product of a weight vector, W, and a pattern X,,

s,

=

x,+w

(2)

Here + denotes matrix transpose. If S, is greater than some number, SO, the decision is yes, and if S , is less

ANALYTICAL C H E M I S T R Y , VOL. 46, NO. 2, F E B R U A R Y 1974

than or equal to SO,the “machine” response is no. Often the dimensionality of W is increased by one and a row of 1’s is added to the pattern space, thereby allowing SO t o be set to zero. Of the many possible methods for determining the weight vector (7) only two have appeared in the chemical literature, negative feed-back (2) and leastsquares (8). Except for one recent study ( 9 ) , only limited success has been attained in extending the binary classifier to handle the multiclass case. The difficulty usually arises in requiring each class to be linearly separable from all other classes. Figure 1 shows an example in which this is possible (linear classifiers in two dimensions generate lines); line I separates region A from regions B and C, line I1 separates region B from region C, while line I11 separates region C from regions A and B. Figure 2 shows a case in which the regions are not linearly separable using the above definition; region B cannot be separated from regions A and C by using one straight line. The technique presented in this paper does not require the notion of linear separability but does require interclass separability. Binary decisions are used to separate each class from each of the other classes. If there are n classes, then n ( n - 1)/2 weight vectors are calculated. Enumeration of the number of positive votes for the proper class yields n - 1 and less for all others. This can be easily seen in the first example (Figure l), by using line I as the A-C separator, line I1 as the A-B separator, and line I11 as the B-C separator. Notice that now the regions in the second example (Figure 2) can also be correctly classified using line I as the A-B separator, line I1 as the A-C separator, and line I11 as the B-C separator. Clearly, a pattern in class A would be classified “A” by separator I, “A” by separator 11, and “B” by separator 111. Hence, the overall classification would be correctly determined by majority vote as class A (3 - 1 = 2). The linear separation problem is further complicated when attempts are made to interpret chemical information because the classes cannot be totally separated. In fact, the pattern space of hydrocarbons having increasing carbon number has a remarkable resemblance to the second example. Nonseparability does not create a great difficulty for the present method since a majority rule can be invoked. There is one drawback, however; such techniques can lead to “null” classifications ( i . e . ,ties).

EXPERIMENTAL The data used were the same as those in the optimum variable study (5) and consisted of low resolution mass spectra for hydrocarbons containing six, seven, and eight carbon atoms. The “unmeasurable’’ common property to be extracted was the carbon number. For each class, 40 spectra were used in the training set and 10 for evaluation. There were a total of 120 spectra in the training set and 30 spectra in the evaluation set. As in similar studies, moments were used to represent each spectrum. For each nonzero intensity, the mass to charge ratio ( u , ) and the square root of the intensity ( l o L ) were employed to calculate the ten moments used.

X , = ACw,u, ,

(3)

X , = BCw,v,

(5)

I

(7) G. Nagy, Proc. /€E€,5, 836 (1968). (8) B. R. Kowalski, P. C. Jurs, T. L. Isenhour, and C. N. Reilley. A n d Chem., 41, 695 (1969). (9) N. M . Frew, L. E. Wangen, and T. L. Isenhour, Pattern Recognition, 3, 281 (1971).

Figure 1.

Linearly separable regions

( A , B, C )

in two dimensions

I

I

’TII

b Figure 2.

Regions

( A , B,

,

C) which are not linearly separable

where

The merits of such a representation have been discussed elsewhere (6). Three preprocessing techniques were applied to the moments data. By using more than one representation of the spectra, the effectiveness of the multiclass technique can be evaluated in greater depth. The first preprocessing technique was autoscaling (10); here each variable is scaled to have unit variance and the average value of the variable is shifted to zero. By multiplying each autoscaled variable by a “class-separating” weight (IO),the second set of preprocessed data was generated. In this case the variances of more important (in a class separating sense) variables are increased, while less important variances are decreased. Finally, a recently developed ( 5 ) linear transformation of the variables was used. The variables defined by this technique are optimum for separating classes and have proved to be most effective for the interpretation of mass spectral data.

RESULTS AND DISCUSSION Table I presents the results of applying three classification techniques to the three sets of preprocessed mass spectral data described in the last section. The first classification method was the linear classifier ( I ) where the weight vector was found by a k a s t squares procedure (a) and also by the feedback procedure ( b ) . The first number represents the classification performance (per cent correct) obtained for the training set and the second number is the evaluation set performance. Patterns in the evaluation set were not used to train the classifiers and are considered as true unknowns. The poor performance for the linear classifier is due to the nonlinear separability of the data used in this study. Also, since the least squares (10) B. R . Kowalskl and C. F. Bender, J. Arner. Chem. (1972).

ANALYTICAL C H E M I S T R Y , VOL. 46,

NO. 2,

SOC.,04,

F E B R U A R Y 1974

5832

295

Table I. Per C e n t Correct Classifications (Training Set/Evaluation Set) Preprocessing technique

Classification method

I. Linear classifier a) Least-squares b) Negative feedback 11. 3-Nearest Neighbor 111. Multiclass classifier a ) Least-squares b) Negative feedback

Autoscale

Weighted autoscale

Optimum linear transformation

83/83

83/83

83/83

56/83 80/60

77/70 89/80

74/87 92/97

91/87

91/87

91/87

94/90

93/90

78/90

method is invariant to all linear transformations, the resuks are the same for the three preprocessed sets of mass spectral data. The second classification method used in this study was the K-Nearest Neighbor Classification Rule ( 4 ) with K

equal to three. This method is a multiclass method that does not depend upon linear separability. Hence, classification performance is improved in the last two sets of preprocessed data. The attributes and limitations of this method can be found in the chemical literature ( 4 ) . The results of the multiclass classifier (III) introduced in this paper are also found in Table I. Here again, the least squares procedure ( a ) and the error correction feedback procedure ( b ) were used to calculate the necessary weight vectors. The multiclass procedure performed very well. The overall performance indicates that the least squares procedure for calculating the weight vector is best. Again, note that least squares solutions are unique and are invariant to all linear transformations of the data. These attributes recommend the least squares multiclass procedure for applications which involve more than two classes. The method is a t least as effective as other linear classifiers and comparable in accuracy to the more expensive K-Nearest Neighbor Rule. Received for review February 26, 1973. Accepted August 27, 1973. Work performed under the auspices of the U. S. Atomic Energy Commission.

Identification of Heroin and Its Diluents by Chemical Ionization Mass Spectroscopy Jew-Ming Chao, Richard Saferstein, and John Manura N2.w Jersey State Police, Forensic Science Bureau, West Trenton, N.J. 08625

Forensic laboratories currently use a variety of techniques to identify illicit seizures of heroin (diacetylmorphine). These methods include color and microcrystal tests, absorption spectrophotometry, thin-layer and gas chromatography ( I ) , as well as electron impact (EI) mass spectroscopy (2). However, no one of the above techniques in itself combines the speed, accuracy, and sensitivity that is necessary for an identification of heroin and its organic diluents. The possible presence of numerous organic components in an illicit heroin mixture, will almost always preclude the examination of the powder directly in the E1 mass spectrometer and therefore necessitates interfacing the mass spectrometer to a gas chromatograph. Increasingly, forensic laboratories are being required to identify all the components of an illicit drug mixture. This analysis may provide investigating authorities with valuable intelligence information regarding the illicit material's synthesis and origin. The application of chemical ionization (CI) mass spectroscopy to drug identification has recently been reported (3-7). This technique has now been utilized as a rapid C. Clarke, "Isolation and Identification of Drugs," Pharmaceutical Press, London, 1969. G . R . Nakamura, T. T. Noguchi, D. Jackson, and D. Banks, Anal. Chem., 44, 408 (1972). G. W. A . Milne. H. M. Fales, and T. Axenrod, Anal. Chem., 43,

(1) E. G. (2) (3)

1815 (1971). (4) H . M . Fales. G. W. A. Milne, and T. Axenrod, Anal. Chem., 42, 1432 (1970). ( 5 ) D. F. H u n t and J . F. Ryan, Anal. Chem., 44, 1306 (1972) (6) R . L. Foltz, M. W. Couch, M . Geer. K. N. Scott, and C. M . Williams, Biochem. Med., 6, 294 (1972) (7) R . Saferstein and J. Chao. J. Ass. Offic. Anal. Chern., 56, 1234 (1973).

296

and sensitive means of identifying heroin and its common diluents. The procedure requires no sample preparation or prior chromatographic treatment, and its sensitivity permits a direct and rapid identification of microgram quantities of illicit heroin preparations. EXPERIMENTAL Apparatus. A Du Pont 21-490 single focusing mass spectrophotometer equipped with a dual EI/CI source was used. The instrument has a resolution of 600 with 10% valley, a 90" magnetic sector, and is equipped with differential pumping. The ,reagent gas was isobutane 199.9% ourel. The source was operated a t a pressure of 0.5-1 Torr and at a temperature of 200 f 10 "C. The ionizing voltage was set a t 300 eV in the CI mode. Procedure. Approximately a microgram of the illicit powder was added to a capillary tube. The tube was introduced by the direct probe of the mass spectrometer and the probe temperature was raised to 200 "C. Scans were taken at a rate of 10 sec/decade after 1and 2 minutes.

RESULTS AND DISCUSSION The application of CI mass spectroscopy to forensic identification lies in the ability of the operator to control the complexity of the spectra that are generated through the choice of the CI reagent gas. The ionization process can occur through a charge or proton transfer processes, depending on the nature of the reagent gas. The former results in spectra resembling that of conventional E1 spectroscopy, the latter produces spectra that are generally less complex. As the present study has as its objective the identification of heroin in the presence of its diluents, isobutane was the reagent gas of choice. This gas has previously been demonstrated as having yielded the least

A N A L Y T I C A L C H E M I S T R Y , VOL. 46, NO. 2, F E B R U A R Y 1974