pattern recognition

ably beneficiary, a speculation borne out by the high degree of reproducibility encountered in this experiment. All pyrolysis products volatile enough...
5 downloads 0 Views 320KB Size
the reaction mechanisms of solid phase pyrolysis are poorly understood, even for comparatively simple substances (15), the net effect of a low pressure environment on the course and reproducibility of the pyrolysis process is hard to predict. Nevertheless any effect that would be noticeable at all is probably beneficiary, a speculation borne out by the high degree of reproducibility encountered in this experiment. All pyrolysis products volatile enough to pass through the expansion chamber and subsequently ionized in the ion source, will contribute to the final mass spectrum at the appropriate mje values. With Py-GLC, this situation is different because of the chemical selection imposed by the column, which may exclude specific chemical classes-e.g., free acids or free amines-from contributing to the obtained fingerprint. I n a separate experiment, the existence of such classes of bacterial pyrolysis products was established by high resolution field ionization mass spectrometry in cooperation with a group a t the University of Bonn, Germany (16). The advantage of high analysis speed needs little comment. Thirty seconds after initiation of the pyrolysis reaction, the complete spectrum is recorded in digital form in the signal averager memory and if provisions are made for fast computer readout of this memory and the sample introduction system is modified so as to permit the introduction of several samples a t once, then the effective analysis time should be less than 1 minute. This is about 30-60 times as fast as can be accomplished by Py-GLC when taking into account that the use of high resolution columns and wide range temperature programming will generally be necessary for complex biological samples. Because of the stable and linear mass scale of the quadrupole spectrometer, which is not plagued by problems analogous to column deterioration by Py-GLC or different performance of replacement columns, the ease of data-processing is striking in comparison with Py-GLC. Moreover, the subject of computer coding, filing, retrieving, and matching of mass spectra has received widespread (15) J. Q. Walker and C. J. Wolf, J. Clirornafogr.Sci., 8, 513 (1970). (16) H. R. Schulten, H. D. Beckey, H. L. C. Meuzelaar, and A. J. H.

Boerboom, ANAL.CHEM.,45, 191 (1973).

attention in literature during the past few years (17) and PyMS may certainly benefit from the activities in this field. If Py-MS thus appears to be a promising method for fingerprinting of complex biological samples, it seems nevertheless regrettable that virtually all information o n the chemical composition of the sample, in principle obtainable from the pyrolysis process (It?), is lost in the electron impact ionization and fragmentation processes occurring in the ion source of the quadrupole. The use of ionization techniques causing negligible fragmentation such as field ionization (19), chemical ionization .(20), or low-voltage electron-impact ionization (21) seems to be the obvious experimental approach to this problem. Apart from the earlier mentioned experiments with high resolution field ionization mass spectrometry ( I @ , we are at present engaged in low-voltage electron-impact ionization studies, which also encompass higher mass ranges than the study reported here (22). ACKNOWLEDGMENT The authors thank J. Kistemaker and A. J. H. Boerboom for their invaluable support and advice. They also gratefully acknowledge the expert assistance of M. Hoogervorst, W. J. Barsingerhorn, and R. Heubers, with the design and construction of the pyrolysis mass spectrometry system.

RECEIVED for review July 17, 1972. Accepted October 16, 1972. This research is part of a project sponsored by the Organization for Fundamental Research on Matter (FOM) and the Dutch Ministry of Health. (1 7) H. S. Hertz, R. A. Hites, and K. Biemann, ANAL.C H E M . ,681 ~~,

(1971). (18) W. Simon, P. Kriemler, J. A. Vollmin, and H. Steiner, J. Gas Chromatogr., 5 5 3 (1967). (19) H. D. Beckey, “Field Ionization Mass Spectrometry,” Pergamon Press, Oxford, and Akademie Verlag, Berlin, 1971. (20) M. S. B. Munson, ANAL.CHEM., 43 (13), 28A (1971). (21) F. H. Field and H. S. Hastings, ibid.,28, 1248 (1956). (22) H. L. C. Meuzelaar, M. A. Posthumus, P. G. Kistemaker, and J. Kistemaker, ANAL.CHEM.,in press.

Construction of Optimum Variables for Spectral Interpretation (Pattern Recognition) C. F. Bender and B. R. Kowalski’ Lawrence Licemore Laboratory, Licermore, Calif. 94550 PATTERN RECOGNITION techniques have recently been used to extract chemical information from spectroscopic data ( I ) . Basically the problem is the following: Using spectra where the sought-for property is known, construct a rule for “recognizing’’ the property (or classification). The adopted rule can then be applied for predictive purposes o n unknown spectra. Although pattern recognition techniques have proved to be quite effective for this type of problem, a few difficulties Present address, Department of Chemistry, Colorado State University, Fort Collins, Colo.

still remain. The most pressing problem is related to the fact that for most classification problems, standard representation of the spectral information does not lead to linearly separable subspaces. This can cause erroneous predictions from linear learning machines (2). In some cases, clusters d o not even exist and hence a nearest neighbor technique (3) will also yield misleading classifications. This problem can be minimized by improving the classification rule or changing the representation of the information. (2) N. J. Nilsson, “Learning Machines,” McGraw-Hill, New York,

(1) T. L. Isenhour and P. C. Jurs, ANAL.CHEM.,43 (lo), 20A (1971); also see references within.

590

ANALYTICAL CHEMISTRY, VOL. 45, NO. 3, MARCH 1973

N.Y., 1965. (3) B. R. Kowalski and C. F. Bender, ANAL.CHEM., 44,1405 (1972).

Studies in both areas have appeared in the chemical literature. Wangen ( 4 ) et al. presented an improved linear classifier which utilizes a no-classification region. Classifications were greatly enhanced for interpretation of mass spectral data. Jurs (5)and Wangen et al. ( 6 )have recently applied the Fourier Transform to mass spectra data; the results of both studies were encouraging. Our contraction of mass spectra to ten variables (7) also showed improved classification performance. The latter three studies were concerned with changes of variables independent of the classification information. The purpose of this note is to introduce a technique that defines optimum variables for a particular type of classification. In the mass spectrum of a compound, there are certain measurements which can be used to define the spectra. Often the intensities at each m/e value are used as the measurements. These measurements constitute the cariables of the problem. A compact representation for the list of variables associated with the pth spectrum, X,, is called a pattern, and is written as follows :

x, =

();

(7 :.,,,)

Here X t P is the ith variable of the pth mass spectrum complete pattern space, X ,is defined as

. . .. . . .

e

T v ‘

e

b

T

T

v

T

* .

‘ I

v *

T

e

T

w

v

v

v v

T

v T

v T

(1)

Figure 1. Two-dimensional representation of original hydrocarbon data (C6-q C7-., C8-A)

The

b

e

x,, x,2.. . .


j ) , the class separators are calculated as follows. A sub-pattern space, Y, is defined which includes only patterns in class i or class j . An outcome column matrix G i , is then defined for

v

T

a

T

T

vv r,

a

v T

Figure 2. Two-dimensional representation of transformed hydrocarbon data (C6-., C7-@, c8-A) each pattern in Y, i.e., Gij

=

-1 for Y, in class i +1 for

(4)

Y, in classj

The class-separating variable is defined as :4) L. E. Wangen, N. W. Frew, and T. L. Isenhour, ANAL.CHEM., 43, 845 (1971). ( 5 ) P. C. Jurs, ibid.,p 1812. ( 6 ) L. E. Wangen, N. W. Frew, T. L. Isenhour, a n d P. C . Jurs, Appl. Spectrosc.. 25,203 (1971). (7) C. F. Bender and B. R. Kowalski, ANAL.CHEM., 45, in press.

(8) H. Margenan a n d G. M. Murphy, “The Mathematics of Physics and Chemistry.” D. Van Nostrand, New York, N.Y., 1964. (9) G. S. Sebestyen, “Decision-Making Processes in Pattern Recognition.” Macmillan, New York, N.Y., 1962.

X,’

=

X,TW

(5)

where p now includes all patterns in X. The transformation, W, is given by W

=

(YY‘)-‘YGZ3

(6)

where - 1 denotes the inverse of the matrix (YYt). This is the least squares solution for the overdetermined set of linear ANALYTICAL CHEMISTRY, VOL. 45, NO. 3, MARCH 1973

0

591

X, = AZWi(Ui - Xi)’, k

Table I. Classification Performance 10 dimensions 3 dimensions 1-Nearest Neighbor Training set 102/120 1131120 Evaluation set 23/30 28/30 3-Nearest Neighbor Training set 961120 110/120 Evaluation set 18/30 29/30

B

(7) The above process is repeated for all pairs. For the n-class problem, n(n - 1)/2 alternative variables will be defined. Finally, to equally weigh each new variable, the new pattern space is autoscaled (10). DATA

To test the technique, three classes of hydrocarbon spectra were used; the compounds within each class contained six, seven, and eight carbon atoms (denoted Cs, C;, and C8 in the figures), respectively. The spectra were selected randomly from our computerized low-resolution mass spectra library. For each class, forty spectra were used in the training set and ten were used for the evaluation set. Rather than use the intensities for each m/e value as the variables, moments were used. The recognition accuracy of similar representations has been discussed elsewhere (7). For each non-zero intensity the m/e (c,) and the square root of the intelisiiy ( W , ) were used to calculate the moments. Ten variables were defined for each spectrum.

Xi

=

A Z Wioi

=

=

2,. . . 5

(9)

l./2ci

This alternative representation slightly degrades the performance of supervised classification (no more than 5 % loss), but since the purpose of this study was to develop a method for optimum use of variables, such a loss was considered tolerable. Figure 1 shows a two-dimensional representation of the tendimensional space, using the original data. RESULTS AND CONCLUSIONS

The ten-dimensional space was reduced to three dimensions by the above-mentioned transformation. Figure 2 shows a two-dimensional representation of the new pattern space. Clearly some of the overlap between classes has been eliminated. Since the data do not appear to be linearly separable, the k-nearest neighbor classification (3) was used for evaluation of the transformation. Table I gives a comparison of the ten- and three-dimensional k-nearest neighbor classification. For this study k = 1 and k = 3 classifications were made. The results are most encouraging for two reasons; first, the performance of the classifier was greatly enhanced (at least 9 higher) and, second, a great reduction in the number of variables has been attained. The latter feature may have many implications for designing spectral information retrieval systems. Although the example given here concerns mass spectra, the same technique has proved useful in numerous other pattern recognition applications in our Laboratory.

(8)

(10) B. R. Kowalski and C. F. Bender, J. Amer. Chem. SOC.,94, 5632 (1972).

RECEIVED for review June 12, 1972. Accepted November 10, 1972. Work performed under the auspices of the US.Atomic Energy Commission.

Microdetermination of Volat iIe 0rganics by Galvanic Coulometry Alphonso Anusieml and Paul A. Hersch Laboratory for Biophysical Chemistry, Department of Chemistry, Uniwrsity of Minnesota, Minneapolis, Minn. 55455

MOST ORGANIC GASES and vapors carried by a n inert gas stream can be determined by adding a constant proportion of oxygen to the stream, passing the stream through a hot tube for complete combustion, and determining the oxygen left over in the effluent. The galvanic-coulometric monitor for traces of oxygen described by one of us ( I , 2 ) can be put 1 present address, chemistry Department,University of Ibadan, Nigeria.

(1) P. A. Hersch in “Advances in Analytical Chemistry and Instrumentation,” C. N. Reilley, Ed., Vol. 111, Interscience, New York, N.Y., 1964, p 183.

(2) P. A. Hersch in “Lectures on Gas Chromatography 1966,” L. R , Mattick and H, A, Szymanski, Ed., plenum press, N~~ York, N.Y., 1967, p 149. 592

0

ANALYTICAL CHEMISTRY, VOL. 45, NO. 3, MARCH 1973

to use advantageously in this context. This contribution shall show how the principle can be extended to hydrocarbons and other slightly soluble species in an aqueous sample. The gas train we used comprised: (1) a cylinder of nitrogen, not necessarily highest purity grade; (2) a strong capillary flow restrictor as a flow stabilizer; (3) a gas wash bottle with water for humidification; (4) a n electrolytic source for oxygen; ( 5 ) a bubbler with a septum to receive sample from a syringe; (6) a combustion chamber housing a heatable refractory; ( 7 ) a galvanic-coulometric sensor for oxygen; and (8) a flowmeter (see Figure 1). The electrolyte in (4) is aqueous KOH (e.g. 5 N ) ; the anode is a thin wire of nickel held vertically and barely touching the surface of the electrolyte; the cathode is of cadmium-impregnated porous nickel carried by nickel screen, as used in alkaline storage batteries. The electrolyte is powered by