Classification of mass spectra using adaptive digital learning networks

Predicting phosphorus NMR shifts using neural networks. Geoffrey M. J. West. Journal of Chemical Information and Computer Sciences 1993 33 (4), 577-58...
0 downloads 7 Views 958KB Size
inactive carriers (100 Mg of each metal). The solution was heated to the appearance of SO3 fumes and diluted with water to 100 ml, giving [HI 0.1. This solution was first extracted with Bi(DDC)B and then with Zn(DDC)2. Thereafter, the aqueous phase was reduced by adding 0.5 g KI and 1.5 g ascorbic acid, heated for 15 min to 90 OC and, after cooling, extracted once more with Zn(DDC)2. All three extractions were made only once with an extraction time of 10 min and with 30 ml of reagent (1.7 X 10-3M in CHC13). The organic phases were drained and measured without washing. The resulting y-spectra are shown in Figures 2, 3a, and 3b. Extraction with Bi(DDC)3 (Figure 2) shows only activities attributed to isotopes of Au, Hg, and Cu, which is in perfect agreement with the data from Table I [Au3+ was not included in our study, but it is known to have a log K value larger than Hg2+ (14)]. Extraction with Zn(DDC)2 before reduction (Figure 3a) shows the expected activities g9Mo and I15Cd. IB7Wis also extracted; it was not included in our study because of its erratic behavior. 24Na results from a carry-over from the aqueous phase; its contribution is very low, the activity being the same order of magnitude as 40K which belongs to the counter background. 65Zn is only partially extracted (-6%) because the acidity is too high for its complete extraction. lzZSbis not expected in this extraction because S b is supposed to be in the 5+ state; however the quantity present in this fraction is only 1%of the total amount of lz2Sb in the sample. Extraction with Zn(DDC)2 after reduction (Figure 3b) shows the expected activities from 76Asand 122J24Sb,these ions having been reduced to their 3+ state. Very small other activities are 24Na and 42K (carry-over from the ~ IB7W(erratic behavior). aqueous phase), and 9 9 m Tand These examples show that extractions with MeDDC as reagents can give fractions with clear-cut separations from a very complicated sample with a minimum of chemical op-

-

erations and within a very short time. The method can be adopted to the special circumstances of a given sample, since the sequence followed above could easily be replaced by others; for instance, replacing the extraction with Zn(DDC)2 after the reduction of the sample by an extraction with In(DDC)S would yield a fraction containing only Sb, and a subsequent extraction with Zn(DDC)2, then one with only As. Another typical application is the determination of traces of metals in ancient silver coins by neutron activation analysis ( 1 5 ) . In this case, the signals from the impurities in the y-spectra of the entire sample are completely hidden by the large activities of IlornAg,Ig8Au,and 64Cu;extraction of the dissolved sample with Bi(DDC)3 removes these three activities completely from the aqueous phase and allows the interference-free determination of all other activities, which stay quantitatively in the aqueous phase.

LITERATURE CITED S. J. Joris, K. I. Aspila, and C. L. Chakrabarti, Anal. Chem., 41, 1441 (1969). ( 2 ) J. Ruzicka and J. Stary. "Substoichiometry in Radiochemical Analysis," Pergamon, Oxford, 1968. (3) A. Wyttenbach and S.Bajo, Anal. Chem., 47, 2 (1975). (4) A. Wyttenbach and S.Bajo. Helv. Chim. Acta, 56, 1198 (1973). (5) A. Elek, J. Bogancs. and E. Szabo, J. Radioanal. Chem., 4, 281 (1970). (6) A. Elek, J. Radioanal. Chem.. 16, 165 (1973). (7) A. €lek and E. Szabo. Proceedings International Solvent Extraction Conference (ISEC 71), Paper 167, The Hague, 19-23 April 1971: Society of Chemical Industry, London, 1971. (8) H. Bode and K.J. Tusche, 2.Anal. Chem., 157, 414 (1957). (9) R. Wickbold, 2. Anal. Chem., 152, 259 (1956). (10) G. Eckert, 2.Anal. Chem., 155, 23 (1957). (11) J. STARY AND K. Kratzer, Anal. Chim. Acta, 40, 93 (1968). (12) H. Shoji, H. Mabuchi, and N. Saito, Bull. Chem. SOC. Jpn, 47, 2502 (1974). (13) H. J. M. Bowen, J. Radioanal. Chem.. 19, 215 (1974). (14) F. Kukula and M. Simkova, J. Radioanal. Chem., 4, 271 (1970) (15) A. Schubiger and 0. Muller, private communication. (1)

RECEIVEDfor review February 19, 1975. Accepted May 19, 1975.

Classification of Mass Spectra Using Adaptive Digital Learning Networks T. J. Stonham and 1. Aleksander' The Electronics Laboratories, The University, Canterbury, Kent, England

M. Camp, W. 1.Pike,2 and M. A. Shaw Unilever Research, Port Sunlight, Wirral, Cheshire, England

Digital learning networks of adaptive logic elements have been applied to the problem of automatic routine identlflcatlon of mass spectral data according to the functional groups present. The technique, which is an embodiment of the n-tuple method of pattern recognition is not limited solely to the classification of linearly separable data, and offers a saving In computer time and storage requirements over the discriminant analysis approach. The structure and mode of operation of the learning nets is discussed, and results Present address, Department of Electrical Engineering, Brunel University, Kingston Lane, Uxbridge, Middlsex, England. Present address, Proprietary Perfumes Ltd., Kennington Road, Ashford, Kent, England.

are given for three classification experiments. Finally, the separabilities of the 28 groups employed In the multicategory classification are considered, thereby enabling a comparison to be made between the digital learning net approach and the spectroscopist's Interpretation.

The interpretation of scientific data and, in particular, chemical data, has traditionally been based on theoretical analysis, and involves the detection of explicit relationships developed from previous experimentation and logically constructed models based on one's knowledge of chemistry. The advent of the computer made possible the development of large libraries of classified data such that interpre-

ANALYTICAL CHEMISTRY, VOL. 47, NO. 11, SEPTEMBER 1975

1817

+-

we.:

tearnmp

\Ct

a i 1:*

I

Flgure 1. Correspondence of principal terminals of SLAM and RAM elements (the terminals of the RAM are shown in parentheses)

tation of “unknown” data is reduced to a library search, and successful recognition requires that the data are unique and have previously been classified. In recent years, however, much interest has been paid to empirical methods of data interpretation where it is assumed that a relationship exists between the data and its defined classification, and pattern recognition techniques have been applied to the interpretation of these data. Much work has been carried out on applying learning machine methods as described by Nilsson (1) to spectral identification, particularly to the identification of mass spectra (2, 3 ) . A limitation to the learning machine approach is that the data must be linearly separable and it has been shown ( 4 ) that spectral data exhibit linear inseparability to a considerable extent. A pattern classifier using piecewise-linear discriminant functions, which gives improved performance with mass spectra has been described ( 4 ) . However, this is achieved only a t the expense of computing time, as the complexity of the training and testing procedures is increased. This paper reports on the application of digital learning networks ( 5 ) (DLN) to the interpretation of mass spectral data. This approach is less dependent on linear separability of data than other methods of classification, since the functions a DLN can perform are dependent on the network topology, which can easily be modified or changed. Although a DLN in a particular configuration cannot perform all possible linearly inseparable functions, interleaved areas of recognition in pattern space are permitted. The DLN approach can be implemented in hardware, using commercially available memory elements, or simulated on a digital computer. Despite having to carry out a parallel processing operation in a serial manner in the computer simulation, the latter still requires less computer time and storage than other learning methods based on discriminant analysis.

DIGITAL LEARNING NETWORKS Based on the Bledsoe and Browning (6) n-tuple method of pattern recognition, the digital learning network approach involves the computation of joint probabilities of occurrence of randomly chosen subsets of binary pattern elements derived from the mass spectra (7). The inherent generalization ability in the technique enables both recognition and prediction to be carried out. DLNs were first introduced in 1968 (8) and comprise an interconnection of adaptive logic units. An adaptive logic unit is essentially a memory element and, a t the outset of this work, it was taken for granted that memory elements would become available in integrated circuit form. Initially, no such device was available, and a unit called the SLAM (Stored Logic Adaptive Microcircuit) was developed ( 9 ) . However, this device has now been superseded by the commercially available Random Access Memory (RAM) which can now be regarded as the basic element of a DLN. The principal terminals of a memory element (Figure 1) 1818

* ANALYTICAL CHEMISTRY, VOL.

Figure 2. A single-layer digital learning network

are the inputs (addresses in a RAM) and the output (dataout terminal) and the latter performs a function of the former. In the case of a 4-input memory element, the device performs the following Boolean function

f = f l f 2 i 3 f 4 4 0 + 3 1 % 2 i 3 x 4 41 + . . . + x l x 2 x 3 x 4 4 1 5 where XI, x p , x3 and x 4 are the binary inputs to the elements, 4 = (do,&, . . ., 415) is a binary vector representing the store contents of the memory element, is the logic operation OR and i (NOT x ) is the INVERT operation requiring x = 0. The inputs to the memory elements must be binary; therefore, in any pattern recognition application, the data (in this case the mass spectra) must undergo some preprocessing to produce binary patterns of fixed format. In practice, the memory elements are used in networks. Figure 2 shows a single-layer network of 128 elements. The system is “trained” by the application of a pattern sample to the inputs of each memory element and the transmission of the desired output to the teach (data in) terminal. A signal at the teach clock (read/write) terminal enables the teach information to be stored in the memory element. When the response of an element is sought, the teach clock is not operated and an input address accesses the appropriate store location giving a data output. The value of n (determining the n-tuple sampling of the patterns) used in these investigations was 4, Le., the pattern samples are 4-tuples. In computer simulations of DLNs, n can be varied,-an advantage over a hardware classifier where n must be fixed-however, RAMS with up to 12-bit addresses are currently available. Apart from the physical problem of increasing storage with n (the storage capacity of each memory element is 2n bits) the generalization ability of the net and training requirements is also related to the pattern sample size. For very small values of n, there is a tendency to overgeneralize, although the network performs well with small training data sets; while for large values of n, less generalization occurs but the training sets must be representative of their class to a greater extent. Investigations into the effect of the sample size on the recognition performance have shown that n has an optimum value (IO).While it is not claimed that n = 4 is optimal for mass spectral recognition, it can be seen that this value gives satisfactory performance. If n is taken to its limits, the n-tuple method becomes analogous to other pattern recognition techniques. In the case of n = 1, one is performing template matching where the template is a superimposition of all the training patterns, and for n equal to the total number of bits in the patterns, the method becomes a form of library search. In the experiments to be described, the input patterns derived from mass spectra are 256 bits in size and the connection data comprise a 2 to 1 mapping involving 512 connections to the input space. Initially, the mapping is ran-

47, NO. 11, SEPTEMBER 1975

+

dom because, in the first instance, no preferential mapping is assumed. There are in excess of 101ooo different mappings and, therefore, exhaustive comparisons are impossible.

PREPROCESSING OF MASS SPECTRA Since the data input to the learning networks must be binary, some preprocessing of the mass spectral data is necessary. A simple direct coding of mass spectra involves applying a threshold to the intensity a t each integer mass value, with pattern origin (defined as the top left-hand corner of a pattern) corresponding to zero mass point. The pattern is then made up to a standard size (256 bits in this case) by inserting zeros after the intensity for the highest mass value if this is less than 256. The spectrum must be truncated if mass values greater than 256 are encountered. With this method of coding, emphasis will be placed on the characteristic fragment ions which are independent of the molecular weight of the compound (e.g., mle 74 in the mass spectra of fatty acid methyl esters), since they remain a t a fixed point on the net. An alternative coding is to apply the intensity threshold in the same way but make the pattern origin the molecular weight of the compound. Thus, emphasis will be placed on the characteristic neutral losses (e.g., M - 31 in fatty acid methyl esters) since the appropriate fragment ions will now occur a t the same point in the input pattern to the nets. One drawback of a direct coding with intensity threshold is that the intensity information is considerably reduced. In order to give greater emphasis to the intensity data, the two following methods were employed. The binary pattern was divided into sixteen 16-bit words, each word consisting of two parts giving information as to the amplitude and position of a peak within the spectrum. In this third method, the most significant peaks can be selected from the spectrum (up to 16 peaks) and more intensity information can be incorporated into the patterns, The numerical information is binary Gray-coded and this has an advantage over the simple binary codes insofar as the Hamming distance between consecutive numbers is always 1; and as the binary structures of the numbers are regarded by the networks as patterns, peaks of similar dimensions will give rise to similar patterns. (The Hamming distance is numerically equal to the number of differing bits in two binary patterns.) In the fourth method of coding, the reduced mass spectrum (ion series) was used. This is a standard form of representation for spectral data when employing file or library search routines ( 1 1 ) . In these calculations, m/e values less than 26 and mle equal to 28, 32, 40, and 44 are excluded because, especially in GC-MS work, intensities a t these mass numbers are very much affected by instrumental background. The lists start with m/e = 29 series and ends with m/e = 42 series. The reduced spectra are calculated by performing the ion series summations using the equation where m = 1, . . ., 14; n = 0, 1, 2, . . ., covering the whole spectrum; I, is the relative intensity a t mass j ; and S, is % contribution of ion series m to the total ion intensity. This coding is particularly appropriate to classes of compounds where CH2 is the basic repeating unit. For example, series of compounds of the form R1(CHZ),-RZ give rise to similar reduced spectra. Gray-coding of the numerical data was again employed.

NATURE OF THE CLASSIFICATION In the DLN approach to data interpretation, it has been the aim to classify mass spectra according to the functional groups present since this is often the first objective of the

Flgure 3.

Piecewise-linearpartition of pattern space

spectroscopist when faced with the plethora of data resulting from a GC-MS analysis. Pattern classification with DLNs can be divided into two main operations: (i) the training phase and (ii) the response phase. In the training phase, reference is made to a known data set, in order to set up the logic which will enable the classification to be effected in the response phase. It has been stated that each memory element of a DLN can be taught to associate information (0 or 1) with the ntuple of pattern it sees at its inputs. The teaching involves the storing of this information within the memory element at a location addressed by the n-tuple. Taking the learning net as a whole, the teach information input during training and, subsequently, output during the response phase can be regarded either as a pattern vector or interpreted numerically, according to the mode of training. In the former case, feedback can be incorporated into the system by use of a suitable mapping between the output and the input of the network. It was felt desirable, however, to maintain the learning system in as simple a form as possible while investigating the applicability of DLNs to the spectral problem. Thus, the following mode of operation was adopted. A learning net is trained on a specific class of mass spectra by teaching each element of the network to output a 1 for all pattern n-tuples encountered in the training. (Initially, all stores are set to zero.) The connection mapping remains fixed throughout the classification, therefore, the locations on the binary patterns (derived from the mass spectra) sampled by each element do not change. In the case of n equal to 4, there are 16 possible 4-tuples which can be seen by each element, (0000-1111). If a data set is being used in which the existence of some characteristic features is postulated, one would expect a limited set of n-tuples to occur at the input of each element during training (12). If there is no common characteristic within the data, there is equal probability of each possible n-tuple occurring, and all the stores will eventually be set to 1 if a sufficient number of training patterns are available. In the response phase, the pattern to be classified is input to the learning net in an identical manner and the stored data in the memory elements are now accessed. Therefore, if the n-tuples input to each element, address locations which have previously been addressed at any time during the training phase, a 1 is output, otherwise the element output is 0. A measure of the response of the whole net to a pattern is obtained by an arithmetical summation of the outputs of all the elements in the net. The aim is to obtain a strong (i.e., numerically large) response from a trained network, for patterns to be classified with the training set to the exclusion of all other patterns. The training of a digital learning net as outlined above is a straightforward process compared with the determination of an optimum hyperplane in the discriminant analysis approach (2) since the response of the net to previous training patterns is not affected as training progresses. It is sim-

ANALYTICAL CHEMISTRY, VOL. 47, NO, 11, SEPTEMBER 1975

1819

+

Table 11. Table of Responses for Function f = Fy zi Implemented with 2 DLNs Using a Majority Decision X

0 0 1 1

Y

E

f

x

4'

0 0 0 1

0 1 0 0

1 1 1 1

0 0 1 1

1 1 0 1

E

f

1 0 1 1

0 0 0 0

ply required that each training pattern is processed sequentially. A digital learning net can be trained on any set of binary patterns. The subsequent response to input patterns is determined by the generalization properties of the net, based on the training set, therefore meaningful results will ensue only if training data are used in which there is some common property or characteristic information. Thus, the method is based on the assumption that characteristic information of chemical class exists, perhaps somewhat redundantly, in a mass spectrum and is reflected throughout the whole spectrum.

Response to

Pattern

f=ag+x2

This is a linearly inseparable function which cannot be performed with a single discriminant surface. Two hyperplanes are required to partition pattern hyperspace, as shown in Figure 3. T o perform this function with a digital learning net, the values of the binary variables of the Boolean function are represented as two pattern classes, namely, those which make f = 0 and those which make f = 1 (Table I). 1020

Majority

f = O

trained net

000 00 1 010 011 100 101 110 111

trained net

decision

1 2 3 3

1 1 0 0 1 0 1 0

3 3 2 1 3 2 3 1

1 3 2 3

Let the digital learning net comprise 3 X 2 - input memory elements and the connection mapping be xy, y z , xz. On training the net to output 1's for the f = 1 class the store contents would be:

DECISION CRITERION In order to complete the classification, some decision operation must be applied to the response obtained at the output of each learning net. A threshold can be set whereby all patterns exceeding the threshold are classified with the training group, thereby effecting a dichotomy of the data. The optimum value of this threshold cannot be easily determined as it varies with input patterns threshold and amount of training. The latter is directly related to the size of the training data set, and the extent to which it is representative of its pattern class. A majority decision is, therefore, employed whereby one DLN is required for each chemical category and the classification is made according to which net a spectral pattern gives the greatest response. The memory elements of a digital learning network are universal logic units where any desired function of the inputs and output can be set up during training. If a threshold decision is employed at the output of the net, a linear decision in n-tuple pattern space is performed, where each dimension represents n-dimensions in the input pattern space. In this case, the linearly inseparable functions which can be performed are determined by the n-tuple sampling of the original pattern. If, however, a majority decision is taken in conjunction with the response of other trained networks, the classification is then dependent on the frequency of occurrence of common n-tuples of pattern in the training and response data and the pattern hyperspace is partitioned according to the proximity of points to specific areas, determined by the training data. A decision of this kind is not limited to linear separability. The ability of a DLN to perform linear inseparable functions is illustrated by way of the following example which also demonstrates the principal operations of a DLN: Consider the following Boolean function (A group of digitized mass spectra can be expressed as a Boolean function where each variable represents a mass value in the original spectrum):

Response to

f = 1

Memory element

.Mapping

Store contents

1 2 3

Xz'

?'z

1101 0111 0111

XZ

On testing, the response for the f = 0 class would be: Pattern

Response

011 010 101 111

The training patterns (f = 1 class) give the maximum response of 3; therefore, an output threshold of 2 would enable the function to be performed. In this example, the n tuple pattern space is linearly separable. However, a majority decision can be demonstrated by training a second network on the f = 0 class. The responses are summarized in Table 11, and it can be seen that the majority decision produces the desired function. In the above example, the desired performance can be achieved by either a majority decision or applying an output threshold which, in this case, can be readily determined. In a practical situation, the complexity of the problem is increased by several orders of magnitude (the terms of the Boolean function representing the group of digitized mass spectra comprise 256 variables as opposed to 3 in the example). IJnder these circumstances one would always use a majority decision. CLASSIFICATION OF MASS SPECTRAL DATA The performance of a system of digital learning nets applied to the classification of mass spectral data was assessed in three experiments. The mass spectral data were taken from the Mass Spectrometry Data Centre (MSDC) library of standard spectra. 20 Ketone D a t a Set. Initially, a limited data set of 10 aromatic and 10 aliphatic ketones was used and it was required to classify the data into aromatic and aliphatic groups. This experiment was a feasibility study to assess the applicability of the method. The learning system comprised initially 2 single-layer networks each having 128 four-input memory elements. The spectra were preprocessed into 256 bit patterns and the same random 2 to 1 connection mapping was used for each network. One net was trained on aliphatic ketones and the other on aromatic ketones and a majority decision was made on the responses. The results for input thresholds of 1,2, 3, and 4% of maximum intensity, with direct coding of the spectrum and pat-

ANALYTICAL CHEMISTRY, VOL. 47. NO. 11, SEPTEMBER 1975

>

I

8 I/

1

d

I

LL

Figure 6. Fall-off in classification with increasing threshold Figure 4. Histograms of classification vs. training, using "majority decision" criterion. Coding, full spectrum with threshold. Origin at m/e = 0

'3

Ha. mf Tr.lnlni

hffemr

Figure 7. Variation in classification of "Ketone 100" with differing Random Connection Mappings Figure 5. Histograms of numbers of errors vs. training, for various

Summary of results using 5 different connection mappings. Threshold = 0. Coding, Full spectrum with origin at m/e = 0

preprocessing techniques using the "majority decision" criterion ( a ) Full spectrum coding with threshold at 0 and origin at m/e = 0. ( b ) Full spectrum coding with threshold at 0 and origin at m/e = m.w. (c)Intensity and position of 12 most significant peaks. Gray-coded. (a Ion spectrum. Gray-coded

tern origin a t mle = 0 are given in Figure 4. The data set can be correctly partitioned without being trained on every pattern of each group. The system appears to be able to correctly classify (a) those patterns which it has seen during training (recognition), and (b) those patterns in the data set which were not used for training (prediction). One sees that, on training with the first four patterns of each group, all 20 spectra can be correctly classified, 12 of which have not been utilized during the training procedure. Some experiments were carried out to gain some insight into the effects of system parameters on the performance of the system. Figure 5 summarizes the occurrence of classification errors, using the preprocessing techniques discussed in the section on digital learning networks, and the fall-off in classification with increasing input threshold is shown in Figure 6. The preliminary inferences of these experiments are that the direct coding of the spectra, with pattern origin at mle = 0 gives a better performance than the other methods examined, and that low input thresholds (approximately 1% base peak) are more desirable for an initial classification. A 1%threshold was found to be optimal in other identification routines (13). 100 Ketone D a t a Set. The data set was expanded to 100 spectra (58 aromatic and 42 aliphatic ketones) and, under the above conditions, it was found that successful classifications of 96 out of 100 could be obtained with 3 training patterns from each group ( 7 ) and 100% classification was achieved after training on 30 patterns. These results support the premise that spectra falling into well-defined chemical groups exhibit characteristic information which exists throughout the spectrum and can be detected by

Table 111. The 28 Chemical Groups Which Constitute the D a t a Set so. of Index No.

1 2 3 4 5 6

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Name of group

Methyl esters Methyl ketones Carboxylic acids Ethyl e s t e r s Higher esters, 12 = 4, 5, 6 Normal alcohols Aldehydes Higher ketones Secondary alcohols Substituted alcohols Diesters Substituted keto acids 1- Phenyl alkanes Terpenes n-Phenyl alkanes ( 7 2 + 1) Aliphatic amines Mercaptans Sulfides Straight-chain alkenes Alkanes Nitriles Alkynes Substituted pyrazines Substituted phenols Furans Pyrroles Thiophenes Aromatic e s t e r s

spectra

29 11 11 12 13 33 8 10 29 14 14 8 10 18 12 22 13 12 14 34

7 24 6 19 8 9 27 13

random n-tuple sampling. In order to illustrate that the behavior of the networks is not due to a favorable connection mapping, four other random mappings were used and the

ANALYTICAL CHEMISTRY, VOL. 47, NO. 11. SEPTEMBER 1975

1821

Table IV. Classification of 28 Groups Using Optimum Training Sequence to Give Training Group Response 2 120. Classification 409 out of 440 Lo, of SO.

Of

s o . of

Errors

So. o i

COlTCCt

Croup

iraining

classi-

Index S o .

patterns

fications

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

9 5 9 9 9 7 5 6 6 6 11 7 5 5 6 5 6 7 4 6 4 6 3 7 5 5 9 7

27 6 11 12 13 29 8 9 28 11 14

a 10 18 12 22 12 12 14 23 6 24 6 19 7 9 27 13

Number

2 5

Group

training

clasri-

qroup

Index No.

pattcms

iications

Sumber

group

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

27 10 11 12 13 25 8 10 18 13 14 8 9 12 11 16 12 11 12 20 7 13 5 15

29 11 11 12 13 31 8 10 29 14 14 8 10 18 12 22 13 12 14 34 7 24 6 19 8 9 27 13

... ... ... ... ...

...

4 8

..

..

1 4

8 9

..

..

1 1 3

9 10 9

.. ..

..

.. .. .. .. .. ..

1

18

..

.. .. ..

.. ..

11 1

..

.. .. 8 22

..

.. ..

1

22

.. .. * .

.. ..

EITOK

correct

Pissigned

.. .. .. ..

limits of the classification for the five connection maps are shown in Figure 7 . I t can be inferred that the results are representative of the performance in conjunction with any random mapping. However, the system can be improved by judicious modifications to the connection mappings, which reduce the redundant processing in the networks (14). The optimization reduces the random aspect of the connection maps. Multicategory Classification. The final data set comprised 440 spectra belonging to the 28 chemical groups listed in Table 111. A parallel hardware system for this classification would require 28 DLNs. However, in the computer simulation, a serial model was used whereby a single net was trained on each group individually and the response of the whole data set obtained and stored. The decision operation was then implemented on the stored response data. Classification results are given for 40 and 80% training (Tables IV and V). In the latter case, the optimum training set (12) for each group was employed to give training group responses of 128. This is the maximum response for the size of net used. The optimum training sequences ensure that the training data are fully representative of the chemical group defined by the available experimental data and allows a minimum response level to be specified for all members of the training group. It can be seen from Table V that the only confusion with 80% training arises between primary and secondary alcohols (groups 6 and 9) where 2 spectra give maximum response to both group 6 and group 9 trained nets. In order to reduce errors, any pattern which gives a maximum response to more than one net can remain unclassified. However, in this classification, it was 1822

Table V. Classification of 28 Groups Using Optimum Training Sequence to Give Training Group Response of 128. Classification 438 out of 440

a 8 20 12

Assigned

2

...

... ... ...

...

... ... *.. *.. ... ... ...

... ... ... ... ... ... ... ...

...

*..

...

... ... ... 9

...

...

...

...

...

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

found that by reducing the amount of training of group 9, the two spectra in question could be correctly classified, giving zero error over the whole data set. SEPARABILITY O F CHEMICAL CLASSES BY DIGITAL LEARNING N E T CLASSIFICATION O F MASS SPECTRA I t is obvious that the DLN approach to the identification of mass spectra is fundamentally different from the spectroscopist’s interpretive methods. The former method has been employed to achieve a basic classification while the latter is a highly developed technique enabling precise identifications to be made ( 1 5 ) . A measure of the reliability of a classification with DLNs can be inferred from the difference between the mean response of the spectra to be classified with the training group and those of other groups. As each mean response has a distribution of individual responses associated with it, the probability of a misclassification decreases with increasing separation between different groups. Data have been compiled (16) on the separabilities of the 28 groups in the classification experiment summarized in Table V, where the range of responses encountered in the “no” classification (Le., those patterns to be classified as not belonging to the training group), can vary from 20-128. The responses obtained from a trained net depend on the Hamming distance of the test patterns from the training set ( 1 7 ) and, for each net, the response groups can be placed in an order of similarity with respect to the training group. For purposes of comparison, the following mean response levels were chosen to assess group similarities: (i) response

ANALYTICAL CHEMISTRY, VOL. 47, NO. 11, SEPTEMBER 1975

Table VI. Chemical Group Similarities Obtained by Classification of Mass Spectra by a System of DLNs Mean responses

Training groups

1.

2.

3. 4. 5. 6.

7.

8. 9.

10. 11. 12. 13. 14. 15.

16. 17. 18. 19. 20 * 21. 22. 23. 24. 25. 26.

27. 28.

Tertiary alcohols ( 1 1 4 ) , Secondary alcohols ( 1 1 2 ) . Higher ketones ( l l l ) , Aliphatic amines (111) Alkanes (120): Higher ketones (118), Secondary alcohols (116). Amines (116) Me thy1 ketones Higher e s t e r s (118), Methyl e s t e r s (115) Carboxylic acids Alkanes (120), Methyl e s t e r s (114), Higher e s t e r s (112) Ethyi e s t e r s Ethyl e s t e r s (119), Secondary alcohols (118), Amines (118), Higher ketones (116), Higher e s t e r s Alkanes (115), M?thyl e s t e r s (112), Methyl ketones (112) Secondary alcohols (123), Tertiary alcohols (118), Aldehydes (118). Alkenes (117). Primary alcohols M-thy1 ketones (115), Amines (113), Alkanes (113), Higher ketones 1113) Primary alcohols (112). Secondary alcohols (111) Aldehydes Amines (119). Alkanes (119), Secondary alcohols (117). Tertiary alcohols 1113). Higher ketones Nitriles (113). Methyl ketones (113) Methyl ketones ( l l l ) , Higher ketones (112), Tertiary alcohols (121), Amines 1118), Secondary alcohols Alkanes (114) Secondary alcohols (122), Amines (117). Alkanes (111) Tertiary alcohols Methyl e s t e r s (112), Methyl ketones ( 1 1 4 ) , Aldehydes ( l l j ) , Higher ketones (116). Diesters Secondary alcohols (114), Tertiary alcohols (117) Well resolved Substituted keto- acids 1-Phenyl alkanes Well resolved Well resolved Terpene s ??-Phenyl alkanes ( 1 2 I 1) 1-Phenyl alkanes (115) Higher ketones (112), Secondary alcohols (119). Tertiary alcohols (117). Alkanes Aliphatic anlines (112). Nitriles (111) Mer c apt ans Sulfides (112) Sulfides Mercaptans (120) Alkanes (113), Nitriles (112) Alkenes Alkanes Methyl ketones 1113), Higher ketones (113). Secondary alcohols (111). Am'nes ( 1 1 2 ) , Alkenes (117), Nitriles (113) Alkenes (lll), Alkynes ( 1 1 2 ) , Pyrazines (ill), Furans (113) Nitriles Alkynes Alkenes (117), Nitriles ( 1 2 1 ) , Substituted pyrazines (120) Pyrazines Well resolved Substituted phenols Terpenes (117), Pyrroles (112) Nitriles (120), Alkynes (117), Substituted pyrazines (119), Pyrroles (117) Furans Pyrroles Alkynes (112), Substituted pyrazines (118). Furans (111) Well resolved Thiophenes Substituted phenols (113) Aromatic e s t e r s Methyl e s t e r s

greater than 120 indicates a high degree of similarity with the training group and (ii) responses of 110-120 show a lesser degree of similarity. Groups having response means in the above ranges are listed in Table VI. Groups having a response mean of less than 110 are easily distinguished from the training group and are not considered here. The mean response is shown in parentheses. In the case where the training group is specified as being well resolved, the response means of all the groups, excluding the training group, were less than 110. The training group response mean was always 128. One of the salient features of the results summarized in Table VI is that a similarity is revealed between classes with a common functional group. This can be seen in particular for the esters, alcohols, and ketones. Thus, the possibility of redefining the classification groups arises, thereby developing a multi-level classifying scheme. T h e preliminary classification can group together classes where common peak positions prevail, e.g., Sulfides and Mercaptans. These preliminary groups can then be separated in a secondary classification where optimization and varying input thresholds can be employed. Limiting the preliminary classification overcomes a practical constraint, for single-layer multicategory classifiers cannot be expanded indefinitely, despite the fact that no apparent fall-off in performance occurred with the 28 categories. A preliminary classification would allow further investigations to be carried out t o determine structural information without encountering the difficulties which can

arise due to the existence of isomers, when tests pertinent t o specific chemical groups are used.

CONCLUSIONS It has been clearly demonstrated that DLNs can be used to classify compounds with a high rate of success from their mass spectra, even when a compound is a complete unknown. However, the best predictive ability is obtained by using all known spectra as the training group. Work in progress concerns the extension of the technique t o larger sets of mass spectral data and to the classification of infrared spectra. Indeed, recent results (18) have been obtained with a data set comprising 42 chemical groups and recognition figures in excess of 99% have been obtained. ACKNOWLEDGMENT The authors wish to thank the Directors of Unilever Ltd. for permission to publish this work. LITERATURE CITED (1) N. J. Nilsson, "Learning Machines," McGraw-Hill. New York, 1965. (2) T. L. lsenhour and P. C. Jurs. Anal. Chem., 41, 21 (1969) (and references cited therein). (3) A. G. Baker, M. Camp, E. Huntington, W. T.Pike, and M. A. Shaw, "Recent Analytical Developments in the Petroleum Industry," Institute of Petroleum, 1974. (4) N. M. Frew, Ph.D. Thesis, University of Washington, Seattle, Wash., 1971. (5) I. Aleksander, "Microcircuit Learning Computers," Mills & Boon, London, 1971. (6) W. V. Bledsoe and I. Browning, "Pattern Recognition and Reading by Machine," Proc. Eastern Joint Computer Conf., 225 (1959).

ANALYTICAL CHEMISTRY, VOL. 47,

NO. 11,

SEPTEMBER 1975

1823

(7) T. J. Stonham, I. Aleksander, M. Camp, M. A. Shaw, and W. T. Pike, Electron. Len., 9, 391 (1973). (8) I. Aleksander and R. C. Albrow, Cornput. J., 11, 65 (1968). (9) R. C. Albrow, Electron. Cornrnun., 3, 6 (1967). (10) J. R. Ullmann, /€€€Trans. Cornput., 18, 1135 (1969). (1 1) L. R. Crawford and J. D. Morrison, Anal. Chern., 40, 1469 (1968). (12) T.J. Stonham and M. A. Shaw, "Pattern Recognition," in press. (13) S. L. Grotch, Anal. Chern., 42, 1214 (1970). (14) T. J. Stonham, I. Aleksander, and M. A. Shaw, Electron. Lett., 10, 301 (1974). (15) F. w. McLafferty, "Interpretation of Mass Spectra," w. A. Benjamin, Reading, Mass., 1973.

T. J.

Stonham, Internal Research Report, University of Kent, March 1974. (17) I. Aleksander, Necbon. Lett.,6, 134 (1970). (18) T.J. Stonham, Internal Research Report, University of Kent, October 1974.

(16)

RECEIVEDfor review March 28, 1975. Accepted May 19, 1975. The financial contribution to the support of this work made by the United Kingdom Science Research Council is acknowledged.

Principal Component Analysis: An Alternative to "Referee" Methods in Method Comparison Studies R. Neil1 Carey, Svante Wold, and James 0. Westgard Depatfments of Pathology, Medicine, and Statistics, University of Wisconsin, Madison, Wis. 53706

This report describes the use of several analytical methods to provide a slngle composite reference analyte value for split-sample method comparison studies. The statistical techniques necessary to analyze the data are slightly more complicated than least-squares linear regression. A potentially appropriate technique is principal component analysis (PCA) using two components-the first measures analyte concentration, and the second measures aggregate interaction (errors). We applied PCA to glucose measurements performed on 130 blood sera by six glucose methods. Two methods, which have been shown to have large interferences, deviated significantly from composite values, although they were permitted to influence them. These experimental conclusions were in fair agreement with those obtained using modeled data that had been constructed to simulate known specific experimental conditions, i.e., interferences and imprecision.

The analytical chemist in a service laboratory frequently wishes to introduce new methods which offer advantages in speed, cost, technical ease, sample size, etc. Before replacing an existing method, however, he must evaluate the new method to decide whether its performance under actual laboratory conditions is acceptable. In such a method evaluation study, experiments are performed to estimate the analytic errors of the method. These errors are generally described in terms of precision and accuracy, which refer, respectively, to random and systematic analytic errors. Random analytic error can be estimated from simple experiments where a number of replicate analyses are performed. Systematic analytic error is more difficult to estimate; however, recovery and interference experiments are generally used for this purpose. Interference experiments, which are perhaps more definitive and comprehensive, are performed by adding a pure solution of the suspected interfering material to the native specimens. Although interference studies can be definitive, a comprehensive study may require a large amount of experimental work. For example, in clinical chemical determinations, every compound normally present in body fluids is a possible interference and should be tested. In addition, all drugs that a patient receives are potential interferences. Young et al. ( 1 ) in a recent bibliography record over 200 substances which affect glucose measurements. The experi1824

ANALYTICAL CHEMISTRY, VOL. 47,

NO. 11,

mental determination of interferences in a proposed new method could therefore be extremely demanding. Furthermore, such studies may still be inadequate since metabolites of drugs may be the real interfering materials, rather than the drugs themselves. Another approach to studying accuracy is to analyze a group of native specimens by the new method and by an error-free (referee) method. The results of this split-sample comparison study are analyzed statistically, and systematic differences are assumed to be errors in the new method. Although this split-sample approach is generally more efficient than a comprehensive interference study, it has some drawbacks. First, it can only indicate the presence of interferences; it cannot identify the interfering compound. This is not so serious, since the evaluation study need only estimate the analytic error and need not assign a cause for the error. Only a quantitative estimate of the errors is needed to judge the method's acceptability, which is the purpose in method evaluation. Identification of the causes of errors is needed in order to modify the method and reduce the magnitude of errors. Second, for many of the compounds that are measured in the clinical chemistry laboratory, there are no reference quality methods. Without having a referee method, the observed differences cannot be definitively assigned to either method as errors. In this report, we describe a study of an alternative approach which does not require a single reference method, but makes use of the results from a battery of well-accepted methods to provide a composite reference value (see below). Hopefully, the errors of the individual methods compensate each other when this technique is used, and the composite reference values are closer to the true values of the analyte concentrations. This approach requires somewhat more complicated statistical methods for data analysis. A potentially appropriate technique is principal component analysis. We have chosen to apply principal component analysis to the data from a recent comparison of six different automated glucose methods (2). The analysis of glucose has been chosen as a model system for this investigation because glucose methods are among the most widely studied of those in clinical chemistry.

REVIEW OF PRINCIPAL COMPONENT ANALYSIS The model for principal component analysis (PCA) has been described previously in the chemical literature (3-5). This model assumes that the dependent observed variable

SEPTEMBER 1975