Hadamard transform and spectral analysis by pattern recognition

Nearest neighbor rule in weighting measurements for pattern recognition ... Mass spectral pattern recognition via techniques of mathematical programmi...
2 downloads 0 Views 578KB Size
The Hadamard Transform and Spectral Analysis by Pattern Recognition B. R. Kowalskil Department of Chemistry, Colorado State University, Fort Collins, Colo. 80527

C. F. Bender General Chemistry Division, Lawrence Livermore Laboratory, Livermore, Calif. 94550

Walsh functions are periodic square waves and form a complete orthogonal set. They are analogous to Fourier harmonics in that they resemble infinitely clipped sine and cosine functions. The Hadamard transform, which can be based on Walsh functions, is introduced as an alternate representation of spectral information. To study the usefulness of the Hadamard transform as a preprocessing method prior to analysis of mass spectral data by pattern recognition, mass spectra are transformed to the sequency ordered Hadamard domain and then converted to Hadamard energy spectra. A definite improvement is noted. Also, other possible uses of the Hadamard transform to spectral analysis are discussed.

Spectral analysis ( I ) is a subject that concerns scientists and engineers in many disciplines. An engineer, for example, may want to study the output of a turbo-generator using spectral analysis to establish its transfer function. A medical researcher may use similar spectral analysis methods to detect significant signals in an electrocardiogram. Several tools useful in the chemical sciences generate spectra as output. The spectral domain can be energy, time, distance, or several other measurable quantities. As emphasis in analytical chemistry has shifted to instrumentation, chemists have turned to applied mathematics and engineering for the necessary methods included in spectral analysis. Fourier analysis ( 2 ) has received a considerable amount of attention in the recent chemical literature. Horlick (3) has shown how the Fourier transform can be used for spectral smoothing, differentiation, resolution enhancement, and other important spectral analysis tasks. The Fast Fourier Transform ( 4 ) (FFT) has provided a quantum jump in the usefulness of these techniques. In addition, advanced computer engineering and the. FFT have combined to revolutionize infrared spectrometry ( 5 ) and nuclear magnetic resonance spectrgmetry (6). In these cases, the spectral information is measured in the time domain and then transformed uia the FFT to the frequency domain. The process can lead to orders of magnitude increases in sensitivity uia the multiplexing advantage and spectral averaging. 1 Present address, D e p a r t m e n t of Chemistry, U n i v e r s i t y of Washington, Seattle, Wash. 98195

(1) G. M. Jenkins and D. G. Watts, "Spectral Analysis and Its Applications," Holden-Day, San Francisco, Calif., 1969. (2) R. Bracewell, "The Fourier Transform and Its Applications," McGraw-Hill, New York, N.Y., 1965. (3) G. Horlick, Anal. Chern.. 44, 943 (1972). (4) G-AE Subcommittee on Measurement Concepts, "What is the Fast Fourier Transform?", / E € € Trans. Audio Electroacoustics. AU-15 (Z), 45 (1967). (5) M. J. D.Low, Ana/. Chern.. 41 (e), 97A (1969). (6) T . C. Farrar. Anal. Chem.. 42 (4), 109A (1970)

2234

In 1969, Jurs et al. (7) used the linear learning machine (8) to determine molecular formulas directly from low resolution mass spectra. Since then, infrared spectra (9), NMR spectra ( I O ) , gamma-ray spectra ( I I ) , and polarographic waveforms (12) have been analyzed by the classification methods of pattern recognition. Included in these studies were preprocessing steps that transformed the original spectral information representation to another domain and in some cases improved the classification results. Jurs (23) and Wangen et al. (14) have used the Fourier transform for the analysis of mass spectra and Kowalski and Reilly ( I O ) improved NMR spectral analysis by employing the autocorrelation transform before analysis by pattern recognition. Several other pattern recognition methods have also been reported in the chemical literature (25). This paper introduces the Hadamard Transform (16-19) as an alternative preprocessing step before spectral analysis by pattern recognition. Mass spectral data are used for demonstration; however, continued research in this area will no doubt broaden the applications in chemistry. The Hadamard transform is based on discrete Walsh functions which are analogous to the harmonic sines and cosines of Fourier analysis in many ways. The main difference lies in the discrete nature of the Walsh functions as opposed to the continuous nature of trigonometric basis functions.

THE HADAMARD DOMAIN Figure 1 shows the first five functions in each of the Fourier series and the Walsh series. The analogy is quite apparent. The Walsh functions look like infinitely clipped Fourier functions. The Fourier functions are continuous sines and cosines, and the Walsh functions are squarewave functions forming a complete orthogonal set and sampled a t discrete intervals. Now, a Hadamard matrix is P. C. Jurs, B. R. Kowalski, and T. L. Isenhour, Anal. Chern.. 41, 21 (1969). N. J. Nilsson, "Learning Machines," McGraw-Hill, New York, N.Y., 1965. B. R. Kowalski, P. C. Jurs, T. L. Isenhour and C. N. Reilley, Anal. Chem., 41, 1945 (1969). B. R. Kowalski and C. A. Reilly, J . Phys. Chern., 75, 1402 (1971). L. E. Wangen, N. H. Frew, and T . L. Isenhour, Anal. Chern.. 43, 845 (1971). L. B. Sybrant and S. P. Perone, Anal. Chern.. 43, 382 (1971). P. C. Jurs,Ana/. Chern.. 43, 1812 (1971). L. E. Wangen, N. M. Frew, T . L. Isenhour, and P. C. Jurs, Appl. Spectrosc., 25, 203 (1971). B. R. Kowalski and C. F. Bender. J. Arner. Chem. Soc.. 94, 5632 (1972). N. Ahmed, K. R. Rao and A. L. Abdussattas, I € € € Trans. Audio Electroacoustics. AU-19, 225 (1971). N. J. A. Sloane, T. Fine, P. G. Phillips, and M. Harwit, Appl. Opt., 8, 2103 (1969). S. W. Golomb, Ed., "Digital Communications with Space Applications." Prentice-Hall. Enalewood Cliffs, N.J., 1964. (19) W.W. Peterson, "Errorkorrecting Codes," The MIT Press, Carnbridge, Mass , 1961.

A N A L Y T I C A L C H E M I S T R Y , VOL. 45, NO. 13, N O V E M B E R 1973

based on discretely sampled Walsh functions and can be generated several ways including recursively in the form.

From Equation 1, it seems evident that Hadamard matrices are 2n x 2n matrices. However, it has been demonstrated (20, 21) that Hadamard matrices exist for all n divisible by four. The Hadamard matrix of order three is:

[H(3)1

=Ir

1 1 1 -1 1 1 1 -1 1 1 1 -1 1 1 1 -1

1 1 -1 -1 1 1 -1 -1

1 -1 -1 1 1 -1 -1 1

1 1 1 1 -1 -1 -1 -1

1 1 1 -1 1 -1 1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 1 -1 1 1 1 1 -1

r

1 1 1 1 -1 -1 -1 -1 -1 -1 1 1 1 1 -1 -1 1 -1 -1 1 -1 1 1 -1 -1 1 -1 1 1 -1 1 -1

I

WALSH FUNCTIONS

Figure 1. The first five Fourier a n d Walsh functions

which can be generated using Equations 1 and 2 . If the rows of this matrix are plotted as points on equal intervals, and adjacent points connected in a square wave manner, the first row would look like the first Walsh function in Figure 1 but the comparison ends there. Row five would look like the second function, row seven like the third, row three like the fourth and row four like the last Walsh function in Figure 1. The different ordering surfaces another similarity between Fourier and Walsh functions: frequency. One method of ordering Fourier functions by increasing frequency is to compare the number of zero crossings in a given interval. The higher the frequency, the greater the number of zero crossings. This same method can be used to order the functions generated by the rows of the matrix in Equation 3. If this is done, the matrix can be reordered and the first five rows would then correspond to the five Walsh functions in Figure 1. The reordering of Equation 3 produces

1 1 1 1 1 1 1 1 1 1 -1 -1 1 1 -1 -1 1 -1 -1 1 1 -1 -1 1 1 -1 1 -1 .1 -1 1 -1

r

I

FOUR I ER H A R M O N ICs

(4)

which is said to be ordered by increasing sequency. The matrix in Equation 4 can be used to transform a waveform, sampled at eight discrete equal intervals, to the sequency-ordered Hadamard domain. Obviously, much larger transform matrices are used for practical application. The Hadamard transform of a spectrum sampled at, for example, 128 equal intervals (2" = 128, n = 7 ) amounts to the simple matrix multiplication,

(20) R. Thoene and S. W. Golomb, "JPL Space Programs Summary," Vol. I V , No. 37-40, p 207. (21) L. D. Baumert. "JPL Space Programs Summary," Vol. I V , No. 3743, ~ 3 1 1 .

where X is the original spectrum and Y is the information in the Hadamard domain. X and Y are column matrices whose elements are the spectral intensities measured at equal intervals. If [H(7)] has been sequency ordered, Y is analogous to the familiar frequency ordered Fourier representation of X. Thus, any spectrum or waveform can be transformed to the Hadamard domain using the generalized operation for the Hadamard transform,

where

K

=

2n

Here, K is the length of the column rector, n is a positive integer, and [H(n)] may or may not be sequency ordered depending on the goal of the analysis. The spectrum dimension is increased by adding zeros to make Equation 7 hold. This is similar to 'the preprocessing done before applying the FFT. Some additional points have surfaced by the ever increasing amount of research on the Hadamard transform. (The Hadamard transform has also been called the BIFORE transform from BInary Fourier REpresentation.) First, there is no need to store [H(n)] in computer memory. Also, the n.2 operations implied by Equation 6 can be reduced to n log2 n arithmetic operations. These two advances come from the use of matrix factoring or matrix partitioning algorithms used to develop the Fast Hadamard Transform (FHT) (16). Discussion of matrix factorization is beyond the scope of this paper, but the techniques are quite similar to those used to develop the Fast Fourier Transform. Another advantage of the FHT is that the operations are limited to additions and subtractions because the elements of [H(n)] are exclusively positive and negative unity. The original spectrum can be recovered uniquely using the inverse transform as

X

= [H(n)]Y

(8)

There are many useful analogies between Fourier spectral analysis and Hadamard spectral analysis. In fact, it has recently been recognized that it is ". . . no longer necessary to design communications equipment around a concept of trigonometric-based functions. A possible so-

A N A L Y T I C A L C H E M I S T R Y , VOL. 45, NO. 13, N O V E M B E R 1973

*

2235

fact that the Hadamard matrix can have not only the Kronecker form generated using Equations 1 and 2, but also a cyclic notation (24) allowing rather unique engineering advantages. Also, it should be obvious from the existence of various forms of notation (Cyclic, Kronecker, etc.) that Walsh and Hadamard are not totally synonymous. Of the many useful similarities between Fourier analysis and Hadamard analysis that have not been exploited in chemical research, the Hadamard sequency energy spectrum (25) (HSES) is of direct importance to the results presented in the next section. Once the sequency-ordered Hadamard transform has been applied to a discretely sampled spectrum with even intervals, the HSES can be generated. Using Y from Equation 6, we can identify each element using another analogy to Fourier analysis. Instead of sin and cos, sal and cal are used to identify the elements at various sequencies. The first element has sequency zero and is written Y(cal, O). The second and third elements are Y(ca1, 1) and Y(sa1, l), respectively. The cal and sal pairs continue until the last element which is Y(sa1, $ K ) , thereby accounting for all K elements. The elements of the HSES can now be defined as,

P ( o ) = [Y(cal,o)]*

P(i)= [Y(cal,i)I2+ [Y(sal,i)]'

(9) (10)

where 1

0 < i < ~ K

(11)

and

P($K)

=

[Y(~~I,;K)]'

This is a direct analogy of the discrete Fourier transform energy spectrum of real input which can be determined as the sum of the squares of the sine and cosine components at the same frequency. Unlike the discrete Fourier energy spectrum, the HSES is not invariant to circular shifts of the input. When this is a problem (not here), a shift operator (25) is applied as for correlation and convolution using the Hadamard transform. Concepts developed in this section are applied to mass spectral data in the next section.

EXPERIMENTAL

Figure 2. L o w resolution m a s s spectra C) CsHs

of a ) C6H8, b ) C7H8, a n d

phistication now is neatly netted with Walsh functions" (22). Since many advances in spectral analysis that are useful to chemistry have come from communications theory, the above quotation may well point to a change in chemical instrumentation design as well. In fact, a Hadamard Transform Spectrometer is currently available commercially (23). This instrument takes advantage of the (22) t i F. Harrnuth. / E € € Spectrum, November 1969, p 82 (23) J. A. Decker, Jr., Anal. Chem.. 44, ( 2 ) , 127A (1972). 2236

The low resolution mass spectra of 298 randomly selected hydrocarbons (26) containing 6, 7 , or 8 carbons are used as a data base for this study. The spectra amount to intensities (normalized to the highest peak) measured at integer masses from 1 to 116 and are augmented with zeros to give K equal t o 128 and hence n equal to 7 from Equation 7. The spectra in the original mass domain constitute the first data set herein called MASS. Then, each spectrum is transformed to the Hadamard domain (sequency ordered) and the collection is called HADA. The HSES was obtained for each spectrum in HADA. Each HSES has 65 el1) and the entire collection is called SES. Figures ements ($K 2-4 show the mass spectrum, Hadamard spectrum, and HSES for three hydrocarbons containing six, seven, and eight carbons, respectively. Upon close inspection, the Hadamard domain is seen as a n energy spreading technique, again similar to the Fourier domain, in that there is a non-zero intensity a t each of the 128

+

(24) M. Hall, Jr., "Combinational Theory," Blaisdell, Waltham, Mass., 1967. (25) F. R. Ohnsorg. Applications of Walsh functions, 1971 Proceedings, Naval Research Laboratory, Washington, D.C., p 55. (26) Mass Spectral Data File, United Kingdom Atomic Energy Authority, Aldermaston, Reading, RG7-4PR. United Kingdom.

A N A L Y T I C A L C H E M I S T R Y , VOL. 45, NO. 13, N O V E M B E R 1973

11,

3

‘7 “6

C

‘6

H6

I

Figure 3. Hadamard transformed mass spectra for a ) CsH8, b ) C7H8,

and C) C8H8

Figure 4. Hadamard Sequency Energy Spectra for a ) C6H8, b ) C7H8,

positions. The HSES domain has a similar property and contains important information quite analogous to the Fourier power spectrum. The Hadamard domain does not seem to provide useful information for visual analysis by the spectroscopist but may provide a useful preprocessing tool for analysis by pattern recognition methods. The main objective of the next section is to discuss the results of applying a pattern recognition method, the K-Nearest Neighbor Classification Rule (27) to the three data sets and to determine the usefulness of the Hadamard transform as a prepro(27) B. R . Kowalski and C. F. Bender, Anal. Chem., 44, 1405 (1972)

and C) C8H8

cessing tool for mass spectral analysis. Other possible uses of Hadamard transform will be discussed in the last section. All programs were written in Fortran IV and calculations were done on the Colorado State University CDC 6400 computer.

RESULTS AND DISCUSSION Table I shows the classification performance for MASS, HADA, and SES obtained by applying the 3-Nearest Neighbor Rule to each set. The Leave-One-Out procedure is also used (27). Each spectrum, one a t a time, is consid-

A N A L Y T I C A L C H E M I S T R Y , VOL. 45, NO. 13, N O V E M B E R 1973

2237

Table I. Three-Nearest Neighbor Classification Results Dataset

No. of v ariables

Carbon no.

NO. correct

Out of

% correct

MASS

128

6 7 8

HADA

128

73 81 108 262 76 85 119 280 75 87 118 280

79 91 128 298 79 91 128 298 79 91 128 298

92.4 89.0 84.4 87.9 96.2 93.4 93.0 94.0 94.9 95.6 92.2 94.0

Total: 6 7 8

Total: S ES

65

6 7 8

Total:

Table II. Dimensionality Reduction by Deleting the Lowest Intensities Data set No. of variables YO correct (/298) MASS

HADA

S ES

128 64 32 16 128 64 32 16 65 64 32 16

87.9 80.2 40.6 26.2 94.0 93.0 84.6 73.8 94.0 94.0 87.6 80.2

ered as an unknown and all the remaining spectra considered as knowns. The unknown is classified as a six, seven, or eight carbon containing molecule by a majority vote of the three nearest neighbors (known) in the euclidean hyperspace. The reader should refer to the literature (27) for a more detailed description of the pattern recognition methods used in this paper. The results in Table I show an improvement in classification performance when the mass spectral data is transformed to the Hadamard domain. There is also an improvement in the SES results which is magnified when one considers its lower dimensionality (65 instead of 128). Since the HADA spectra consist of the interleaved cal and sal pairs, the information is redundant and the improved performance of HSES with only 65 dimensions is most likely a reflection of this fact. As the Fourier transform was used as a preprocessing tool prior to analysis by a pattern recognition method, so, too, the Hadamard transform has been found to yield improved results. The relative merits of the two domains are probably dependent on each particular application. However, the ease of implementation (addition and subtraction rather than multiplication and division) recommends the Hadamard transform to some degree. Also, the discrete nature of the Walsh functions, upon which the Hadamard transform is based, better fits the discretely sampled waveforms and spectra normally input to digital computers in chemical laboratories. The sequency ordering in the Hadamard and HSES domain suggest the possibility of selectively reducing the dimensionality of the pattern recognition application while monitoring the classification performance. Dimensionality reduction studies have previously yielded important information (28) and also serve to reduce the computation necessary for final analysis. Tables I1 and I11 show the results of selectively reducing the dimensionality of the three data domains before application of the 3-Nearest Neighbor Rule. The number of variables is the total remaining after one-half are eliminated by either of two methods. SES has only 65 variables so only one variable is eliminated on the first pass. This is done to make the comparison exact. The first method of elimination (Table 11) consists of setting the lowest intensities to zero until the desired number remains. The second method (Table 111) eliminates the intensities at the high and low ends of each spectrum. This amounts to the

CONCLUSION In addition to the concepts applied to mass spectral data in this paper, researchers are continuing to discover new uses for the Hadamard transform and are proving that more analogies to Fourier analysis hold true. It should be understood that Hadamard transform theory is a self-consistent theory and does not a t all depend on the analogies to Fourier theory. However, for ease of explanation, the analogies have been used here because Fourier theory is probably well known to readers of this paper. Schmidt (29) has written a collection of Hadamard analysis (called Walsh analysis in the paper) computer programs in the APL language. The APL package includes a forward and reverse FHT, a sequency ordering routine, a two-dimensional FHT for picture processing, plotting routines, and several other useful programs. Although the programs are not used in the work presented here, it is believed that the package will find use in future application to chemical data.

(28) P. C. Jurs, B. R. Kowalski, T. L. Isenhour, and C. N. Reilley, A n a / . Chem., 41, 690 (1969).

(29) R. 0. Schmidt, Applications of Walsh Functions, 1971 Proceedings, Naval Research Laboratory, Washington, D.C., p 68.

2238

Table Ill. Dimensionality Reduction by Deleting High and Low Sequences Data set

No. of variables

% correct (1298)

MASS

128 64 32 16 128 64 32 16 65 64 32 16

87.9 78.9 46.0 19.1 94.0 89.9 78.2 61.4 94.0 94.0 81.2 64.4

HADA

S ES

highest and lowest masses for MASS and the highest and lowest sequencies for HADA and SES. Although severe degradation of the results is seen in some cases, SES results are not changed much; and in one case 80.2% of the patterns are classified correctly using only 16 variables. The results in Table I1 are somewhat better than in Table 111. Since the nearest neighbor method involves a considerable amount of computation, dimensionality reduction provides significant savings.

A N A L Y T I C A L C H E M I S T R Y , VOL. 45, NO. 1 3 , NOVEMBER 1973

Gibbs and Pichler (30) have made a very important connection between the Fourier power spectrum and the Hadamard (again called Walsh) power spectrum. With the discovery of their important link, the two domains can be used somewhat interchangeably and thus increase the usefulness of the Hadamard transform. The familiar Shift Theorem of Fourier theory has been proved for Hadamard theory (25), fast cyclic convolution and correlation algorithm have been developed (31), and several applications have been successfully completed (30) J. E. Gibbs and F. R . Pichler, ibid.. p 51. (31) D. A . Pitassi, ibid., p 130.

(32-34). These and other developments promise to make the Hadamard transform a useful tool for chemical data analysis.

Received for review March 26, 1973. Accepted May 31, 1973. Paper presented at the 1973 Pittsburgh Conference on Analytical Chemistry and Spectroscopy, Cleveland, Ohio, March 1973. Work supported in part by the National Science Foundation. GP-36578X. (32) I . A. Davidson, ibid.. p 1 7 7 . (33) S. J. Campanella and G. S. Robinson, ibid.. p 199 (34) J. W. Carl and M. Kabrisky, ibid.. p 203.

Deconvolution Method for Identification of Peaks in Digitized Spectra Geert Brouwer and J. A. J. Jansen Philips

Research Laboratories, Eindhoven, The Netherlands

An intermediate step in the automatic evaluation of complex spectra is the determination of the positions and areas of the individual lines. The influence of the background is reduced by fitting the first derivative of the known profile to the first derivative of the spectrum. This method ultimately results in one set of deconvolution coefficients for the determination of the positions of the lines and another set for obtaining the areas. I n a specific case Gaussian profiles are assumed. To a certain degree, doublets are resolved. A 30,000-channel spectrum is processed in 1 minute. The method introduces a bandwidth limitation that has an influence on the signal-tonoise ratio and resolution. A comparison is made to other methods.

Spectra of varying complexity appear in several areas of analytical chemistry and computer programs are used for line identification. In this paper, attention is restricted to spectra where the line shape is reproducible, because it is determined by a stable instrument. This is the case with prism and grating spectrographs of moderate resolution, gamma-ray spectrographs and, to a lesser extent, mass spectrographs. Moreover, the line profiles can be approximated quite well by Gaussian curves with half-widths that vary only slowly with the position of the line in the spectrum. This knowledge is exploited in our peak-searching program. The data reduction program starts with a spectrum where the quantitative information obeys the superposition principle. Thus, in a photographic spectrum, the measured transmittances must first be converted to intensities. In the absence of background, the lines in a spectrum can be identified by a least-squared fit of the standard line profile, of which both position and amplitude are varied. The introduction of background leads to erroneous positions and areas. If the assumption is made that the slope of the background is constant in the neighborhood of the line that is processed, the influence of back-

ground can be eliminated in the following way: the first derivative of the standard profile is fitted to the first derivative of the measured spectrum, as is shown in the next section. The difference between both methods can be explained by their transfer characteristics: besides the common high-frequency limitation, the latter has an additional low-frequency cut-off that rejects low-frequency noise or background. Any data reduction program introduces some bandwidth limitation in the information flow and this affects both resolution and signal-to-noise ratio. In digital computer programs, the measurements are available in sampled form, and in this paper a deconvolution technique is developed for line identification. It yields two sets of deconvolution coefficients, one for the determination of positions, the other for the areas. The linear operations: differentiation and least-squares fitting need not be performed sequentially but can be compacted into one operation ( I ) . This enhancement of speed is desirable because the data reduction is performed simultaneously with the reading of the information to avoid the necessity of a vast computer memory space. Overlapping peaks are resolved by the deconvolution method presented in this paper, as far as this is permitted by the resolution of the spectrograph and the data reduction program. The sampling of the spectrum may have an influence on the final results. Special attention is paid to the width of the plate reader slit that is used in connection with photographic spectra. A special section is devoted to a comparison of the various data reduction methods that have been published. INSTRUMENTAL The plate reader employed in our laboratory was built by Witmcmr and Van Gool ( 2 ) . It scans photographic plates u p to 25 cm in length and measures the transmittance automatically a t a rate of 0.5 cm/sec. A built-in measuring system generates trigger puls(1) A. Savitzky. and M. J. E. Goiay, Anal. Chem.. 36,1627 (1964). (2) A . W . Witmer and G . H . Van Gool, Colloquium Spectroscopicum lnternationaie XVI, Heidelberg, preprints, 1 9 7 1 , p 254.

A N A L Y T I C A L C H E M I S T R Y , VOL. 45, NO. 13, NOVEMBER 1973

2239