Low capacity codes for mass spectra based on statistical moments

Low capacity codes for mass spectra based on statistical moments. C. P. Anese, and J. A. Richards. Anal. ... Katherine. Alben. Analytical Chemistry 19...
0 downloads 0 Views 425KB Size
or tetraphenylporphyrin cation, as appropriate, in the solvent system to be used, (2) dilute the appropriate solution to be used for the porphyrin determination 1:l with diglyme and record the fluorescence spectrum, (3) add a known volume of a stock solution of ZnTPP in diglyme (for example, l / l o of the total volume of the sample), and (4) use the intensity increase and the analytical curve for the standard stock solution to calculate the concentration of porphyrin in the sample. For the determination of protoporphyrin IX in the ethanol extract of the procedure recommended by Hanna et al. (I), a sample calculation is:

x - IZnTPP

2

IPRP

dilution

x

X

1.1 dilution with stan&

X

(sample) (internal standard)

IPRP IznTpp

[Z n T P P ]

11

= [PRP]

1.00 1.89 4.4x x 1.1x 0.76 3.81 11 = 5.7 X lo-’ M (32 pg/lOO m L )

2x-x-

The ratio for the standard solutions need be determined only once for the instrument and can then be applied for these determinations. The term [ZnTPP]/ 11 is the concentration of ZnTPP in the standard stock solution divided by the dilution of the standard as it is added to the sample solution. For the example calculated above, the emission intensities a t 650 nm with excitation at 425 nm were used. The results of the internal standard method give protoporphyrin concentrations accurately with a standard deviation of the determination of 2.0%. In acidic solutions, the ZnTPP used as an internal standard is instantaneously converted to the tetraphenylporphyrin dication. This species has an excitation maximum at 445 nm

and an emission maximum at 687 nm. The emission intensity is linear (slope = 1.00, standard deviation = 1%)over the range of 5 X lo4 to lo4 M. In using this species as an internal standard, the excitation wavelength is changed on recording the fluorescence spectra of the standard solution with and without internal standard. The emission intensity of the standard at 687 nm is used in the calculation. This method is applicable to coproporphyrin and uroporphyrin determination by the method of Sobel et al. (10) and protoporphyrin in ethyl acetatelacetic acid or acidified acetone extracts (5).

LITERATURE CITED (1) T. L. Hanna, D. N. Dietzier, C. H. Smith, S.Gupta, and H. S. Zarkowsky. Clln. Chem. ( Winston-Salem, N.C.), 22, 161 (1976). (2) S.Granlch, S. Sassa, J. L. Granich, R. 0. Lavere, and A. Kappas, Proc. Natl. Acad. Sci., U S A . , 89, 2381 (1972). (3) L. P. Kammholz, L. G. Thatcher, F. M. Blodgett, and T. A. (bod, Pdktrics, 50, 625 (1972). (4) S. Piomeill, J. Lab. Clin. Med., 81, 932 (1973). (5) J. J. Chlsholm, C. W. Hastlngs, and D.K. K. Cheung, Blochem. Media, 9, 113 (1974). (6) A. A. Lamola, M. Joselow, and T. Yamane, Ciin. Chem. ( Winston-Salem, N.C.), 21, 93 (1975). (7) A. A. Lamola, and T. Yamane, Science, 186, 936 (1974). (8) L. Hellmeyer, “Disturbances in Heme Synthesis”, C. C Thomas Publishing Co., New York, N.Y., 1963. (9) A. Goldberg and C. Rimington, “Diseases of Porphyrin Metabolism”, C. C Thomas Publishing Co., Sprlngfield, Ill., 1962. (10) C. Sobel, C. Cano, and R. E. Thiers, Ciin. Chem. ( Winston-Salem,N.C.), 20, 1397 (1974). (11) J. J. Chisholm, Jr. and D. H. Brown, Clin. Chem. ( Winston-Salem,N.C.), 21. 1669 11975). (12) D. Adler, F..kLongo, J. D. Flnarelll, J. Goldmacher, J. Assour, and L. Korsakoff, J . Org. Chem., 32, 476 (1967). (13) A. A. Adler, F. R. Longo, F. Kampers, and J. Kino, J. Inorg. Nucl. Chem., 32, 2443 (1970). (14) J. R. Miller and G. D. Dorough, J . Am. Chem. SOC.,74, 3977 (1952). (15) D. P. Perrin, W. L. F. Armarego, and D. R. Pelrln, “Puiflcatbn of Laboratory Chemicals”, Pergamon Press, London, 1966. (16) D. J. Quimby and F. R. Longo, J . Am. Chem. Soc., 97, 51 11 (1975). (17) P. G. Seybold and M. Gouterman, J . Mol. Spectrosc., 31, 1 (1964).

A.

RECEIVED for review December 10, 1976. Accepted May 17, 1977. Research was carried out as part of a study of phosphorimetric analysis of porphyrins in blood and urine, supported by the National Institute of Environmental Health Sciences Grant No. ES-00987-01.

Low Capacity Codes for Mass Spectra Based on Statistical Moments C. P. Anese and J. A. Rlchards” Department of Electrical Engineering, James

Cook University of North Queensland, Townsville, Queensland 48 1 1, Australia

Attention has been given recently to the development of low capacity codes for the mass spectrum owing to the high degree of redundancy present in this form of analytical data. This work has been directed principally towards low capacity storage of spectral libraries and, to a lesser extent, to low capacity transmission of spectral information over communications channels. Wangen et al. (1)and Gotch (2-4) have investigated the usefulness of spectra coded by noting simply the absence or presence of a peak below or above a predetermined slicing level, resulting in a spectral code numerically equal in bits (binary digits) to the mass range considered significant. Crawford and Morrison (5) and Knock et al. (6) have considered the properties of a code in which spectra are represented by a number of their strongest peaks. In a variant of this, other investigators (6, 7) have discussed coding schemes in which the N most intense peaks in consecutive 14 amu spectral windows are adopted as the signature of the spectrum. By correlating mass positions Wangen, Woodward, and Isenhour (1)produced a further spectral code, with significantly reduced redundancy, leading to both 80- and 48-bit code 1456

ANALYTICAL CHEMISTRY, VOL. 49, NO. 9, AUGUST 1977

lengths; these gave acceptable performance when tested in library searching procedures for spectral identification. In this paper, a code based upon statistical moments of the spectrum is proposed and discussed. While retaining uniqueness comparable to that of the above codes, its length (i.e., bits/spectrum) is significantly smaller, recommending its use in situations demanding low capacity.

SPECTRAL SIGNATURES BASED UPON STATISTICS Since the mass spectrum is a discrete probability distribution, it is natural to consider the significance of moments in describing its essential features. Indeed, for distributions of this type it is possible to exhibit all properties of the spectrum in terms of its moments, or functions of ita moments, a collection of low order moments is adequate to define the gross features of a distribution, although very high order moments may be necessary to define details. Since the primary concern of this investigation is the preservation of the uniqueness of a spectrum after coding, and not the re-

Table I. Observed Ranges of Statistics in 1500 Spectra, Bit Allocations, and Subsequent Ranges and Precisions of Statistics Allowable Observed range Allowable range Statistic Min Max Bits allocated Min Max Increment 9 1 512 1 23 425 Mean 9 1 512 1 Standard deviation 8 326 8 -12.7 12.8 0.1 -5.4 7.5 Skewness 8 0 2550 10 4 1.3 x 107 Kurtosis production of the original distribution of peak intensities, a set of low order moments can be considered sufficient as a discriminating spectral signature. This is supported by results t o follow. If the mean of the spectrum is defined, in the usual way, as IJ =

zf;:xi/Z:fi i

i

where the x iare mass positions and the f, are the corresponding intensities, then the kth central moment is defined by

mk

=

x(xi - P)’fi/z:fi i

i

Rather than use moments directly, spectral statistics can be generated. In addition to the mean, only three are of interest here, these being the standard deviation, u = the skewness, s = m3f (r3; and the kurtosis, k = m4f u4. It is appropriate to remark that Bender et al. (8) have also adopted moments, however in their case to precondition mass spectral data in pattern classification approaches to recognition.

6

EXPERIMENTAL A set of 1500 spectra from the MSDC collection was chosen to evaluate the effectiveness of using mean, standard deviation, skewness, and kurtosis as a spectral code. To establish ranges of significance, the data set was searched for maximum and minimum values of each statistic. From this information and from an arbitrarily but sensibly chosen precision, the number of bits necessary to represent the complete range and precision of a particular statistic was chosen. This is summarized in Table I where it will be observed that the maximum value of kurtosis is very high. As a result, this was restricted to an upper limit, with higher values truncated to the limit. By allocating 9 bits each to mean and standard deviation, and 8 bits each to skewness and kurtosis, a 34-bit spectral code results. The 1500 spectra in the data set were so encoded and searched for subsequent matching pairs. Only 8 were found, representing 0.53% of the complete set. In practice it would be desirable to have a code length of 32 bits since this is a common computer word length and is an integral multiple of minicomputer and microprocessor word lengths. Therefore, the range over which mean and standard deviation are specified was lowered by allocating only 8 bits to each of those parameters also. This permitted values only up to 256 to be used in those parameters (again higher values were truncated to that limit). In the data set only 4% of means and 1% of standard deviations exceeded 256 so that the effect of such a restriction is not severe. When the 1500 spectra were scanned again, only 9 matching pairs were found (0.60%). These results should be compared to those given by other coding procedures. Such a comparison (based upon results given by previous investigators) is depicted in Table I1 where the success of the 32-bit spectral statistics code in delineating spectra is seen to be better than all others, apart from one-bit encoding using a 1%slicing level. That code however is four times longer than the statistics code and yet is only marginally better in delineation. It can be shown (9)that the maximum accuracy in peak heights necessary to generate the 32-bit statistics code, to the precisions implicit in the bit allocation, is 0.3%, and that on the average only a 1%precision in peak height is required. These are within the capabilities of present magnetic and, in some cases, quadrupole mass spectrometers.

Table 11. Comparison of Properties of Spectral Codes No. of matching Mass Code groups expressed range, length as % of the bi‘ts total sample amu Code 32 0.60% Entire Statistics p , u , s, k 0.83% One peak in each 6-453 128 14-amu window ( 4 ) 0.46% One-bit encoding ( 2 ) 13-140 128 (slicing level = 1% of base peak) 352 0.83% One-bit encoding ( I ) (slicing level = 0.1% of base peak) One-bit encoding with correlation of mass positions ( I ) 80 1.23% (80 channels) 48 3.11% (48 channels)

NEAREST NEIGHBOR LIBRARY SEARCHING For a coding procedure to be viable, it is necessary that it permit successful spectral identification in the face of experimental and related errors. A technique which enhances the usefulness of codes in this situation is that in which the kth nearest neighbors to the unknown are searched for ( I , 3). This method utilizes a measure of the difference between unknown and library coded spectra with the kth nearest neighbors to the unknown being selected from the library if their difference measures are the kth smallest. Since four spectral statistics contribute to the code adopted herein, the relative significance of each was determined in order to provide weighting constants for the nearest neighbor algorithm. Significance was determined on the basis of the success of each statistic individually in distinguishing spectra in the data set. The percentages of matching spectra in such a test were found to be in the ratios 1:2:5:5 for meamstandard deviation:skewness:kurtosis. Therefore weights of 1,0.5,0.2, and 0.2 were adopted in library searching. The search algorithm required computing the relative difference in each statistic for the unknown and a library spectrum. These differences were weighted according to the above prescription and summed to produce a measure of mismatch. The procedure was repeated for each library spectrum and, e.g., the five spectra corresponding to the five smallest values of the mismatch measure or index were selected as the five nearest neighbors and listed in order of increasing mismatch index. A typical result is shown in Table 111,where it will be seen that the nearest neighbors are isomers and homologues. In producing this, each statistic for a library of 1000 spectra was stored to one order of magnitude greater precision than the unknown, principally to permit the nearest neighbors t o be listed in order of likelihood. Otherwise a number of zero mismatch indices (exact matches) were obtained, permitting no ordering. While this implies an additional 16 bits per spectrum for each library entry, it does illustrate the performance of library searching based upon the four statistics when experimental rounding or errors are present, or differences in precision are available, since the ANALYTICAL CHEMISTRY, VOL. 49, NO. 9, AUGUST 1977

1457

Table 111. Example of Nearest Neighbor Library Searching Based upon Statistics Code Unknown (C,H,,O) Mean = 50 Standard deviation = 17 Skewness = 0.3 Kurtosis = 220 The 5 nearest neighbors are: Mismatch index Mean Std dev Skewness Kurtosis C,H,,O C,H,,O C,H,,O C,H,,O C,H,,O

0.04 0.09 0.14 0.18 0.18

49.5 50.3 51.6 53.1 47.3

17.0 17.9

17.8 18.4 15.6

0.26 0.26 0.41 0.40 0.37

220 188 214 212 258

unknown was retained i n its 32-bit format. With the library spectra also in 32-bit form, the unknown was still listed as one of the first 5 nearest neighbors, illustrating that entire 32-bit coding (library and unknowns) should generally be useful in library searching procedures of this type. The performance of the search routine was still creditable when further larger errors were introduced into the unknown. For example, when the standard deviation of C6HI40 was altered by 15% and the kurtosis by lo%, the correct compound was cited as the second nearest neighbor. It is also worth noting that, when weighting factors were not used, the correct compound was suggested as the third most likely neighbor. Finally, library searching using the statistics code performs well in terms of delineation from isomers and homologues. In some cases where the first neighbor listed was not the correct compound, the unknown was in the top two or three; the compounds ranked higher were almost always its isomers, with extremely similar mismatch indices. While one-bit encoding is generally not successful in separating isomers ( I ) , the code used in this investigation was found to have a high success rate in doing so. Thirty-seven of the forty compounds used as unknowns were correctly listed as the first neighbor, even though isomers existed in the library.

spectrum is not lost-Le., it is important that the degree of uniqueness of encoded spectra be similar to that of full spectra. Certainly the results of this study imply that little spectral information is sacrificed in the coding method used even though the ultimate code length is only 32 bits. An accompanying advantage in choosing moments or momentderived statistics is that errors in the determination of peak intensities will not be amplified in the code. Rather the accuracy of the moments will be of comparable order to that of the spectrum. On the other hand, errors in peak position could propagate larger moment errors, for which reason the technique is more suited to low resolution spectra in which all peaks are constrained to occur at integral mass numbers. Clearly other codes based upon spectral statistics could be invented. For example it is possible to construct a reliable code based upon just mean and kurtosis, resulting in a greatly reduced code word length (20 bits) with reasonable discrimination (-1%). However the accuracy required in each of those parameters requires peak height precisions which are not practicable. Allocation of 8 bits to each of the four statistics is a compromise which does not place undue demands on the performance of a mass spectrometer and yet which yields very satisfactory results.

ACKNOWLEDGMENT The authors thank A. Griffiths of the Department of Electrical Engineering, James Cook University of North Queensland, for useful discussions on some aspects of this work.

LITERATURE CITED (1) L. E. Wangen, W. S. Woodward, and T. L. Isenhour, Anal. Chem., 43, 1605-1614 (197 1). (2) S. L. Grotch, Anal. Chem., 42, 1214-1222 (1970). (3) S. L. Grotch, Anal. Chem., 43, 1362-1370 (1971). (4) S. L. Grotch, Anal. Chem., 45, 2-6 (1973). (5) L. Crawford and J. Morrlson, Anal. Chem., 40, 1464-1469 (1968). (6) B. A. Knock, I. C. Smith, D. E. Wright, R. G. Ridley, and W. Kelly, Anal. Chem., 42, 1516-1520 (1970). (7) H. Hertz, R. Hites, and K. Biemann, Anal. Chem., 43, 681-691 (1971). (8) C. F. Bender, H. D. Shepherd, and B. R. Kowalski, Anal. Chem., 45, 617-618 (1973). (9) C. P. Anese, B.E. Thesis, James Cook University of North Queensland, Townsville, Australia, 1976.

CONCLUSIONS In devising coding techniques to reduce redundancy, it is important to ensure that the essential information of a

RECEIVED for review December 30, 1976. Accepted April 19, 1977.

Teflon Powders for Near Infrared Spectra of Hydrides Tryggvi Emilsson and Vakula S. Srinlvasan" Department of Chemistty, Bowling Green State University, Bowling Green, Ohio 43403

Obtaining spectra in the near infrared region (1 wm to 3.5 pm) is usually difficult because of interference from water (OH stretch). One resorts to potassium bromide pellet with protective windows or heavy water solutions under very dry conditions. We would like to report here the use of Teflon (DuPont) as a pressable matrix, for the spectra in this region. Teflon powders have been used for making strongly water repelling electrodes with great integrity and mechanical stability (1). These powders are now available commercially, and they can be used very easily for mixing and molding. We used both Teflon 7A and 7C, obtained from E. I. du Pont de Nemours and Company, Wilmington, Del., in our experiments. 1458

ANALYTICAL CHEMISTRY, VOL. 49, NO. 9, AUGUST 1977

The spectra were run on Beckman Acta IV and Perkin-Elmer Model 337. The pellets were made by weighing 100 mg of Teflon and pressing in a dye of 3/8-inchdiameter at a pressure of 12 tons, under vacuum. In Figure l(a), the background spectrum of a pellet is shown. One could see that the material is transparent in the near IR region, with increasing scattering as the wavelength decreased. In Figures l(b) and l(c) are shown the spectra of lithium aluminum hydride and sodium cyanoborohydride with the pellets made with very small quantities of each. The met&hydrogen stretch (2) is dominant over the background. The stability of the pellets to moisture was clearly demonstrated