Anal. Chem. 1986, 58, 2439-2442
2439
Pyrolysis-Mass Spectrometry Methodology Applied to Southeast Asian Environmental Samples for Differentiating Digested and Undigested Pollens Stephan J. DeLuca and Kent J. Voorhees* Department of Chemistry and Geochemistry, Colorado School of Mines, Golden, Colorado 80401 Emory W. S a r v e r
U S . A r m y Chemical Research and Deoelopment Center, Aberdeen Proving Grounds, Maryland 21010
PyrolysliGmass spectrometry with multivariate statistics has been used to classify southeast Asian envkonmental samples against a collection of standards (i.e., pdlen, bee feces, honey, and beeswax). Dlscrlminant analysis was the most successful supervised statlstical procedure employed. By use of bltnd feces and pollen standards, a classification success rate of 95 % was Obtained. Unsupervised statlstical methods were not totally wccessful In classifying unknowns; however, information on the chemical characteristics was obtained.
The origin of the so-called "yellow rain" occurrences in southeast Asia has been extensively debated. I t has been reported that trichothecene mycotoxins were identified in suspected yellow rain samples as well as biological samples from exposure victims. Mirocha and co-workers (1,2) conducted gas chromatography/mass spectrometry analyses on leaves, grain, soil, a yellowish powder, blood, urine, and body tissues for Fusarium toxins. T-2 and HT-2 toxins were found in the biological samples while T-2 toxin, diacetoxyscripenol, deoxynivalenol, nivalenol, and zeralenone were found in the other samples. Levels of the toxins ranged from 1 ppm to about 100 ppm. Since these specific mycotoxins have not been shown to appear naturally on the substances investigated, the authors suggested that substances were manufactured and distributed by man. A high pollen content has been reported in some yellow samples collected in southeast Asia (3, 4). It was suggested that pollens might be used as a support or carrier for distribution of chemical agents. In recent reports, Nowicke and Meselson ( 5 ) and Seeley et al. (6) did in-depth palynological microscopy of samples allegedly associated with yellow rain attacks. In all samples studied, pollen of the varieties normally found in the southeast Asian area was identified. In addition, a number of different pollen varieties were found in a given sample, and no two samples showed the same pollen composition. These variations were consistent with a palynological study conducted on bee feces collected in India. The authors concluded that the suspected yellow rain samples that they had examined were probably feces from honey bees. Microscopy, however, is unable to directly determine whether the pollen grains have passed through a bee's digestive tract. Since pollen grains are digested as they pass through a bee, a chemical analysis should be able to directly distinguish between undigested and digested pollen grains. A specific analysis has not been reported for any of the samples associated with "yellow rain" that would differentiate between digested and undigested pollens. Pyrolysis-mass spectrometry (Py-MS) has been extensively used for the characterization of biological and other polymeric materials (7-13). When combined with pattern recognition techniques, the procedure has been successfully used to de-
Table I. Samples Used in the Study sample no. 1-6" 7-14
15-25 26-27 28-29 30-31
description bee fecal material various pollen samples southeast Asian samples beeswax honey water-washed pollen
Due to the extremely small amount of sample no. 6 provided, reproducible spectra could not be obtained. This sample was therefore eliminated from the data analysis.
termine compositional information for a number of different types of materials. For example, Windig and Meuzelaar (12), using Py-MS and factor analysis with graphical rotation, determined the biopolymer composition of a number of samples varying from water-soluble starches to whole grass leaves, and MacCarthy e t al. (13) have recently shown that these methods can distinguish between humic acid and fulvic acid samples. The purpose of this work was 2-fold: (1)evaluate the ability of pattern recognition procedures with pyrolysis-mass spectrometry data to distinguish bulk chemical differences between bee feces and pollen samples; (2) to evaluate the ability of Py-MS/pattern recognition to distinguish between undigested pollens and fecal pollen grains.
EXPERIMENTAL SECTION Samples. Pyrolysis-mass spectrometry was performed on a set of 31 different samples. Eleven samples were provided by the Chemical Research and Development Center (CRDC) as unknowns, Le., no sample history or identification was given prior to analysis. The problem was to determine if these samples were pollens, bee feces, or some other unrelated material. To provide a set of known samples with which to compare the unknowns, 20 samples of different origins, including bee feces, pollens, beeswax, and honey, were obtained from various sources. Table I summarizes sources and compositions (where known) of the samples. The set of pollen samples included different species and different collection methods. In order to determine if a simple extraction process could modify the pollens extensively,one pollen sample was extracted with dimethylformamide and two samples were washed with water. The feces samples were collected both under control conditions and in the environment. Furthermore, in some of the samples, the bees were fed nonpollen, high-protein diets (Beltsville diet). The pollen and feces suites thus represent an extremely broad variation of sample type. Pyrolysis-Mass Spectrometry. The details of the Py-MS technique have been extensively discussed elsewhere (7,8). For this study an Extrel SpectrEL quadrupole mass spectrometer interfaced with a Curie-point pyrolyzer was used. Low-energy electron ionization (15 eV) was used with a scan range of 45-245 amu and a scan speed of 1200 amu/s. The pyrolysis temperature was 610 "C with a rise time of 100 ms. Thirty spectra were
0003-2700/S6/0358-2439$01.50/0GI 1986 American Chemical Society
2440
ANALYTICAL CHEMISTRY, VOL. 58, NO. 12, OCTOBER 1986
summed together to produce each Py-MS spectrum. Each sample was suspended in methanol to an approximate concentration of 10 mg/mL and then applied dropwise to the Curie-point wires. Triplicate analyses were accomplished by running three sets of the 31 samples. For each set the order in which the samples were run was randomized. The samples were mounted and run in groups of six (and one group of seven), allowing approximately 30 min of pumping time between groups. A background sample (Le., an uncoated wire) was run a t the beginning of each group. Data Analysis. Each Py-MS spectrum was collected as a set of raw intensities. The intensities were normalized by dividing each intensity by the sum of the intensities over the mass range of 72-160. This process minimizes the effects of background fluctuations, which are most severe outside of this range. Multivariate statistical calculations were carried out by using the ARTHUR (24)program subroutines on a DEC 1091computer. Prior to multivariate statistical analyses, the data were autoscaled; for each feature (Le., normalized intensity a t a given m / z ) in each spectrum the mean of that feature over all spectra was subtracted and the result divided by the corresponding standard deviation. The multivariate statistical analyses performed included unsupervised learning procedures (i.e., principal components factor analysis with graphical rotations and nonlinear mapping) as well as supervised learning procedures (Le., SIMCA and multilinear least-squares regression) (24-2 7). Prior to the generation of nonlinear maps the data (scaled) were orthogonalized by ARTHUR subroutines KAPRIN and KATRAN. This procedure ensures that the distances calculated in subroutine DISTAN are truly Euclidian. The maximum number of eigenvectors (these are the “factors” of factor analysis) were extracted (i.e., n - 1 where n is the number of spectra since this was less than the number of m / z values); however only the first ten factors were used in the distance calculations. Using more than ten factors did not improve separation of the various groups. These ten factors accounted for 72.5% of the totalvariance. During the same run, Karhunen-Loeve projections were generated for the first ten factors. These projections allow visualization of the scores of each sample on each factor, two factors at a time. Factor rotations and associated factor spectra were generated. In factor spectra, the intensity of each peak is given by the loading of that m / z value on the factor multiplied by the standard deviation, calculated during autoscaling, for that mlz. For the supervised learning routines, training, and test/evaluation sets were specified using ARTHUR subroutine CHDATA. The training set was then autoscaled and the scaling parameters were applied to the test set. The SIMCA analyses were performed using subroutines SIPRIN, SIUTIL, and SICLASS. One-, two-, and three-component models were tested on all training sets. Discriminant analysis was carried out with subroutine LEDISC, which is a multilinear least-squares regression routine designed for categorized data. Discriminant models were built off of both the total nonorthogonolized data set and a factor analyzed data set containing the first ten factors. Comparable results were obtained with both data sets.
8
70
‘155
’
,I 8,’ Feces
71
O
3
Unk
I
#
20
Unk I. 17
d
0
* e
0
N
60
100
140
loo
M/Z Figure 1. Normalized pyrolysis-mass spectra of selected samples.
RESULTS AND DISCUSSION Figure 1 shows normalized Py-MS spectra for several samples. Visual inspection of the spectra gives some idea of the variability of pyrolysis products among the samples. The complexity of the spectra, however, makes it impossible t o visually classify the samples. Each spectrum can be represented as a point in n-dimensional space where n is the number of observed m / z values. A method for reducing this space to two dimensions, allowing visual inspection, is nonlinear mapping. Figure 2 shows the nonlinear map of the entire data set. T h e apices of each triangle represent replicate sample spectra. The NLM illustrates some general trends in the data. The wax and honey samples appear to be quite different from the majority of t h e known pollen and feces samples. T h e feces samples tend to fall closer to the wax samples than d o the pollens. This is consistent with observations that bee feces contains a large ( 2 0 4 0 % ) amount of wax. Unknown sample
v
28
Figure 2. Nonlinear map of total data set
21 appears t o have a large wax component and is quite different from the rest of the unknowns. The NLM shown in
ANALYTICAL CHEMISTRY, VOL. 58, NO. 12, OCTOBER 1986
2441
Table 11. Results of Discriminant Analysis on the Test S e t Members"
1
1
1
2
2
'
z2 2
'- 2
2
2
2 2 2
2
2
2
2 2
2 2
z 2
sample no.
class
av w t
15 16 17 18 19 20 22 23 24 25
feces pollen pollen feces feces feces pollen feces feces pollen
0.61 0.60 0.64 0.74 0.71 0.62 0.71 0.85 0.70 0.58
30 31
pollen pollen
0.75 0.89
2
2
authentic feces authentic pollen pollen grains found/unknown origin authentic feces unknown origin unknown origin authentic pollen no pollen grains found no pollen grains found pollen grains may be present/unknown origin known pollen known pollen
2
Flgure 3. Nonlinear map of pollen and feces samples (including samples 30 and 31).
5i 0.
t5
99
"The second column gives the calculated class (either feces or pollen) determined by the average weight (these weights are regression coefficients, Le., weight of feces class + weight of pollen class = 1) for the three replicates. The fourth column gives information on the samples that were determined by the CRDC and given to us after our analyses were complete (exception: samples 30 and 31 were known water-washed pollens used as an evaluation set).
correctly classified. The last column of Table I1 contains information on the unknown samples which was provided after the analyses had been completed. Four of the samples were actually authentic pollens (16 and 22) or feces (15 and 18) which provided a blind evaluation set. The discriminant analysis correctly classified samples 16, 22, 30, and 31 as pollens, and samples 15 and 18 as feces. Samples 17,19, 20, 21, 23, 24, and 25 were true unknown environmental samples on which some analyses had been conducted by the army ' and 25 were classified by discrilaboratories. Samples 17 minant analysis as pollen. The nonlinear map and K-L plots showed that the classification of the samples as pollen was consistent with the unsupervised learning results. Similar comparisons between discriminant analysis and unsupervised learning were obtained for samples 19, 20, 23, and 24. These samples, classified as feces, clearly plotted in an area of the nonlinear map and factor 1 vs. 2 K-L plot dominated by various feces samples. Since a portion of bees' diets consists of pollen, it is not surprising that the feces and pollen samples are chemically quite similar, as complete digestion does not occur. The digestive process probably extracts specific classes of compounds such as proteins (6). With this information it was reasoned that the Py-MS spectra of feces samples would show lower intensities of protein pyrolysis products than pollen samples. Large peaks at m / z 48 and 117 are typical in Py-MS spectra of protein samples (8). A plot was generated in which the samples were plotted as a function of their normalized m / z 48 vs. 117 intensities (Figure 5 ) . With two exceptions, the pollens show higher protein content than the feces. Sample 3, which shows a high m / z 48, is a feces sample from bees that were fed a controlled diet which contained no pollen but high protein and thus shows a high protein content. Sample 10 is a pollen that shows a low protein content; however its water washed analogue shows a relatively high protein content. We cannot account for this observation. Unknown sample 22 shows a high-protein content indicating that it is either a pollen or a feces from a bee that was fed a high protein diet. The remainder of unknown samples have intermediate protein values. Unsupervised learning procedures alone were not completely successful in classification of the unknown samples. SIMCA supervised learning procedures using the known pollen and feces samples as separate training sets were used to de-
2442
ANALYTICAL CHEMISTRY, VOL. 58, NO. 12, OCTOBER 1986
the various feces samples. These components may also be common to the pollen samples, and thus pollens are misclassified as feces. In contrast, discriminant analysis directly compares the two classes to build a discriminant model. In this study, discriminant analysis was clearly superior to SIMCA analysis for differentiating feces from pollen samples. It may be possible, however, to build SIMCA models using cross-validation of components which would give satisfactory results.
RI mla
117
CONCLUSIONS
RI m l z 4 8
Flgure 5. Bivariate plot of total data set on rnlz 48 vs. m l z 117.
Table 111. Distance” of Unknown Samples to the Pollen and Feces Training Sets
sample no.
a v distance t o pollen training set (std dev)
av distance t o feces training set (std dev)
15 16 17 18 19 20 21 22 23 24 25
1.9 3.8 1.5 5.4 6.6 2.6 8.1 1.3 3.0 3.3 1.9
1.6 3.5 1.5 5.2 5.9 1.7 8.3 1.3 2.7 2.9 2.0
“Distance i s giyen in number of standard deviations, averaged over t h e three replicates.
termine the similarity of the unknowns to the training sets. In the SIMCA routines a principal component model is established for the training set. T h e distances of the training set members to the model are calculated and a standard deviation of these distances is determined. The test set members can then be classified as either belonging to the training set or not, depending on their distances (in standard deviations) to the model. A distance of over two standard deviations is considered outside of the model. Table 111represents the results of the SIMCA analysis for the unknown samples. This table shows the distances of each sample in the test set in relation to each of the training sets based on a two standard deviation cutoff point. Six of the samples fall outside both the feces and pollen training sets, including sample 21, which was shown by factor analysis to have a high wax content. Except for sample 20, samples that fall below two standard deviations with one training set also fall below two standard deviations with the other training set. In addition, many known feces samples fell within the pollen training set and vice versa. These results are probably due to the chemical similarities of the pollen and the feces samples. The SIMCA method did not provide the necessary information for classifying the samples. For instance, the SIMCA model for the feces set is based on the chemical components (actually t h e pyrolysis products) that are common in each of
Both supervised and unsupervised learning procedures were necessary to analyze the Py-MS data. Unsupervised learning (i.e., factor analysis and NLM) showed that the Py-MS data contains enough chemical information to distinguish bee feces from pollens. In addition, samples that were grossly different from feces and pollen (i.e., waxes and honeys) were identified through unsupervised learning. Sample 21 was shown by factor analysis to be mostly wax, in concurrence with data from the army. Discriminant analysis successfully classified all authentic feces and pollen samples, while 95% of the training set members were successfully classified. The results of the analyses of the unknowns (suspected “yellow rain” samples) showed that three samples were feces and two were pollens. Comparison of the discriminant analysis results for the five unknowns with the unsupervised learning results showed excellent correspondence. Using pollen and feces samples as training sets the SIMCA analysis failed to distinguish between feces and pollen samples and incorrectly classified two authentic samples (no. 16 and 18) as belonging to neither the pollen or feces training sets. This study illustrates that pattern recognition procedures can distinguish between undigested pollen and fecal pollen samples based on their pyrolysis-mass spectra with confidence levels approaching 95%. Registry No. HzO, 7732-18-5.
LITERATURE CITED (1) Watson, S. A.; Mirocha, C. J.; Hayes, A. W. Fundam. Appl. Toxicol. 1984, 4 , 700-717. (2) Mirocha, C. J.; Powlosky. R. A.; Chatterjee. K.; Watson, S.;Hayes, W. J. Assoc. Off. Anal. Chem. 1983, 66, 1485-1499. (3) Marshall, E. Science 1983, 221, 242-244. (4) Ashton, P. S.;Meselson, M.; Robinson, J.; Perry, P.; Seely, T. D. Science 1983, 222, 366-368. (5) Nowlcke, J. W.; Meselson, M. Nature (London) 1984, 3 0 9 , 205-206. (6) Seeley, T. D.; Nowicke, J. W.; Meselsen, M.; Guillemin, J.; Akratanakul, P. Sci. Am. 1985, 253, 128-137. (7) Analytical fyrolysis: Techniques and Applications ; Voorhees, K. J., Ed.; Butterworths: London, 1984. (8) Meuzelaar, H. L. C.; Haverkamp, J.; Hileman, F. D. Pyrolysis of Recent and Fossil Biomaterials : Cornpendurn and Atlas ; Elsevier: Amsterdam, 1982. (9) Irwin, W. J. Analytical Pyrolysis, A Comprehensive Guide; Marcel Dekker: New York, 1982. (10) Tsao, R.; Voorhees, K. J. Anal. Chem. 1984, 5 6 , 1339-1343. (11) Voorhees, K. J.; Tsao, R. Anal. Chem. 1985, 5 7 , 1630-1636. (12) Windig, W.; Meuzelaar, H. L. C. Anal. Chem. 1984, 56, 2297-2302. (13) MacCarthy, P.; DeLuca, S. J.; Voorhees, K. J.; Thurman, E. M.; Maicolm, R . L. Geochim. Cosrnochim. Acta 1985, 4 9 , 2091-2096. (14) Harper, A.; Duewer, D. L.;Kowalski, B. R.; Fasching, J. L. I n Chernometrics: Theory and Applications; Kowalski, B. R., Ed.; American Chemical Society: Washington, DC, 1977, ACS Symp. Ser. 52, pp 14-52. (15) Rummel, R. J. Applied Factor Analysis; Northwestern University Press: Evanston, IL, 1970. (16) Catrell, R. B. Factor Analysis; Harper and Brothers: New York, 1976. (17) Blomquist, G., Johansson, E., Soderstrom, B., Wold, S. J . Chromatogr. 1979, 7, 19-25.
RECEIVED for review February 12, 1986. Accepted June 1, 1986.