Two-Dimensional Fourier Compression - Analytical Chemistry (ACS

Figure 2 Data set 1: 170 positive ion spectra of 1-pentanol. ..... like most video compression technologies such as JPEG, the discarded data are high-...
0 downloads 0 Views 191KB Size
Anal. Chem. 1997, 69, 4249-4255

Two-Dimensional Fourier Compression Chunsheng Cai and Peter de B. Harrington*

Center for Intelligent Chemical Instrumentation, Department of Chemistry, Clippinger Laboratories, Ohio University, Athens, Ohio 45701-2979 Dennis M. Davis

Chemical and Biological Detection Research Team, U.S. Army Edgewood Research Development and Engineering Center, Aberdeen Proving Ground, Maryland 21010-5432

A two-dimensional Fourier compression method has been developed as a tool for portable sensors. Ion mobility spectrometry (IMS) yields an advanced chemical sensor for monitoring trace quantities of compounds in air. Twodimensional Fourier compression can increase the compression efficiency without compromising the quality of compressed data. A criterion for the automatic determination of the compression efficiency or cutoff frequency has been developed and evaluated with IMS data. IMS data were compressed by 97% without significant loss of information. Compression of analytical data was an active area of research in the 1970s and 1980s. This research was driven by the accumulation of large collections of digital data, as well as the limited storage and processing capabilities of laboratory computers. In the 1990s, computer technology developed so that laboratory computers could store and process larger quantities of data generated by analytical instrumentation. The importance of data compression has resurged due to two new trends in analytical chemistry. The first is the development of miniaturized instruments, for which the size of the instrument may be smaller than the size of the desktop or notebook computer. Such instruments may be equipped with onboard processors with limited storage capability for data collection and processing. The second trend is the use of sensors for monitoring. These instruments continually generate data that may exceed the capacity of computer storage devices. In many cases, a complete temporal record of the monitoring event is desired. Compression methods may record all the data generated by the sensor and provide the added advantage of discarding noise from the data. Ion mobility spectrometry (IMS) is an area where data compression is important. IMS yields small portable instruments that may detect part per billion concentrations of volatile compounds in air.1,2 These hand-held instruments are capable of generating data for several days, which would overwhelm most portable recording devices. IMS is a powerful but inexpensive technique for monitoring gas-phase samples. These sensors offer high sensitivity, low detection limits, and fast response. The instruments used in this (1) Hill, H. H.; Siems, W. F.; St. Louis, R. H.; McMinn, D. G. Anal. Chem. 1990, 62, 1201A-1209A. (2) Eiceman, A. G.; Karpas, Z. Ion Mobility Spectrometry; CRC Press: Boca Raton, FL, 1994. S0003-2700(97)00458-7 CCC: $14.00

© 1997 American Chemical Society

work generate spectra at a rate of 40 Hz. A monitoring period of several hours could generate millions of spectra. One approach is to signal-average the spectra, but important temporal information would be lost. Therefore, compression methods that facilitate the handling of large collections of data are beneficial. The Fourier transform (FT) is widely used for signal processing, such as convolution, correlation, smoothing, etc.3 FT methods are also useful for data compression.4-6 For this paper, the native domains are drift time and sample acquisition time. The transform domain is drift frequency and sample frequency. The time domain will hence refer to the native domain for the data, and frequency will refer to the Fourier domain. Efficiency and efficacy are two figures of merit for compression. Compression efficiency is determined by a cutoff frequency that specifies the number of transformed points that will be stored. Compression efficacy is a measure of the similarity of the regenerated spectra compared to the spectra before compression. A tradeoff exists between compression efficiency and efficacy such that if the data are overcompressed, relevant information may be lost. After the data are transformed, a reduced number (Nc) of frequency domain points are stored. In the literature, Nc is referred to as the cutoff point and is the key parameter in optimizing the compression. In all the compression methods, only the positive frequencies are used, because the negative frequency domain information is redundant. Various methods for smoothing and compression applications have been described in the literature for the determination of the cutoff point. Bush proposed a method that used an iterative calculation of standard deviation of the higher frequency points.4 The last portion (e.g., 15/16) of the frequency domain points was used to calculate the standard deviation. The calculation was repeated, each time adding a single adjoining lower frequency point. This procedure was repeated until a significant increase in the standard deviation was encountered (i.e., 5%). The point where the standard deviation increased by more than 5% was chosen as the cutoff (Nc). Maldacker et al. used an easier method to determine the cutoff point.5 In their method, the cutoff point (Nc) was the lowest frequency for which the absolute value was less than 0.1% of the maximum absolute (3) Press, W. H.; Teukolsky, S. A.; Vettering, W. T.; Flannery, B. P. Numerical Recipes in C; Cambridge University Press: Cambridge, U.K., 1992; Chapters 12 and 13. (4) Bush, C. A. Anal. Chem. 1974, 46, 890-895. (5) Maldacker, T. A.; Davis, J. E.; Rogers, L. B. Anal. Chem. 1974, 46, 637642. (6) Binkley, D. P.; Dessy, R. E. Anal. Chem. 1980, 52, 1335-1344.

Analytical Chemistry, Vol. 69, No. 20, October 15, 1997 4249

value. This method was later modified by inclusion of a tolerance factor.6 The alternative use of a power spectrum was also suggested.7 In this method, the cutoff point was selected where the power spectrum exceeded the noise power by a factor of 5. The noise power was defined as the maximum value in the 50 highest frequency points. All of the above methods involve arbitrary factors. The equivalent width criterion, proposed by Lam and Isenhour, avoids the problem of arbitrary factors.8 The equivalent width in the frequency domain was calculated from the width of the narrowest peak in the time domain spectrum. This width was used for determining Nc. This method works well for spectra of uniform peak shape. However, the method is biased in favor of retaining noise for low signal-to-noise ratios. Another method to determine Nc that was suggested by Chau and Tam used a least-squares fit of the frequency domain spectrum to a fourth-degree polynomial.9 The first derivative of the polynomial was calculated with respect to frequency, and the first minimum was assigned as the cutoff point. It is difficult to select any single method for the determination of Nc, because different data sets have characteristics that may be suitable for a specific method. The problem of determination of the frequency domain cutoff point (Nc) is more complex for multidimensional compression. Therefore, a simple and rapid calculation that can be applied to large sets of data is desirable. IMS data have many features that make them suitable for Fourier compression. The FT of a single IMS spectrum can be satisfactorily represented by the low- frequency coefficients. This trait is due to the relatively wide Gaussian bands that occur in spectra obtained from low-resolution instruments. Furthermore, for most portable ion mobility spectrometers, the instrument response is relatively slow with respect to changes in analyte concentration. The slow changes arise due to the introduction of the sample across a membrane located in the inlet of the instrument. In addition, chemical adsorption is used to remove analytes from the IMS instrument. The removal of the analyte from within the instrument is a relatively slow process. When the instrument is removed from the sample, a slow decay of the signal may be observed. This slow and continuous response makes Fourier compression on the sample acquisition time dimension attractive. In this study, a two-dimensional FT method has been developed to compress the IMS data. Two-dimensional Fourier compression should increase the compression efficiency without compromising the quality of compressed spectra. For example, if each dimension is compressed to 80%, a mutual compression of both dimensions would result in a 96% compression. Thus, on either the drift time (i.e., resolution element axis for IMS) or sample acquisition time, the one-dimensional FT compression is less effective. For this work, compression efficacy is measured by the regeneration error. Removal of the high-frequency noise components by compression will contribute to the regeneration error, because the regenerated spectrum will differ from the original spectrum. Therefore, this work has a conservative bias and higher compression efficiencies may be attainable. A modified method of selecting the cutoff point (Nc) that was suitable for multidimensional compression was devised. This (7) Kirmse, D. W.; Westerburg, A. W. Anal. Chem. 1971, 43, 1035-1039. (8) Lam, R. B.; Isenhour, T. L. Anal. Chem. 1981, 53, 1179-1182. (9) Chau, F. T.; Tam, K. Y. Comput. Chem. 1994, 18, 13-20.

4250

Analytical Chemistry, Vol. 69, No. 20, October 15, 1997

method is based on the standard deviation method described earlier.4 For comparison, a second criterion for the cutoff point used the fraction of the largest frequency domain component as an alternative method. For data processing methods, such as principal component analysis or partial least-squares calibration, the sample acquisition dimension may be transformed back into the time domain dimension. The other dimension would remain in the compressed frequency domain. Due to the linear properties of the FT, either dimension may be uncompressed separately. THEORY For the data sampled at evenly spaced intervals in time, the discrete Fourier transform can be performed rapidly by the fast Fourier transform (FFT). The discrete Fourier transform is N-1

∑h

Hn )

e2πikn/N

k

(1)

k)0

for which h and H are discrete data in the time and frequency domains, respectively.3 Most FFT algorithms require that the number of components (N) be a power of 2 for each object. This requirement is accomplished by appending zero values to the object. Typically, the two domains have the same number of components. The inverse Fourier transform (IFT) is

hk )

1

N-1

∑H N

n

e-2πikn/N

(2)

n)0

The original time domain data will be regenerated when all the frequency components are used. For experimental data, many of the high-frequency points characterize noise. This part of the frequency domain can be removed without loss of relevant information. The compression is achieved by storing the reduced number of frequency points, which is the principle of Fourier compression. In order to regenerate the time domain data, zeros are appended on the reduced Fourier coefficients, so the regenerated data will be the same size. The data are regenerated with the IFT (eq 2). One may also reduce the number of data points and compress the time domain spectrum by performing the IFT without the zero-filling step. The two-dimensional FT equation is N2-1N1-1

H(n1,n2) )

∑ ∑ h(k ,k ) e

2πik2n2/N2

1

2

e2πik1n1/N1

(3)

k2)0 k1)0

The multidimensional transform can be obtained by two Fourier transforms applied sequentially to the rows and the columns of the data matrix. The order of transformation does not matter, due to the associative property that is inherent in all linear transformations. The first FT was applied on the drift time dimension (i.e. resolution element) as the spectra were read. The frequency domain compressed spectra were stored as rows of the data matrix. The computational procedure that was used is given in Figure 1. The spectra were reflected about the time origin, so that the frequency domain spectrum was composed of real numbers. This method did not provide any benefits other than simplifying the computation. The spectra were increased to a length of the next

Figure 1. General description of the Fourier transform algorithm: (A) IMS spectrum which has 1300 points. (B) spectrum padded with last point to the length of the power of 2 (2048 points) and wrapped around to 4096 points; (C) real part of the Fourier-transformed spectrum with the zero-frequency component omitted; (D) first 1300 points of the positive frequencies.

largest power of 2. This procedure is equivalent to performing the discrete cosine transform. For example, the 1300 points in panel A were extended to 2048 points. The extended spectrum was wrapped around to make it an even function (symmetric with respect to the midpoint), as shown in panel B. The additional points between the spectrum and its reflection were filled with the intensity from the largest drift time value (i.e., last point collected). This procedure reduced sidebands in the regenerated spectra. After the FT, the frequency domain data are given in panel C. Because the frequency domain spectrum is symmetric, only the positive frequencies of the spectrum (first 2048 points) are needed to regenerate the spectrum in panel B. The values in the high-frequency region characterize noise in the spectra and can be eliminated to accomplish the compression. For the FT of the sample acquisition dimension, the procedure was followed in the same way, except each column of the data matrix was transformed instead of each row. EXPERIMENTAL SECTION The IMS spectrometer was a handheld device, the Chemical Agent Monitor (Model CAM 482-301N; Graseby Ionics Ltd., Watford, Herts, U.K.), and was used with a single modification. The reagent chemistry was based on water rather than acetone. The modification removed the reagent gas source from the recirculating gas system of the instrument. Spectra were collected with an IBM-compatible 486 PC using WASP software, version 1.35 (Graseby Ionics). The data were acquired at an 80 kHz acquisition rate, and the drift time range was 1-17 ms. House air was dried using a Whatman Model 76-02 air drying tower, and the air pressure was maintained at 138 kPa with a regulator. Subsequent to the Whatman dryer, two additional

drying towers (4 cm i.d. × 17 cm length) were connected in series. The drying towers were packed with 2-3 cm layers of grade 44 indicating silica gel, 3-8 mesh size, a 1:1 mix of 13X and 5A molecular sieves, and Pyrex Fiberglas glass wool. The towers were used to scrub the dried air of residual organic compounds. The dry air flow rate was regulated with a Cole-Parmer (part number FM064-62) flowmeter that maintained a 3 L/min air flow to the proximal end of a Plexiglas sampling compartment or box. The box had a removable lid and was constructed with dimensions of 35 × 12 × 12 cm (length, height, width). Metal screws rather than adhesives were used to assemble the box so that contamination from adhesive solvents could be avoided. The sample port for the ion mobility spectrometer was a 6 cm high × 8.5 cm wide opening at the distal end of the sample compartment. The samples were placed in 4.0 mL glass vials that were filled three-fourths full with organic solvents. The vials were placed in the sample compartment approximately 14 cm from the inlet nozzle of the ion mobility spectrometer in the Plexiglas sample compartment. Three organic compounds, p-xylene (99.6%, Fisher Scientific, Fair Lawn, NJ, Lot No. 952739), dichloromethane (99.5%, Spectrum, Gardena, CA, Lot No. EF109), and 1-pentanol (99.3%, Fisher Scientific, Lot No. 933539), were used to generate the data and evaluate the compression method. The samples were placed in the sample compartment and removed as the IMS spectrometer measured the air in the compartment. These experiments were designed to simulate a monitoring event for which samples may be exposed to the instrument for short periods. Five data sets were collected. Data sets 1-4 were all collected in positive ion mode, and data set 5 was collected in negative ion mode. The first data set was obtained from 1-pentanol. The vial Analytical Chemistry, Vol. 69, No. 20, October 15, 1997

4251

Figure 2. Data set 1: 170 positive ion spectra of 1-pentanol. Each spectrum is an average of 10 IMS scans.

was placed in the sample compartment at scan number 10 and was removed at scan number 60. The data acquisition software averaged 10 spectral scans for each spectrum that was stored. Each spectrum was composed of 1300 data points, and 170 spectra were collected. The entire data set is given in Figure 2 with an elapsed time of approximately 6 min. This set was used an example to evaluate the multidimensional compression. Data set 2 was acquired from 1-pentanol as well, except that the data acquisition software did not signal-average. Therefore, the signal-to-noise ratio was lower for this data set, because each spectrum was a single IMS scan. For this experiment, the vial of 1-pentanol was added to the sample compartment four times. Data set 3 was collected with a vial of p-xylene that was rapidly added and removed from the sample compartment. The goal was to evaluate the most rapid change in sample concentration on the compression of the sample acquisition dimension. This data set was composed of 1000 single-scan spectra. The data set was collected in a 6 min period. Data sets 4 and 5 were respectively composed of positive and negative ion spectra obtained from dichloromethane. For these data sets each spectrum was an average of 10 scans. These data were not acquired from the sample compartment but from a vapor generator that has been described in detail elsewhere.10 The positive spectra were acquired at 5 vapor-phase concentrations (ppm/ no. of spectra): 50/10, 110/50, 860/38, 1150/50, and 6420/40. This data set was composed of 388 spectra. The negative ion spectra were collected at 3 different vapor-phase concentrations (ppm/ no. of spectra); 130/100, 1300/100, and 6420/200. Four hundred spectra comprised data set 5. The data processing programs were programmed in C/C++; the FFT code was from ref 11. These programs were compiled in Watcom Version 11.0 (Sybase Inc., Canada) and run on a Pentium Pro 200 MHz computer equipped with 64 MB of RAM, which operated under MS-Windows NT 4.0. The figures were generated with Axum 5.0A for Windows (Mathsoft Inc.). RESULTS AND DISCUSSION IMS separates ions based on volume to charge ratios, so that larger ions will be observed at longer drift times. When no analytes are present, a peak is observed that is due to the reactant (10) Harrington, P. B.; Reese, E. S.; Rauch, P. J.; Hu, L.; Davis, D. M. Appl. Spectrosc. 1997, 51, 808-816. (11) Dobbe, J. G. G. Dr. Dobb’s J. Software Tools Professional Programmer 1995, February, 125-133.

4252

Analytical Chemistry, Vol. 69, No. 20, October 15, 1997

ions. The reactant ions are typically protonated water clusters, when air is used as the reagent for the atmospheric pressure chemical ionization (APCI) process. This peak may be observed at 5.85 ms and is the only peak at the early scan numbers with respect to the scan number, because the sample has not been introduced to the compartment. When 1-pentanol was introduced into the sample compartment, a 1-pentanol peak increased rapidly at 7.38 ms. Peaks that are apparently dimer and trimer ions occur at 8.93 and 10.78 ms, respectively. The reactant ion decreased with the increase of the analyte peaks, which was caused by conservation of charge. When 1-pentanol was removed from the sample compartment, the oligomer ions decreased at different rates. The trimer decreased the fastest, and the monomer decreased the slowest. For the large number of spectra in the data sets, the cutoff points for each spectrum should be relatively similar. When this case arises, a global cutoff point may be assigned for all the spectra, and each spectrum will be compressed to the same degree. For the methods described previously, the level of computation may prevent the rapid determination of a global cutoff point. An advantage of reflecting the spectra, so that only the real coefficients are calculated, is that the determination of the cutoff point is simpler because complex numbers are avoided. The frequency domain spectra are symmetric; thus, all the negative frequencies can be discarded and the number of points used for the determination of the cutoff is N/2. Two methods were used for selecting the cutoff point. The first is referred to as the maximal value method and is the procedure described by Maldacker et al.5 The other method is referred to as the standard deviation method, which is a modification of Bush’s method.4 This method calculates the standard deviation from the intensities that correspond to the largest half of the positive frequencies. The standard deviation was calculated only once for each spectrum, and this value was used to determine the cutoff frequency. For the maximal value method, the maximal value Imax of the frequency spectrum was first determined. The intensities were compared as a function of frequency from highest to lowest frequency until a point was found for which the absolute value was larger than a fraction m of Imax. The next largest frequency was selected as the cutoff. All frequency components that corresponded to the cutoff frequency and higher were discarded as part of the compression. The parameter m was studied at values that ranged between 10-4 and 3.0 × 10-3. For the standard deviation method, the frequency components that were larger than the median frequency were used to estimate the standard deviation in panel D of Figure 1. The frequency domain spectrum was searched from the median frequency to low frequency for an intensity that was larger than a factor n of the standard deviation plus the mean. The parameter n was varied from 5 to 50. As with the maximal value method, the next largest frequency point was used as the cutoff. The maximal value and standard deviation methods are complementary. The former looks for a point in the frequency spectrum whose magnitude exceeds a fraction of the total integrated signal. The zero-frequency component typically has the largest magnitude, because it represents the sum of all the intensities in the spectrum. The latter method seeks to find a point that occurs above the noise level, as measured by the high-frequency components of the spectra. Therefore, the maximal value method utilizes signal, and the standard deviation method utilizes noise.

Figure 3. Relationships of the compression efficiency (Nc) and efficacy (σd) as a function of parameters m for the maximal value method and n for the standard deviation method.

Two figures of merit were used for compression efficiency and efficacy. The compression ratio (Rc) measures efficiency and is defined as

Rc ) 1 -

p′ × q′ p×q

(4)

for which p is the number of spectra, and q is the number of points of a spectrum for the original data, and p′ and q′ are the numbers of reduced points after compression. Compression efficacy is characterized by the regeneration error. The regeneration error for entire sets of data was measured by the root mean square difference (σd), which is defined as

x

p

q

∑ ∑(x

σd )

ij

- yij)2

i)1 j)1

p×q

(5)

for which xij and yij are respectively elements of the original and regenerated data matrices that are composed of p spectra of q points. This figure of merit is biased in favor of retaining noise in the regenerated spectra. Therefore, σd is an effective way to estimate the accuracy of the regenerated data but ignores the benefits of the low-pass filtering that are afforded by Fourier compression. A single IMS spectrum that had four peaks approximately equal in height was used to evaluate the parameters m and n for the two cutoff methods. The results are given in Figure 3. Nc and the compression efficiency decreased when m or n increased. The compression efficacy as measured by σd increased when Nc

Figure 4. Comparison of an original spectrum and the regenerated spectrum. The spectrum was compressed for 1300 points to 168 real coefficients before regeneration.

decreased, as one would expect. By using the maximal value method, no significant differences in Nc were observed when m ranged from 5 × 10-4 to 2.5 × 10-3. The values that were reported in the literature were typically 10-3, which was in the middle of the range that was studied.5 Therefore, this value for m was used for evaluating the compression of all the spectra in the data set, because it yielded both efficient and effective compression. For the second cutoff method, the results were similar. When n varied from 5 to 50, no significant difference occurred in Nc. Regeneration errors (σd) of less than 10 were deemed acceptable and resulted in relative errors of less than 1% between the original and regenerated spectra. Figure 4 gives the original and regenerated spectra obtained from a compression with n equal to 25 and m equal to 10-3. The differences typically occur along the base line. Peak areas and heights were not significantly affected. So far, there was no difference between these two cutoff methods, and both yielded acceptable results. The values of m or n may vary due to the inherent tradeoff between compression efficiency and regeneration accuracy. For IMS data, values of n equal to 25 and m equal to 10-3 appear to be good choices. For an entire data set of spectra, a single cutoff point may be applied that universally compresses all the spectra. Figure 5 gives the Fourier spectra of 170 IMS scans. The zero-frequency component represents the total area for each spectrum, and all the spectra in the data set have the same area. This characteristic trait of IMS is due to charge conservation and constant current generated by the ionization source of the instrument. For the 170 spectra in data set 1, a cutoff point was determined using the two criteria presented earlier. The figures of merit are given in Figure 6. These results were with a value for m of 10-3 and for n of 25. Both methods gave similar cutoff values (Nc) for the spectra in the data set. The cutoff points varied from 166 Analytical Chemistry, Vol. 69, No. 20, October 15, 1997

4253

Figure 5. Data set 1 plotted with respect to drift frequency for 170 spectra.

Figure 7. Cutoff points for the two-dimensional Fourier compression. The cutoff points (Nc) are indicated as dots.

Figure 8. Compression efficiency (Rc) and efficacy (σd) with respect to parameter n.

Figure 6. Cutoff points plotted with respect to the 170 spectra obtained using m equal to 10-3 in the upper figure and n equal to 25 in the lower figure.

to 203 within this data set. From the figure, the compression efficiency increased (lower Nc) when the 1-pentanol vapors were present in the sample compartment (i.e., spectral scans 10-60). The largest cutoff was used as a conservative global parameter to compress all the spectra to the same degree. The Fourier compression along the drift time dimension had achieved a compression ratio of 84%, by reducing the number of points in each spectrum from 1300 to 203. After Fourier compression was applied, 203 points were retained for the drift frequency spectra in Figure 5. The time dimension was transformed by following the same procedure used for the drift time dimension to generate the multidimensional compression. A difference was obtained for the two cutoff criteria. The maximal method did not work well for the two-dimensional compression. The maximal values were obtained from the zerofrequency components. Along the sample acquisition dimension, 4254 Analytical Chemistry, Vol. 69, No. 20, October 15, 1997

the intensities of these components varied significantly, because each component is the sum of the drift frequency intensities. These components were all positive and large for the zero drift frequency. For other drift frequencies the components varied between positive and negative values. The zero-frequency component for both dimensions could be used for the maximal value method. However, new values for parameter m would be required. The standard deviation method performed well for the twodimensional transformed data when the parameter n was 25. The cutoff values for data set 1 are given in Figure 7. The standard deviation cutoff method worked well, because the noise was uniformly distributed throughout the higher drift and sample frequencies. From Figure 7, the cutoff point of 23 was selected to compress the data on the sample frequency dimension. The total compression ratio (Rc) of the two-dimensional Fourier compression was 98%. In Figure 7, the distribution of cutoff points does not have a rectangular shape (e.g., 23 × 203). The distribution of cutoff points has a triangular shape, which indicates different cutoffs for different frequencies are appropriate. Therefore, the compression efficiency can be doubled when the larger scan frequencies compress to a reduced number of drift frequency points. Figure 8 gives the compression ratio (Rc) and the regeneration error (σd) with respect to parameter n. The same n was applied for the determination of the cutoff (Nc) of both dimensions. When n was larger than 15, compression ratios exceeding 97% were obtained. If the Fourier coefficients were kept in triangular form instead of the rectangular form, the compression ratio would exceed 98.5%. Although n is an arbitrary factor, a value of 25 was found to be appropriate for all the IMS data sets that were studied.

Table 1. Results of Two-Dimensional FT Compression on Different IMS Data Sets (n Fixed to 25) data set

original size (p × q)

reduced size (p′ × q′)

Rc (%)

σd (mV)

1. 1-pentanol 2. 1-pentanol 3. p-xylene 4. dichloromethane 5. dichloromethane

170 × 1300 501 × 1300 1000 × 1300 388 × 1300 400 × 1300

23 × 203 14 × 193 48 × 193 8 × 186 14 × 202

97.9 99.6 99.3 99.7 99.4

14.49 52.49 34.53 16.26 18.81

For the five different IMS data sets, the figures of merit are given in Table 1. The compression ratios are generally satisfactory. Two data sets that were obtained in single-spectrum mode had relatively larger regeneration errors (σd). The noise levels for these two sets were 3 times larger than those composed of spectra acquired with signal averaging. Fourier compression performed smoothing and removed the higher frequency components of the noise besides compressing the data. The parameter n can be changed, if greater regeneration accuracy or compression efficiency is required. For data set 1 and Figure 8, one may observe that only a marginal decrease in compression efficiency is obtained when n is reduced to 10. However, there is a significant decrease in regeneration error (σd). Typical peak heights in the IMS data range between 3000 and 100 mV. A peak at 100 mV is close to the limit of detection for these instruments. Therefore, the regeneration error is considered good for most applications. The two-dimensional Fourier compression cannot be applied directly as an online method. The sample acquisition time dimension cannot be transformed until the monitoring event has concluded. The drift time dimension can be transformed as the spectra are collected. If the number of cutoff points varies among spectra, Nc could be stored with each spectrum. The sample acquisition dimension can be transformed at the end of the monitoring event. For longer monitoring periods, the sample acquisition dimension can be transformed in segments. The discrete cosine transform (DCT) may be an alternative algorithm to FFT for online compression. DCT may furnish greater compression efficiency and efficacy. However, DCT does carry a larger computational overhead compared to the FFT.12 The regeneration of the data is accomplished by the IFT. The effect of apodization was also evaluated on the regenerated spectra. There are several apodization methods that are recommended by choosing different functions, such as square (i.e., none), trapezoi(12) Blinn, J. F. IEEE Comput. Graphics Appl. 1993, 13(7), 78-83. (13) Ziv, J.; Lempel, A. IEEE Trans. Inf. Theory 1977, 23, 337-342. (14) Huffman, D. A. Proc. IRE 1952, 40, 1098-1101.

dal, and exponential functions. Trapezoidal apodization was evaluated by multiplying the last 25% of the retained frequency components by a linear function with a negative slope and an x-axis intercept of zero. Apodization did not improve the regenerated spectra or have a significant effect on the regeneration error. By compression of data in both time and spectral dimensions, large compression efficiencies may be achieved without loss of information. Therefore, apodization may be required when the data are highly compressed or discontinuous. Although the Fourier compression method is lossy, like most video compression technologies such as JPEG, the discarded data are high-frequency noise. Lossy compression methods are more efficient than lossless methods such as Lempel-Ziv series13 and Huffman14 algorithms as used in the popular ZIP file compression software. The compression ratio for data set 1 using WinZIP (Version 6.2, Nico Mak Computing, Inc.) was 36%. It should be noted that the Fourier-compressed data might be further compressed with the lossless algorithms. CONCLUSIONS A two-dimensional Fourier transform compression method was first introduced for analytical data. This method is important when one considers the demands on data storage that are presented by continuous monitoring. However, such techniques may also be important for high-resolution separation methods coupled to multichannel detectors and for wireless transmission of experimental data. The two-dimensional Fourier transform is well-suited to IMS due to its continuous response with respect to the sample acquisition time and drift time dimensions. Future work will examine wavelet and cosine transforms, as well as development of real-time algorithms. The standard deviation method has been modified so that it can be used to determine the cutoff point (Nc) for the two dimensions. The reduced Fourier coefficients can be processed directly, uncompressed separately, or regenerated to time domain spectra. For the data regeneration, apodization was not necessary and was somewhat detrimental with respect to compression. ACKNOWLEDGMENT We thank the U.S. Army ERDEC-CBD for financial support (Grant No. DAAM01-95-C-0042). Lijuan Hu is thanked for her oral presentation of this work at the 1997 Pittsburgh Conference, Atlanta, GA. Received for review May 5, 1997. 1997.X

Accepted July 31,

AC970458S X

Abstract published in Advance ACS Abstracts, September 15, 1997.

Analytical Chemistry, Vol. 69, No. 20, October 15, 1997

4255