SIMPLISMA and ALS Applied to Two-Way Nonlinear Wavelet

Mar 22, 2005 - Wavelet Compressed Ion Mobility Spectra of. Chemical Warfare Agent Simulants. Libo Cao,†,§ Peter de B. Harrington,*,† and Jundong ...
3 downloads 0 Views 548KB Size
Anal. Chem. 2005, 77, 2575-2586

SIMPLISMA and ALS Applied to Two-Way Nonlinear Wavelet Compressed Ion Mobility Spectra of Chemical Warfare Agent Simulants Libo Cao,†,§ Peter de B. Harrington,*,† and Jundong Liu‡

Center for Intelligent Chemical Instrumentation, Department of Chemistry and Biochemistry, Ohio University, Athens, Ohio 45701-2979, and Russ College of Engineering and Technology, Ohio University School of Electrical Engineering and Computer Science, Athens, Ohio 45701-2979

Ion mobility spectrometry is a rapid scanning measurement method for which compression methods that facilitate the handling of large collections of data are beneficial. Peak distortion in reconstructed ion mobility spectra from linear wavelet compression is problematic in that artifact peaks may cause false positive alarms. Peak shifting also may cause false alarms if target peaks shift out of or interfering peaks shift into detection windows. Nonlinear wavelet compression (NLWC) preserves peak shape and can lessen the degree of distortion, shifting, and artifact peaks in the reconstructed spectra. NLWC was applied to achieve high compression and fidelity in the reconstructed spectra. Another benefit is that NLWC improves signal-to-noise ratios and thus the models built from compressed data are improved. By compressing both the drift time order and the spectrum acquisition order, greater compressions maybe achieved. A two-way nonlinear wavelet compression method that incorporates alternating least squares (2W-NLWC-ALS) algorithm was devised by applying ALS to partially reconstructed wavelet coefficients generated from two-way NLWC. The number of components in a data set can be determined automatically using ASIMPLISMA. The smaller ALS models are saved as the final compressed data and can be used to reconstruct the entire data set efficiently without maintaining the compressed wavelet coefficient matrix of the original data set. The 2W-NLWC-ALS algorithm provides greater compression ratios compared to regular wavelet compression and interpretable models. Using this method, large volumes of data can be acquired and easily evaluated through a simple compressed model. A compression ratio of 510 ppm, root-mean-square error (ERMS) of 6.3 mV (full-scale signal is usually 1 V or larger), and relative rootmean-square error (RERMS) of 1.62% were achieved for data sets collected by CAM. A compression ratio of 46 ppm, ERMS of 9.2 mV, and RERMS of 0.42% were achieved for data sets collected with an ITEMISER instrument. The * Corresponding author. E-mail: [email protected]. Phone: (740) 517-8458, Fax: (740) 593-0148. † Center for Intelligent Chemical Instrumentation. § Present address: Metara, Inc., 1225 East Argues, Sunnyvale, CA 94085. ‡ Russ College of Engineering and Technology. 10.1021/ac0486286 CCC: $30.25 Published on Web 03/22/2005

© 2005 American Chemical Society

2W-NLWC-ALS algorithm is an efficient compression method that provides the benefits of a simple model. Ion mobility spectrometry (IMS) has been broadly applied for the detection of trace levels of explosives,1,2 bacteria,3,4 pesticides,5 and chemical warfare agents.6-8 The IMS signal arises from ion formation and characterization in a drift tube maintained at ambient pressure. IMS instrumentation can be miniaturized and affords portable devices for on-site chemical measurement. Handheld IMS instruments may detect parts per billion concentrations of volatile compounds in air. The other advantages of IMS are its fast response and high sensitivity. Reviews on IMS can be found elsewhere.9-12 In many instances, the rapid scanning advantage is not fully utilized because of the large volumes of spectra that can be accumulated. Onboard processors that are used for such instruments may have limited storage capability. IMS sensors may acquire spectra at a rate of 40 Hz, and 5 h of collection may result in ∼1 GB of data. If spectra are continually collected, the storage capacity of the processing system may be exceeded. For wireless communication applications, the data bandwidth limit may restrict the rate at which data can be conveyed, thereby necessitating the use of data modeling and data compression.13 (1) Ewing, R. G.; Atkinson, D. A.; Eiceman, G. A.; Ewing, G. J. Talanta 2001, 54, 515-529. (2) Buxton, T. L.; Harrington, P. D. Appl. Spectrosc. 2003, 57, 223-232. (3) Strachan, N. J. C.; Nicholson, F. J.; Ogden, I. D. Anal. Chim. Acta 1995, 313, 63-67. (4) Vinopal, R. T.; Jadamec, J. R.; deFur, P.; Demars, A. L.; Jakubielski, S.; Green, C.; Anderson, C. P.; Dugas, J. E.; DeBono, R. F. Anal. Chim. Acta 2002, 457, 83-95. (5) Tuovinen, K.; Paakkanen, H.; Hanninen, O. Anal. Chim. Acta 2000, 404, 7-17. (6) Eiceman, G. A. Abstr. Pap. Am. Chem. Soc. 2002, 224, 211-ANYL. (7) Steiner, W. E.; Clowers, B. H.; Matz, L. M.; Siems, W. F.; Hill, H. H. Anal. Chem. 2002, 74, 4343-4352. (8) Asbury, G. R.; Wu, C.; Siems, W. F.; Hill, H. H. Anal. Chim. Acta 2000, 404, 273-283. (9) Li, F.; Xie, Z. Y.; Schmidt, H.; Sielemann, S.; Baumbach, J. I. Spectrosc. Spectral Anal. 2002, 22, 1025-1029. (10) Eiceman, G. A. Trac-Trends Anal. Chem. 2002, 21, 259-275. (11) Hill, H. H.; Simpson, G. Field Anal. Chem. Technol. 1997, 1, 119-134. (12) Eiceman, G. A.; Karpas, Z. Ion Mobility Spectrometry; CRC Press: Boca Raton, FL, 1994. (13) Cao, L.; Harrington, P. B.; Harden, C. S.; McHugh, V. M.; Thomas, M. A. Anal. Chem. 2004, 76, 1069-1077.

Analytical Chemistry, Vol. 77, No. 8, April 15, 2005 2575

Multivariate curve resolution (MCR) methods model the transient responses of data.14,15 Most methods decompose the data set into spectra and concentration profiles.15,16 Large data sets place demands on MCR calculations that may not be sufficed by the computational system. Data compression prior to MCR can dramatically reduce the storage requirements, concomitantly improve signal-to-noise ratios, and model the spectra and concentration profiles. Data compression has been applied to IMS and reduced the data size without losing important chemical information. Over the past decades, a variety of linear transforms have been developed for compression. Discrete Fourier transform (DFT),19 discrete cosine transform (DCT),20 and discrete wavelet transform21 are among the most popular ones used for chemical data. Β-Spline methods were described as good candidates for data compression of FT-IR spectra.17,18 For sensor data such as furnished by IMS, the data set usually can be represented as an image with one axis corresponding to spectral resolution elements (i.e., drift time) and spectral acquisition. These data are amenable to two-way compression. Two-way DFT-based methods were devised and applied to IMS data in both drift and sample acquisition times.22-24 Two-way compression that is compressing both the drift time and spectral acquisition time orders was shown to yield large gains in compression efficiency from sensor data. In previous work, drift time and spectral acquisition time were referred to as dimensions and the compression as two-dimensional. However, ion mobility spectra can contain 1000 drift time resolution elements or dimensions. So the number of dimension will refer to the number of resolution elements and spectra, and order will refer to the two measurements of drift and spectral acquisition times. The DCT can be regarded as a discrete-time version of the Fourier cosine series. DCT is currently the standard encoding scheme for JEPG still image compression, which has found tremendous applications in various areas of entertainment, engineering, and science.25 Despite all the popularity and advantages of DFT- and DCT-based compression schemes, there are inherently drawbacks associated with these two approaches. The DFT is not commonly used in image compression algorithms since the energy concentration properties are poor compared to other transforms; another degrading effect is that undesired high-frequency components appear at the boundaries of the transformed block. In DCT-based image compression, the input image needs to be blocked to reduce the computational complexity. Blocking results in correlation across block boundaries, which causes loss of compression as well as noticeable and annoying blocking artifacts. Another drawback for FT-based (14) Harrington, P. B.; Rauch, P.; Cai, C. Anal. Chem. 2001, 73, 3247-3256. (15) Jiang, J. H.; Ozaki, Y. Appl. Spectrosc. Rev. 2002, 37, 321-345. (16) Wang, J. H.; Hopke, P. K.; Hancewicz, T. M.; Zhang, S. L. L. Anal. Chim. Acta 2003, 476, 93-109. (17) Alsberg, B. K.; Nodland, E.; Kvalheim, O. M. J. Chemom. 1994, 8, 127145. (18) Alsberg, B. K.; Kvalheim, O. M. J. Chemom. 1993, 7, 61-73. (19) Cai, C.; Harrington, P. B.; Davis, D. M. Anal. Chem. 1997, 69, 4249-4255. (20) Ahmed, N.; Natarajan, T.; Rao, K. R. IEEE Trans. Comput. 1974, C-23, 90-93. (21) Vetterli, M.; Kovacevic, J. Wavelets and subband coding; Prentice Hall: Englewood Cliffs, NJ, 1995. (22) Harrington, P. B.; Isenhour, T. L. Appl. Spectrosc. 1987, 41, 1298-1302. (23) Urbas, A. A.; Harrington, P. B. Anal. Chim. Acta 2001, 446, 393-412. (24) Chen, G. X.; Harrington, P. B. Anal. Chim. Acta 2003, 490, 59-69. (25) Wallace, G. K. Commun. ACM 1991, 34, 30-44.

2576

Analytical Chemistry, Vol. 77, No. 8, April 15, 2005

compression is that it is bad for nonperiodic functions because of the nonlocality of the basis functions. Wavelet transformation (WT) has emerged as a better solution in overcoming the above shortcomings. Compared to the FT, the WT offers several advantages over FT including faster, simpler implementation, and compact support that extends to infinity. Wavelet compression (WC) can be achieved with the WT by transforming the signal into the wavelet domain and retaining a reduced number of coefficients; significant size reductions were achieved using this method. Reviews and tutorials regarding WTs are available in the chemistry literature.26,27 A distinction should be made between linear and nonlinear wavelet compression methods. Compression that is achieved by retaining low frequency or resolution components is referred to as linear. Linear compression acts as a fixed band-pass filter of the data, and high-frequency components are discarded during compression. The compression can be achieved by linear algebraic operations. For nonlinear compression, wavelet coefficients that satisfy some intensity criterion are retained along with their positions. The simplest approach is to use a garrote threshold that sets coefficients whose magnitudes do not exceed a critical value to zero. This operation is a nonlinear operation, and the compression acts as a variable band-pass filter. Denoising refers to spectra that are reconstructed from nonlinearly compressed data and smoothing to spectra that are reconstructed from linearly compressed data. Linear WC applied to IMS is useful when all peaks have the same shape. However, peak distortion in reconstructed ion mobility spectra from WC is problematic in that artifact peaks can result in false positive alarms. These artifacts can be detrimental when peak detection algorithms are used. Peak shifting also can result in false negative alarms when a peak shifts out of a detection window. To solve this problem, a nonlinear wavelet compression (NLWC) algorithm was devised to avoid peak distortion and eliminate artifacts in the reconstructed spectra.13 For NLWC, two key components are furnished for each spectrum. The first component is a reduced set of wavelet coefficients. The second component is a vector that stores the location of the wavelet coefficients. The algorithm works by storing the locations and coefficients of the wavelet coefficients that exceed an intensity threshold. NLWC is especially useful for high compression rates, high-fidelity reconstruction, and denoising of the reconstructed spectra. NLWC allows the removal of the artifacts caused by LWC while achieving similar or improved compression with respect to linear wavelet methods. The models built from compressed data are improved because NLWC improves signal-to-noise ratios. Using this method, large volumes of data can be acquired and easily evaluated.13,28 SIMPLe-to-use interactive mixture analysis (SIMPLISMA) is a self-modeling curve resolution method that has been demonstrated as an effective tool for enhancing IMS measurements.29 Alternating least squares (ALS)30 is also a chemometric modeling method that can be used on the NLWC compressed data. The (26) Alsberg, B. K.; Woodward, A. M.; Kell, D. B. Chemom. Intell. Lab. Syst. 1997, 37, 215-239. (27) Leung, A.; Chau, F.; Gao, J. Chemom. Intell. Lab. Syst. 1998, 43, 165-184. (28) Cao, L.; Harrington, P. B.; Liu, C. Anal. Chem. 2004, 76, 2850-2868. (29) Windig, W.; Guilment, J. Anal. Chem. 1991, 63, 1425-1432. (30) Hamilton, J. C.; Gemperline, P. J. J. Chemom. 1990, 4, 1-13.

alternating procedure of regression applied to NLWC is constrained, so that concentrations and spectra should not contain negative values. The general objective of SIMPLISMA and ALS is to decompose a large data matrix into two simpler matrices, a matrix of concentration profiles and a matrix of component spectra. The simpler models can efficiently reconstruct the entire measurement, while leaving the data set compressed. Modeling as a form of compression can decrease the data size between 60 to 6 ppm of the original size. The number of components in the model is determined automatically based on the complexity of the data. Therefore, large volumes of data can easily be acquired and evaluated. In addition, compression improves signal-to-noise ratios so that models built from compressed data are improved. ALS applied to 2W-NLWC proved to be beneficial in the application of chemical agent monitor (CAM) and ITEMISER, which are two different IMS instruments. NLWC is advantageous compared to LWC, because it is resistant to shifts of the peak maxima (i.e., less than 0.02 ms). Smaller values of error of root mean square (ERMS) were obtained from NLWC than from LWC. The procedure of building ALS models from 2W-NLWC compressed data will be called 2W-NLWC-ALS. The 2W-NLWC-ALS procedure simplifies the data and extracts the main principal variances with respect to of the spectra and their change with concentration. Efficiency and efficacy are two evaluating criteria for compression. Compression efficiency is determined by the ratio of the size of compressed data to the uncompressed data. Compression efficacy is a measurement of the similarity of the reconstructed spectra compared to the unprocessed spectra. There is a tradeoff between these two criteria. For example, if the data are overcompressed, high efficiency is obtained but relevant information may be lost, which represents low efficacy. In this paper, both compression efficiency and efficacy were evaluated. THEORY IMS. IMS has been widely used in screening explosives at airports,1,11 detecting chemical agents for the military,7 and monitoring stack gas emissions in industry.10 In practice, a vapor sample is introduced into the reaction region of a drift tube where neutral molecules of the vapor are ionized and the resultant ions are injected into the drift region for mobility analysis. This relationship of the variables was given in eq 1, for which K(T) is

Vd ) K(T)E

Vd L/td L2 ) ) E V/L tdV

K0 ) K(T)

P 273.16 L2 P 273.16 ) 760 T tdV 760 T

(3)

for which P is the drift gas pressure in Torr and T is the temperature in K. From eq 3, at any given temperature and pressure, the reduced mobility K0 of an unknown ion maybe be obtained from the ratio of its td value to that of a reference ion obtained under identical experimental conditions. Equation 4 gives the method to calculate the reduced mobility of an unknown ion by knowing its drift time using a standard ion.

K0(unknown) K0(standard)

)

td(standard) td(unknown)

(4)

IMS spectra usually have the reactant ion peak (RIP) as a background component. The atmospheric pressure chemical ionization charge-transfer reactions are intricate between the RIP and the analytes. SIMPLISMA. SIMPLISMA is a pure variable selection method, which is based on the principle that the intensity of a pure variable gives an estimate of the concentration of its associated components. For example, in eq 5 for which the original data matrix D

D ) CST + E

(5)

(m × n) is a matrix with m rows of spectra and n columns that correspond to the drift time measurements. The original data matrix D can be decomposed into the product of a matrix of concentration profiles (C) and a matrix of spectra (S) that will best fit the experimental data. The matrix with the spectra of the pure components in its columns is represented by S. The residual error is represented by E. SIMPLISMA estimates C with pure variables, which are defined as

pij ) wij ×

σj µj + R

(6)

(1)

the mobility that is determined from the drift velocity (Vd) attained by ions in a weak electric filed (E) of the drift region at atmospheric pressure. Usually, electric fields of 200 V/cm and mobilities of 1-2 cm2 V-1 s-1 result in ion speeds of 200-1000 cm/s or drift times of 5-20 ms in 4-20-cm drift regions at ambient pressure.12 The coefficient K(T) is a function of temperature and can be measured at a given temperature from physical measurements as given in eq 2,

K(T) )

With a potential difference V across a drift region of length L, an ion with an arrival time td has the mobility of K(T). However, K(T) is usually converted to a reduced mobility, or K0.

for which pij is the purity spectrum j of variable i. The wij is the weight that characterizes the linear independence of the jth candidate variable with respect to the previously selected pure i - 1 variables. The weight is a determinant-based function that corrects for correlation with previously selected pure variables. The details of this algorithm are described elsewhere.29,31 Mean µj and standard deviation σj are obtained for the jth candidate variable. A constant value R dampens the influence of columns with no signal content. The intensities for the variable with the highest purity are used for the initial estimate for the first concentration profile c1. The procedure continues until a matrix

(2) (31) Windig, W. Chemom. Intell. Lab. Syst. 1997, 36, 3-16.

Analytical Chemistry, Vol. 77, No. 8, April 15, 2005

2577

of concentration profiles C is obtained. Then S is calculated by

S ) DTC(CTC)-1

(7)

The spectra are normalized to unit vector length. Then the concentration matrix is calculated from the normalized S. See eq 8.

C ) DS(STS)-1

(8)

intensities across the spectra. When the signal is greater than twice the measured noise, the component is retained as significant. ALS. ALS has been widely used in MCR of chemical data.15,16 An initial estimation of concentration profile (C) is obtained from the data matrix by using SIMPLISMA. An iterative method is used for which the convergence criterion is the sum of the squared residual error obtained from the difference between the data matrix and its reconstruction. The iteration comprises two steps. The estimation of spectra is obtained by least squares

ST ) C+D + E for which C comprises the concentration profiles from the normalized spectra. The data set can be reconstructed by

D ˆ ) CST

(9)

If the proper number of components has been determined and the data follow a linear model, the D ˆ and D should only differ by the indeterminate error. ASIMPLISMA. For ALS to be used as a compression method, an automated approach to determining the number of components is required. For compression, overestimating the number of components is preferred to avoid losing important signals. SIMPLISMA was modified to automatically determine the correct number of components and evaluated with six sets of IMS data. ASIMPLISMA was modified to determine the number of components above a noise threshold. The procedure uses a region of the IMS spectrum that is usually free of signal. This region may be at drift times early or late into the ion mobility measurement. The same region (e.g., 1-4 ms) for measuring the noise is often used to estimate and correct for baseline region of the ion mobility spectrum. The noise is the root mean square of the baseline-corrected intensities for all spectra in the signal free drift time region. The noise is calculated by

x

m1

noise )

m1n1

(10)

m

∑c

2 ij

i)1

mn

(11)

The m and n are the time and drift time dimension for the original data set. The signal is divided by the number of drift time measurements n, because the concentration profiles are integrated 2578

Analytical Chemistry, Vol. 77, No. 8, April 15, 2005

(13)

for which (ST)+ is the pseudoinverse of the transpose of the pure spectra matrix S and C is the current estimation of the concentration profiles. This step furnishes two estimates for C using the procedures described in step 1. The pair (S and C) of estimates that furnish the lowest residual error are used for the subsequent iteration. The iteration above is repeated until the relative residual error (Err) converges. The relative residual error is obtained as

Err )

2 ij

i)1 j)1

x

C ) D(ST)+ + E

x

m

n

∑∑

for which dij is a baseline-corrected element in a region devoid of signal. The m1 and n1 are the dimensions of the matrix to calculate the noise. The value for noise is used as the damping threshold R in ASIMPLISMA. After each component is calculated in ASIMPLISMA as ci, the signal value is calculated as

sigj )

for which C+ is the pseudoinverse of matrix C and ST is the transpose of the current estimate of the spectra. In our algorithm, two estimates are obtained. The first uses a classical least-squares solution, and then all the values in the spectra that are below zero are set to zero. In the second calculation, the spectra are obtained by a nonnegativity constrained least-squares fit that is implemented through the MATLAB function lsqnonneg. The spectra are normalized to unit length. For the second step, a new estimation of the concentration profile is obtained by least squares

n1

∑∑d

(12)

nc

(dij -

i)1 j)1

∑c

ik

k)1

m

n

∑∑

× sjk)2 (14)

dij2

i)1 j)1

for which dij is a component of the data matrix, which is m × n. The element cik is an element of C and sjk is an element of S. The total number of components is represented as nc. Two-Way-NLWC-ALS. ASIMPLISMA is applied to the original data to determinate the number of components in the data set before the implementation of 2W-NLWC-ALS algorithm. Figure 1 gives the schematic diagram of the implementation 2W-NLWCALS algorithm. The NLWC algorithm has been described elsewhere.13 There are seven steps involved in this algorithm. (1) A wavelet transformation is applied to the rows of the data set D (m × n) to furnish wavelet coefficients Wc (m × n). (2) A wavelet transformation is applied to the columns of the resulting matrix Wc (m × n) obtained from step 1. The data matrix is then transformed to a 2W-wavelet coefficient matrix Wc × Wc. (3) In this step, all the coefficients in the matrix are sorted from highest to lowest magnitude. This sorted array is used to

Figure 1. Schematic diagram of the implementation principle of the 2W-NLWC-ALS algorithm.

define a threshold value. The number of coefficients (ncoeff) that will be saved from the original data set is decided by the product of the expected compression ratio and number of points (m × n) from the original data set. The threshold value is defined as the coefficient value in the list with index number equal to ncoeff. Values in the coefficient matrix that are greater than the threshold value are saved as a vector Wcc, and the remainders are discarded. The positions of the saved coefficients are saved in the position control vector so that the data may be reconstructed. (4) Using the compressed coefficients vector Wcc and position control vector, a partially reconstructed wavelet coefficient matrix Dpr (mc × nc) is generated in reduced size. (5) The SIMPLISMA algorithm is applied to the partially reconstructed coefficient matrix Dpr (mc × nc). When the estimated total number of components is represented as nc, the coefficients matrix is decomposed into C0 (mc × nc) and S0 (nc × nc). The C0 (mc × nc) is used as the input of the initial value of concentration profiles to complete the calculation of ALS. Cc (mc × nc) and Sc (mc × nc) are outputs of ALS to represent the lowresolution concentration profiles and the spectra. Note that the Cc and Sc are saved as the compressed data. (6) The inverse WT is applied to the Cc (mc × nc) and S (nc × nc) to reconstruct the C (m × nc) and S (n × nc) as highresolution concentration profiles and spectra. All the negative value in C and S are set to zero. (7) The data set can be reconstructed from the products of either the high- or low-resolution concentration profiles and spectra as in eq 9. The combination values of mc and nc used in the algorithm are determined by the author through a series of evaluations of the ERMS. The optimal values of mc and nc should provide the best ERMS value after the reconstruction and reasonable compression ratio.

Evaluation Reconstruction Efficiency and Efficacy. The compression efficiency is evaluated with the compression ratio (CR).

CR )

mc × nc + nc × nc × 100% m×n

(15)

The CR is a percentage of the total number of elements in the low-resolution concentration profiles Cc and spectra Sc. The matrix Cc and Sc were obtained in step 5 in Figure 1. To study the efficacy of the reconstruction errors associated with the data compression, the root-mean-square error (ERMS) of the reconstructed data set with respect to the original data set was used as defined in eq 16

x

m

n

∑ ∑ (d

ERMS )

ij

- cisjT)2

i)1 j)1

mn

(16)

for which dij is the ith row and jth column unit in the experimental data matrix, which is m × n dimension. The ci is row i in C, and sj is row j in S. The number of spectra in each data set is m, and the number of points in each spectrum is n. The relative root-mean-square error (RERMS) is also used as a ratio of the ERMS value to the maximum intensity (Imax) in the data set to evaluate the efficacy of the reconstruction.

RERMS ) ERMS/Imax

(17)

Another error metric was used for spectra that have high signal-to-noise ratios. Because NLWC denoises spectra during compression, part of the reconstruction error can be attributed Analytical Chemistry, Vol. 77, No. 8, April 15, 2005

2579

to noise removal. The spectra obtained from ALS models were compared between the uncompressed and compressed data sets, because the magnitude of noise in these spectra was attenuated. The relative root-mean-square error between the ALS spectra (SERMS) from uncompressed and reconstructed data sets is defined as

x

m

n

∑ ∑ (s

SERMS )

ij

- ˆsij)2

i)1 j)1 m

(18)

n

∑ ∑s

2

ij

i)1 j)1

for which n is the number of points in each spectrum and m is the number of spectra extracted from the data set after ALS is applied to it. The value of the jth point in the ith spectrum in the spectra extracted from the uncompressed data set is sij, and ˆsij is value of the jth point in the ith spectrum in the high-resolution spectra that are obtained by applying 2W-NLWC-ALS to the original data set. When the concentration profile is relatively smooth, the reconstruction error of the concentration profiles can be evaluated by the relative root-mean-square error between the ALS concentration profiles (CERMS) from the compressed and uncompressed data by

x

n

m

∑ ∑(c

CERMS )

ij

- cˆij)2

i)1 j)1 n

(19)

m

∑ ∑c

2 ij

i)1 j)1

for which m is the number of points in each concentration profile ci, and n is the number of components in the ALS model. The value of the jth point in the ith spectrum in the concentration profile extracted from the uncompressed data set is cij, and cˆij is value of the jth point in the ith spectrum in the high-resolution concentration profiles that is obtained by applying 2W-NLWCALS to the original data set. EXPERIMENTAL SECTION Two different ion mobility spectrometers were used to collect two types of data sets that have different noise levels. The first instrument was a handheld chemical agent monitor (type 482301N, Graseby Ionics, Watford, Herts, U.K.). This instrument was operated in positive ion mode. The acetone dopant was removed from the instrument, so that ionization was based on water chemistry. The instrument was interfaced with a single processor PII 200 MHz/64MB RAM computer through a data acquisition board, type AT-MIO-16X (National Instruments, TX). The operating system was Microsoft Windows 98s Edition (Redmond, WA). Three chemical warfare agent simulants were used to collect CAM data sets. The simulants were dimethyl methylphosphonate (DMMP; 97%, Lancaster Synthesis Inc., Windham, NH, Lot No. P07087 212052-3), triethyl phosphate (TEPO; 99.8%, Aldrich Chemical Co. 2580

Analytical Chemistry, Vol. 77, No. 8, April 15, 2005

Inc., Milwaukee, WI, Lot No. 12502MI), and dipropylene glycol methyl ether (DPM; 99+%, Aldrich Chemical Co. Inc. Lot No. 10613TA). The data sets were acquired with a home-built VI (National Instruments) program. Three data sets were collected from this instrument. Each data set contained TEPO, DMMP, and DPM. The IMS waveform was sampled at 80 kHz, and a single spectrum comprised 1500 points. For each data set, ∼2000 spectra were saved. The second instrument was an ion trap mobility spectrometer (ITMS), ITEMISER contraband detection and identification system (Ion Track Instruments, Inc., Wilmington, MA). Positive mode was used with an ammonia reagent ion. The ITMS system was interfaced to a laptop computer (pentium III 850 MHz and 384 MB memory) via a card (type DAQCard-AI-16XE-50). The operating system is Microsoft Windows XP Pro with Service Pack 1. The IMS waveform was sampled at 80 kHz. DMMP and DPM were sampled for each data set. Three data sets were used in this paper for data compression and analysis. A single spectrum comprised exactly 1500 points. For each data set, approximately 15 000 spectra were collected. For both instruments, homemade virtual instrument (VI) programs were implemented in LabVIEW 6.1 (National Instruments) to acquire data from the IMS instruments. Samples were not diluted before data collection. Each data set was collected by placing a 5-mL open vial that contained 2 mL of pure sample at a distance of 1 cm from the inlet of the CAM for 1-3 s, and the next sample was not introduced until the instrumental response returned to the blank signal (i.e., baseline response). The data in the drift time range of 0-0.5 ms was omitted to eliminate the gating pulse from the evaluations. The data processing was implemented on PC with an AMD XP 2200+ processor and 512 MB RAM. The operating system was Window XP Professional (service pack 1). MATLAB 6.5 R13 (The MathWorks, Inc, Natick, MA) was used to perform computational evaluations and modeling of the wavelets. The binary data collected from the VI were read by MATLAB and represented in matrix format with each row comprising an IMS spectrum. The 2W-NLWC-ALS algorithm and reconstruction routine was written in MATLAB. The WaveLab 802 toolbox was used for wavelet applications in this paper.32 Twenty-two wavelet filters were investigated from 3 different families of wavelets, 10 from the daublet families, 5 from the coiflet families, and 7 from the symmlet families. The data sets were reduced to speed up the exhaustive evaluation. The data sets with reduced data size, which will be referred as smaller data sets, are used to find number of components in each data set and evaluate the effect of wavelet filter type during the compression. The data sets from CAM were decimated to smaller size by saving every 10th spectrum in the data acquisition dimension, and every 10th point in the drift time dimension. The data sets collected from ITEMISER were decimated to smaller size by saving every 50th spectrum in the data acquisition dimension and every 10th point in drift time dimension. The wavelet filters were optimized by using a grid search. The reduced ITEMISER data sets were also used for reconstruction of the original data sets. The original size CAM data (32) Donoho, D.; Duncan, M. R.; Huo, X. WAVELAB 802 for Matlab5.x. http:// www.stat.stanford.edu/∼wavelab/ (accessed Sep 2003).

Figure 2. (a) TEPO-DMMP-DPM data set comprising 2073 spectra with 1500 points/spectrum collected in positive mode. The gating pulse has been removed. (b) Reconstructed spectra of TEPO-DMMP-DPM data sets using the 2W-NLWC-ALS algorithm at a compression ratio of 640 ppm. Wavelet filter of daublet 8 was used at both the drift time order and the data acquisition order. The compressed wavelet coefficients retained 100 points for the time order and 300 points in spectra dimension. Table 1. Data Set Collection Information for the Three CAM Data Sets and Three ITEMISER Data Sets

data set

data set dimension (scan number × drift time)

CAM 1 CAM 2 CAM 3 ITEMISER 1 ITEMISER 2 ITEMISER 3

2055 × 1500 2073 × 1500 2111 × 1500 15006 × 1500 15010 × 1500 15015 × 1500

small data set dimension (scan number × drift time) 205 × 1024 207 × 1024 211 × 1024 300 × 1024 300 × 1024 300 × 1024

collection time (s) 139.34 142.20 143.41 746.23 746.40 746.58

sets were used for reconstruction. Table 1 gives all the dimension information of CAM and ITEMISER for both original and smaller data sets. After the wavelet filters were selected by using the small data sets, the full-size data sets of both CAM and ITEMISER were used for compression and reconstruction evaluation using the optimal wavelet filters.

RESULTS AND DISCUSSION Two ion mobility spectrometers, the CAM and ITEMISER were used to collect two types of data sets with different noise levels. Three chemical weapon simulants (TEPO, DMMP, DPM) were used for the data sets collected from CAM. To calculate the reduced mobility for each of the compounds, the DMMP dimer peak was used. This peak has a relatively constant reduced mobility value under different temperature environments.33 For the data sets collected by CAM, because the drift tube was at room temperature, the room-temperature value of 1.43 cm2 V-1 s-1 was used as the reference mobility. For the data sets collected with the ITEMISER, the DMMP dimer reduced mobility of 1.40 cm2 V-1 s-1 was used because it corresponded with the drift tube temperature of 200 °C. The reduced mobilities for the ions from the other compounds were calculated using these reference values for high- and low-temperature DMMP dimer reduced mobilities. Figure 2a gives a surface plot on data set of TEPO, DMMP, and DPM comprising 2073 spectra with 1500 points per spectrum (33) Ewing, R. G.; Eiceman, G. A.; Harden, C. S.; Stone, J. A. Int. J. Mass Spectrom., submitted for publication.

Analytical Chemistry, Vol. 77, No. 8, April 15, 2005

2581

Figure 3. (a) ALS spectra for TEPO-DMMP-DPM data set comprising 2073 spectra with 1500 points/spectrum collected in positive mode with the CAM. (b) ALS concentration profiles for TEPO-DMMP-DPM CAM data set. (c) ALS spectra for TEPO-DMMP-DPM CAM data set reconstructed using the 2W-NLWC-ALS algorithm at a compression ratio of 640 ppm. Wavelet filter of daublet 8 was used at both the drift time order and the spectral acquisition order. (d) ALS concentration profiles for reconstructed TEPO-DMMP-DPM CAM data set.

collected in positive mode with the CAM. This figure shows that TEPO was sampled by the CAM first, followed by DMMP, and then DPM. The RIP had a significant decrease in intensity when there was a new analyte introduced to the system. ALS was used to model the data set given in Figure 2a. The ASIMPLISMA algorithm was used to obtain the initial concentration profiles for ALS. The baseline and noise values are calculated using the drift time from 1 to 4 ms. Maximum number of components is set to 25. For the three CAM data sets, the calculated estimate number of components is 4, which includes RIP, TEPO, DMMP, and DPM. For the three ITEMISER data sets, the calculated estimate number of components is 3, which includes RIP, DMMP, and DPM. For both CAM and ITEMISER, three replicated data sets were used for each type of chemical warfare agent, and the estimate number of components is consistent. Figure 3a gives the spectra for the CAM data set. TEPO gave two peaks, a monomer at 7.96 ms (K0 of 1.56 cm2 V-1 s-1) and a dimer peak at 10.86 ms (K0 of 1.14 cm2 V-1 s-1). DMMP gave a monomer peak at 6.85 ms (K0 of 1.81 cm2 V-1 s-1) and a dimer peak at 8.68 ms (K0 of 1.43 cm2 V-1 s-1). DPM peaks were found at 7.69 ms (K0 of 1.61 cm2 V-1 s-1) for the monomer and 10.00 ms (K0 of 1.24 cm2 V-1 s-1) for the dimer. Figure 3b gives the concentration profiles for the TEPODMMP-DPM data set. The compounds sampled by the CAM were introduced as TEPO, DMMP, and DPM. Each compound was sampled once the instrument returned to baseline response from the previous compound. Note that, at a time of 12.69 min, there is a significant decrease in the RIP. This problem was caused by the vibration of the cable so that, at that time point, one zerosignal spectrum was obtained. This spurious signal was easily detected using ALS. 2582

Analytical Chemistry, Vol. 77, No. 8, April 15, 2005

The data sets collected from the ITEMISER compared to those of the CAM have lower noise. TEPO did not give any signal in the ITEMISER, because of the competitive charge suppression by the ammonia dopant. DMMP and DPM were used for the data sets collected from ITEMISER. Figure 4a gives the ITEMISER DPM-DMMP data set comprising 15 000 spectra with 1500 points per spectrum collected in positive mode. The reduced mobility values for both DPM and DMMP were greater than for the CAM because of the higher temperature drift tube in ITEMISER. DMMP was introduced to the system before DPM. ALS was applied to the original ITEMISER data set in Figure 4a. Figure 5a gives the spectra for this data set in a drift time window of 3-11 ms. The DPM monomer peak occurred at a drift time of 6.58 ms (K0 of 1.60 cm2 V-1 s-1). The DMMP monomer peak appeared at a drift time of 5.64 ms (K0 of 1.87 cm2 V-1 s-1) and a dimer at 7.52 ms (K0 of 1.40 cm2 V-1 s-1). Figure 5b gives the concentration profiles for the data set in Figure 4a. The smooth appearance of the integrated intensity for each of the compounds shows the lower noise levels of the ITEMISER instrument. From the concentration profiles, DMMP was introduced to the system before the introduction of DPM. The DPM concentration began to increase after 590 s. Two factor analysis of variance (ANOVA) was applied to both CAM and ITEMISER data sets to evaluate the change of error in reconstruction using different wavelet filters. For Table 4, 15 combinations of wavelet filters were randomly chosen from 22 wavelet filters for each dimension, and each wavelet combination was used for reconstruction. The ERMS values from 15 reconstructed data sets using 2W-NLWC-ALS with randomly selected wavelet filter combinations comprised the rows and 3 independent CAM measurements comprised the columns of a data set that

Figure 4. (a) DPM-DMMP data set comprising 15 006 spectra with 1500 points/spectrum collected in positive mode with the ITEMISER. The gating pulse has been removed. Smaller data set comprising 300 spectra with 1024 points/spectrum was used to plot this graph by extracting at an interval of 50 spectra from the original data set and cutting the ending points. (b) Reconstructed spectra of DMP-DMMP data set from 2W-NLWC-ALS algorithm at compression ratio of 62 ppm. Wavelet filter of daublet 14 was used at both the drift time dimension and the data acquisition dimension. The saved compressed spectra Sc is a matrix of 200 by 4. The saved compressed concentration profiles Cc is a matrix of 150 by 4.49.

Figure 5. (a) ALS spectra for DPM-DMMP data set comprising 15 006 spectra with 1500 points/spectrum collected in positive mode with the ITEMISER. The gating pulse has been removed. (b) ALS concentration profiles for DPM-DMMP data set. (c) ALS spectra for ITEMISER data set reconstructed using 2D-NLWC-ALS algorithm at compression ratio of 62 ppm. Wavelet filter of daublet 14 was used at both the drift time dimension and the data acquisition dimension. (d) ALS concentration profiles for reconstructed ITEMISER data set.

was subjected to ANOVA. ANOVA indicated that there were no significant differences among the reconstruction errors from the

CAM data sets and that the differences among wavelet filters were insignificant. Analytical Chemistry, Vol. 77, No. 8, April 15, 2005

2583

Table 2. Comparison of the Drift Time of Each Peak of All the Components in CAM and ITEMISER Data Set Using ALS Spectraa uncompr CAM data (ms) RIP TEPO monomer TEPO dimer DMMP monomer DMMP dimer DPM monomer DPM dimer

5.97 7.96 10.85 6.85 8.68 7.69 10.00

reconstr CAM data (ms) 5.97 7.95 10.85 6.84 8.68 7.69 9.98

uncompr ITEMISER data (ms)

reconstr ITEMISER data (ms)

4.33 n/ab n/a 5.64 7.52 6.58 n/a

4.33 n/a n/a 5.64 7.52 6.58 n/a

a The values reported are (drift time/maximum peak intensity). Uncompressed CAM data set used as shown in Figure 2a and uncompressed ITEMISER data set is as in Figure 4a. The reconstructed CAM data set is as Figure 2b, and Figure 4b shows the reconstructed ITEMISER data set. The CAM data set was reconstructed from a matrix of spectra with size 300 by 4 and a matrix of concentration profiles with size of 100 by 4. The CAM data set was reconstructed from a matrix of spectra with size 200 by 3 and a matrix of concentration profiles with size of 150 by 3. b n/a, not available.

Table 3. Comparison of the Reduced Mobility (K0) for All Peaks in the CAM and ITEMISER Data Sets Using ALS Spectraa uncompr reconstr uncompr reconstr CAM data CAM data ITEMISER ITEMISER 2 -1 -1 2 -1 -1 2 -1 -1 (cm V s ) (cm V s ) (cm V s ) (cm2 V-1 s-1) TEPO monomer TEPO dimer DMMP monomer DMMP dimer DPM monomer DPM dimer

1.56 1.14 1.81 1.43 (std) 1.61 1.24

1.56 1.14 1.81 1.43 (std) 1.61 1.24

n/ab n/a 1.87 1.40 (std) 1.60 n/a

n/a n/a 1.87 1.40 (std) 1.60 n/a

Table 4. ANOVA Tables for 2W-NLWC-ALS of Three CAM Data Sets with 15 Wavelet Filter Combinations for Each Data Seta source wavelet filtrs data sets error total

SS

Df

0.0003

2.13 ×

14

0.00009 0.0009 0.00129

F

Prob > F

10-5

0.66

0.7876

10-5

1.38

0.2676

MS

4.43 × 3.21 × 10-5

2 28 44

no.

row filter

column filter

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

symmlet 6 daublet 16 coiflet 2 coiflet 5 symmlet 5 coiflet 1 daublet 8 daublet 12 symmlet 7 symmlet 7 daublet 14 daublet 12 daublet 4 coiflet 3 coiflet 1

coiflet 2 coiflet 5 daublet 20 coiflet 4 symmlet 9 symmlet 7 symmlet 9 daublet 12 symmlet 4 daublet 10 coiflet 4 daublet 20 symmlet 9 daublet 18 daublet 16

a Each column is the E RMS value generated after reconstruction using the 2W2W-NLWC-ALS algorithm using a random filter combination applied to row compression and column compression. Each row is a different CAM data set.

Table 5. ANOVA Tables for 2W-NLWC-ALS of Three ITEMISER Data Sets with 15 Different Wavelet Filter Combinations for Each Data Seta source

SS

Df

MS

F

Prob > F

wavelet filters data sets error total

0.1624 0.00293 0.31317 0.4785

14 2 28 44

0.0116 0.00147 0.01118

1.04 0.13

0.4484 0.8776

a

Uncompressed CAM data set used as shown in Figure 2a and uncompressed ITEMISER data set is as in Figure 4a. The reconstructed CAM data set is as in Figure 2b, and Figure 4b shows the reconstructed ITEMISER data set. The K0 values were calculated from the drift times provided by Table 2. b n/a, not available.

ANOVA was applied to the data in Table 5 for reconstruction errors from the ITEMISER data. The results are similar in that no significant differences among the 3 ITEMISER data sets or the 15 wavelet filters are obtained. The reconstruction errors resulting from uniform compression of data sets with varying wavelets were compared to study the performance of different wavelets for compression efficacy. During this evaluation procedure, the reduced data sets described in Table 1 were used to decrease computation time. Twenty-two wavelet filters were applied to both row and column compression using all possible combinations, which resulted in 484 wavelet filters. Several compression levels for the drift time order were evaluated. As a result, 484 values of both ERMS and SERMS were obtained for each level of compression level. The minimum values were selected for different compression levels and the results are given in Figure 6 and Figure 7. Figure 1 gives the schematic of the 2W-NLWC-ALS algorithm. After applying the algorithm, the saved compressed data are the low-resolution concentration profiles Cc and the low-resolution spectra Sc in step 5. The number of points saved in Cc and Sc that efficiently reflect the original data depends on the number of wavelet coefficients (wcc) saved in step 3. To decide the number 2584 Analytical Chemistry, Vol. 77, No. 8, April 15, 2005

no.

row filter

column filter

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

daublet 6 symmlet 4 daublet 12 symmlet 7 symmlet 5 daublet 10 daublet 4 coiflet 4 symmlet 6 daublet 8 daublet 20 daublet 14 daublet 18 daublet 20 symmlet 8

daublet 20 symmlet 7 daublet 12 daublet 10 symmlet 8 daublet 12 daublet 4 daublet 10 daublet 8 symmlet 9 daublet 16 daublet 16 coiflet 3 symmlet 7 daublet 12

a Each column is the E RMS value generated after reconstruction using the 2W-NLWC-ALS algorithm using a random filter combination applied to row compression and column compression. Each row is a different ITEMISER data set.

of points that should be saved in Cc and Sc, the minimum ERMS values are plotted as a function of the number of compressed points per spectrum for three CAM data sets and three ITEMISER data sets. For the three CAM data sets, the ERMS increased significantly after compressing the spectra below 250 points. The

Figure 6. Minimum ERMS evaluation from all 484 wavelet filter combination applied to drift time order and spectral acquisition order using the 2W-NLWC-ALS method. Three data sets were from CAM on TEPO-DMMP-DPM, and three were from ITEMISER on DPMDMMP.

Figure 7. Minimum SERMS evaluation from all 484 wavelet filter combination applied to the drift time order and spectrum acquisition order dimension using the 2W-NLWC-ALS algorithm. Three data sets were collected from the CAM on TEPO-DMMP-DPM, and three were from the ITEMISER data sets on DPM-DMMP.

ALS spectra from this level of compression yielded small artifacts. When the compression was greater than 300 points per spectrum, no artifacts were observed in the ALS spectra; thus, the optimum number of points for each spectrum should be 300 for CAM data. For the CAM data sets, each spectrum was compressed to 300 points, the daublet 8 was the optimum wavelet filter for both row and column compression. For the ITEMISER data sets, 200 points/spectrum were found to be the optimum for compression and daublet 14 was the best filter for both row and column compression. The minimum SERMS plotted as a function of number of compressed points per spectrum in Figure 7 further proved the optimum compression schema as found in Figure 6 using the ALS spectra of the reconstruction. Figure 2b gives the reconstructed spectra of TEPO-DMMPDPM from the CAM after application of the 2W-NLWC-ALS algorithm. The daublet 8 filter was applied to both row and column wavelet compression. The original size of the data set was 2073

by 1500. The low-resolution spectra Sc saved is a matrix of 4 by 300. The four columns are the ALS spectra for RIP, TEPO, DMMP, and DPM, respectively. Each spectrum with 1500 points in the original data set was compressed to 300 points. The lowresolution concentration Cc is a matrix of 4 by 100. The original 2073 spectra in the data acquisition time dimension were compressed to 100 spectra. The compression ratio was 510 ppm. The ERMS is 0.0063, and the RERMS is 1.62%. Figure 3c gives the spectra obtained from the reconstructed spectra in Figure 2b using ALS. No visible ringing effects can be observed for any of the peaks. The drift time of DMMP dimer was at 7.52 ms, the reduced mobility of 1.43 cm2 V-1 s-1 was used as the standard to calculate the other K0 values of the other compounds. The drift times of RIP (5.97 ms), TEPO dimer (10.85 ms), DMMP dimer (8.68 ms), and DPM monomer (7.69 ms) were kept at the same value after the reconstruction. The drift time of TEPO monomer and DMMP monomer both were shifted for 0.01 ms. However, this insignificant drift time shift in the reconstruction is so small that all the compounds remained at the same reduced mobility after reconstruction. One of the benefits of NLWC is that it has the ability to denoise the spectra. Note that, in Figure 3b, the concentration profiles from the original data set are very noisy. At the time of 12 s, there is a significant decrease in RIP concentration. This problem was caused by the vibration of the cable so that, at that time point, zero-signal spectrum was obtained. However, the corresponding concentration profiles for the reconstructed data set in Figure 3d had this artifact removed, which demonstrated the denoising feature of NLWC. The concentration profiles in Figure 3d were also smoother than those of Figure 3b. Figure 4b gives the reconstructed spectra of DMMP-DPM from ITEMISER after applying the 2W-NLWC-ALS algorithm. The daublet 14 filter was applied to both row and column for wavelet compression. The original size of the data set is 15 000 by 1500. The saved low-resolution spectra Sc is a matrix of 3 by 200. The three columns are the ALS spectra for RIP, DMMP, and DPM, respectively. Each spectrum with 1500 points in the original data set was compressed to 200 points. The low-resolution concentration Cc is a matrix of 150 by 3. The original 15 000 spectra collected in the data acquisition time dimension were compressed to 100 spectra. The compression ratio is 46 ppm. The ERMS for the reconstruction of this data set is 0.0092, and the RERMS is 0.42%. To evaluate the efficacy of the reconstructed data set, ALS was applied. Figure 5c gives the spectra of the reconstructed data set in Figure 4b. There is no peak shifting occurring after the reconstruction. The drift time of DMMP dimer was at 7.52 ms, the reduced mobility of 1.40 cm2 V-1 s-1 was used as the reference value to calculate the other reduced mobilities of the other compounds. The drift times of DMMP monomer and DPM monomer maintained the same values after reconstruction. There were insignificant peak intensity attenuations less than 0.09% for the reconstruction. All characteristic ions retained the same reduced mobility after reconstruction with respect to the uncompressed spectra. The comparison of the drift time, maximum peak intensity of the ALS result for the original data set and the reconstruction data set can be found in Table 2. The comparison of the reduced mobility values between the original data set and the reconstructed data set can be found in Table 3. Analytical Chemistry, Vol. 77, No. 8, April 15, 2005

2585

The ALS concentration profiles obtained from the reconstructed ITEMISER data set were plotted as Figure 5c. Compared to Figure 5a, the ALS concentration curves calculated from the reconstructed data set are smoother (less noise level) than the one from the original data set. The CERMS is calculated as 0.0088. Table 2 gives the comparison of the drift time for all the components in CAM and ITEMISER data set using ALS spectra. The reconstructed data sets were the product of the reconstructed high-resolution spectra and concentration profiles. For the CAM data sets, the peak positions of RIP, TEPO dimer, DMMP dimer, and DPM monomer were the same between the compressed and original data. The peak positions of the TEPO monomer and the DMMP monomer were decreased by 0.01 ms. For the ITEMISER data set, there is no shift in drift time after reconstruction. Reduced mobility is used instead of drift time because it is independent of many experimental factors. The reduced mobilities were calculated based on standards. Table 3 gives the reduced mobility for both the uncompressed spectra and the reconstructed spectra using the 2W-NLWC-ALS algorithm. The DMMP dimer reduced mobility value of 1.43 cm2 V-1 s-1 at room temperature was used as standard for CAM data sets. At high temperatures around 200 °C, the value of 1.40 cm2 V-1 s-1 was used for the DMMP dimer as a standard to calibrate the other compounds for the ITEMISER data sets. CONCLUSIONS A two-way nonlinear wavelet compression method combined with ALS (2W-NLWC-ALS) algorithm was developed and applied to IMS data. ALS served in this algorithm not only as a modeling

2586

Analytical Chemistry, Vol. 77, No. 8, April 15, 2005

method, but also a compression method. The smaller size ALS model is saved as the compressed data, and higher compression ratios are obtained along with the advantage of modeling the original data set. A compression ratio of 510 ppm, root-meansquare error (ERMS) of 6.3 mV, and relative root-mean-square error (RERMS) of 1.62% were achieved for the data sets collected by CAM. A compression ratio of 46 ppm, root-mean-square error (ERMS) of 9.2 mV, and relative root-mean-square error (RERMS) of 0.42% were achieved for data sets collected by ITEMISER. NLWC avoids peak distortion and eliminates artifacts caused by linear wavelet compression while achieving similar or improved results. The ALS models of the reconstructed spectra and the original spectra represent satisfactory spectra and concentration profiles with indistinguishable differences. The reduced mobilities for the characteristic ions from each compound in both instruments were unaltered by the compression and reconstruction. ACKNOWLEDGMENT The U.S. Army ECBC is thanked for funding this project and donating the CAMs. The Research Corporation is thanked for the Research Opportunity Award. GE Ion Track is thanked for the donation and support of the Itemizer. Mariela Ochoa, Preshious Rearden, Ping Chen, Matt Rainsberg, and George Bota are thanked for their helpful comments and suggestions.

Received for review September 16, 2004. Accepted February 4, 2005. AC0486286