Smoothing, Denoising, and Data Set Compression - American

Anal. Chem. 1997, 69, 78-90

Application of Wavelet Transforms to Experimental Spectra: Smoothing, Denoising, and Data Set Compression V. J. Barclay* and R. F. Bonner

PE Sciex, 71 Four Valley Drive, Concord, Ontario, Canada L4K 4V8 I. P. Hamilton

Department of Chemistry, Wilfrid Laurier University, Waterloo, Ontario, Canada N2L 3C5

Various methods have been proposed for smoothing and denoising data sets, but a distinction is seldom made between the two procedures. Here, we distinguish between them in the signal domain and its transformed domain. Smoothing removes components (of the transformed signal) occurring in the high end of the transformed domain regardless of amplitude. Denoising removes small-amplitude components occurring in the transformed domain regardless of position. Methods for smoothing and denoising are presented which depend on the recently developed discrete wavelet (DW) transform technique. The DW smoothing and denoising methods are filtering techniques that are applied to the transformed data set, prior to back-transforming it to the signal domain. These DW techniques are compared with other familiar methods of smoothing (Savitzky-Golay) and denoising through filtering (Fourier transform). Preparatory to improving experimental data sets, synthetic data sets, comprised of ideal functions to which a known amount of noise has been added, are examined. The filter cutoffs are systematically explored, and DW techniques are shown to be highly successful for data sets with great dynamic range. In the minority of cases, smoothing and denoising are nearly interchangeable. It is shown that DW smoothing compresses with the most predictable results, whereas DW denoising compresses with minimal distortion of the signal. Experimental spectra are often complicated by noise, which may be due to interfering physical or chemical processes, imperfections in the experimental apparatus, or any of a number of causes which result in random, spurious fluctuations of the signal received at the detector. By definition, noise is instantaneously irreproducible although often it may be characterizable on average if it obeys Gaussian or Poisson distribution statistics, for example. In the present work, an interfering signal of another compound is not regarded as noise, nor are drift, baseline, signal pickup, or “cyclic noise” such as that used in a broader definition by Erickson et al.1 included. (1) Erickson, C. L.; Lysaght, M. J.; Callis, J. B. Anal. Chem. 1992, 64, 1155A1163A.

78 Analytical Chemistry, Vol. 69, No. 1, January 1, 1997

Although much recent work has focused on the design of digital filters for smoothing and denoising,2 a clear distinction between smoothing and denoising is seldom made. For simplicity, the signal domain and its transformed domain are described in terms of the time/frequency paradigm; later it is shown that the model is applicable to signals recorded in dimensions other than time. It is also assumed for simplicity that the time step, ∆t, is held constant, which reflects most common analytical situations. An observed intensity I(t) defined in the time domain is the sum of the true signal s(t) and noise n(t). The true signal varies over time, ∆s(t)/∆t, and the noise causes a random spurious fluctuation, ∆n(t)/∆t, at each time step. On average, the noise causes the observed intensity to fluctuate from point to point more than the true signal, and the overall quantity ∑|∆I(t)/∆t| is larger than for the true signal alone. The effect of noise can be mitigated by minimizing these spurious fluctuations in the observed signal, i.e., by smoothing. Smoothing is here defined as an activity that operates on the observed signal [s(t) + n(t)] to minimize fluctuations, ∑|∆I(t)/ ∆t|. Without knowing the true signal s(t), yet assuming that its time variation ∑|∆s(t)/∆t| is smaller than the fluctuations due to noise, smoothing attempts to approximate the true signal by reducing the fluctuation in the observed signal. In contrast to smoothing, denoising is here defined as an activity that has as its explicit goal the removal of n(t), regardless of its effect on |∆I(t)/∆t|. In fact, the removal of n(t) might lead to an increase in the pointwise “jumps” of s(t), and thus an increase in |∆I(t)/∆t| for some values of t, but in general, the overall quantity ∑|∆I(t)/∆t| will decrease as it approaches the true value ∑|∆s(t)/∆t|. In the context of filtering a transformed signal, the distinction between smoothing and denoising becomes clearer. Whereas noise is distributed across the frequency domain, the signal is localized to parts of the frequency domain. Smoothing removes high-frequency components of the transformed signal regardless of amplitude, whereas denoising removes small-amplitude components of the transformed signal regardless of frequency. This is a strong description of denoising since it includes all frequencies, and most cases of so-called denoising are in fact partial denoising, since they can only remove noise from a certain frequency region. For the data sets of interest here, |sav(t)| g |nav(t)|, and in practice, (2) See, for example: Bialkowski, S. E. Anal. Chem. 1988, 60, 355A-361A, 403A-413A. S0003-2700(96)00638-5 CCC: $14.00

© 1996 American Chemical Society

smoothing and denoising may give similar results, if noise is insignificant at low frequencies, where the signal typically has high amplitude. (If the low-frequency noise is significant, such as in the case of baseline, a different strategy is required.3) Smoothing cannot properly claim to remove the noise, however, only to reduce the signal “choppiness” induced by the presence of noise. A distinction can be drawn between two types of filter. A “direct filter” acts on the transformed signal in the frequency domain. The observed signal is transformed from the time to the frequency domain by an orthogonal transform [such as the Fourier transform (FT)], filtered, and transformed back by the inverse transform. The filter removes the high-frequency or lowamplitude components, as discussed below. An “indirect filter” smooths or denoises the observed signal in the time domain; no transformation to the frequency domain is required. Numerous methods exist for this type of filtering. Windowed smoothing algorithms such as Savitzky-Golay4 or Kalman averaging1 are often employed. The median filter technique originally described by Tukey5 has recently been shown to work well for broad features with narrow noise spikes.6 If the peak shape function is well-known, a nonlinear least-squares multivariate fitting to a functional form such as a progression of localized Gaussians7 can be made. In the following, smoothing and denoising algorithms using both direct and indirect filter methods are compared. A special focus is on smoothing and denoising algorithms based on the discrete wavelet (DW) transform. The application of wavelet transforms to the problems of noisy and incomplete data have been put on a sound statistical basis by Donoho and co-workers.8 A brief introduction to wavelet basis functions and a worked example involving a simple data set are given in Appendix A. The decomposition of a data set using wavelet transform techniques has recently been discussed in highly readable form in this journal.9 For a general introduction to wavelet decomposition methods, the reader is referred to several recent monographs.10-13 The DW smoothing and denoising algorithms are compared to two representative existing methods, and the comparison extends over synthetic and experimental data sets. The DW smoothing and denoising algorithms lead to a representation of the signal that uses fewer basis functions than the raw signal, which translates to greater efficiency in storage and transmission. The extent of compression is also discussed in this report. Specific results are reported from (i) a Savitzky-Golay smoothing algorithm, (ii) a Fourier transform filtering procedure, (iii) wavelet smoothing, and (iv) wavelet denoising. These four (3) Barclay, V. J.; Bonner, R. F.; Hamilton, I. P.; Thibault, P., in preparation. (4) Savitzky, A.; Golay, M. J. E. Anal. Chem. 1964, 36, 1627-1639. (5) Tukey, J. W. Exploratory data analysis; Addison-Wesley: Reading, MA, 1977. (6) Stone, D. C. Can. J. Chem. 1995, 73, 1573-1581. (7) Ivaldi, J. C.; Tracy, D.; Barnard, T. W.; Slavin, W. Spectrochim. Acta 1992, 47B, 1361-1371. (8) Donoho, D. L.; Johnstone, I. M. Biometrika 1994, 81, 425-455. Donoho, D. L.; Johnstone, I. M. Compt. Rend. Acad. Sci. Paris A 1994, 319, 13171322. Donoho, D. L.; Johnstone, I. M.; Kerkyacharian, G.; Picard, D. Journ. R. Stat. Soc. Ser B 1995, 57, 301-369. Donoho, D. L.; Buckheit, J. In Wavelets in Statistics; Antoniadis, A., Oppenheim, G., Eds.; SpringerVerlag: New York, 1995. (9) Walczak, B.; van den Bogaert, B.; Massart, D. L. Anal. Chem. 1996, 68, 1742-1747. Note their convention for numbering of details is opposite to that employed here. (10) Daubechies, I. Ten Lectures on Wavelets; SIAM: Philadelphia, PA, 1992. (11) Meyer, Y. Wavelets: Algorithms and Applications; SIAM: Philadelphia, PA, 1993. (12) Chui, C. K. An Introduction to Wavelets; Academic Press: New York, 1992. (13) Kaiser, G. A Friendly Guide to Wavelets; Birkha¨user: Boston, MA, 1994.

Figure 1. Three data sets used for denoising and smoothing comparisons: (a) Simple Gaussian with and without 5% random noise added. (b) Asymmetric triangular peaks with varying widths and amplitudes, with and without 3% random noise. (c) Electrospray spectrum of thyroid extract.

methods are applied to synthetic data in the time domain and real data, consisting of mass spectra obtained using electrospray techniques in the mass-to-charge domain, and a liquid chromatogram. Reference 8 describes more sophisticated filtering techniques applied to other types of data. METHODS The Savitzky-Golay (SG) smoothing algorithm is an example of a moving average algorithm for which the polynomial coefficients have been chosen to give the best least-squares fit to the points within a window.4,14 It is an indirect filter, because it is carried out in the time domain, rather than the frequency domain. It is related to a broad range of other digital smoothing polynomials,15 but the SG form appears to be the most widely used in analytical chemistry. In the present work, the observed signal such as that shown in Figure 1a is processed within a moving window of two different sizes (n ) 7 or 25) by a polynomial of quadratic or quartic order (m ) 2 or 4) to give results such as those shown in Figure 2a and b. The specific SG algorithm used was made available through commercial data analysis software.16 Panels c and d show difference functions for the processed data and will be discussed in more detail below. The Fourier filtering algorithm used is outlined pictorially in Figure 3. The signal in Figure 1a is Fourier-transformed, and the logarithm of its power spectrum is calculated as shown in Figure 3a. Simple functions may be fitted to portions of the power spectrum: log{|S(f )|2} for the signal region (low frequency, in this example) and log{|N(f )|2} for the noise region. In Figure 3a, two such functions are shown for S(f ): a linear (“typical”) (14) Bromba, M. U. A.; Ziegler, H. Anal. Chem. 1981, 53, 1583-1586. (15) Ziegler, H. Appl. Spectrosc. 1981, 35, 88-92. (16) Igor Pro 2.0, Wavemetrics, Oswego, OR, 1994.

Analytical Chemistry, Vol. 69, No. 1, January 1, 1997

79

Figure 2. Results from Savitzky-Golay smoothing technique applied to Figure 1a data set: (a) SG-smoothed signals of polynomial order m ) 2. (b) Same as (a), except that m ) 4. (c) Difference functions for panel a, with added noise on bottom. (d) Difference functions for panel b, with added noise on bottom.

choice and a carefully fitted Gaussian (“best case”). The latter was chosen by fitting to the noiseless spectrum in order to get the best possible representation of the transformed signal. An optimal (Wiener) filter function that gives greater weight to lowfrequency components, and smoothly goes to zero at high frequency, is constructed; the sigmoidal filter function F(f ) ) S(f )/[S(f ) + N(f )] is shown for the two choices of S(f ) in Figure 3b. Other filter designs such as the Parzen filter are possible, but a comparison of their properties is beyond the scope of the present work; refs 17 and 18 discuss filter design extensively. The Fourier-transformed data are multiplied by F(f ) and backtransformed to give a signal that has high-frequency components removed as shown in panel c. The form of the FT algorithm used was coded in C, based in part on the algorithm in ref 19; an improved variant,20 which does not require a data vector of size 2 J, is also available through commercial data analysis software. The wavelet transform is analogous to the Fourier transform. Both are basis transformations, i.e., expressions of the signal in terms of basis functions other than the original coordinates. The Fourier basis set consists of sine and cosine functions; the wavelet basis consists of small “waves” of the type depicted in Appendix A. Both transforms are invertible, and both sets of basis functions are orthogonal. However, the Fourier basis functions are com(17) Haykin, S. Modern Filters; MacMillan: New York, 1989. (18) Hamming, R. W. Digital Filters, 3rd ed.; Prentice-Hall: Englewood Cliffs, NJ, 1989. (19) Press, W. H.; Teukolsky, S. A.; Vetterling, W. T.; Flannery, B. P. Numerical Recipes in C: The Art of Scientific Computing, 2nd ed.; Cambridge University Press: New York, 1992. (20) Programs for Digital Signal Processing; IEEE Press: Piscataway, NJ, 1979. (See section 1.4 on Mixed Radix Fast Fourier Transforms.)


Figure 3. Fourier transform noise filtering: (a) Logarithm of power spectrum, created by squaring the real and imaginary components of the transformed vector. The lines representing signal and noise contributions for a linear approximation to signal (offset, solid line) and the Gaussian approximation to signal (dashed line) are also shown. (b) Wiener filter function created from linear and Gaussian approximations in (a). (c) FT-filtered Gaussian functions. (d) Difference functions for the raw noisy data (bottom), the linear filtered data (offset of 0.2), and the Gaussian filtered data (offset of 0.4).

pletely localized in the frequency domain and have infinite extent (as sines and cosines) in the time domain, whereas the wavelet basis functions are dually localized (i.e., have a finite extent) in both the time and frequency domains. The transforms are usually expressed in fast, linear, numerical forms as the fast Fourier transform and the discrete wavelet transform. The dual localization of wavelets means that large classes of functions and operators are sparse when transformed to the wavelet domain. This sparseness means that computationally fast matrix operations are possible,10 once a wavelet transform has been carried out, and significant compression can be achieved. The form of the DW algorithm used was coded in C, based in part on the published form in ref 19; it may also be available through commercial data analysis software.21 Whereas sine and cosine are the basis functions of the Fourier transform, there are an infinite number of “waves” which can be used in the wavelet transform. These basis functions will determine how compact,22 and how smooth, the dual localizations are. Typically, the more compact a wavelet basis function is, the less smooth it is. The orthogonal basis functions are obtained from a single function (the “mother function”) by shifts that act on the t-variable and dilations that act on the f-variable. Appendix A compares a linear spline, cubic spline, and Daubechies forms (21) See, for example: WavBox by C. Taswell or WaveLab by D. Donoho; both appear to utilize Matlab. WaveMetrics has introduced this operation to Igor Pro 3. (22) We distinguish “compact”, referring to the extent along the x-axis, from “compressed”, referring to the ability to describe a signal with fewer basis functions than it requires in the raw form.

of wavelet basis sets and shows a variety of coefficient choices for the latter. In this work, the Daubechies 12-coefficient wavelets are used.10 Appendix A shows selected elements from this basis set and illustrates the reconstitution of a signal from a linear combination of elements. The Daubechies form was found to be highly successful in terms of speed, compression, and dynamic response. To carry out a DW transform, first a square and sparse wavelet coefficient matrix of order N must be constructed; N must be a power of 2, 2 J, where J is an integer. The transform is then applied to the N-element data vector in a hierarchical fashion, i.e., first to N members of the vector, then to N/2, then to N/4, and so forth. At each stage, every other point is omitted. This forces the sum ∑|∆I(t)/∆t| to decrease, and hence each stage is smoother. As more interleaved points are omitted, the signal appears to become less well-resolved. The schematic algorithm and its resulting effect on the resolution of the spectra are shown in Figures 2 and 3 of ref 9. The output of the DW transform consists of all the “detail” components (i.e., difference vectors, where the difference calculated is that between successive stages) that were accumulated as the algorithm operated from the top down. The first detail is at the highest resolution and consists of 2 J-1 points; the second detail consists of 2 J-2 points and is at lower resolution, etc. As with the Fourier transform, the DW transform is somewhat affected by boundary conditions. The last few wavelet coefficients at each level of the hierarchical transform are affected by points from both ends of the data vector. In practice, end effects are not a problem because the data sets are padded with “neutral” points as described below to make their size 2 J. Discrete wavelet transforms are often used to reduce the size of the data set. There are two general methods for removing elements from the DW-transformed signal. Elements may be discarded based on either their position or their amplitude in the transformed vector; both approaches are compared in this paper. RESULTS AND DISCUSSION Comparison of the methods considered in this paper is carried out on four data sets, two synthetic in the time domain, one experimental in the m/z domain, and one that closely simulated the latter. Figure 1a shows the first synthetic data set, a single Gaussian peak with (bottom) and without (top) 5% random noise added. This peak was constructed to be similar to a well-studied test case of Savitzky and Golay.4 Figure 1b shows the second synthetic data set: a series of triangular peaks with (bottom trace) and without (top trace) 3% random noise added. The triangles are asymmetric with the left side having twice the slope of the right. This data set was constructed to incorporate a wide range of peak widths and two extremes in peak amplitude, features that may vary widely in experimental spectra. Figure 1c is the experimental data set: an electrospray mass spectrum of a thyroid extract which contains a considerable amount of noise. (The simulated data for Figure 1c will be introduced below.) In the following discussion, results are compared for different techniques and parameters for both synthetic cases by calculating “difference functions” which are equal to the processed noisy function minus the function without noise (top traces of Figure 1a and b). The root mean square (rms) deviation for each difference function is also calculated. For perfect denoising, the difference function is zero, and its rms deviation is zero. For perfect smoothing, the difference function is flat but not neces-

sarily zero, due to the difference in the two approaches noted in the introduction. Single Gaussian Peak. One of the simplest possible model data sets is the single noisy Gaussian in Figure 1a. The noise distribution is random but can lead to “apparent” features such as baseline “blips” at t ≈ -150 and 140. First, Savitzky-Golay smoothing4 of order m and size n is examined as shown in Figure 2. As discussed under Methods, the coefficients of an mth order polynomial are chosen to give a least-squares fit to data points in a moving window of width n. Here four pairs of m and n values are considered. Results for m ) 2 with n ) 7 and 25 are shown in Figure 2a while those for m ) 4 with n ) 7 and 25 are shown in Figure 2b. Their corresponding difference functions are shown in Figure 2c and d; results for n ) 7 are offset by 0.2; those for n ) 25 are offset by 0.4. In both figures, the bottom trace is the noise that had been added to the Gaussian. The rms of the noise is 0.048, and the rms deviations of the SG(m,n)-processed data are 0.029 (2, 7), 0.018 (2, 25), 0.037 (4, 7), and 0.019 (4, 25). There are anomalously large fluctuations at t ) -150 and 140. It may be seen that a lower-order polynomial and a larger window size result in a smoother function. The choice m ) 2 and n ) 25 is most successful at smoothing this signal with its rms of 0.018; the anomalous fluctuation at t ) -150 has been partially removed and the fluctuation at t ) 140 has been almost entirely removed due to the window-averaging. However, this choice of (m,n) significantly underestimates the height of the Gaussian peak, as seen by the large dip at t ) 0 in Figure 2c. Thus, m ) 4 and n ) 25, with only a slightly higher rms of 0.019, appears to be the better choice for smoothing this particular signal. Fourier transform filtering of the single noisy Gaussian is shown in Figure 3. As described under Methods, the power spectrum of the data set is calculated and the logarithm of the power spectrum is shown in Figure 3a. Also shown in Figure 3a are three lines, which have been chosen to model the logarithm of the power spectrum of (i) the signal part (steep line, which is offset for purposes of display, and dashed curve) and (ii) the noise part (horizontal line). The dashed curve was constructed as the “best possible case” of FT filtering from fitting the signal of the noiseless spectrum. The equation describing it is

S(f ) ) S(2πN/∆t) ) -15.7 + 12.8 {exp[-0.003N2] + exp[-0.003 (N - 512)2]} (1) From these, the smoothly varying filter functions shown in Figure 3b are constructed, with the dashed line corresponding to the “best case” signal model. The Fourier-transformed data set is then multiplied by the filter functions and back-transformed to obtain the filtered data sets shown in Figure 3c. The “best case” FT filtering does indeed improve the signal, but it may be seen that noise remains. The difference functions for Figure 3c are shown in the top traces of Figure 3d. The bottom trace is the noise which had been added to the Gaussian. It may be seen from the difference functions that there are no systematic errors in the reconstruction, showing that the division of the power spectrum into signal part and noise part is valid in this case. The largest deviation is for the anomalous fluctuation at t ) -150. The rms deviations of the difference functions are 0.022 and 0.020, respectively. From Figure 3b it may be seen that the filter function is essentially zero for most frequencies and that, for practical Analytical Chemistry, Vol. 69, No. 1, January 1, 1997

81

Figure 4. Discrete wavelet transform smoothing: (a) DW-transformed vector with span of details 1, 2, and 3 shown. (b) Filtered DWT vector, with 3 details removed (8:1 compression). (c) Backtransform of (b), to give DW-smoothed spectrum. Bottom trace is 4:1 compression, for which D1 and D2 were removed, and top trace is 8:1 compression (D1, D2, and D3 removed). (d) Difference functions for raw noisy data (bottom) and the smoothed data sets shown in (c) (offset by 0.3 and 0.6).

Figure 5. Discrete wavelet transform denoising: (a) DW-transformed vector with denoising limits of 9% shown. (b) Filtered DWT vector from (a). (c) DW-denoised spectrum. Bottom trace, 6% filter; top trace, 9% filter. (d) Difference functions for the raw noisy data (bottom) and the denoised data (offsets of 0.3 and 0.6). The 9% cutoff led to 9/512 compression.

purposes, compression of about 5:1 can be achieved, or 10:1 for “best case”. Wavelet smoothing for the single noisy Gaussian peak is shown in Figure 4. In wavelet smoothing, elements of the transformed vector are removed, depending on their position. Figure 4a shows the wavelet transformed data set. The portion of the data set to be eliminated is indicated as all elements under the arrows that show the extent of details 1, 2, and 3. In this case, the three highest resolution details (D1, D2, D3), comprising 7/8 of the elements of the data set, are discarded for 8:1 compression and consequent smoothing. (The reason for this choice of compression is discussed later in connection with other cutoff limits.) The remaining elements, shown in Figure 4b, are then backtransformed to obtain the smooth signal shown in the top trace of Figure 4c. The difference function shown in the top trace of Figure 4d has an rms deviation of 0.019. For comparison, in Figure 4c and d, results are also shown for the nonoptimum case of 4:1 compression, which has an rms deviation of 0.026. Wavelet denoising of the single noisy Gaussian peak is depicted in Figure 5. The wavelet transform of this data set is shown in Figure 5a; this is identical to Figure 4a, except for the portion of the data set to be eliminated, as indicated by the coefficients between the horizontal lines. In wavelet denoising, elements of the transformed vector are removed, depending on their magnitude. Here, a magnitude of 9% of the maximum coefficient, Imax, in Figure 5a is chosen for reasons discussed below. The horizontal lines in Figure 5a bracket the elements of the transformed vector which have intensity less than 9% of the maximum coefficient. These elements were discarded, giving the filtered vector shown in Figure 5b. This vector was then back-

transformed to obtain the denoised signal shown as the top trace in Figure 5c. The difference function in the top trace of Figure 5d shows that the denoising procedure is largely successful with an rms deviation of 0.017. For comparison, in Figure 5c and d, results are also shown for the nonoptimum case of 6% Imax cutoff, which has an rms deviation of 0.024. In assessing these results relative to those of the FT procedure, it should be noted that for DW denoising there is also a massive amount of compression (9/512 for 9% cutoff; 19/512 for 6%). For the optimal cutoff, the rms deviation of the difference function in Figure 5d, 0.017, is smaller than that of the difference function for FT filtering shown in Figure 3d and somewhat smaller than that for DW smoothing in Figure 4d. The lower rms deviation indicates the filtered noisy function is in closer agreement with the ideal (nonnoisy) function. Thus, for this data set, it is concluded that DW denoising does a better job of recovering the original signal than either the FT- or DW-smoothing technique. Multiple Triangular Peaks. Experimental spectra may have multiple peaks with varying widths and amplitudes. The data set in Figure 1b was constructed with these features in mind. It has tall peaks of varying width and four short (20%) narrow peaks at t ) 165, 301, 787, and 803. Results for SG smoothing are shown in Figure 6 for this data set. In Figure 6a, SG-smoothed traces are shown for m ) 2 with n ) 7 and 25. In Figure 6b, traces are shown for m ) 4 with n ) 7 and 25. The corresponding difference functions are shown in Figure 6c and d, respectively. In Figure 6a, for tall peaks, choosing n ) 25 (i.e., a large window) is more successful than n ) 7 (i.e., a small window) in smoothing the broad peaks between


Figure 6. Savitzky-Golay smoothing for data set from Figure 1b. Panels are otherwise as described in Figure 2.

t ) 400 and 800. However, n ) 25 increases the breadth of the sharper peaks for t < 400. Use of a higher-order polynomial (larger m) as in Figure 6b improves the representation of the sharper peaks, but smoothing of the broad peaks is worse. As the top traces in Figure 6a and b show, SG smoothing using the larger window is so poor that the short narrow peaks virtually disappear, and there is dramatic broadening of the tall narrow peaks. Using the smaller window, the short peaks are welldescribed, but much of the noise remains. Results for FT filtering of the multiple asymmetric triangle peaks are shown in Figure 7. Figure 7a shows the logarithm of the power spectrum with the noise region modeled by the horizontal line. The choice of line to model the signal region is problematic, so two estimations of the signal lineswith the same slope but different interceptsare shown as solid (A) and dashed (B) lines in Figure 7a. Using the signal and noise lines, the filter functions shown in Figure 7b were constructed. The processed data sets shown in Figure 7c were then obtained. It may be seen that Fourier filtering gives a good representation of the broad peaks, but an increasingly poor representation of the narrower ones, which lose about half their intensity. Figure 7d shows this more clearly through use of the difference functions for the FT-filtered data (top and middle) compared to the added noise (bottom). Note that the choice of the ‘B’ line permits a broader band of frequencies. Thus, the narrow peaks are better preserved, but the widest peak is less smooth for (B) than (A). Qualitatively, the reason for the poor representation of the narrow peaks is clear: they occur in a high-frequency region of the signal which coincides with the high-frequency region of the noise. Noise removal, by filtering out high frequencies, causes these components to be lost and including them would mean including more of the noise. Results for wavelet smoothing of the multiple asymmetric triangle peaks are shown in Figure 8. Figure 8a shows the

Figure 7. Fourier transform noise filtering for data set in Figure 1b. (a) Logarithm of power spectrum. The lines representing signal and noise contributions are shown. An alternate approximation for signal is shown as a dashed line. (b) Wiener filter function created from line approximations shown in (a). (c) FT-filtered multiple peak spectrum for two choice of filter. (d) Difference functions for the raw noisy data (bottom) and two sets of filtered data. Note the wider filter retains more of the original signalsand the original noise.

coefficients of details in the transformed signal. In Figure 8b, the first detail, i.e., all elements of the transformed vector with N > 512, has been set to zero. The reason for this choice of cutoff is discussed below. The back-transform of this modified vector is the processed signal shown as the top trace in Figure 8c. For comparison, results are also shown for the nonoptimum case of 4:1 compression. Figure 8d shows the difference function for the DW-smoothed data sets (top trace, 2:1; middle trace, 4:1). Note the significant loss of amplitude for the narrowest of the tall peaks for 2:1 smoothing and especially 4:1 smoothing. This is comparable to the FT and SG losses of amplitude. However, the broad peaks are quite well-represented, and the small narrow peaks are also observed. Results for wavelet denoising of this data set are shown in Figure 9, beginning with panel a, which is the same wavelettransformed vector of the signal as for wavelet smoothing. In Figure 9b, those coefficients with magnitude less than 2% of the largest coefficient of the transformed vector, Imax, are set equal to zero. The reason for this choice of cutoff is discussed below. Because of the small cutoff, Figure 9b resembles Figure 9a but in fact represents a substantial compression: 80/1024. Backtransforming the filtered transformed vector gives the processed data set shown as the top trace in Figure 9c. Note that even the narrowest of the tall peaks is well-represented and the four narrow small-amplitude peaks are all clearly visible. The difference function for the DW-denoised data is shown in Figure 9d (top trace). For comparison, results for the nonoptimum case of 1% Imax cutoff are also shown in panels c and d. A lower cutoff means more elements are retained; here the compression is 187/1024. Analytical Chemistry, Vol. 69, No. 1, January 1, 1997

83

Figure 8. DW-smoothing technique applied to multiple peak data set shown in Figure 1b: (a) DW-transformed vector. (b) Filtered DW vector, with 1 detail removed for smoothing based on 2:1 compression. (c) Bottom, back-transform of (b); top, back-transform of smoothing based on 4:1 compression (offset by 1.0). (d) Difference functions for (c) offset by 2.0 and 1.0. The original noise added is shown in the bottom trace.

The difference functions show that although noise remains in this data set, there is no systematic error. Choice of Cutoffs: Synthetic Data. Where to place the DW denoising cutoff, or what amount of compression to use when DW smoothing, requires systematic study with synthetic data for which the noise-free spectrum is known. We are not aware of such an investigation for wavelet smoothing, but for wavelet denoising Donoho and co-workers have drawn a distinction between wavelet filtering using a “hard” threshold and a “soft” threshold.8 The latter depends on the size of the data set and the standard deviation of the noise inherent in the signal. In the present work, it was desired to maintain a single yet flexible parameter that did not depend on knowing a priori the noise level, so scaling the denoising threshold or cutoff relative to the magnitude of the largest wavelet coefficient was chosen. The rationale for the choice of limits in the DW smoothing and DW denoising of the first two cases (synthetic data) is straightforward. Since the data sets consisted of a pure function plus (known) added noise, it was possible to construct a difference function which is the smoothed or denoised data minus the pure function. For perfect data treatment and noise removal, the rms deviation of the difference function would be zero. Thus, the optimum compression or cutoff is that which gives the smallest rms deviation. The rms deviations of the two synthetic data sets are listed in Table 1, for DW smoothing using a series of compressions, and Table 2, for DW denoising using a series of cutoff values. 84 Analytical Chemistry, Vol. 69, No. 1, January 1, 1997

Figure 9. Discrete wavelet transform denoising of the multiple peaks shown in Figure 1b: (a) DW-transformed vector. (b) Filtered DW vector, with cutoff set at 2% of the maximum coefficient of the transformed vector. (c) DW-denoised signal. Top trace, 2% cutoff; bottom trace 1% cutoff. (d) Difference functions for (c) with offsets of 2 and 1. The original noise added is shown in the bottom trace.

The number of coefficients eliminated is related to the number of the detail. In Table 1, which is for wavelet smoothing, the number of details removed (shown in columns 1 and 5) is related to the size of the transformed vector. In the wavelet decomposition of a data set of size 2j, the first (and highest-frequency) detail contains 2j-1 elements. The kth detail contains 2j-k elements. Thus, for example, the removal of 3 details from the single Gaussian data set of size 29 () 512) results in 448 elements being removed (29-1 + 29-2 + 29-3), leaving only 29-3 () 64) elements in the transformed vector. The number of elements remaining in each vector is shown in columns 2 and 6 of Table 1. The resulting rms deviation of the difference function is shown in columns 3 and 7. It can be seen that the optimum smoothing involves compression of 8:1 for the single Gaussian peak and 2:1 for the multiple asymmetric peaks. In Table 2, which is for denoising, the percentage cutoff referred to in columns 1 and 5 is the size of the fraction of the maximum coefficient in the wavelet-transformed vector. Values in the transformed vector that are below the cutoff are set to zero prior to back-transforming to the time domain. The number of elements remaining in each vector is shown in columns 2 and 6 of Table 2. The resulting rms deviation of the difference function is shown in columns 3 and 7. It may be seen that the minimum rms deviation occurs for 9% cutoff for the single Gaussian peak and 2% cutoff for the multiple triangular peaks. Smaller cutoff values result in decreased compression and greater rms error for each case. The smaller cutoff for the latter case is due to two factors: the smaller amount of noise, and the larger variety of peak shape widths. Cutoffs larger than the optimum value result in more noise being removed but also in significant peak shape degradation.

Table 1. Wavelet Smoothing: Dependence of rms Deviation and Compression on the Number of Details Removed single Gaussian peak + 5% noise

multiple triangular peaks + 3% noise

no. of details removed

no./512 remaining

rms

% of original size

no. of details removed

no./1024 remaining

rms

% of original size

0 1 2 3 4 5 6

512 256 128 64 32 16 8

0.048 0.035 0.026 0.019 0.021 0.088 0.135

100 50.2 25.2 12.7 6.4 3.3 1.8

0 1 2 3

1024 512 256 128

0.030 0.026 0.041 0.053

100 50.1 25.1 12.6

Table 2. Wavelet Denoising: Dependence of rms Deviation and Compression on the Choice of Cutoff single Gaussian peak + 5% noise

a

multiple triangular peaks + 3% noise

% cutoffa

no./512 remaining

rms

% of original size

% cutoffa

no./1024 remaining

rms

% of original size

0 5 6 7 8 9 10 11

512 31 19 14 11 9 8 7

0.048 0.030 0.024 0.021 0.020 0.017 0.021 0.024

100 12.1 7.4 5.5 4.3 3.5 3.1 2.7

0 0.8 0.9 1 2 3 4 5

1024 260 224 187 80 63 59 52

0.030 0.025 0.023 0.022 0.021 0.026 0.028 0.034

100 50.8 43.8 36.5 15.6 12.3 11.5 10.2

Cutoff threshold is set relative to size of the maximum coefficient in the transformed vector.

Compression Achievable. Both DW smoothing and denoising lead to data set compression, but a distinct difference between the two treatments becomes apparent when the compression achievable is calculated. For the purposes of this discussion, NR is defined as the number of elements retained in the transformed vector and NW as the number thrown away. The sum of these numbers is the total size of the data set that was transformed, i.e., 2j. For DW smoothing, in which the filter selects according to the wavelet detail, the compression achievable is (NR + 1)/(NR + NW). This is because each detail in the wavelet occupies a strictly defined portion of the data set, as shown in Figure 4a. Thus, the only “overhead” incurred is one additional number which will specify the number of details that must be retained. This formula was used to calculate the “% compression” in Table 1. For DW denoising, in which the filter selects according to magnitude, the compression achievable at first glance may appear to be NR/(NR + NW). However, the magnitude of the component of a transformed vector is not predictable according to its location, so an extra vector, which gives the positions of the nonzero transformed elements, must be employed. This location vector would mean that an extra NR points must be stored. Thus, the true compression achievable is 2NR/(NR + NW), and on this basis the “% compression” columns in Table 2 have been calculated. Further compression may be achieved using the wavelet packet transform.13 During the wavelet transform, the frequency domain is divided logarithmically by octaves, using scaling to sample different frequency bands. The wavelet packet transform further divides the wavelet subspace so that repeating wavelets are used to achieve higher frequency resolution. This is a promising route to further compression but is beyond the scope of the present work.

Choice of Cutoffs: Experimental Data. The central issue for the third case considered, or indeed for any experimentally recorded data set, is how to determine the limits (compression for smoothing or cutoffs for denoising) without a priori knowledge of the noiseless signal. For example, discrimination between fine structure in a recorded signal and the oscillation of high-frequency noise is highly problematic unless one knows in advance that the apparent fine structure is meaningless. Thus, the level of compression for DW smoothing or the frequencies to allow in FT filtering are difficult to set without a clear knowledge of the characteristics (peak shape, resolution, type of noise, etc.) of a “model” data set for the type of experiment being recorded. The technique of DW denoising is less likely to remove meaningful fine structure because of the remarkable ability of wavelets to preserve both low- and high-resolution aspects of a data set, providing they have relatively large coefficients. How to optimize the DW filters for an experimental data set? Three approaches are identifiable: (i) a visual inspection of the raw vs the processed data set; (ii) examination of the difference function between the raw and processed data sets, to see if the noise distribution follows a characteristic pattern; (iii) creation of a data set that simulates the experimental data in peak shape, intensity, step size, and distribution. To the simulated signal would be added noise with a distribution characteristic and S/N ratio comparable to the real data. From this, the optimization could be carried out as presented for the synthetic data sets in this work. The empirical approach to determining the extent of smoothing or denoising is most widespread but is critically dependent on the analyst. It requires a careful visual inspection of representative peaks in the data set, based on experience and knowledge, and is not quantifiable. Analytical Chemistry, Vol. 69, No. 1, January 1, 1997

85

Table 3. Parameters for Function Used To Simulate the Experimental Spectrum in Figure 1ca peak label, i

charge stateb

xi

Bi × 10-6

βi

Ci × 10-6

γi

1 2 3 4 5 6 7

15 14 13 12 11 10 9

631.5 675.2 726.0 787.7 859.3 945.1 1050.2

0.85 1.7 3.4 5.1 3.4 1.7 1.7

-0.01 -0.005 -0.005 -0.005 -0.005 -0.005 -0.005

1.7 5.1 10.2 10.2 8.5 3.4 0.85

-0.1 -0.08 -0.06 -0.05 -0.04 -0.04 -0.04

a Function consists of a sum of Gaussians; see eq 2 in text. b See ref 23.

Figure 10. Raw experimental spectrum (bottom trace), simulated spectrum without noise (middle trace), and simulated spectrum with noise (top trace). Simulated function in eq 2 and Table 3 of the text.

The second and third approaches depend on certain presumptions about the noise in the data set. In some cases, statistical techniques can be applied to multiple images or repeated experiments in order to determine the approximate amount of white noise presentsor in fact whether it is a white noise distribution. Statistical techniques can also be applied to determine the most probable signal, which, although not perfectly noise-free, would give some indication of how the ideal signal should appear. These are beyond the scope of the present work, which assumed a white noise distribution. The second and third approaches to filter optimization require extra calculation, with the third approach being the most involved. However, it is the most rigorous and the most quantifiable, and once it is carried out for a certain type of data set and the limits are determined, it is not necessary to repeat the exercise. In the following section, an in-depth example of filter optimization for the electrospray spectrum is presented. Experimental Spectrum. The application of filter techniques to experimental data sets with significant noise is the ultimate test of their utility. Typically, as is the case in the experimental data set considered here (Figure 1c), a noise-free version does not exist, so difference functions and their rms deviations cannot be calculated from the “pure” data set as in the previous cases. This can create problems in establishing the limits of smoothing and denoising, as discussed above. To illustrate how DW filter techniques can be successfully applied to an experimental spectrum, an example of simulating an observed spectrum is first presented: a functional form is devised, noise is added, the noiseadded spectrum is processed, and then the rms deviation between 86


Figure 11. Simulation of experimental spectrum and the results of applying DW filters: Without noise (sim); with Gaussian noise added (Gsim); with optimum denoising (D 0.2%); with optimum smoothing compression (S 8). Results are shown for (a) the overall spectrum, (b) the most intense charge state in the spectrum, and (c) two charge states of small intensity. The optimum DW criteria were determined by minimizing the rms deviation from the noise-free spectrum (sim).

the noiseless model and its processed form is used as an indicator for choice of cutoffs. The optimum cutoffs for the simulated spectrum will then be used to process the raw experimental spectrum. The differences between the processed and experimental spectra are then calculated and inspected for uniformity of noise removal. Each of the seven peak clusters in Figure 1c owes its shape to at least three components, and the series of peak clusters results from different multiple charges being acquired by the protein of weight ∼9400 amu; the experimental technique is similar to that used in ref 23. Through a process of trial and error, the 6001100 m/z region of this spectrum was simulated as a sum of Gaussians approximating a very broad baseline term (A term), broad multiply-charged peaks (B terms), and triplet peaks superimposed on the broad multiply-charged peaks (C terms):

y(m/z) ) y(x) ) A exp{-R(x - xa)2} + 7

∑ i)1

7

B exp{-β(x - xi)2} +

∑[C exp{-γ(x - x ) } + 2

i

i)1

C′ exp{-γ′(x - xi - κ′)2} + C′′ exp{-γ′′(x - xi - κ′′)2}] (2) where the peak locations xi and parameter values for the B and C (23) Covey, T. R.; Bonner, R. F.; Shushan, B. I.; Henion, J. Rapid Commun. Mass Spectrom. 1988, 2, 249-255.

Figure 12. Experimental electrospray spectrum and the results of DW filter processing: Panels show expansion of (a) two lowest intensity charge states, (b) highest intensity charge state, and (c) medium-intensity charge state. Top two traces, DW denoising with cutoffs of 0.2 and 0.1%; middle trace, raw spectrum; bottom two traces, DW smoothing with compression factors of 4:1 and 8:1.

terms are summarized in Table 3. The baseline term is located at xa ) 800; the preexponential factors A ) 3.4 × 106, C ′ ) 0.3C, C ′′ ) 0.1C; and the terms in the exponential arguments are R ) 4 × 10-5, γ′ ) 10 γ, and γ′′ ) 30γ. The original and simulated spectra are shown in Figure 10. To the noiseless simulated spectrum (middle trace) was added random spurious noise until the effect mimicked the experimental spectrum, as shown by comparison of the top and bottom traces of Figure 10. The rms deviation of the noise-added simulated spectrum is 1.6 × 106. Optimization of the filtering criteria was then carried out as for the simpler data sets. First, DW denoising was performed on the noise-added simulated spectrum for a variety of cutoffs. The rms deviations were calculated, and the optimum cutoff was determined to be 0.2%. This led to retention of only 166/8192 coefficients, to give a back-transformed signal with an rms of 6 × 105. Next, DW smoothing was performed for a variety of compression factors. The optimum smoothing was achieved using 8:1 compression, which is a retention of 1024/8192 coefficients, to give a back-transformed signal with an rms also of 6 × 105. The results of the optimization procedure on the model spectrum are presented in Figure 11. Panel a shows the whole spectrum, panel b shows an expansion of the dominant peak (charge state 12), and panel c shows an expanded portion of two small peaks (charge states 14 and 15). Overall, it appears that the denoised spectrum is cleaner than the smoothed spectrum. This is due to the very small number of coefficients retained in the DW-denoised case (166) compared to over 6 times that number in the DW-smoothed case (1024). For all charge states, the triplet peaks appear, except for the lowest intensity peaks, at

Figure 13. Comparison of methods applied to the experimental spectrum shown in Figure 1c. Each panel except (b) represents the difference between Figure 1c and the processed spectrum: (a) SG smoothing (4, 25); (b) logarithm of the power spectrum and the signal and noise line approximations used in constructing a well-tailored FT filter; (c) FT-filtered data using optimum filter shown in (b); (d) DW smoothing, with 4:1 compression (removal of first two details); (e) DW denoising, with cutoffs of 0.1 (836/8192 elements) and 0.2% (314/ 8192). The offsets are 0.7 and -0.5 × 106, respectively.

620, expanded in panel c and 1040 (not expanded). The denoising cutoff may be set too high, leading to some high-resolution basis functions with low-magnitude components being excluded; in addition to the missing peaks of very low intensity, spurious small peaks occur at 650 and 690 m/z. Overall, however, there is little distortion in the processed simulated spectrum for either filter. These “optimal” (for the simulation) cutoff criteria were applied to the raw spectrum, and to verify their suitability, slightly less restrictive criteria were also examined. Although the simulated spectrum captured the main characteristics of the raw spectrum well, it appears the simulation could be further improved. Magnification of the most intense “C” peaks in the raw spectrum shows that each sharp peak is actually a set of very narrow peaks, overlapping except for the top 10% or so of their height. Figure 12 shows, as the middle trace in each panel, a portion of the original spectrum. The two traces above it are the “optimal” denoising cutoff of 0.2% and the “close to optimal” of 0.1%. The two traces below it are the “optimal” smoothing compression factor of 8:1 and the “close to optimal” of 4:1. Inspection of the “S 8” and “D 0.2%” traces in panel a shows that some of the resolution of the very narrow low-intensity peaks is lost at the so-called optimal settings. At higher intensities, this trend is mitigated for Analytical Chemistry, Vol. 69, No. 1, January 1, 1997

87

Figure 14. Sample application of discrete wavelet filtering to a liquid chromatography/mass spectrum data set. For comparison, in each case the original data appears as the lower trace: (a) the total ion chromatographic (TIC) trace. The DW vector (not shown) had its elements with magnitude less than 0.5% of the maximum element removed; (b) the mass spectrum for a cut in the TIC at its maximum t ) 2.54 min. After a DW transform (not shown), all elements with magnitude less than 2% of the maximum were removed. Panels c and d are magnifications of (a) and (b); the offsets of the DW-modified data sets have been changed to 1.0 × 107 and 1.0 × 106, respectively.

denoising, as shown in panels b and c, but is still present for smoothing. Somewhat less stringent cutoff criteria in the DW denoising and smoothing of the raw spectrum were thus chosen. Figure 13 shows difference plots for SG (4, 25) smoothing, Fourier filtering, DW smoothing (4:1 compression), and DW denoising [0.1 and 0.2% of Imax(N)], in panels a, c, d, and e, respectively. Special care was taken to closely tailor the curve for the signal function of the log power spectrum in the Fourier filtering procedure; this is shown in Figure 13b. The results of the Fourier filtering, shown in Figure 13c, are noticeably better than those for SG, shown in Figure 13a, but there remain large deviations at the positions of the sharp, high-amplitude peaks. The DW smoothing difference function in Figure 13d shows some deviation at the positions of the high-amplitude peaks, but less than for SG. Its difference function bears remarkable similarity to the FT-filtered difference function, which is not surprising since both FT and DW smoothing discard the high-frequency components of the transformed data set. In contrast, DW denoising results in very uniform difference functions, for both 0.1 and 0.2% cutoff limits, as shown in Figure 13e. There are no large deviations at the positions of the sharp high-amplitude peaks, yet the compression occurring is (836/ 8192) and (314/8192), respectively. This is in accord with earlier evidence which shows DW denoising to be the most successful and efficient method of preserving the essential features of a data set. 88 Analytical Chemistry, Vol. 69, No. 1, January 1, 1997

Figure 15. Sample wavelets using different numbers of coefficients: 8, 10, 12, 14, and 20, with offsets of 200, 100, 0, -100, -200, respectively. (The 10 Daub line is dashed for clarity.) Samples of 13-coefficient basis functions defined through cubic spline and linear spline techniques are also shown, at offsets -300 and -400, respectively.

The electrospray example shows the construction of a simulated spectrum to establish the DW cutoff criteria. This is admittedly time-consuming, but the inferences (4:1 for smoothing or 0.2% Imax for denoising criteria) can be used on all similar spectra. Other issues come into play as well in deciding whether DW techniques should be applied: for example, the CPU requirements of processing (no greater than for FT) and the requirement for vector spaces 2j in size. Lastly, it should be noted that wavelet denoising will not replace indirect methods such as SG4 or Kalman filtering1 because these can be performed in real time while the DW methods must transform the entire data set as a complete block. Smoothing vs Denoising. The relative compactness of the basis functions at each level of detail gives insight into the merits of DW smoothing vs DW denoising. In the case of smoothing, the first detail and possibly second, third, and so on, are discarded, regardless of the magnitude of individual components. This means that the most compact wavelets are simply not available for recreating the signal. However, in the case of denoising, if the magnitude of the coefficients indicates a significant contribution, components from a low detail, i.e., highly compact wavelets, will be retainedsand this makes a significant difference in the type of basis function available to describe the signal. To smooth or to denoise? It may appear that denoising, i.e., selecting only those wavelet components with high magnitude, is always the more sensible choice. This is somewhat offset by the “memory penalty”, i.e., the fact that the positions of each of the large coefficients must be retained, and not just a single number for the level of compression, as is the case for smoothing. The choice of whether to smooth or denoise depends on whether the expected peak widths contain true fine structure; this in turn is related to the relative size of the sampling interval. One case where both smoothing and denoising worked well is the single noisy Gaussian peak already examined. The sampling

Figure 16. From left to right and from top to bottom: the DW transform of a noisy Gaussian as shown in Figure 1a, with all components except the largest 10 removed (as in Figure 5b); the denoised Gaussian formed by the inverse transform of the filtered DW{peak} (as in Figure 5c); and in the following panels the individual wavelet basis function associated with each of the 10 coefficients. To reconstitute the DWdenoised peak in the first panel, each of the 10 wavelet functions must be multiplied by its coefficient shown in the first panel and then summed. All x-axes, except where noted, are in units of t.

interval is much greater than the Nyquist frequency,24 which is a reflection of the breadth of the “true” features in the signal. The DW-transformed vector, as shown in Figure 4a, had very low intensity in the lowest detail. The high-N components of the transformed vector related only to the noise that had been added. Discarding one, two, or three details meant that only very lowintensity contributions were removed to give the filtered vector shown in Figure 4b. Or, the low-intensity contributions could be removed explicitly, by denoising, with a cutoff set appropriately as in Figure 5a to give the filtered vector shown in Figure 5b. Since the denoising technique incurs the memory penalty, however, it makes most sense to employ a smoothing algorithm. The compression in this case can be startlingly efficient, since (24) Brigham, E. O. The Fast Fourier Transform; Prentice-Hall: Englewood Cliffs, NJ, 1974.

the noise and signal regimes are so well-separated. Thus, only very high details (very low-resolution wavelets) must be retained. When the sampling interval is comparable to the Nyquist frequency, denoising is recommended. A “spiky” data set, whose peak widths are comparable to the recorded step size, presents extra problems because the width of the noise fluctuations is also on the order of the step size. This type of situation is reflected in the t < 400 portion of Figure 1b. High-intensity coefficients occur throughout the transformed vectorseven in the first detail, as can be seen in Figure 8a. LC/MS Data Set. To exemplify these recommended treatments, Figure 14 presents the optimum DW results for a typical liquid chromatography/mass spectrum data set. The bottom trace of Figure 14a shows the total ion chromatogram (TIC), which consists of apparently broad peaks with small noise. The peak Analytical Chemistry, Vol. 69, No. 1, January 1, 1997

89

shapes are reminiscent of the Gaussian peak in the synthetic data of Figure 1a, and hence, both DW smoothing and denoising were expected to be successful. In this case, however, DW denoising was found to be necessary; fine structure in the peaks between 2 and 3 min was lost with even the minimal amount of smoothing (2:1 compression; not shown). The DW-denoised data set (cutoff conservatively set at 0.5% of Imax) is shown as the upper trace; the compression achieved was 73/256. At t ) 2.54 min, the eluent from the chromatographic column gives rise to the largest peak in the TIC. When this eluent is analyzed in a quadrupole mass spectrometer with electrospray ionization, the spiky and sparse signal obtained is shown in the bottom trace of Figure 14b. As the discussion in the previous section indicates, DW denoising is unequivocally recommended for this type of data. The DW-denoised signal is shown as the upper trace in Figure 14b. The cutoff for filtering was set to be high, 2% of Imax, leading to a compression ratio of better than 10: 1. Figure 14c and d show magnified portions of these data sets and the denoised results. CONCLUSION A distinction is drawn between smoothing and denoising, according to the method of filtering elements in the transformed vector. The discrete wavelet transform is shown to be useful in smoothing and denoising synthetic data sets as well as simulated and observed electrospray mass spectra and liquid chromatograms. Two simple approaches for filtering the transformed vector have been systematically investigated. The DW results have been compared to results obtained using the the SavitzkyGolay smoothing algorithm, and also the Fourier transform filtering algorithm. Wavelet filtering algorithms have been shown to smooth and denoise data to an extent superior to other methods, with the added advantage that they can compress data sets by a factor of 2-10. For a complicated experimental data set, an in-depth example of establishing DW filter cutoff criteria using a simulated spectrum was shown. In general, DW denoising, which is a magnitude-based filter, gave better results than DW smoothing, which is position-based. The drawbacks to DW smoothing were apparent when the peak width of the fine structure was on the order of the Nyquist frequency. ACKNOWLEDGMENT We thank Dr. Tom Covey of PE Sciex for providing the thyroid extract spectrum. V.J.B. thanks NSERC (Canada) for the award of an Industrial Research Fellowship. I.P.H. thanks NSERC (Canada) and Wilfrid Laurier University for generous support.

90


APPENDIX A This Appendix presents, in Figure 15, a small selection of the wavelet functions available. Some Daubechies wavelets constructed using different numbers of coefficients are shown; the low-coefficient functions are comparatively choppy. Also shown are wavelets whose 13-coefficient forms were determined using linear (SL) and cubic (SC) splines. The 12-coefficient Daubechies form was chosen for the present work, leading to the transformed vectors shown in Figures 4a and 5a. This Appendix also presents, in Figure 16, the discrete wavelet transform decomposition for the simplest case in this report, shown in ideal and “noise-added” form as two traces in Figure 1a. After filtering the transformed vector components according to magnitude (the denoising algorithm) only 10 elements remained, occurring at N ) {2, 4, 5, 6, 8, 10, 11, 12, 23, and 47} as seen in Figure 5b. In Figure 16, the filtered function, its filtered transform, and the individual wavelet basis function associated with each of these 10 coefficients are shown. In order to reconstitute the DW-denoised peak seen in Figure 5c, each of the 10 wavelets was multiplied by its coefficient and then summed. It is instructive to consider the contributions of the details to the DW-filtered function. The vector of transformed coefficients shown in Figure 4a contains the first detail or D1 (N ) 257-512), the second detail or D2 (N ) 129-256), and the third detail of D3 (N ) 65-128). All their components are of negligible magnitude, so 8:1 compression is easily achieved by simply removing these components. In the transformed vector, the first major coefficient is at N ) 47, lying in the middle of the D4 (N ) 33-64) detail. Another contribution occurs at N ) 23, in the middle of the D5 (N ) 17-32) detail. As noted previously, the first or lowest detail is at the highest resolution, and as the number of the detail increases, the resolution proceeds to decrease. This is shown in Figure 16, where comparison of the D4 and D5 components shows the former is sharper and more compact (by a factor of 2) than the latter. As elements from the lower resolution details D6, D7, and D8 are included, the overall character of the signal (a single central peak) is asserted.

Received for review June 27, 1996. Accepted October 16, 1996.X AC960638M X

Abstract published in Advance ACS Abstracts, December 1, 1996.

Smoothing, Denoising, and Data Set Compression - American

Recommend Documents