Near-optimal smoothing using a maximum entropy criterion

optimizing signal shape retention during the process of noise removal. In contrast, little work has been aimed at optimizing the Alter to retain as mu...
2 downloads 13 Views 1MB Size
2057

Anal. Chem. 1992, 84, 2057-2066

Near-Optimal Smoothing Using a Maximum Entropy Criterion Robert J. Lariveet and Steven D. Brown' Department of Chemistry and Biochemistry, University of Delaware, Newark, Delaware 19716

Mgltal fllten have h o m e very common for smoothlng and fllterlng of noky chemkal data. Many papen have appeared on the use of fllten for optbnlzlng dgnal-to-ndw or for optbnlzlng dgnal8hape retonth durlng the process of nok. removal. I n contrart, IHtIe work ha8 b..n almed at optlmlzlng the fllter to retaln as much of the lnfonnatlon present In the data whlk removlng a8 much of the nolw a8 porrlbh. Thk papor r.pork a rlmple approach for optlmlzlng dlgnal fRw8 d severaltype$. The optlmlzatbncrlterbnusedhere k based on the Informatbnal entropy of the smoothed dgnal. It k domonotrated here that, for many form8 of data, a fitter optknlzed by maxbnldngthe Inb"tl0nal entropy, but wlthout knowlodge of the true rlgnal shape, performs as well as an equlvahnt, optknal fllter opthlzed wlth the aM of prlor knowledgmdthe8bnalshape. Fl(ersopthrlndbymaxLnlzlng the Infonnatlonal ontropy have appllcatlonr where accurate roconotructlon mud k accompankdby rlgnlkant Increases In rlgnal-to-noke ratlor of nolsy, but unknown data. The maxl" entropy optbnlzatlon procem k demonstrated on several different dlgltal flltem.

INTRODUCTION Digital filters have become common for smoothing and fitering of noisy data.'-3 When a digital filter is applied to a noisy signalduring data collection, real-time noise reduction, or filtering, results. When a digital Yfiiter"is applied after data collection, smoothing of the noisy data is achieved. In either case, a well-designed digital filter or smoother acts aa an ideal notch filter, allowing only the signal frequencies to paw, while attenuating all other frequencies in a predefined manner.4*6 Because digital filtering and smoothing is usually done in software rather than in hard-wired circuits, ad hoc filtering schemes and modifications which selectively attenuate various frequencies are easily accomplished. This flexibility, along with the ease of implementation of these filters, has made digital filtering popular in analytical chemistry, as indicated by the increase of articles on this subject.s-10 There are a number of problems associated with digital filtering, however. The greatest difficulty with the use of digital filtering and smoothing methods lies in setting those adjustable parameters necessary to accomplishoptimal noise reduction for a given data set. Often, these parameters are

* Corresponding author.

+ Preeent address: Department of Chemistry, Froatburg State Uni-

versity, Frostburg, MD 21632. (1) Willson, P. D.; Edwards, T. H. Appl. Spec. Rev. 1976,12,1. (2)Kaiser, J. F.;Fteed, W. A. Rev. Sci. Instrum. 1977,48,1447. (3)Horlick, G. AM^. Chem. 1972,44,943. (4)Bialkoweki, 5.E.Anal. Chem. 1988,60,365A. (5) Bialkoweki, S. E.Anal. Chem. 1988,60,403A. ( 6 ) Aubanel, E.E.; Myland, J. C.; Oldham, K. B.; Zoeki, C. G. J. Electroanal. Chem. Interfacial Electrochem. 1985,184,239. (7)Jagannathan, S.;Patel, R. C. Anal. Chem. 1986,58,421. (8)Betty, K.R.; Horlick, G. Appl. Spectrosc. 1976,30, 23. (9)Hayes, J. W.;Glover, D. W.; Smith, D. E.; Overton, M. W. A w l . Chem. 1973,45,277. (10)Myland, J. C.; Oldham, K. B.; Guoyi, Z. Anal. Chem. 1988,60, 1610. 0003-2700/92/0384-2057$03.00/0

associated with the convolution function which determines the frequenciesthat are attenuated. The range of frequencies passed by an optimal smoother is not fixed; it must be selected to match, to the extent possible, the noisy signal to be smoothed. Optimization of the set of adjustable parameters controllingthe smoother passband is not easily accomplished, however, especiallywhen the nature of the signal is unknown. If the wrong values are used, it is possible to distort the data inadvertently in attempts to reduce the noise content.' Previous work has considered this dilemma in terms of 'hard" smoothing, where noise is significantly attenuated, but at the cost of signal shape, and "soft" smoothing, where noise is removed only to the extent that all of the possible details of the signal are preserved.11-13 There is very little literature in chemistry describing which of the many smoother types is best suited for removal of noise superimposed on a particular signal shape, or on the optimization of a particular filter type. Comparisons of different filtering or smoothing methods of the same noisy data are also rare. Voightman and Winefordner compared several common analog filters-ones that can be realized in digital form as well-for use on spectroscopic data,14 but they did not try to optimize the fiitering process. In this paper, an approach based on the Shannon information entropy is presented that addresses some of the problems associated with selecting the Ybest"parameters for digital fiitering and smoothing. By maximization of the information entropy of anoisy response,it is possible to obtain a set of adjustable parameters necessary to ensure minimal loss of signal information in a Smoothing step, even when the signal shape that is sought from the smoothing is unknown. When signals in Gaussian noise are examined, the smoothing obtained by application of this method is very close to that obtained with least-squares fitting of the noisy signal, using prior knowledge of the signal shape. The entropy-based optimization method is general, and it is demonstrated here in connection with three smoother algorithms.

THEORY General. The notation used in this paper is as follows: vectors are represented by bold capitalized letters, R, the individual scalar components of the vector are shown as noncapitalized letters with a subscript, ri, and scalars not associated with a vector are distinguished as letters with no subscript, n. Functions are denoted by parentheses around the independent variable, f ( t ) ,and the Fourier transform of these functions are written with a Capitalized letter, F(Q). All real data obtained by physical measurements can be broken down into two components: the noise component and the signal component4p5

X=S+R

(1)

where X is the raw data vector, S the signal vector, and R the noise vector. For the work reported here, the noise vector (11)Bromba, M. U. A.; Ziegler, H. AM^. Chem. 1983,66,1299. (12)Bromba, M. U. A.; Ziegler, H. Anal. Chem. 1979,51,1760. (13)Bromba, M. U. A.;Ziegler, H. Anal. Chem. 1984,66,2052. (14)Voightman, E.G.;Winefordner,J. D. Rev. Sci. Instrum. 1986,57, 957. @ 1992 Amerlcan Chemical Sockty

2068

ANALYTICAL CHEMISTRY, VOL. 64, NO. 18, SEPTEMBER 15, 1092

can be considered to be a sample of a random Gaussian distribution with a mean of zero. The assumption that the noise and signal components are additive and uncorrelated is reasonable for most chemical applications and is used in this paper. Digitalfiltering and smoothingtechniquescan be described for a time series X as a convolution of the b discrete, timedomain data X ( T ) with a weighting function h(t - T ) such that

where 8 ( t ) is the smoothed signal, an estimate of the true signal s(t),*"" and where T is the lag associated with the convolution operation. Real data x ( t ) may be taken to be a discrete time series in the sampled independent variable t

t = to+ nA

(3) where n is the cardinal number of the measurement, A the interval between recorded samples, and where t o is the initial value of the time series defined by eq 3. This time series can be set up in terms of time or any other suitable independent variable, so long as it meets the requirements above. Equation 2 can also be expressed as

&t)= x(t)*h(t) (4) where * indicates the convolution operation. Equation 4 can be brought from the time domain into the frequency domain, using the Fourier transform to obtain

S(Q)= X(Q)H(Q)

(5)

where the weighting function H ( Q )can be represented as its Fourier transform h

where b is the number of points in the system being examined. The weighting function in eq 6 describes the weighting of each point x@) in the Fourier domain representation of s. Ideally, those frequencies which represent pure noise will receive a zero weight value, and those which represent only signal will receive a weight value of one. Determination of this weighting function is the critical step in filtering and smoothing. The relationship between eqs 4,5, and 6 is important in understanding how filtering and smoothing routines operate. Generally the systematic changes attributed to the signal are confiied to a few low frequencies, while many common forms of noise cover a relatively wide range of frequencies. If this relationehip of noise and signal holds, it is easily seen that the number of frequencies required to describe a simple, broadband signal to an acceptable degree of accuracy are also relatively few, usually consisting of frequency components from the lower frequency region. The number of frequencies needed to adequately define the signal by the Fourier series defied in eqs 4-6 is determined by the Nyquist sampling theorem.16 Note that this determination of the frequencies necessary to describe a signal adequately can be considered equivalent to the sampling of the signal in that either undersamplingor under representation (with fewer frequencies represented than those required by the Nyquist theorem) leads to aliasing and distortion of the true signal shape. If the signal shape is known and the noise is zero-mean and white (or can be made white by application of a whitening (15)Brigham, 0.E.The Fast Fourier Transform; Prentice-Hall, Inc.: Englewood Cliffs, NJ,1974. (16)Bracewell, R. N. The Fourier Transform and Its Applications; McGraw-Hill: New York,NY,1965. (17)Brereton, R. G.Chemom. Intell. Lab. Sys. 1986,1,17.

filter), determining the best weighting function is straightforward. The optimal weighting function-in the leastsquares sense-for a known signal is the matched smoother, where, for real data, the weighting function is identical to the frequency-domain representation of the signal.4J6J8 When the signal shape is unknown, however, a matched smoother cannot be used. In thew c m , determiningthe best weighting function for smoothingor filteringis far from straighfforward, and nearly-optimal smoother8 are attractive, if they can be found. Any smoother other than the matched smoother, even when properly sized, cannot provide better signal-to-noise ratio enhancement in the least-squares sense, however. Polynomial Smoothing. Probably the most familiardigital smoother used when the signal shape is not known prior to filteringor smoothing is the well-known polynomial smoother first described in the chemical literature by Savitzky and Golay.lg22 In this method, a portion of the signal containing 2N 1points is fitted to a Gram polynomial of the kth order, PkN(n),where

+

defies the terms in the polynomial and where the weighting coefficient for each term in the polynomial is given by

The ( x ) G ) notation used in these equations is defied as

( x p = x ( x - l)(x

- 2)...( x - j

+ 1)

(9)

These polynomials form a set of orthogonal functions on the finite number of points in the 2N + 1point 'window" of data under examination. Generally,the center point of the window is then set to the estimatedvalueobtained from the polynomial model obtained from eq 7. Other points in the window can be used,but there are some compellingr e a " to avoid them.% The window is moved to the next point and the process is repeated. This procedure is repeated until all points in the datavedor are smoothed. Ineffect,thismethod approximates a complex signal shape with a simpler, polynomial function over the points included in the smoother window. The width of the window used for the local approximation and the order of the polynomial allow some latitude in matching the frequency characteristics of the smoother and the signal. In eq 6, the nth element of the weight for polynomial smoothing, h,, is

where m is the degree of the polynomial being fitted to the noisy data. The weight defied by eq 10 can be modified by changing either the window size or the degree of the polynomial. To determine how window size and number of terms in the series defied by eq 10 affect the different frequency components present in the noisy data, it is most convenient to consider the smoother transfer function. By changing the variable in eq 6 such that 0 is equal to r w / w s ,with wBthe point where Nyquist folding occurs, a plot of H(0) versus 0 can be produced. This plot describes the transfer function for the (18)Savitzky, A.;Golny, M. J. E. Anal. Chem. 1964,36,1627; 1972,44, 1906. (19)Madden, H.H.Anal. Chem. 1978,50,1383. (20)Enke, C.G.;Nieman, T. A. Anal. Chem. 1976,48,70SA. (21)Willeon, P. D.; Polo, S. R. J. Opt. SOC.Am. 1981,71, 599. (22)Emet, R. R. Rev. Sei. Znstrum. 1965,36,1689. (23)Dougherty, T.P.; Wentzell, P. D.; Crouch, S . R. Anal. Chem. 1987,59,367.

ANALYTICAL CHEMISTRY, VOL. 84, NO. 18, SEPTEMBER 15, 1992

2059

by Kawata and Minami.26 This method is similar to the firstorder polynomial (running average) smoother. The average of a number of adjacent points in a window is calculatsd, and the center point of the window receives this average value as an estimate of the true signal value. The window is shifted by one point and the process is repeated. The term h, in eq 6 for a running average is

which has a transfer function

The mean and variance of the data within the smoother window are calculated. This "local variance" estimate, uZx, is then compared to the expected variance of the noise over the entire data set

where u2, is the overall (global) variance of the noise in the data. This calculated ratio a(v) is then used as to determine how much smoothing the center point in the rectangular window receives

Fquency (Radions/sec)

FIguro 1. (a)- transfer fmctlonsfora thirdorder potynomial smoother wlth wlndow sizes of 5 (-), 11 (- -), and 21 (.) points. (b) The transfer functlons for a fmhorder polynomial with wlndows of 11 (-),

-

-

15 (- -),

and 21 ).(

points.

smoothing operation. Transfer functions are related to weighting functions in that they both describe how frequency components of a noisy data sequence are attenuated. They are generally defined in units of radians 8-1. These transfer functions have been evaluated for the various different parameters used in polynomial smoothing; a detailed accounting of them has been given by Wilson and Edwards.1 Figure 1presents some examples of these transfer functions. By changing the window size and the number of terms used in the polynomial equation, the transfer function can be modified to determine which frequencies will be allowed through the digitalfiter or smoother. As the number of points in the wjndow increases and the number of terms used in the polynomial equation decreases, the higher frequencies are attenuated more strongly. As with the rectangular weighting function, if the attenuation is too severe, and frequencies below the Nyquist limit are attenuated, the higher frequency components of the signal are removed, and the signal becomes distorted through aliasing. If the frequencies above the Nyquist limit are not attenuated properly, noise in these frequencies is not removed. Bromba and Zieglerl1-l3 have discussed how various modifications to the transfer function can affect the resulting signal passed by a polynomial filter. They have shown that the best results occur for polynomial fiitering when the true signal shape is known, or can be guessed, so that the order and window size of the polynomial filter can be set optimally. They offered no method of d e t e r m i n i them parameters for an unknown signal, however. Adaptive Smoothing. The second smoother algorithm of interest here is the adaptive smoothing method reported

3(v) = [a(v)(z(v)- m)I+ a b ) (14) where a(v) is the average value of the data in the window. When the local variance is greater than or equal to the global estimate of the variance of the noise in the data, a(v)is assigned a value of zero and the center point of the window is set to the calculated average. In this case, the smoothing provided by the adaptive smoother is equivalent to that given by a polynomial smoother of order 1, and extensive smoothing results. As the local variance becomes much less than the variance of the noise, a(v) becomes close to one, and no smoothing occurs. Thus the degree of smoothing provided 'adapts to the local noise variance. It is easy to generalize the treatment of Kawata and Minami to higher-order polynomial filters, too. In this way, polynomial filters that adapt to local noise variance conditions can be produced using the approach outlined in eqs 11-14 above. Only the adaptive, polynomial smoother of degree one will be discussed here because it offers the greatest range of smoothing for data with nonuniform noise, a situation which occurs, among other cases, when data collected in the time domain are presented in the frequency domain. The transfer functions for adaptive running-average smoothing are similar to a polynomial smoother of degree one, but offset on the y-axisby a(u),the ratio of the local variance to the global variance. As with polynomial smoothing, the size of the window determines how much of the higher frequencies are attenuated. In this smoother algorithm, window size is not quite as important in controlling signal distortion as in the nonadaptive polynomial smoother, however, due to the adaptive nature of the algorithm, but if the window size is too large or too small, very little smoothing is achieved. The ratio of variances raises or lowers the transfer function on the y-axis,thus attenuating or passing frequencies to a greater or lower extent, respectively. Thus, guesses for the estimated variances have a significant effect on the attenuation of higher frequencies. Overly optimisticestimates(wherethe estimated variance is too small) cause over smoothing where the local signal variance exceeds the global estimate, according to eq 13above. Overly pessimistic estimates cause little smoothing to occur,onthe other hand. For any noisy signal,there should be optimal estimates of window size and global signalvariance. With these parameters, a smoother transfer function can be

2060

ANALYTICAL CHEMISTRY, VOL. 64, NO. 18, SEPTEMBER 15, 1992

created that will attenuate mainly noise. Kawata and Minami suggested no method for determining the optimal estimates of signal variance or smoother window size in their derivation of this smoother. Fourier Smoothing. The simplest smoother to implement is based on truncation of the noisy signal in the Fourier domain. The direct application of eq 5, with the rectangular weighting function H ( Q ) ,removes frequencies above some lower limit but preserves all frequency components below that limit.3 The rectangular truncation can lead to spurious oscillations (Gibbs' effect) in the back-transformed data, especially if the data are not properly preprocessed, and it is common to modify the truncation so that attenuation of frequencies near the cutoff point is more gradual.24 A trapezoidal weighting function is quite commonlysubstituted in these cases, but other windowing functions can also be used. When noise and signal are well-separated in the Fourier domain, finding the proper point for truncation is simple.3 However, when the noise and signal components share frequencies, finding the optimal truncation point is not straightforward. The same tradeoffs mentioned with the other two smoothers: signal preservation versus noise attenuation must be balanced in the optimal placement of the truncation. In all three of the methods discussed above, optimal values for those parameters that control the transfer function shape are not easily accessible from the equations describing the smoothing in either the time or the Fourier domain. Commonly, these values can only be guessed by the experimenter. The empirical nature of their selection makes smoothing techniques extremely difficult to use in an optimal manner and makes comparison of the different methods challenging. One difficulty in optimizing the smoothing or filtering produced by these methods is the problem of determining whether the change in shape of the smoothed signal produced by the signal processing is due to noise reduction or to signal distortion. In many cases, the change in the data produced by the signal processing is due to a change in both signal and noise. For most types of noise, there are two distinct regions in the frequency space representation of the data: one which, in effect, contains only noise and the other region which contains both noise and signal. Attenuation of frequencies in the region containing only noise results in decreased noise and no change in the signal shape because no frequency components of the signal are altered. When the transfer function of the smoother attenuates frequencies in the region where there is a significant amount of signal, however, the noise decreases,but the signal is distorted as well. It is possible to decide on the relative contributions of signal distortion and noise reduction by making use of Rayleigh's t h e ~ r e m : ' ~ J ~ Because the Fourier transform is a linear transform, moments of a function are preserved across the transformation, and the integral of the square modulus of a function is equal to the integral of the square modulus of its transform (15)

This theorem can be modified to show that for a given set of discrete, band-limited data in the frequency domain, the inverse Fourier transform can be obtained b

(16)

(24) Brown, S.D. Inhactical Guide to Chemometrics; Haswell, S.J., Ed.; Dekker: New York, NY, 1992; Chapter 8.

such that any integer value for b from 1 to N / 2 will give a vector s that will have a constant area T N

T=

pi

(17)

151

where N is the number of points in the vector and where di is the magnitude of the ith point in the vector 8. This relationship can be used to distinguish noise removal and signal distortion. When frequency components from the signal are removed, the intensity of the signal must decrease, and the signal must broaden to preserve the area T. This relation is exactly analogous to the effect of aliasing brought about by undersampling of a signal, as both are based on the representation of a wave form by a Fourier series.ls Truncation or undersampling both broaden signals. Any empirical method which detecta either the decrease in signal intensity or the increase in signal broadening resulting from filtering can therefore be used to improve the quality of the filtering or smoothing. The one used here is based on the Shannon information entropy. Entropy. Entropy is a term which was first used to describe the amount of "disorder" in a system. Thoae states in a system which contain higher levels of entropy are favored over those which have less.25 In a statistical sense these states are more probable. Informational entropy is defined mathematically in the discrete form as follows: A random experiment is performed that has n possible results. This experiment is performed j times, so that, assuming independence, there are N = d outcomes possible. The frequencyfi of each different outcome is fi =

Ni

for i = 1-N

(18)

where i is one possible outcome of the experiment and where Ni is the number of times that outcome i occurred. The vector F = VI, f 2 , ..., fn] is a discrete probability distribution functi0n.~~,~8 The entropy of this distribution is defined as

-xfi n

H=

log fi

1=1

The base of the logarithm in eq 19 is arbitrary, but it is conventional to use base 2 logarithms, giving the entropy the units of bits. By defining the information entropy H associated with data vector X as

z

H =-

x i logxi

(20)

where xi is the value of ith point in the data vector scaled by the vector norm llXll, an informational entropy for a spectrum can be calculated. When Gaussian noise with a mean of zero is randomly distributed on the noise-free data vector, the uncertainty associated with the data vector increases. It is convenient to think of this addition of noise as having the effect of broadening the confidence region in which the true, noise-free signal will fall.29 The information entropy will decrease with the addition of noise; however, a direct reflection of decreased smoothness in the data.30 Many possible "true" signals might be used to describe the noise-free part of the noisy composite signal. By calculating the information (25) Kawata, S.;Minami, S. Appl. Spectrosc. 1984, 38, 49. (26) SchrMinger, E. Statistical Thermodynamics, Cambridge University Press: Cambridge, England, 1948. (27) Jaynes, E. T. h o c . IEEE. 1982, 70, 939. (28) Jaynes, E. T. Annu. Reu. Phys. Chem. 1980,31, (1980) 579. (29) Hurvich, C. M. Technometrics 1986,28, 259. (30) Shannon, C. E. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, 1949.

ANALYTICAL CHEMISTRY, VOL. 84, NO. 18, SEPTEMBER 15, 1992

entropy of a trial smoothed data vector obtained as a function of one or more adjustable parameters in the smoothing algorithm, a multivariate entropy function can be defined on the parameter space. Maximizing the information entropy of a smoothed representation of X, as given by eq 20, yields the most probable smoothed version of S. In maximizing the entropy, the amount of noise contained in the smoothed version of X is, in effect, minimized subject to a constraint. The constraint is imposed by the nature of the unknown, noise-free signal itself, for which a minimal number of frequencies are required by the Nyquist criterion. The constraint can be specified in terms of the x2 statistic, where the fitting of the smoothed signal 9 to the noisy data fi is evaluated by n

The estimated noise at point i is given by u,2. The x2 distributionthus sets arestricted area of the confidence band of possible solutions wherein a maximum in informational entropy is sought.31 Therefore, by maximizing the information entropy in this region,the most probable signal consistent with the unknown signal shape, but containing less noise than the original data, is selected from the many possible “true” signals. By the theorem of entropy concentration, it can be shown that many other highly probable solutions which are also consistent with the data lie very close to the one with maximum entropy.32

EXPERIMENTAL SECTION To compare the different methods and test the entropy maximization method in a systematic manner, a well-defined system is necessary. As a first approximation to experimental data, computer-generatedGaussian and Lorentzian peak shapes were used. These synthetic peaks described experimentallyobserved spectral shapesfairly well, as most spectral peak shapes are either Lorentzianor Gaussian, while a few spectralpeak shapes fall between that of a Lorentzian and that of a Gaussian.’ By characterizingthe Smoothingof both shapes,a basis for estimating how well the method will perform on different types of experimental data can be established. Gaussian peaks were generated from the relation

s = A exp[-+-]

4(ln 2)t2

and Lorentzian peaks from the relation

s=-

A 1-4- 4t2

w2 where A is the peak height, t is time or x-axis of the plot, and W is the full width at the half-height (fwhh) of the Gaussian or Lorentzian peak in the same units of the x-axis. Each of the synthetic peaks contained 200 points. The Gaussian noise added to the synthetic data was generated using either a C language random number generator supplied with the UNIX BSD4.3 operating system or a uniform, random number generator in MATLAB. To ensure a Gaussian distribution of noise, five random numbers were generated between -1 and 1, and their average was calculated. This number was used as a single noise value. A noise vector was generated by repeating this process for each element in the vector. The noise vector was scaled to the desired root mean square value and then added to the known signal vector to generatethe noisy data vector to be smoothed. The signal to noise ratio was calculated by (31) Skilling, J.; Gull, S. F. In Maximum Entropy and Bayesian Methods in Inverse Problems; Smith, C . R., Grandy, W. T., Eds.; Reidel:

Dordrecht, 1986. (32) Japes, E.T. In Maximum Entropy and Bayesian Methods in Inverse Problem; Smith, C . R., Grandy, W. T., Eds.; Reidel: Dordrecht, 1985.

2081

dividing the maximum peak height of the signal vector by the root mean square of the noise contained in the noise vector. The problem of distinguishing signal broadening versus noise reduction can be eliminated by detecting changes in either intensity of the peaks or in broadening of the smoothed peaks. To keep the peak intensity constant throughout the procedure, so as to measure peak broadening at constant peak height, the maximum peak height after smoothing was set to a constant value of 0.1 unit. Every smoothed vector was normalized in this manner. The smoothing was also evaluated using experimental UVvisible spectra. Solutions of nickel nitrate, cobalt nitrate, and copper nitrate in a 0.1 M nitric acid solution were made up using reagent-grade chemicals. Their spectra were obtained with a Hewlett-Packard 8452 diode array spectrometer with 2-nm resolution. The concentrationof these solutionswas adjusted to give peak absorbance readings near 1.0absorbance unit. Gaussian noise was added to these spectra using the same method described above. The smoothing techniques were applied, and optimal smoothing parameters were determined by using leastsquares fitting of the unaltered spectra and by location of the maximum of the entropy function. For calculating the coefficients used in the polynomial (Savitzky-Golay) smoothingalgorithm,the pseudoinverseof the Vandermonde matrix was used.33 These Coefficients were then used to determine the smoothed values for each point in the data vector. With this method, the points prior to the first and after the last smoothing window could also be smoothed if desired. Results reported here were obtained with midpoint smoothers only. The informational entropy was calculated according to eq 20. The negative of the entropy was minimized with the aid of a simplex optimization program. Least-squares fitting of the known, noise-freepeak to the smoothedpeak was used to evaluate the smoothing. Programs were written in the Pascal and MATLAB languages. Calculations were performed in the Pascal programming language on a Celerity 1200 minicomputer under the Unix BSD 4.3 operatingsystemand in the MATLABlanguage on an Apple Macintosh SE with 68020/68882 accelerator board and on a Sun SPARC 1+ workstation.

RESULTS AND DISCUSSION Several sets of synthetic data with known signal shapes were generated, and noise was added to provide known responses which could be used to test the accuracy and precision of the smoothers generated by entropy maximization method. Gaussian and Lorentzian peaks of 40,20, 10, and 5 points at full width half-height (fwhh) were used to evaluate the smoothing. For all these peaks, zero-mean Gaussian noise was added to produce data with a specified maximum signal to noise (S/N) ratio. A series of 10 different noisy data sets, containing the same signal vector, but different noise vectors, was examined for each trial. To provide a check on the extent of the smoothing possible given the noise, the exact signal shape was used to determine the best (leashquared smoothingposeible with each smoother applied to the noisy data vector. Results from the smoothing were evaluated by fitting the smoothed result to the true signal. With the true signal shape, the sum of the squared error (SSE)between the data that had been smoothed, $4, and the true, noise-free data, S, can be calculated directly

z(B, N

SSE =

- si)’

(24)

r=l

By plotting the sum of squared error obtained from eq 24 as a function of the smoothing parameters, a function which exhibits a minimum is obtained. Here, the parameters which produce smoothed data with a minimum sum of squared fit error are taken as the “true” parameters. These parameters are optimal in a least-squares sense because they minimize a sum of squared error terms between the true and the smoothed data. (33) Bialkowski, S.E. Anal. Chem. 1989,61, 1308.

2082

ANALYTICAL CHEMISTRY, VOL. 64,

NO. 18, SEPTEMBER 15, 1992

Table I. Comparison of Best Smoothing Parameters for a Gaussian Peak. Containing Different Noise Vectors least-sauares method entropy .. method window size SSE value of fit window size SSE value of fit 0.143 74 0.143 74 6 6 0.144 02 0.14402 10 10 4 0.144 05 0.144 00 6 4 0.144 21 0.144 17 6 4 0.143 96 0.143 89 5 7 0.143 93 0.143 93 7 0.143 74 5 0.143 71 9 0.143 77 7 0.143 77 7 0.143 96 7 0.143 96 7 0.143 86 6 0.143 86 6 6.0 0.143 92 av 6.9 0.143 90 1.9 0.000 15 0.000 14 SD 1.5 Ten noisy Gaussian8 with fwhh of 40 pta, and maximum S/N ratio of 10,were used as input. Fourier smoothing wa8 done using a rectangular smoothing window.

The optimal smoothing parameters were obtained for the entropy maximization and the least-squares minimization. With these values, any apparent difference in the optimized smoothingparameters obtained from the two methods could be compared in terms of the qualityof smoothing. Optimizing smoothing parameters by entropy maximization does not always result the same values as those calculated from the least-squares minimization, however. Table I summarizes resulta for smoothing of a series of 10 different, noisy signals made from the same signal vector, but different noise vectors. Here, the smoothing window of a rectangular, Fourier-domain smoother was to be optimized. The noise vectors all had approximately the same root mean square noise value (0.1) and mean (-0). The variation in the optimal windows found by the entropy method and the least-squares method resulta from variations in the noise vector and the effect of the noise on the two approaches to optimization. Noise acta in a welldefiied manner on average, but it may not act in a welldefined manner for any one specific case. Therefore, to properly account for the effect of the noise, an average of 10 runs was taken, and the average optimal parameter set is given along with ita standard deviations. These statistical indicators describe the random effects of the noise much more reliably than a one-case example. In evaluating discrepancies in window sizes between the least-squares and the entropy methods, it must also be determined how much any difference in the predicted smoother parameters makes in the final, smoothed results. While the most probable signal shape determined from entropy maximization may not be the optimal, smoothed signal shape determined from the least-squares fita, the principle of entropy concentration suggests that these two should be close and that the entropy maximum should be rather broad. If the least-squares and entropyfunctions both show broad extrema, large differences in smoothing param-

eters will make little difference in the smoothed data and little difference in the sum of squared error (SSE) values obtained from fitting the smoothed data to the true signal, for example. If, however, the minima in the least-squares function and maxima in the entropy function are sharp, even small differences in the smoothingparameters, as determined from the two methods, will have a pronounced impact on the quality of the smoothing. To evaluate the significance of differences in seta of optimal parameters, SSE values are calculated by fitting the optimal, smoothed data produced from both functions to the true signal. In Table I, the average minimum SSE value produced by fitting data obtained from the smoothing using prior knowledge of the signal (the leastsquares method) was 0.14390. The average SSE value calculated by fitting the signal obtained from the information entropy-optimized smoothing (the entropy method) to the noisy data was 0.143 92. The difference of only O.OO0 02 in the average minimum SSE values for the two methods clearly indicates that the entropy-based selection of smoothing window, on average, produces smoothing resulta which are essentially identical to those obtained from least-squares fitting of data using prior knowledge of the true signal shape. Unlike the least-squares smoothing method, however, the entropy-based selection of smoothing parameters presumes no prior knowledge of signal or noise. Polynomial Smoothing. One set of studies evaluated the effectiveness of polynomial smoothing. Considerable work on polynomial smoothers has been reported by Willson and Edwards' and by others,15-22 but no previous work has appeared on the optimization of these smoothers for a particular signal. It was of interest to determine the success with which the entropy could be used to optimize the two adjustable parametersin the polynomial smoother: smoothing window size and polynomial order. Both can be varied to generate a series of smoothed data, for which an entropy surface can be generated, and a maximum located. This was done for Gaussianpeaks with varying peak widths and signalto-noise ratios of 10. An exhaustivesearch was used to locate the optimal smoothingparameters for polynomial smoothing of Gaussian data at low (maximum S/N = 10) and medium (maximum S/N = 100)signal-to-noise ratios. Average resulta from smoothing of 10noisy data each are summarized in Table 11. While a correlation exists between the true optima for least-squares smoothing and those selected by entropy maximization, there is scatter in the optimal polynomial order and smoother window selected by maximizing the informational entropy. This result can be explained by considering the nature of the transfer function for polynomial smoothers. Because changes in smoother window and order cause relatively minor changes in the smoother transfer function, as compared to changing the length of the rectangular window transfer function for a Fourier domain smoother, the twodimensional entropy surface for polynomial smoothing should not be as steeply pitched to the single, obvious minimum seen in plota of the SSE value as a function of the size of a

Table 11. Comparison of Best Smoothing Parameters for Gaussian Peaks, Variable-Order Polynomial Smoothing. optimum window optimum order sum of the squared error value pta LSQb entropy LSQ entropy LSQ entropy fwhh method SDc method SDc method SDc method SDc method SDc method SDC 7.4 2.4 0.8 2.7 2.2 3.7 44.6 0.08 0.02 0.09 0.03 SIN 40 44.0 0.21 0.14 10.9 4.3 2.2 3.6 1.8 10 20 38.0 11.1 37.4 0.15 0.04 10 26.8 36.8 11.7 0.5 0.5 10.2 5.1 3.1 5.1 3.0 0.19 0.05 5 13.4 4.9 23.0 10.2 3.4 2.1 5.3 2.9 0.34 0.06 0.6 0.3 S/N 40 37.0 10.0 48.8 0.6 3.2 1.0 0.0 0.4 O.OOO8 0.0003 0.0159 0.0016 0.0119 0.005 3.4 6.2 1.1 2.2 0.4 0.0014 0.0004 100 20 38.2 5.2 41.0 10 0.0029 0.0007 0.0142 0.006 22.0 6.4 7.3 6.3 2.1 5.5 2.1 28.0 5 12.6 3.0 0.017 0.007 14.4 4.0 5.8 1.5 5.1 2.1 0.0051 O.OOO9 a

*

Polynomial order varied from 2 to 9. Window size varied from 2*(order) - 1 to 128. Least-squares method. Standard deviation.

ANALYTICAL CHEMISTRY, VOL. 64, NO. 18, SEPTEMBER 15, 1992

2OM

Table 111. Comparison of Best Smoothing Parameters for Gaussian Peaks. ptafwhh

LSQmethod

40 20 10

11.3 18.8 25.6 37.3

5 40 20 10

5 40 20 10

5 40 20 10

5 a

optimum window sum of the square error value SD entropymethod SD LSQmethod SD entropymethod Polynomial Smoothing, Third Order 1.4 19.7 5.1 0.016 00 0.000 17 0.016 33 1.6 22.6 3.0 0.03300 0.000 20 0.03306 3.7 33.7 4.7 0.069 36 0.000 19 0.069 42 7.6 39.4 7.8 0.14379 0.000 11 0.143 80

6.3 8.9 15.7 24.3

1.8 1.0 1.8 1.7

6.6 8.8 14.6 20.4

5.1 7.9 14.6 24.3

1.2 1.3 1.4 3.0

4.5 7.2 13.6 18.6

13.2 17.8 18.2 15.6

3.9 6.3 1.9 2.7

16.1 21.6 20.4 20.0

Fourier Smoothing, Rectangular 2.0 0.14382 1.1 0.06930 2.0 0.033 10 5.1 0.01592

SD

difference in SSE

0.000 26 0.000 17 0.000 20 0.000 12

0.000 33 0.000 06 0.000 06 0.00001

Window 0.00008 0.00015 0.00014 0.000 14

0.143 83 0.069 31 0.033 13 0.016 05

0.OOO 08

0.000 17 0.000 16 0.000 26

0.000 01 0.000 01 0.OOO 03 0.000 13

Fourier Smoothing, Trapezoid Window 1.4 0.143 83 0.00009 1.6 0.069 23 0.000 18 2.4 0.032 93 0.000 15 5.8 0.01593 0.00023

0.14384 0.069 25 0.032 96 0.016 05

0.000 10 0.000 18 0.000 14 0.000 36

0.000 01 0.000 02 0.00003 0.000 12

0.143 97 0.069 60 0.032 93 0.015 54

0.000 08 0.000 16 0.000 22 0.000 32

0.000 00 0.000 01 0.OOO 02 0.000 03

Adaptive Smoothing 5.7 0.14397 5.0 0.069 59 3.8 0.03291 4.5 0.015 51

0.00008 0.000 15 0.000 21 0.00031

The maximum signal-tu-noise ratio for all studies reported in this table was 10.

Table IV. Comparison of Best Smoothing Parameters for Lorentzian Peaks. optimum window sum of the square error value SD entropymethod SD LSQmethod SD entropymethod Polynomial Smoothing, Third Order 2.8 24.8 4.0 0.016 85 0.000 13 0.017 05 0.035 20 2.6 26.2 4.6 0.035 10 0.000 11 9.3 23.9 11.5 0.15765 0.00007 0.157 65 6.4 9.3 8.6 0.15091 0.00006 0.15094

SD

in SSE

0.00021 0.00009 0.0000'7 0,00008

0.00020 0.00010 0.00000 0.00003

6.2 12.0 11.7 15.0

Fourier Smoothing, Rectangular Window 10.1 0.15093 0.00009 9.9 0.157 69 0.00012 3.0 0.035 11 0.000 10 2.3 0.01680 0.00021

0.151 14 0.157 82 0.035 21 0.016 89

0.00006 0.00014 0.00018 0.00021

0.00021 0.00013 0.00010 0.00009

1.5 7.7 2.5 1.4

23.7 11.9 8.0 11.3

Fourier Smoothing, Trapezoid Window 14.1 0.15093 0.00009 8.1 0.15769 0.000 12 1.9 0.035 10 0.000 11 3.8 0.016 78 0.00021

0.15094 0.157 70 0.035 22 0.016 97

0.00009 0.00012 0.00019 0.00033

0.00001 0.00001 0.00012 0.00019

2.1 2.8 2.1 1.0

7.2 5.1 6.2 5.6

0.151 05 0.157 84 0.035 95 0.017 52

0.00006 0.00006 0.00009 0.00013

0.00000 0.00000 0.00001 0.00000

ptafwhh

LSQmethod

40 20 10 5

13.8 20.0 21.4 19.7

40 20 10 5

9.3 12.7 15.7 19.0

2.2 6.5 2.0 1.1

40 20 10 5

8.0 13.3 12.6 18.0

40 20 10 5

5.8 5.1 7.6 6.4

a

Adaptive Smoothing 4.7 0.151 05 2.8 0.15784 2.3 0.035 94 2.5 0.017 52

0.00006 0.00006 0.00009 0.000 12

difference

The maximum signal-to-noise ratio for all studies reported in this table was 10.

rectangular smoother window. Instead, a broad, shallow entropy maximum exists. While, as expected, one combination of smoother window and polynomial order was found to give the best smoothing, several other combinations of smoothingwindow and polynomial order often gave smoothed signals with entropiesthat were only alittle smaller, especially when narrow peaks with high noise levels were smoothed. To reduce the scatter in the selection of optimal smoothing parameters, a more restricted set of studies was carried out. In these, a polynomial smoother with fixed order was used to process the data, and the entropy was used to optimize only the smoothing window. The order of the polynomial smoother determines the moments of the data that pass unaffected by the smoothing. For example, a third-order smoother will preserve the zeroth through the third moments of a noisy signal-the area, the mean, the variance, and the kurtosis-while a fist-order smoother preserves only the zeroth and f i i t moments (area and mean) of the signal.13 For this reason, smoothere of second-degree and higher were

evaluated. The entropy was calculated for the smoothed data, as discussed above, and the plot of entropy versus smoothing window size was used to locate the optimal smoothing window size. Only results from application of the third-order polynomial smoother are reported here, as other smoothers behaved similarly. Results from the application of the third-order polynomial smoother applied to noisy Gaussian signals of varying peak widths are given in Table 111. This table is divided into three vertical sections. The column in the f i s t section identifies the true signal shape and peak half-width. The middle columns of these tables comparethe average optimum window obtained by the matched smoothing (least-squares) method and the entropy maximization method. The standard deviation for these optimal window widthe are also given in this section. Ten trials were taken to determine the mean and standard deviation for eachof the extrema located. The righthand columns compare the averagesof the sum of the squared error obtained at the optima The standard deviation of thew

2064

ANALYTICAL CHEMISTRY, VOL. 64, NO. 18, SEPTEMBER 15, lQ92

window size

window size

c

I

1.2,

-0.4 I

0

20

40

60

80

100

120

140

160

I80

I

-0.2 1 0

200

20

40

60

Channel Number

'

-0.2 0

80

100

120

140

160

180

I

200

C h m l Number

I 20

40

60

80

100

120

140

160

180

200

Channel Number

Fburo 2. Least squares and Shannon entropy functions for Fourler smoothing. (a) The SSE value determined as a function of the ske of the smoothing window In the Fourier domain. (b) The information entropy determined as a functlon of the ske of the smoothlng window in the Fowler domaln. (c) The nolsy simulated data used to calculate these functions. The channel number indicates the ordlnai number of the data. (d) me estimated results compared to the true signal using SSE optimization. (e) The estimated results compared to the true signal uslng information entropy optimization.

SSE values are also given in this section. The last column in this section gives the difference between the two averaged sums of the squared error values obtained by the two diferent methods. The results indicate that, on average, entropyoptimized smoothing identifies the same window size as that selected with prior knowledge of the signal. Note that the window size increases as the peak width of the Gaussian peak decreases. Thisis consistent with the fact that narrower peaks contain higher frequency components (and more non-zero signal frequencies in the Fourier domain) than broad peaks. The third-order polynomial smoother was also applied to noisy Lorentzian peaks with various fwhhvalues. The results are summarized in Table IV. In Table IV there is the same

inverse correlation between the number of points used in the best smoothingwindow and the fwhhof the peak. The smaller the number of points at the fwhh for the peak, the larger the number of points necessary in the window used by the polynomial smoother to obtain optimal smoothing.1 For the Lorentzian peaks, however, entropy-based smoothing gave more variable estimates for the optimal window size, and the entropy optimization of the polynomial smoother was not as accurate as for the Gaussian peaks. The lower quality of the results was found to result from the method used for normalizing the smoothed peaks to constant maximum intensity for comparisons of smoothing results, however, and not from any inherent limitation in smoothing or in optimizing

ANALYTICAL CHEMISTRY, VOL. 84, NO. 18, SEPTEMBER 15, 1992

the smoother. Taking the point of maximum peak height of a smoothed Lorentzian as the scaling factor means that any error in the point determining the peak maximum translates to error in scaling the entire peak, which shows in the SSE of the fit to the true peak. By using a different normalizing procedure, where several points are averaged to find the peak maximum, noisy Lorentzian peaks can be smoothed very accurately. That normalizationprocedure is not as well suited for Gaussian peaks asthe single point normalization,however. Fourier Smoothing. The rectangular Fourier-domain smoother has only one adjustable parameter-the smoothing window size. Smoothing was investigated as a function of that parameter. Fourier smoothing was performed following methods previously reported3 the fourier transform of the data vector was obtained, then both the real and imaginary parts of the transformed data were multiplied by a rectangular weighting function, and finally the inverse Fourier transform was taken. The rectangular function was constructed so that frequency components were passed unchanged (where the rectangular function was one) or were attenuated to zero (where the rectangular function was zero). To reduce the noise in a systematic fashion, the number of points set to a value of one in the rectangular function was increased from one to the total number of points in the data vector. The entropy of these smoothed vectors was calculated and plotted as a function of the number of points used in the rectangular portion of the weightingfunction. The SSEvalues from fitting the smoothed vectors to the true peaks were also calculated and plotted as discussed above. The optimal window sizes obtained from the least-squares and entropy functions were then compared. Typical plots of the SSE and entropy as functions of the smoothing window size are given in Figure 2. The agreement was found to be very good. T w o different types of weighting functions have been used with Fourierdomain smoothing by others.3 One is the rectangular window discussed above. Entropy maximization was evaluated using Fourier smoothing with a rectangular window applied to the same peaks as those used for the polynomialsmoothing.Tables I11 and IV list the results of smoothing the noisy Gaussian and Lorentzian peaks. In general, Fourier-domain smoothing gave slightly better results than those obtained from polynomial smoothing. There again was a correlation between the width of the peak and the number of points necessary for optimal smoothing in the rectangular portion of the weighting function: the wider the peak, the smaller the number of points necessary. Again, the entropy optimization method proved to be more successfulwith Gaussian peaks than with Lorentzian peaks because the same normalizationmethod was used. The other common weighting function recommended for Fourier-domain smoothing has a trapezoidal shape.3 As Tables I11 and IV show, for the noisy Gaussian and Lorentzian peaks examined here, a simple trapezoidal function gave results similar to those observed with the rectangular window. These results can be explained by examining Table I more closely. The Gaussian signal can be described exactly using only the f i t 10frequencies,and averyreasonable description can be obtained using only 6 frequency components. Therefore, any weighting function that does not attenuate the first 6-10 frequencieswill give very similar results, so long as most of the higher frequency noise is attenuated. The aliasing expected from truncation3 is not a problem if signal frequencies have zero intensity already. Aliasing only occurs where non-zero signal frequencies are truncated, a situation that is relatively easily avoided in these data sets. By finding the optimal rectangular function, the use of other types of weighing functions is not necessary to obtain satisfactory results in smoothing these simple peak shapes. Adaptive Polynomial Smoothing. Finally, the entropy optimization method was also used to examine the optimal window size for use with the adaptive, running-average

a

2065

XI03

5,

4.5

I

-

43.5

f

-

32.5 2 1.5

I

f

ta I

4.01J50

400

450

500

550

650

600

700

750

800

850

wavelength (nm)

Flgure 3. (a) The spectrum of copper(I1) nltrate measured in dilute sdutlon (-), withsmoothed spectrum (- -), and a normakeds p e d ” measured at high signal to noise ratio k).Smoothing was optimized with Shannon entropy. The normalized spectrum has been offset for clarity. (b) The spectrum of nickeI(1I) nitate measured in dilute solutkn (-), with smoothed spectruin (---), and a normalized spectrum measured at high signal to noise ratio k.).Smoothlng was optimized with Shannon entropy. The normalized spectrum has been offset for clarity. (c) The spectrum of cobah(1I) nltrate measuredIn dilute dutlon (-), with smoothed spectrum (---), and a normailzed specbum measuredat hlgh slgnal to nolse ratto k).Smoothhg waa opthnlzed with Shannon entropy. The normallzed spectrum has been offset for Clarlty.

-

smoothing method of Minami and Kawata.a This method for adaptive smoothing also has two adjustable parameters: the smoothing window size and the estimate of the global signal variance. While both parameters can be varied, a detailed study of the effects of changing only the smoothing

2008

ANALYTICAL CHEMISTRY, VOL. 64, NO. 18, SEPTEMBER 15, 1992

Table V. Comparison of Best Smoothing Parameters for Metals metal

LSQ method

co

36.6 57.0 20.0

cu Ni

optimum window sum of the square error value SD entropy method SD LSQ method SD entropymethod Polynomial Smoothing, Third Order 9.5 47.5 13.7 0.073 06 0.OOO 08 0.073 07 9.1 4.3

56.8 26.8

1.5 2.1 2.1

5.1 4.0 14.5

9.4 10.0

0.122 46 0.066 75

0.OOO 05 0.OOO 04

SD

difference in SSE

0.122 46 0.066 76

O.OOOO8 O.OOOO5 o.OOOo4

O.OOOO1 O.OOOO0 o.OOOo1

0.073 09 0.122 43 0.066 76

O.OOOO7 O.OOOO5 O.OOOO3

O.OOOO2 O.OOOO0 O.OOOO1

Fourier Smoothing, Rectangular Window

co

cu Ni

7.0 4.4 1.1

1.5 1.6 9.1

window size is reported here. The approach taken in these studies was identical to that used for the polynomial smoothing studies. As in those studies, smoothing of a Gaussian peak by adaptive smoothers with varying smoothingwindow size and varying estimates of the global noise variance was performed. The estimates of the global noise variance were permitted to range from one-tenth of the true noise level to 10 times the true noise level. Tables I11 and IV give the results for adaptive smoothing of Gaussian and Lorentzianpeaks, respectively. This method produced essentially equivalent results from smoothing the Lorentzian peaks and the Gaussian peaks. The sum of the squared error values for the various peak were slightly lower than those obtained in the Fourier smoothingand polynomial smoothing, but the small difference in smoothing falls well within the confidence intervals set by the fitting. This similarity of performance indicates that all three smoothing methods give similar results, once their smoothingparameters are optimized. Comparing the Smoothing of Noisy, Visible Spectra. As an additional check of the adequacy of smoothing by entropy-selected smoothing windows used with these smoother algorithms, noisy, experimental UV-visible spectra were also smoothed. The spectra of nickel(II), copper(II), and cobalt(111, obtained by measurement of dilute solutions of metal nitrate in 0.1 M HN03, are shown in Figure 3. These spectra provide a somewhat representative sample of UV-visible spectra found in the literature: the spectrum of copper(I1) is very simple,while that of cobalt(I1) is a little more complex, and nickel(I1) the most complex, with two significant peaks with different peak widths. Each was smoothed by several of the methodsdiscwed above, but only the results of Fourierdomain smoothing are shown. As Figure 3 shows,the Fourierdomain smoothing is successful, but small changes in the metal spectra with concentration occur as a result of shifta in the hydrolysis equilibria, even in 0.1 M HN03.N To permit an objective evaluation of the smoothing in the absence of concentration-dependent hydrolysis, noise of a form similar to that observed in the dilute spectra shown in Figure 3 was added to noise-free spectra of the three metals in 0.1 M HN03. As before, sets of noisy spectra were generated by creating random sets of noise vectors. The sets of noisy spectra were then smoothed by using both Fourier smoothing and thirdorder polynomial smoothing methods,as s u m "* d in Table V. The results, on average, are similar for both smoothing techniques. Fourier smoothingwith a rectangular weighting function gave slightly more consistent results, as shown by the smallerstandard deviation for the extrema obtained. This is not surprising, consideringthat all of the smoothing methods are very closely related. The Fourier smoothing might be expected to perform better because no high-frequencynoise is passed above the cutoff frequency, while the polynomial smoother and adaptive smoother pass some small amounts of high-frequency noise, as a comparison of transfer functions shows. (34) Hartley, F.R.;Burgees, C.; Alcock, R. Solution Equilibria; Hal-

stead Press: New

York, NY,1980.

0.073 07 0.122 43 0.066 75

0.OOO 06 0.OOO 05 0.OOO 03

CONCLUSIONS All of the smoothingmethods discussed here remove noise by attenuating the high frequencies. The critical aspect for these methodslies in setting those parameters which eliminate noise frequencies. To improve the polynomial smoothing method, more information concerningthose frequencieswhich contain both noise and signal must be obtained. As shown above, it is possible to =fine-tune" the shape of the transfer function somewhat by simultaneously optimizing the smoothing window and the degree of the smoothing polynomial. With the adaptive smoother, the additional optimization involves selecting a suitable value for the global noise variance, the parameter that turns the adaptive smoother off and on as noisy regions are encountered. As with the single-parameter optimizations reported above, all multiparameter optimizations can be done by exhaustive searches so that the nature of the extremum is identified, but other methods for locating optima (e.g., a simple search or simulated annealing of the entropy as a function of the adjustable smoothing parameters) could be used to speed routine optimizations. While the entropy optimization method is general, the optimum obtained is specific to filtering or smoothing for a particular combination of smoother, noise, and signal. It is not possible to optimize the fiitering or smoothing for a set of unknown signals in advance because the entropy constraint-the shape of the noisy signal-is lacking. Unless the noisy signal shape is known (or can be approximated) in advance, it is necessary to reoptimize the fiitering or smoothing parameters for each signal and noise combination. Finally, it should be noted that the entropy optimization method reported here can be applied to data other than spectra and smoothers other than the ones selected for discussion here. The maximum entropy solution is the optimal solution when the data are corrupted by noise due to undersampling27 but is not optimal for removing white noise overlapped with data at all frequencies. Because truncation of noise-containing data amounts to an undersampling of those data, when there are two regions in frequency space, one with data and noise, and the other with only noise, the entropy-optimizedsmoothing should be nearly optimal. Given the overlap of noise and signal, however, it should be apparent that the smoothing,even when optimized, is not a panacea for carelessdata acquisition;even with optimal smoothing, some noise remains imbedded in the smoothed result.

ACKNOWLEDGMENT This work was supported by the Division of Chemical Sciences, Office of Basic Energy Research, of the U.S. Department of Energy, under Grant DE-FG02-86ER13542.

RECEIVED for review February 12, 1992. Accepted June 15, 1992.