Generalized digital smoothing filters made easy by matrix calculations

Generalized digital smoothing filters made easy by matrix calculations. Stephen E. Bialkowski. Anal. Chem. , 1989, 61 (11), pp 1308–1310. DOI: 10.10...
1 downloads 0 Views 393KB Size
1308

Anal. Chern. 1989, 67, 1308-1310

(3) Schulze, G.; Frenzel, W. Anal. Chim. Acta 1984, 759,95. Wehmeyer, K. R.; Wightman, R. M. Anal. Chem. 1985, 5 7 , 1989. Ciszkowska, M.; Stojek, 2. J . Nectroanal. Chem. 1985, 191, 101. Baranski, A. S.;Quon, H. Anal. Chem. 1986, 58, 407. Golas, J.; Osteryoung, J. Anal. Chim. Acta 1988, 787, 211. Howell, J. 0.: Kuhr, W. G.; Ensman, R . E.; Wightman, R. M. J . Electroanal. Chem. 1988, 209, 77. (9) Golas, J.: Galus, Z.; Osteryoung, J. Anal. Chem. 1987, 59,389. (10) Baranski, A. S.Anal. Chem. 1987, 59,662. (11) Sottery, J. P.; Anderson, C. W. Anal. Chem. 1987, 59, 140. (12) Li. L. J.; Fleischmann. M.; Peter, L. M. Nectrochim. Acta 1987, 32, (4) (5) (6) (7) (8)

1585. (13) Stojek, 2.; Osteryoung, J. Anal. Chem. 1988, 6 0 , 131. (14) Pons, J. W.; Daschbach, J.; Pons, S.; Fieischmann. M. J . Nectroanal, Chem. 1980, 239, 427. (15) Lange’s Handbook oichemisfry, 13th ed.; Dean, J. A , , Ed.;McGrawHill: New York, 1987; pp 1-13.

(16) Aoki. K.; Osteryoung, J. J . Electroanal. Chem. 1984, 760. 335. (17) Hassan, M. Z.; Unterecker, D. F.; Bruckenstein. S. J . Electroanal. Chem. 1973, 4 2 , 161. (18) Metals in Mercury; Solubility Data Series; Hirayama, C., Gaius, Z.,Guminski, C., Eds.; Pergamon Press: London, 1987. (19) Stojek, 2.; Zublik, Z. J . Nectroanal. Chem. 1975, 6 0 , 349. (20) Levart, E.; d’Ange d’Orsay, E. P. J . Nectroanal. Chem. 1966, 12, 277.

RECEIVED for review October 17, 1988. Accepted March 10, 1989. This work was supported in part by the U S . Office of Naval Research and by the Grant 01.17.04.01 from the Polish government.

Generalized Digital Smoothing Filters Made Easy by Matrix Calculations Stephen E. Bialkowski Department of Chemistry and Biochemistry, U t a h State University, Logan, U t a h 84322-0300 Reported herein is a simple procedure for the determination of the impulse-response filter function used for data smoothing by the convolution process ( I ) . This procedure is based on the use on matrix techniques used in regression analysis and included in many software packages or algorithms found in many texts on numerical analysis (2-4). It is a general technique and so may be used to determine the filter function coefficients for Savitsky-Golay type power series polynomials ( 5 ) ,finite Fourier series, and almost any other basis functions. The essence for the least-squares smoothing filters made popular by the work of Savitsky and Golay is regression of the raw data within a finite and “moving” interval, with some function or set of functions which the user chooses as being appropriate for modeling the particular data ( 1 , 5 , 6). In the analysis by Savits!cy and Golay, a power series in the abscissa dimension is used as the model function and the regression analysis is performed analytically in order to obtain the filter coefficients. These filter coefficients make up what is known as the impulse response of the filter ( I ) . The latter is convoluted with the raw data to obtain the smoothed data estimate at the middle point of the sample interval. In addition to the Savitsky-Golay middle point estimators, there have been reports of filters that may be used to estimate other than the central point. In particular, Proctor and Sherwood (8) used orthogonal polynomial filters which allowed estimation of any data within the sample interval, and Leach et al. used the least-squares method to determine the coefficients for an initial point filter (9). As a consequence of the rather tedious task of obtaining the filter coefficients in analytical form, one normally determines only those coefficients that are the estimators for either the middle or end points of the odd smoothing interval. Subsequently, filter smoothing of those points that are close to either the start or the end of the raw data sequence has historically presented a problem. Solutions to this end point problem have ranged from their elimination from the smoothed data set to partial smoothing by simple averaging (6, 7 ) . This loss or partial filtering of data is not necessary. In general, regression analysis can be used to determine the coefficients of the basis set functions which best fit, in the least-squares sense, the raw data over the entire smoothing interval. As an example, consider the power series polynomial given by x [ k ] = a. + a,h a2k2 ... + a,km (la)

+

+

0003-2700/89/0361-1308$01 50/0

or equivalently m ,,.

x [ k ] = CUjk’ j=O

where x [ k ] is the raw data sequence, k is the sequence index, and the a, are the coefficients of the mth order power series to be determined by regression. Equation 1 is actually a single element of a more general problem of estimating each x[k] within the current data interval. The least-squares solution for the coefficients which describe the x [ k ] data over this interval is well-known (2, 4). Cast in the form of a matrix equation, the least-squares solution for raw data with constant variance may be written (4) minllx - Valla

(24

with the simple solution

x = Va

(2b)

where x is a column vector of length n, a is the length m + 1 column vector of coefficients, and V is the n by m + 1matrix of basis functions arranged in columns. With the power series basis set, V is the Vandermonde matrix with elements ui, = i1-l. The minimum p2-norm expression in eq 2a is equivalent to finding those coefficients that minimize the errors in the least-squares sense. Although there are fast algorithms for determining the coefficients of Vandermonde systems (4),there is information in the end result of using the pseudoinverse solution that may be used to determine relative variances. The solution for the coefficients vector in eq 2 obtained by using the pseudoinverse of V is V+x = a (3) where V+ = (VTV)-’VT,and superscript T indicates the transpose. The determination of a describes the data in terms of the power series coefficients, but it does not provide the data estimate, that is the smoothed data. To obtain the estimate, the coefficients are used to reconstruct the data over the interval by multiplication by the basis set matrix f = (VV+)x

(4) where 2 is the estimate of x within the span of the basis set and is the same vector length as the raw data vector. We herein call VV+ an estimation matrix since it estimates each point within the sample interval. 6 1989 American Chemlcal Society

ANALYTICAL CHEMISTRY, VOL. 61, NO. 11, JUNE 1, 1989

The estimation matrix obtained by the pseudoinverse in eq 4 is in fact equivalent to that which would be obtained by using orthogonal polynomials. T o see this, consider the QR factorization, V = Q R ( 3 , 4 ) . The orthonormal factorization is such that Q is an n by m + 1column matrix of orthonormal vectors and R is an m 1 by m 1 upper triangular factorization matrix of linear combination weights. The weights in R are those required to reconstruct the original basis from the orthonormal one. Substituting Q R for V, eq 4 may be written as

+

+

2 = Q(RR+)QTx

and thus it = Q Q T x

(54

Clearly, the estimation matrix obtained with the original basis set and that obtained if an orthonormal basis is used are equivalent. In fact, the orthonormal basis is often easy to predict. For example, the Gram-Schmidt QR factorization of a power series basis results in the Legendre polynomials (3). There are other points to be made here. First, each vector in QQT or VV+ is normalized to the extent that the sum over the vector elements is unity. Second, the fact that eq 5d is not a function of the factorization matrix implies that any polynomial basis of the same order will result in an equivalent estimate of the data. This is true because the span of all polynomial basis of the same order is the same (3). Finally, the orthogonal basis is much easier to perform numerical calculations with. That is, the computation of QQT will in general have fewer errors than that of VV+. Since both estimations matrices give the same results, it is recommended that the orthogonal basis be used to avoid computation errors. Starting with either a polynomial series or an orthogonal basis, the estimation matrix is a symmetric, n by n matrix that when multiplied by the data vector, x, results in the estimate of all x, within the interval. Thus each row (or equivalently each column, since it is symmetric) of the estimation matrix is itself a filter impulse-response function that may be used to estimate the corresponding point in the smoothing interval. Since the sum over the individual elements of each impulse-response vector is unity, the DC gain of each is one. For an odd sample interval, the middle vector is the usual Savitsky-Golay middle point smoothing filter. The elements of this middle vector are in fact the filter coefficients. After the matrix computations involved in the determination of the estimation matrix, this vector may be extracted for convolution with the raw data in the usual fashion. But in addition to the middle point impulse-response filter vector, the matrix also contains the impulse-response filter vectors for estimation of all points within the filter interval. These other vectors may be used to obtain estimates of the data at any point along the raw data sequence. For example, the first vector is the “initial point” least-squares polynomial filter used in the analysis of rate data (9, I O ) . The derivatives of the raw data can be also determined by using the matrix techniques as long as the derivative can be expressed as a linear combination of the original basis set elements. For the power series in eq l a , the derivative of the measurement data, x [ k ] ,with respect to the abscissa index is

xTk] = 0

this equation that the derivative is a linear combination of the original basis set. The difference is that the original coefficients are shifted by one place and there is an additional factor that is multiplied by each term in the series. The shifting of the coefficients and the multiplication can be represented as a matrix. First, recall that the coefficients are determined by multiplication of the raw data by the pseudoinverse V+, as given in eq 2. Coefficient shifting and scaling is performed by subsequent multiplication by a matrix that is all zero except for the first upper off-diagonal. The finite first upper off-diagonal quality of this matrix will perform the coefficient shifting. The elements of this off-diagonal are the multiplication factors indicated by eq 6. Inspection of eq 6 reveals that these elements are equal to the row number. Thus the m 1 by m 1 matrix which “maps” the coefficient for estimation of the derivative is

+

and since QTQ = I

+ ul + 2a2k + 3a3k2+ ... + rnu,km-l

(6)

where the prime indicates the derivative. It is apparent from

1309

+

.=lo

0

0

2

0

0 3

: :I

0

(7)

Finally, an n by n matrix which is made up of row filter impulse-response vector functions is calculated by multiplication by the original basis set matrix. The resulting matrix equation for derivative estimation is it’ = (VDV+)x

(8)

And as with the data estimation matrix discussed above, the rows of this matrix can also be selected so as to estimate the derivative at the beginning and end of the data (9). Although the end data can be smoothed with a function of the same order as that of the central data, the noise rejection is not equivalent ( I O ) . The variance in the output is related to the filter vector elements by u:(output)/u2(input)

=

xui:

(9)

I

where uij are the individual elements of the j t h filter vector. In term of the estimation matrix, the output variance ratio is given by the diagonal elements of the matrix formed from the matrix-matrix transpose product. Thus for the signal estimate, the variance ratios are the diagonal elements diag{(VV+)(VV+)TJ = diag(VV+J

(loa)

while for the derivative estimate, the variance ratios are diag{(VDV+)(VDV+)T)= diag(VD(VTV)-’DTVT) ( l o b ) Equation 10 facilitates the ratio calculations, and in particular, eq 10a gives the ratios as the diagonal elements of the estimation matrix itself. Figure 1 illustrates smoothing filter variance ratios as a function of both the order of the polynomial and the particular element in the smoothing interval. An odd smoothing interval equal to 25 data points was chosen in this case. Notice that the variance ratio for the middle point, on the right hand side of the figure, has a simple dependence on the order of the polynomial. In fact, the variance ratio of the middle point smoothing vector is just the inverse degrees of freedom. Since middle point smoothing does not depend on the odd powers in the power series basis (5,6,I O ) , this variance ratio is simply 1/(N - m / 2 ) ,where m / 2 is the integer portion of the fraction. Another point illustrated in Figure 1 is that all variance ratios consistently increase with increasing order of the basis. Of course, matters such as the frequency response must also be taken into account in choosing the order of the basis ( I O ) . The last point to be noticed in this figure is that the even order polynomial filters have minimum variance ratios which are not at the midpoint. If the frequency response of these “off

Anal. Chem. 1989, 67, 1310-1312

1310 l.0y

a e

0. 6

set function that is to be used to smooth the raw data. Next, the sample interval is specified and the basis set matrix, such as V, is calculated. The pseudoinverse is multiplied by the basis matrix to result in the matrix of rowwise impulse-response vector functions. For an odd smoothing interval of n = 2 N + 1, the first N 1 row vectors are used to estimate the values of the first N 1points. The interval of estimation is the first 2 N 1 raw data. These are not moving averages. Next, the middle, or ( N + 1)th row is convoluted with the middle raw data in a moving average algorithm. This convolution is performed up to the last point able to be estimated with this vector, N index units prior to the end of the data. The last N points are smoothed by using the last N row vectors in the matrix. As with the first data, the same interval is used for all N estimations. The effect of smoothing by using this type of filter is illustrated in Figure 2. The raw data was obtained by adding synthetic white noise to points described by a second-order polynomial. Although there are only 100 data points shown, a third-order filter with a smoothing interval length of 25 was used. Notice in particular the initial and final points of the smoothed data. Multipass smoothing can be performed by either successive application of the algorithm or by self-convolution of the impulse-response vector functions prior to a single-pass smooth (7). The latter is only useful for estimation of the data, not derivatives.

+ +

+

0. 4

0" c

0 . 0 i - i - T T - T 1 2 1 1 1 9 9 B

,

7

I

I

I

I

6

E

4

3

OFFSET

,

2

,

1

I

0

INDEX

Flgure 1 . Filter output variance ratio as a function of the location of

the estimate within the smoothing interval (offset index on horizontal axis) and the order of the polynomial (separate curves). The relative variance is symmetrical about the middle index point notated here as zero offset Thus only the first 13 out of a total of offset points are shown. The polynomial order ranges from 1, the bottom curve, to 7 at the top. The variance ratio lines do not cross but do coincide at certain points.

O.sP

I

I

i W

0 3

+ z

ACKNOWLEDGMENT The author in indebted to one reviewer for pointing out that ref 8-10 report methods that are similar to those of this report. LITERATURE CITED

0 4

I 4

z

-0.4

~

0

I

I

I

25

50

75

I

100

SEQUENCE INDEX Figure 2. Output of a third-order poiynomial fitter of wklth 25 smoothing 100 synthetic raw data points. Raw data are shown as points and the

(1) Bialkowski, S. E. Anal. Chem. 1988, 6 0 , 355A-361A. (2) Bevington, P. R . Data Reduction and Error Analysis for the Physical Sciences: McGraw-HIII: New York, 1969. (3) Johnson, L. W.; Riess, R. D. Numerical Analysis. 2nd ed.; AddisonWesley Publishing: Reading, MA, 1982. (4) Golub, G. H.; VanLoan, C. F. Matrix Computations; Johns Hopkins University Press: Baltimore. MD, 1983. (5) Savitsky, A,; Golay, M. J. Anal. Chem. 1964, 3 6 , 1627-1639. (6) Enke, C. G.; Nieman, T. A. Anal. Chem. 1976, 4 8 , 705A-712A. (7) Bromba, M. U. A.; Zlegler. H. Anal. Chem. 1983, 55, 1299-1302. (8) Proctor, A,; Sherwood, P. M. Anal. Chem. 1980, 52, 2315-2321. (9) Leach, R . A,; Carter, C. A,; Harris, J. M. Anal. Chem. 1984, 5 6 , 2304-2307. (10) Wentzell, P. D.; Doherty, T. P.; Crouch, S. R. Anal. Chem. 1987, 5 9 , 367-371.

smoothed data as a continuous line. center" filters is adequate for the particular data, then better signal to noise ratios will be obtained by using these. The algorithm for single pass smoothing using the matrix computation scheme is apparent. One first selects the basis

RECEIVED for review July 15, 1988. Resubmitted January 24, 1989. Accepted March 9, 1989. This work was supported in part by CHE-8520050 awarded by the National Science Foundation.

Continuous Flow Fast Atom Bombardment Mass Spectrometry Using a Modified Electron Impact Source Colin S. Creaser* and Susan Crosland Srhool of Chemical Sciences, University of East Anglia, Norwich NR4 7TJ, U.K

Continuous flow fast atom bombardment mass spectrometry (FAB-MS) (1-3) has been the subject of considerable interest recently because the range of solvents that may be used as matrix materials at room temperature is extended to include more volatile liquids, so enabling the technique to be used as a high-performance liquid chromatography (HPLC) interface ( 3 ) . The continuous flow of solvent may also overcorne sonie of' the quantitation problems experienced in 0003-2700/89/0361-13 lO$O 1.50/0

conventional FAB-MS (1) by reducing the effect of highly surface-active components of complex samples that concentrate at the liquid surface and dominate the spectrum. In order to use FAB for the qualitative and quantitative analysis of organophosphorus compounds (4-6), we required a continuous flow system that could be used regularly without causing undue disturbance to the normal operation of the spectrometer. In existing commercial instruments changing 1989 American Chemical Society