Relationship between Digital Filtering and Multivariate Regression in

Dec 15, 1992 - Relationship between Digital Filtering and Multivariate Regression in Quantitative Analysis. Anal. Chem. , 1992, 64 (24), pp 1155A–11...
0 downloads 0 Views 12MB Size
REPORT

Relationship between

Digital Filtering and M u l t i v a r i a t e Regression in Quantitative Analysis Chris L. Erickson, Michael J. Lysaght1, and James B. Callis Center for Process Analytical Chemistry Department of Chemistry, BG-10 University of Washington Seattle, WA 98195

Analytical chemistry can be defined as the science dealing with the identification and quantitation of a sample's physical and chemical properties (state variables). The analytical variable most commonly probed is chemical concentration. An obvious way to determine chemical concentrations is to isolate all analytes and then determine their masses. Unfortunately, this type of analysis procedure can be time-consuming, laborious, or even impossible to perform. Therefore a great deal of effort in analytical chemistry has been expended on the development of a u t o m a t e d electronic instrumentation capable of measuring analytical state variables by some indirect means. Generally this involves performing a n e x p e r i m e n t on t h e s a m p l e in which a physical or a chemical stimulus is applied to the system. The an-

1 Current address: Department of Chemistry, U.S. Air Force Academy, USAFA, CO 80840

0003-2700/92/0364-1155A/$03.00/0 © 1992 American Chemical Society

alytical response to this perturbation is then measured. Unfortunately, the analytical m e a s u r e m e n t t h u s obtained is not a direct measure of the state variable(s) of interest, for the following reasons: the measurement process itself disturbs the system; the measurement scheme is not perfectly selective, and therefore interferences can contaminate the signal a r i s i n g from t h e a n a l y t e ; random noise in t h e system corrupts t h e measurement; or a theoretical relationship must be employed to obtain the state variable(s) of interest from the signal. Such a relationship may not be known in advance, or it may be complex. As a result, analytical m e a s u r e m e n t s are generally processed in some way to remove interferences, reduce noise, and extract the information related to the desired state variables. The common approach to developing an instrument for quantitative analysis is univariate and linear in nature (i.e., the sensing means is designed to be as selective as possible so that only a single measurement is needed to estimate the concentration of the analyte of interest). This minimizes postexperimental signal processing. An example of such a system would be an ion-selective electrode. Unfortunately, the perfectly selective

sensing system has yet to be develo p e d , b e c a u s e all m e t h o d s a r e plagued to some degree by interferences. Obviously, interferences can be avoided or minimized by adequate sample cleanup, but this approach generally requires time-consuming manual operations. An alternative approach is to use multichannel i n s t r u m e n t s t h a t incorporate multiple sensors, each of which is characterized by its partial selectivity toward a particular analyte. The r e s p o n s e t h e n becomes multivariate in nature, and a pattern t h a t allows correction for interferences and drift, and even permits simultaneous multicomponent analysis, can be developed. An example of an analytical method based on this approach is t h e n e a r - I R spectroscopic analysis of grain for protein, starch, and moisture content (1). Over t h e p a s t decade two a p proaches have emerged for processing multivariate m e a s u r e m e n t s contaminated by noise and interferences: digital filtering and multivariate regression. Bialkowski has re*· viewed t h e applicability of digital filter theory for this purpose (2, 3), whereas Beebe and Kowalski (4) as well as others (5, 6) have discussed the merits of multivariate regression-based calibration and predic-

ANALYTICAL CHEMISTRY, VOL. 64, NO. 24, DECEMBER 15, 1992 · 1155 A

REPORT tion. Because these seemingly dis­ parate approaches, which use different terminologies and m a t h e ­ matical notations, possess the identi­ cal goal of optimally deriving state variables from analytical measure­ ments, the question naturally arises as to what relationship exists be­ tween the two. Accordingly, the pur­ pose of this REPORT is to quantita­ tively explain and compare digital filtering and multivariate regression. Before proceeding, we will outline the scope of this article and introduce a contemporary problem in chemical analysis, which will later serve as an experimental means of illustrating and comparing digital filtering and regression methodologies. As regards scope, the techniques discussed here apply primarily to quantitation of d e t e r m i n i s t i c v a r i a b l e s . Also, we limit the discussion of digital filter­ ing to finite impulse response-type filters, and the discussion of multi­ variate regression to classical leastsquares and principal components regression. R e a d e r s i n t e r e s t e d in studying other interesting connec­ tions between digital filter theory and multivariate regression should consult References 7—11. Although we will use the independent variable time in our equations, other domains such as wavelength, voltage, a n d space can be substituted as long as the domains meet the minimal stipu­ lations described later. The chemical analysis problem we will examine is the quantitation of the o-, m-, and /«-xylene isomers in xylene mixtures. Traditionally, GC has been used to analyze xylene (12). This technique, however, has certain d i s a d v a n t a g e s : s o m e w h a t lengthy analysis times are required, carrier gas is consumed, columns deterio­ rate, and the sample generally is not preserved. Short-wavelength n e a r - I R spec­ troscopy appears to be an excellent alternative method for xylene analy­ sis because analyses can be done rap­ idly; no materials are consumed; the instrumentation (a photodiode array spectrograph) is relatively inexpen­ sive and highly reliable; and nonin­ vasive, automated analysis is possi­ ble (13). However, as can be seen in Figure 1, the near-IR spectra of the three xylene isomers are very simi­ lar; thus, a measured spectrum of a xylene mixture must be extensively processed to recover the concentra­ tion estimates of the isomers. Impulse response filter A time-dependent measurement x(t) can be represented as the sum of the

pure analyte signal s(t) and the noise v(t) associated with that signal x(t) = s(t) + v(t)

(1)

The magnitude of s(t) is influenced by the state variable initiating the sig­ nal. Noise can be defined as any dis­ turbance in the measurement t h a t obscures observance of the pure sig­ n a l . T h e n o i s e m a y be r a n d o m (white) and/or cyclic in nature. Cyclic noise, s o m e t i m e s called cyclostationary or periodic noise, is de­ fined as noise that, if present, re­ peats itself consistently through each e x p e r i m e n t a l cycle. T h e p r i m a r y source of cyclic noise encountered in the xylene experiment is spectral in­ terference. In spectroscopically quan­ tifying o-xylene, for example, the m-xylene and /"-xylene "signals" in­ terfere with the o-xylene signal each time the experiment is done, and therefore they constitute cyclic noise (2, 14). A filter generally can be designed to extract s(t) from x(t), reducing both

random and cyclic noise to acceptable levels. Ideally, this filter will also de­ rive from s(t) the related property of interest. For a shift-invariant linear system, the time-dependent filtering process can be mathematically ex­ pressed as the convolution of the im­ pulse response filter h(f) with the in­ put measurement x(t) (15, 16) y(t) = x(t) * h{t) = J χ(τ) h(t-τ) dt

(2)

The * symbol signifies convolution, τ is the delay variable or the time variable of integration, and y(t) is the filter output, which we will later re­ late to desired state variables. The function h(t) is called the impulse re­ sponse function of the system be­ cause, given an impulse input (delta function), the output is described by h(t). For the usual analytical experi­ m e n t , f u r t h e r s i m p l i f i c a t i o n s of Equation 2 can be made. First, inte­ gration need only be done from τ equals 0 to t if the system is one­ sided (15, 16), that is, if both x(t) and h(t) and thus y(t) only have physical meaning for t > 0. Second, digital data acquisition makes the discrete form of Equation 2 a better descrip­ tion of the experiment. Equation 3 depicts this discrete convolution, which uses a digital impulse r e ­ sponse function h[n] to filter the dig­ ital input measurement x[n] to yield the estimated signal y[n] η

yW = 2 *[*] h[n - k]

(3)

k=0

Figure 1. Pure near-IR spectra of o-xylene, m-xylene, and p-xylene isomers.

Figure 2. Measurement/filtering scheme. In the measurement process, the pure signals s,-s3, which are induced by the state variables ρ,-ftj, are summed with random and cyclic noise v, and vc to give the measurement x. Filters π,-Λ3 extract from x, via convolution, the three state variable estimates p-,-p3. The filter outputs / ι - / 3 a r e directly related to p-,-p3.

1156 A · ANALYTICAL CHEMISTRY, VOL. 64, NO. 24, DECEMBER 15, 1992

Square brackets denote discrete functions; parentheses are reserved for continuous functions (15). Con­ tinuous time variables t and τ have been replaced with incremental time variables η and k, respectively. The variable t, which has units of time, and », which has no dimension, are related by a proportionality factor equal to the time between each dis­ crete increment. The variables τ and k are similarly related. Because of this proportionality, η and k will con­ tinue to be referred to as "time" vari­ ables. Figure 2 illustrates the measure­ ment/filtering model described thus far. In Figure 2, three state variables or properties p give rise to three sig­ nals (e.g., three chemical concentra­ tions give rise to three instrumental signals). In the measurement pro­ cess, symbolized by the summation, the pure signals are added to the random and the cyclic noise to gener­ ate the measurement. In addition to the cyclic noise labeled vc, the signals

hinder quantitation of one another b e c a u s e of t h e i r o v e r l a p p i n g r e ­ sponses and therefore constitute cy­ clic noise with respect to each other. The lower half of Figure 2 depicts the filtering process in which three filters, whose design and implemen­ tation will be discussed below, ex­ tract the three state variables from t h e m e a s u r e m e n t . By convolving each filter with the measurement, an output y is obtained. For some exper­ iments, obtaining the entire output function y[«] with a high signal-tonoise ratio (S/N) is the primary goal. This may be the case, for example, in qualitative analysis, where the en­ tire filtered output spectrum or chromatogram is used to identify distinc­ tive features. However, this article focuses on quantitative analysis in which specific analytical properties such as analyte concentrations are derived from measurements. A key issue, which will be treated later, is the manner by which a property esti­ mate $,which is a scalar, can be opti­ mally derived from the filtered out­ p u t y[n] obtained in E q u a t i o n 3, which is a sequence or a vector. We have introduced a filter func­ tion t h a t can transform a n input function to an output function that contains the desired state variable information. In the next sections we explain how a scalar property is esti­ mated from a filter output vector. Systems in which the noise is exclu­ sively random are examined first, followed by a more general and ana­ lytically relevant case in which both random and cyclic noise exist. Matched filtering Matched filtering is designed to opti­ mally estimate a state variable in the case where the signal arises solely from that state variable, the signal is linearly related to the state variable amplitude, and the noise is random (2, 14, 17, 18). The impulse response filter can be shown as h[n] = cs[-n]

(4)

where c is a constant and s[-n] repre­ sents the time-reversed signal un­ adulterated by noise. The filter is said to be matched because it equals the time-reversed pure signal uncorrupted by noise; thus, when h[n] or s[-«] is convolved with x[n], a crosscorrelation is effected between s[n] and x[n] y[n] = x[n] * h[n] =x[n] * cs[—n\ = c s[«] * χ[«] (5)

representation of cross-correlation η

y[n] = c 2 m x[n + k\ (6) *=o We will justify the above form of the impulse response filter and show how it leads to a simplification of Equations 3 and 6, changing a convo­ lution or a cross-correlation, which technically is a vector/matrix multi­ plication, to a simple dot product of vectors. The optimal estimate of an analyt­ ical s t a t e variable can be derived from the filtered output if the quan­ titative experiment is deterministic, which means that each experiment takes place over an identical incre­ mental range (of time, frequency, or wavelength). For such experiments the deterministic signal consistently repeats itself s[A] = s[mJV+£]

Ν

y[N] = c^s[k]s[k]

+

k=0 Ν

c^s[k]

v[N + k]

(8)

t=0

which is obtained by s u b s t i t u t i n g Equations 1 and 7 into Equation 6. Equation 8 is essentially the sum of two dot products. The first dot prod­ uct is between the signal and itself, which results in a positive squared estimate of the true signal. Because the signal and the random noise are orthogonal, the second dot product approaches zero as Ν becomes large. Thus it becomes apparent why the maximum cross-correlation occurs at η = Ν: At this time the cross-correla­ tion function approaches a noise-free squared estimate of the pure signal. Any segment of the random noise function should be orthogonal to the signal; therefore, this second dot product approaches zero regardless of the incremental time offset inside the brackets. To simplify Equation 8, v[k] is substituted for v[N + k] to give S

N

yw=c 2 iM m+c χ sm vm = k=Q

The * symbol signifies cross-correla­ tion. Equation 6 shows the discrete

(7)

Here Ν is the experimental cycle time, and the whole number m is the cycle number. The noise, meanwhile, varies randomly. For deterministic e x p e r i m e n t s t h e cross-correlation function in E q u a t i o n 6 reaches a maximum value when η = Ν (17). This cross-correlation maximum y[N] (a scalar) is described by the equa­ tion

k=0 Ν

c^s[k]x[k]=p k=0

(9)

B e c a u s e t h e c r o s s - c o r r e l a t i o n is maximum when η = Ν, the signal es­ timate, which results from the single cross-correlation a t η = Ν, h a s a maximum S/N. This cross-correla­ tion maximumy[N] provides the opti­ mum property estimate p (19); the entire cross-correlation summation for all values of y[n] need not be solved. At this stage it is important to re­ member t h a t although the previous derivations were performed in the time domain, analogous representa­ tions can be formulated in any perti­ nent domain. This is made possible by the minimal restrictions imposed, which require the system to be lin­ ear, shift-invariant, one-sided, and deterministic. For systems t h a t obey these constraints, complicated convo­ lutions and cross-correlations (Equations 3 and 6) can be simplified to dot products. These c o n s t r a i n t s m a y t a k e on slightly different meanings under dif­ ferent domains. The xylene example considers spectral data in the wave­ length domain. The one-sidedness argument is satisfied in this domain because the signal at any wavelength is independent of the signal at an­ other wavelength, and the spectrum always starts at some initial wave­ length. The deterministic condition is equivalent to requiring that spec­ t r a be t a k e n over identical wave­ length regions, which is typically the case in quantitative analysis. When s[n\ and therefore h[n\ are not precisely known beforehand, the matched filter h[n] can be experimen­ tally derived by ensemble-averaging numerous m e a s u r e m e n t s (17). Be­ cause we have assumed that noise is random in this case, the noise cancels upon ensemble-averaging while the signals add coherently. If a sufficient number of measurements are aver­ aged, h[n] will closely resemble the time-reversed pure signal and the convolution of x[n] and h[ti], or the cross-correlation of s[n] and x[n], will r e n d e r the optimal property e s t i ­ mate. Kalman innovation filtering Matched filtering is inadequate if cy­ clic noise exists in addition to the random noise (i.e., if the v(t) term in Equation 1 includes both cyclic and r a n d o m components). Recall t h a t chemical interferences constitute a form of cyclic noise. A Kalman inno­ vation filter (KIF) has been recom­ mended for state variable estimation in this case (2, 14, 17). The KIF should not be confused with the Kal­ man filter, an infinite impulse re-

ANALYTICAL CHEMISTRY, VOL. 64, NO. 24, DECEMBER 15, 1992 · 1157 A

REPORT sponse filter (3, 8, 20). The KIF re­ moves the effects of interfering noise by c r e a t i n g a n i m p u l s e r e s p o n s e function that is independent of, and orthogonal to, all cyclic noise compo­ nents. Thus, when the impulse re­ sponse filter operates on the mea­ s u r e m e n t , i t c a n c e l s t h e cyclic interferences while accurately esti­ mating the signal. To create a KIF, the pure signal and the cyclic noise components must be known. The first step in designing a KIF for one signal component is to "whit­ en" x[n], which now possesses r a n ­ dom and cyclic noise, thus creating an innovation »[»]. The function i[n] is similar to x[n] in t h a t it contains s e p a r a t e signal and noise compo­ nents isin] and v[n] i[«] = is [n] + v[n] (10) but i\n\ is different from x[n] in t h a t the signal and noise parts of i[n] are orthogonal to each other. In other words, the effect of "whitening" is to make the signal component orthogo­ nal to the cyclic interfering noise, so t h a t the interferences are removed from x[n] upon application of the fil­ t e r . The t e r m " w h i t e n i n g " a r i s e s from the fact t h a t because cyclic noise is eliminated, only white noise r e m a i n s . The a d v a n t a g e s of t h i s "whitening" process will soon become apparent. Once the noise components of i[n] have been made random with respect to the signal component, the innova­ tion simply becomes an extension of t h e m a t c h e d filter. F o r i[n] t h e matched KIF is h[n] = cis[-n] (11) where c again is a constant. The function h[n] is merely t h e t i m e reversed part of i'[«] t h a t "matches" the desired signal yet is orthogonal to the noise (14, 17). Therefore, when cyclic noise exists, Equation 5 be­ comes yO] = x[n] * c iB [-«] = ciB [«]**[»]

In Equation 14 a large correlation exists between ia and s, whereas the cross-correlation or dot product be­ tween the orthogonal functions i s and ν approaches zero as Ν goes to infin­ ity. Recall t h a t v\n\ in this situation is b o t h r a n d o m a n d cyclic noise. Hence, the state variable p is opti­ mally e s t i m a t e d from t h e signal, w h e r e a s b o t h r a n d o m a n d cyclic noise are largely disregarded. However, as with matched filter­ ing, the individual signal being esti­ mated in the presence of cyclic and random noise must contain all the information necessary to compute p. For example, if concentration were the property being estimated, a spec­ trum, chromatogram, or voltammogram of each pure analyte and inter­ fering species would be required to formulate the KIF. These pure sig­ nals independently contain all the in­ formation necessary to d e t e r m i n e analyte concentrations. If, however, the desired state variable were gaso­ line octane n u m b e r (21) or some other complex property, matched fil­ ters and KIFs could not be used to directly estimate the property. Gaso­ line octane n u m b e r depends, in a complex m a n n e r , on several p u r e

(12)

An argument similar to that devel­ oped for t h e m a t c h e d filter case above can be made t h a t the best property estimate occurs aXn = N, re­ sulting in

p=xis

Ν

(13) k=0

The rationale for creating the orthog­ onal i n n o v a t i o n becomes evident when Equation 1 is substituted into Equation 13 to give Ν

P = c^is[k]s[k] k=0

+ is[k]v[k]

(14)

chemical components (e.g., branched and aromatic hydrocarbons or addi­ tives). Without knowing what a pure octane number signal looks like, and which interfering species are present and how they influence octane num­ b e r , a K I F c a n n o t be p r e p a r e d . Quantitative analysis of such proper­ ties necessitates the use of calibra­ tion-based regression methods, which will be discussed later. Experimental formation of the KIF is straightforward if the pure analyte and interfering species' signal fea­ tures are known and if they interact linearly. In this situation the analyte signal s[n] is made orthogonal to the cyclic noise v[n], yielding iJM. The G r a m - S c h m i d t algorithm is gener­ ally used (2, 22). Although most de­ scriptions of the KIF are limited to the case of a signal in the presence of one type of interfering species, the KIF can handle multiple known sig­ nals and interfering species. To do so, an innovation filter is re­ quired for each property being quan­ tified. Each component's KIF must be orthogonal to all cyclic noise com­ ponents, which include interfering component signals and systematic cyclic noise. This m e a n s t h a t t h e G r a m - S c h m i d t orthogonalization process must be repeated for each component. In other words, simply forming one orthogonal basis set, as most orthogonalization algorithms do, is not sufficient; each basis vector (KIF) must be mutually orthogonal to all other interferences. In addition to this constraint, each basis vector must be properly scaled (17). In dealing with multicomponent mixtures or systems wherein several properties are desired, it is convenient to expand the summations of discrete sequences (which are equivalent to vector dot products) to matrix/vector multiplications. In the following nota­ tion, vectors and matrices are repre­ sented by boldface lowercase and bold­ face uppercase symbols, respectively. By lumping c into is, Equation 13, which shows how the KIF extracts state variables from measurements, can be written as

Figure 3. KIFs for (a) o-xylene, (b) m-xylene, and (c) p-xylene.

1158 A · ANALYTICAL CHEMISTRY, VOL. 64, NO. 24, DECEMBER 15, 1992

(15)

Here, χ is the measurement vector with dimensions of one by N; is is the KIF, which is Ν by one in size; and p is the scalar property estimated by the KIF. Equation 15 can be expanded for multicomponent systems by incorpo­ rating an orthogonal filter for each component property to be estimated. Accordingly, P=xIs

(16)

where the columns of the Ia matrix contain the KIF of each component; / s has dimensions Ν by the number of components; and ρ becomes the p r o p e r t y vector, which i s one by number of components in size. At this stage it is instructive to re­ turn to the xylene analysis example introduced earlier. Recall t h a t only slight shifts in the wavelength and the relative intensity of the two pri­ mary peaks discriminate the pure spectra. Because the spectra are so closely correlated (because each xy­ lene isomer acts as cyclic noise when quantifying the other isomers), matched filtering is not able to accu­ rately estimate the isomer concen­ trations. Because the KIF can extract information when both random and cyclic noise are present, quantitation of each xylene isomer in the multicomponent mixture is possible.

xylene mixture, the KIF would fail miserably. This assumption might naively be made because the m-xy­ lene and ^-xylene spectra are so sim­ ilar. Table I indicates that the SEP values for the KIF using only a n o-xylene and w-xylene model are un­ acceptable, particularly for w-xylene. Hence, the KIF is useful only when all the pure signals are known. Unfortunately, exact aspects of the signal and the cyclic noise are fre­ quently unknown. Under these cir­ cumstances multivariate regression provides a means for deriving the fil­ ter functions. In the following sec­ tions we will explore classical and in­ verse l e a s t - s q u a r e s regression a s well as the relationship between re­ gression and the filtering concepts developed thus far.

KIFs were calculated for the three xylene isomers by orthogonalizing and properly scaling each pure com­ ponent spectrum with respect to the others. These filters are shown in Figure 3. The KIFs were used to quantify the xylene isomer concen­ t r a t i o n s of 20 different m i x t u r e s whose spectra are shown in Figure 4. To evaluate the efficacy of filters in general, a standard error of predic­ tion (SEP) is often calculated as

The classical l e a s t - s q u a r e s (CLS) model for a multicomponent system assumes t h a t a m e a s u r e m e n t is made up of linearly independent sig­ nals, each multiplied by a factor rep­ resenting the degree to which t h a t signal contributes to the overall mea­ s u r e m e n t . T h e CLS model, also known as the K-matrix model (23), may be written as

S E P =

y^_—

Classical least squares

x=pS

(18)

(17)

where g is the number of samples; a is the sample index; pa represents the true state variable (isomer concen­ tration) in the