Relationship between digital filtering and multivariate regression in

Assessment of the Quality of Latent Variable Calibrations Based on Monte Carlo Simulations. Hans R. Keller , Juergen. Roettele , and Hermann. Bartels...
23 downloads 0 Views 7MB Size
REPORT

Re1ationship between Digital Filtering and Multivariate Regression 9 in Quantitative Analysis P

c

Chris L. Erlckson, Michael J. Lysaght’, and James B. Callis Center for Process Analytical Chemistry Department of Chemistry, BG-IO University of Washington Seattle, WA 98195 Analytical chemistry can be defined as the science dealing with the identification and quantitation of a sample’s physical and chemical properties (state variables). The analytical variable most commonly probed is chemical concentration. An obvious way t o determine chemical concentrations is to isolate all analytes and then determine their masses. Unfortunately, this type of analysis procedure can be time-consuming, laborious, or even impossible to perform. Therefore a great deal of effort in analytical chemistry has been expended on the development of automated electronic instrumentation capable of measuring analytical state variables by some indirect means. Generally this involves performing a n experiment on the sample in which a physical or a chemical stimulus is applied to the system. The an> Current address: Department of Chemistry,

US.Air Force Academy, USAFA, CO 80840 0003-2700.9~0364~1155A503 00 0 r 1992 Amer can Cnemica Society

alytical response to this perturbation is then measured. Unfortunately, the analytical measurement thus obtained is not a direct measure of the state variableb) of interest, for the following reasons: the measurement process itself disturbs the system; the measurement scheme is not perfectly selective, and therefore interferences can contaminate the signal arising from the analyte; random noise in the system corrupts the measurement; or a theoretical relationship must be employed t o obtain the state variable(s) of interest from the signal. Such a relationship may not be known in advance, or it may be complex. As a result, analytical measurements are generally processed in some way to remove interferences, reduce noise, and extract the information related t o the desired state variables. The common approach to developing an instrument for quantitative analysis is univariate and linear in nature Le., the sensing means is designed to be as selective as possible so that only a single measurement is needed to estimate the concentration of the analyte of interest). This minimizes postexperimental signal processing. An example of such a system would be an ion-selective electrode. Unfortunately, the perfectly selective

sensing system has yet to be developed, because a l l methods a r e plagued to some degree by interferences. Obviously, interferences can be avoided or minimized by adequate sample cleanup, but this approach generally requires time-consuming manual operations. An alternative approach is to use multichannel instruments that incorporate multiple sensors, each of which is characterized by its partial selectivity toward a particular analyte. The response then becomes multivariate in nature, and a pattern that allows correction for interferences and drift, and even permits simultaneous multicomponent analysis, can be developed. An example of an analytical method based on this approach is the near-IR spectroscopic analysis of grain for protein, starch, and moisture content (I). Over the past decade two a p proaches have emerged for processing multivariate measurements contaminated by noise and interferences: digital filtering and multivariate regression. Bialkowski has re, viewed the applicability of digital filter theory for this purpose (2,3), whereas Beebe and Kowalski (4) as well as others (5,6) have discussed the merits of multivariate regression-based calibration and predic-

ANALYTICAL CHEMISTRY,VOL. 64,NO. 24,DECEMBER 15.1992 * 1155 A

tion. Because these seemingly disp a r a t e approaches, which use different terminologies and mathematical notations, possess the identical goal of optimally deriving state variables from analytical measurements, the question naturally arises as to what relationship exists between the two. Accordingly, the purpose of this REPORT is to quantitatively explain and compare digital filtering and multivariate regression. Before proceeding, we will outline the scope of this article and introduce a contemporary problem in chemical analysis, which will later serve as an experimental means of illustrating and comparing digital filtering and regression methodologies. As regards scope, the techniques discussed here apply primarily to quantitation of deterministic variables. Also, we limit the discussion of digital filtering to fmite impulse response-type filters, and the discussion of multivariate regression to classical leastsquares and principal components regression. Readers interested in studying other interesting connections between digital filter theory and multivariate regression should consult References 7-11. Although we will use the independent variable time in our equations, other domains such as wavelength, voltage, and space can be substituted as long as the domains meet the minimal stipulations described later. The chemical analysis problem we will examine is the quantitation of the o-, m-,and $-xylene isomers in xylene mixtures. Traditionally, GC has heen used to analyze xylene (12). This technique, however, has certain disadvantages: somewhat lengthy analysis times are required, carrier gas is consumed, columns deteriorate, and the sample generally is not preserved. Short-wavelength near-IR spectroscopy appears to be an excellent alternative method for xylene analysis because analyses can be done rapidly; no materials are consumed: the instrumentation (a photodiode array spectrograph) is relatively inexpensive and highly reliable; and noninvasive, automated analysis is possible (13). However, as can be seen in Figure 1, the near-Et spectra of the three xylene isomers are very similar; thus, a measured spectrum of a xylene mixture must be extensively processed to recover the concentration estimates of the isomers. impulse response filter A time-dependent measurement x(t) can he represented as the sum of the

pure analyte signal s(t) and the noise u(t) associated with that signal

dt)= s(t) + U(t)

(1)

The magnitude of s(f)is influenced by the state variable initiating the sig. nal. Noise can he defmed as any disturbance in the measurement that obscures observance of the pure signal. The noise may be random (white) andlor cyclic in nature. Cyclic noise, sometimes called cyclostationary or periodic noise, is defined as noise that, if present, repeats itself consistently through each experimental cycle. The primary source of cyclic noise encountered in the xylene experiment is spectral interference. In spectroscopically quantifying o-xylene, for example, the m-xylene and #-xylene “signals” interfere with the o-xylene signal each time the experiment is done, and therefore they constitute cyclic noise (2,14).

A filter generally can be designed to extract s(t) from dt),reducing both

random and cyclic noise to acceptable levels. Ideally, this filter will also derive fmm s(f) the related property of interest. For a shift-invariant linear system, the time-dependent filtering process can be mathematically expressed as the convolution of the impulse response filter hW with the input measurement x(t) (15, 16)

-

y(t) x(f) * k(t) = ] x ( @ h(t-7) dz (2) The * symbol signifies convolution, 7 is the delay variable or the time variable of integration, and y(t) is the filter output, which we will later relate to desired state variables. The function h(t) is called the impulse response function of the system because, given an impulse input (delta function), the output is described by h(t). For the usual analytical experiment, further simplifications of Equation 2 can be made. First, integration need only be done from T equals 0 to t if the system is onesided (15,169, that is, if both x ( f ) and h(t) and thus y(t) only have physical meaning for t 2 0. Second, digital data acquisition makes the discrete form of Equation 2 a better description of the experiment. Equation 3 depicts this discrete convolution, which uses a digital impulse response function k[nl to filter the dig. ita1 input measurement x[nl to yield the estimated signal y[nl



y[nl= Figure 1. Pure near-IR spectra of 0-xylene, In-xylene,and p-xylene isomers.

.

,

,...

:

Figure 2. Measurementlfiltering scheme. In the measurement process,the pure signals s,-s., which are induced by the -le variables p < - k , are Summed with random and cyclic noise v, and v, to give the measurement x. Filters h,-h, enrau fmm x, via convolution,the three state The filter OU~DU~S variable estimates y,-y3 are dirmly related to

b,+.

$,a,

1156 A * ANALYTICAL CHEMISTRY, VOL. 64,NO. 24, DECEMBER 15,1992

dkl h[n - kl

(3)

1.0

Square brackets denote discrete functions; parentheses are reserved for continuous functions (15).Continuous time variables t and r have been replaced with incremental time variables n and k, respectively. The variable 1, which has units of time, and n, which has no dimension, are related by a proportionality factor equal to the time between each discrete increment. The variables 7 and k are similarly related. Because of this proportionality, n and k will continue to be referred to as “time” variables. Figure 2 illustrates the measurementlfiltering model demihed thus far. In Figure 2, three state variables or properties # give rise to three signals (e.g., three chemical concentrations give rise to three instrumental signals). In the measurement process, symbolized by the summation, the pure signals are added to the random and the cyclic noise to generate the measurement. In addition to the cyclic noise labeled u,, the signals

hinder quantitation of one another because of their overlapping r e sponses and therefore constitute cyclic noise with respect to each other. The lower half of Figure 2 depicts the filtering process in which three filters, whose design and implementation will be discussed below, extract the three state variables from the measurement. By convolving each filter with the measurement, an output y is obtained. For some experiments, obtaining the entire output function y [ n ] with a high signal-tonoise ratio (S/N) is the primary goal. This may be the case, for example, in qualitative analysis, where the entire filtered output spectrum or chromatogram is used to identify distinctive features. However, this article focuses on quantitative analysis in which specific analytical properties such as analyte concentrations are derived from measurements. A key issue, which will be treated later, is the manner by which a property estimate $,which is a scalar, can be optimally derived from the filtered output y [ n ] obtained in Equation 3, which is a sequence or a vector. We have introduced a filter function that can transform a n input function to an output function that contains the desired state variable information. In the next sections we explain how a scalar property is estimated from a filter output vector. Systems in which the noise is exclusively random are examined first, followed by a more general and analytically relevant case in which both random and cyclic noise exist. Matched filtering Matched filtering is designed to optimally estimate a state variable in the case where the signal arises solely from that state variable, the signal is linearly related to the state variable amplitude, and the noise is random (2, 14, 17, 18).The impulse response filter can be shown as

h[n] = c s[-n]

(4)

where c is a constant and s[-n] represents the time-reversed signal unadulterated by noise. The filter is said to be matched because it equals the time-reversed pure signal uncorrupted by noise; thus, when h[n] or s[-n] is convolved with x [ n ] , a crosscorrelation is effected between s[n] and x[nl

representation of cross-correlation n

~ [ kX][ H + k]

fin] = c

We will justify the above form of the impulse response filter and show how it leads to a simplification of Equations 3 and 6, changing a convolution or a cross-correlation, which technically is a vector/matrix multiplication, to a simple dot product of vectors. The optimal estimate of an analytical state variable can be derived from the filtered output if the quantitative experiment is deterministic, which means that each experiment takes place over an identical incremental range (of time, frequency, or wavelength). For such experiments the deterministic signal consistently repeats itself

s[k] = s[mN + k ] (7) Here N is the experimental cycle time, and the whole number m is the cycle number. The noise, meanwhile, varies randomly. For deterministic experiments the cross-correlation function in Equation 6 reaches a maximum value when n = N (17). This cross-correlation maximum y[Nl (a scalar) is described by the equation

c

3

s[k] u [ N + kl (8)

k=O

which is obtained by substituting Equations 1 and 7 into Equation 6. Equation 8 is essentially the sum of two dot products. The first dot product is between the signal and itself, which results in a positive squared estimate of the true signal. Because the signal and the random noise are orthogonal, the second dot product approaches zero as N becomes large. Thus it becomes apparent why the maximum cross-correlation occurs at n = N: At this time the cross-correlation function approaches a noise-free squared estimate of the pure signal. Any segment of the random noise function should be orthogonal to the signal; therefore, this second dot product approaches zero regardless of the incremental time offset inside the brackets. To simplify Equation 8, u[k] is substituted for u[N + k ] to give N

N

y [ n ] = x [ n ] * h [ n ] = x [ n ]* c s [ - n ] = s[nl * x[nI ( 5 ) The * symbol signifies cross-correlation. Equation 6 shows the discrete

(6)

k=O

y[Aq = c

2 s[k] s[k] + c

k=O

x,

c

s[k] u[k] = k=O

2 s[k]x[k] = p

k=O

(9)

Because the cross-correlation is maximum when n = N , the signal estimate, which results from the single cross-correlation at n = N , has a maximum S/N. This cross-correlation maximum y [ M provides the optimum property estimate p (19); the entire cross-correlation summation for all values of y [ n ] need not be solved. At this stage it is important to remember that although the previous derivations were performed in the time domain, analogous representations can be formulated in any pertinent domain. This is made possible by the minimal restrictions imposed, which require the system to be linear, shift-invariant, one-sided, and deterministic. For systems that obey these constraints, complicated convolutions and cross-correlations (Equations 3 and 6) can be simplified to dot products. These constraints may take on slightly different meanings under different domains. The xylene example considers spectral data in the wavelength domain. The one-sidedness argument is satisfied in this domain because the signal at any wavelength is independent of the signal at another wavelength, and the spectrum always starts at some initial wavelength. The deterministic condition is equivalent to requiring that spectra be taken over identical wavelength regions, which is typically the case in quantitative analysis. When s[n] and therefore h[n] are not precisely known beforehand, the matched filter h[n] can be experimentally derived by ensemble-averaging numerous measurements (17). Because we have assumed that noise is random in this case, the noise cancels upon ensemble- averaging while the signals add coherently. If a sufficient number of measurements are averaged, h[nI will closely resemble the time-reversed pure signal and the convolution of x[n] and h[n], or the cross-correlation of s[n] and x [ n ] ,will render the optimal property estimate. Kalman innovation filtering Matched filtering is inadequate if cyclic noise exists in addition to the random noise (i.e., if the u ( t ) term in Equation 1 includes both cyclic and random components). Recall that chemical interferences constitute a form of cyclic noise. A Kalman innovation filter (KIF) has been recommended for state variable estimation in this case (2, 14, 17). The KIF should not be confused with the Kalman filter, an infinite impulse re-

ANALYTICAL CHEMISTRY, VOL. 64, NO. 24, DECEMBER 15, 1992 * 1157 A

REPORT sponse filter (3, 8, 20).The KIF removes the effects of interfering noise by creating a n impulse response function that is independent of, and orthogonal to, all cyclic noise components. Thus, when the impulse response filter operates on the measurement, i t cancels t h e cyclic interferences while accurately estimating the signal. To create a KIF, the pure signal and the cyclic noise components must be known. The first step in designing a KIF for one signal component is to ”whiten” x[nl, which now possesses random and cyclic noise, thus creating an innovation ibl. The function i[nl is similar to x[nl in that it contains separate signal and noise components iJn1 and u[nl (10) 4n]= .i [nl + u[nl but i[nl is different from x[nl in that the signal and noise parts of itnl are orthogonal to each other. In other words, the effect of “whitening” is to make the signal component orthogonal to the cyclic interfering noise, so that the interferences are removed from afnl upon application of the filter. The term “whitening” arises from the fact that because cyclic noise is eliminated, only white noise remains. The advantages of this ” w h i t e n i_ d process will soon become . apparent. Once the noise components of i[nl have been made random with respect to the signal component, the innovation simply bewmes an extension of the matched filter. For i[nl the matched KIF is h[n] = c i, [-n] (11) where c again is a constant. The function h[nl is merely the timereversed part of i[nl that “matches” the desired signal yet is orthogonal to the noise (14, 17). Therefore, when cyclic noise exists, Equation 5 becomes

In Equation 14 a large correlation exists between .i and s, whereas the cross-correlation or dot product between the orthogonal functions is and u approaches zero as N goes to infinity. Recall that u[nl in this situation is both random and cyclic noise. Hence, the state variable p is optimally estimated from the signal, whereas both random and cyclic noise are largely disregarded. However, as with matched filtering, the individual signal being estimated in the presence of cyclic and random noise must contain all the information necessary to compute p. For example, if concentration were the property being estimated, a spectrnm, chromatogram, or voltammogram of each pure analyte and interfering species would be required t o formulate the KIF. These pure signals independently contain all the information necessary to determine analyte concentrations, If, however, the desired state variable were gasoline octane number (21) or some other complex property, matched filters and KIFs could not be used to directly estimate the property. Gasoline octane number depends, in a complex manner, on several pure :

80 9 0

iveleni

Hn] =r[nl Ic is [-nl= c i s [nl *x[nl (12) An argument similar to that developed for the matched filter case above can be made that the best property estimate occurs at n = N, resulting in

p=xr,

N

p=

c

2 i, [&Ix[kl

(13)

h=O

The rationale for creating the orthogonal innovation becomes evident when Equation 1 is substituted inh Equation 13 to give N

p =c

2 i, [k]@I+ .i 1.0

[k]u[kl (14)

chemical components (e.g., branched and aromatic hydrocarbons or additives). Without knowing what a pure octane number signal looks like, and which interfering species are present and how they influence octane number, a KIF cannot be prepared. Quantitative analysis of such properties necessitates the use of calibration-based regression methods, which will be discussed later. Experimental formation of the KIF is straightforward if the pure analyte and interfering species’ signal features are known and if they interact linearly. In this situation the analyte signal s[nl is made orthogonal to the cyclic noise u[nl, yielding i&]. The Gram-Schmidt algorithm is generally used (2,22).Although most descriptions of the KIF are limited to the case of a signal in the presence of one type of interfering species, the KIF can handle multiple known signals and interfering species. To do so, an innovation filter is required for each property being qnantified. Each component’s KIF must be orthogonal to all cyclic noise components, which include interfering component signals and systematic cyclic noise. This means that the Gram-Schmidt orthogonalization process must be repeated for each component. In other words, simply forming one orthogonal basis set, as most orthogonalization algorithms do, is not snfkient; each basis vector (KIF) must be mutually orthogonal t o all other interferences. In addition to this constraint, each basis vector must be properly scaled (17). In dealing with multicomponent mixtures or systems wherein several properties are desired, it is convenient to expand the summations of discrete sequences (which are equivalent to vector dot prcducta) to matrdvector multiplications. In the following notation, vectors and matrices are represented by boldface lowercase and boldface uppercase symbols, respectively. By lumping c into ,.i Equation 13, which shows how the KIF extracts state variables from measurements, can be written as

Flgure 3. KlFs for (a) o-xylene, (b) m-xylene, and (c) p-xylene.

1158 A * ANALYTICAL CHEMISTRY, VOL. 64, NO. 24, DECEMBER 15.1992

(15)

Here, x is the measurement vector with dimensions of one by N; is is the KIF, which is N by one in size; andp is the scalar property estimated by the KIF. Equation 15 can he expanded for multiwmponent systems by incorporating an orthogonal filter for each component property to be estimated. Accordingly, p

=XIe

(16)

where the columns of the Z. matrix contain the KIF of each component; Z. has dimensions N by the number of components; and p becomes the property vector, which is one by number of components in sue. At this stage it is instructive to return to the xylene analysis example introduced earlier. Recall that only slight shifts in the wavelength and the relative intensity of the two primary peaks discriminate the pure spectra. Because the spectra are so closely correlated (because each xylene isomer acts as cyclic noise when quantifying t h e other isomers), matched filtering is not able to accurately estimate the isomer concentrations. Because the KIF can extract information when both random and cyclic noise are present, quantitation of each xylene isomer in the multicomponent mixture is possible. KIFs were calculated for the three xylene isomers by orthogonalizing and properly scaling each pure component spectrum with respect to the others. These filters are shown in Figure 3. The KIFs were used to quantify the xylene isomer concentrations of 20 different mixtures whose spectra are shown in Figure 4. To evaluate the efficacy of fdters in general, a standard error of prediction (SEP) is often calculated as

do:.

(Pa - it,)

SEP = 4 where q is the number of samples; a is the sample index; pa represents the true state variable (isomer concentration) in the ath sample, which is determined by a reference method; and signifies the state variable predicted by the KIF. A SEP for each isomer concentration, calculated using the KIF, is listed in Table I. As Equation 17 shows, the SEP is similiar to a standard deviation. It is a root-mean-square error estimate of how well a filter estimates a state variable. In Table I, therefore, the numbers reported are absolute error estimates that reflect the amount of uncertainty associated with predicted xylene isomer concentrations. Because the xylene concentrations are measured in volume percent, these SEPs also have the units ofvolume percent. A major disadvantage of the KIF via Gram-Schmidt orthogonalization is that the analyte and the interfering species' signals must be exactly known to compute I.. For example, if we supposed that only two components were present in the

xylene mixture, the KIF would fail miserably. This assumption might naively be made because the m-xylene and p-xylene spectra are so similar. Table I indicates that the SEP values for the KIF using only an 0-xylene and m-xylene model are unacceptable, particularly for m-xylene. Hence, the KIF is useful only when all the pure signals are known. Unfortunately, exact aspects of the signal and the cyclic noise are frequently unknown. Under these 'circumstances multivariate regression provides a means for deriving the filter functions. In the following sections we will explore classical and inverse least-squares regression as well as the relationship between regression and the filtering concepts developed thus far.

S contains the independent u n tainted signals of each component and the cyclic noise, and it has dimensions of number of components by number of time increments (or whatever measurement domain is being employed). The vectors x a n d p were defined in Equations 16 and 16, respectively. When signal and noise characteristics are known (when S is known), the desired properties of an unknown sample can be determined by solving Equation 18 for the leastsquares estimate o f p

p=xs+ (19) S' represents the pseudo-inverse of S and in this case equals S"P)-', where P denotes the transpose of S and (SS9-l signifies the inverse of SP.The pseudo-inverse of a matrix is often derived by performing a sinClassical least squares gular value decomposition (SVD)on the matrix to be inverted (24).This The classical least-squares (CLS) decomposition and its effects will be model for a multicomponent system described in the following section. assumes that a measurement is Although the CLS and the KIF apmade up of linearly independent sigproaches use different algorithms to nals, each multiplied by a factor repproduce S' and Z,, the columns of S+ resenting the degree to which that are equivalent to the mutually orsignal contributes to the overall meathogonal signal(s) and the cyclic surement. The CLS model, also known as the K-matrix model (23), noise that make up the innovation filters Z, (25).Multiplication of S by may be written as W ( S 9') ideally yields the identity x=ps (18) matrix, as does SI.. This is the same as saying that the f i column of S' or Z. is orthogonal to all the rows of S (i.e., each signal and cyclic noise component, except for the h h row of S, which is the signal whose state variable is being estimated). Because of the virtual equivalence of the CLS and the KIF methods, it is not surprising that their prediction abilities are identical. Table I shows the SEP ! values for each xylene isomer derived via CLS. CLS can be used to fit linear combinations of pure component signals to measurements (i.e., curve fitting). Figure 4. Near-IR spectra of xylene For example, from one of the xylene mixtures.

Comparison of standard error of prediction for : isomers computed using the KIF, the KIF two-component model, CLS, and PCR' lhod

0.85 31.55

ANALYTICAL CHEMISTRY, VOL. 64,NO. 24, DECEMBER 15,1992

-

1159 A

REPORT mixture spectra shown in Figure 4, it is possible to estimate how much of the measurement arises from o-xylene, m-xylene, and p-xylene. The unknown spectral measurement can be decomposed into its pure component spectra if they are known. S+ is calculated from the pure spectra S, then multiplied by the unknown spectrum r to give p, the state variables (concentrations) of each isomer (Equation 19). Figure 5a shows an example of a mixed xylene measurement. Underneath the measurement lie the pure spectra multiplied by their appropriate concentrations, which were derived from Equation 19. The sum of the three weighted spectra approximates the measurement. Figure Sb shows the difference between the overall measurement and the sum of the three properly weighted pure spectra. Residual analysis reveals whether random noise is the sole discrepancy between the measurement and the model, as it seems to be here, or whether some other componentperhaps cyclic noise not modeledcontributes to the measurement. A disadvantage of CIS, as with the KIF, is that it is limited in the types of properties it can estimate. Because the CLS model assumes that pure signals are multiplied by separable state variables to give a measurement, each state variable must originate from one and only one pure component signal. Properties influenced by eeveral components in an unknown manner cannot be estimated. Another drawback to which we have already alluded is that CLS, like the KIF and matched filtering, requires knowledge of and access to the pure signals as well as the cyclic noise before the filters can be generated. Frequently, however, the exact aspects of the signal and cyclic noise are not known. Principal components regression (PCR), a n inverse least-squares method, is introduced in the next section. We will show that PCR and similar techniques provide a means for deriving filters when signal and noise characteristica are inaccessible, and that PCR can predict complex state variables that are related to the pure chemical components in some way not known beforehand. Inverse least-squares regresslon Inverse least-squares regression models such as F'CR assume that a regression vector b maps a measurement to a scalar property

p=x b

(20)

Before Equation 20 can be used to estimate state variables, b must he derived through calibration, a process whereby measurements of several chemical mixtures containing varying amounts of analyte and cyclic noise components a r e acquired. These measurements ( x row vectors) form a measurement matrix X. In addition to experimentally measuring X,the state variable of interest for each individual measurement x must be independently measured by an accurate reference method. These property data form the vectorp. The model for calibrating a single state variable is

p=X b (21) wherep has dimensions of number of samples by one, X has dimensions of numher of samples by number of time increments (or whatever domain is used), and b has dimensions of number of time increments hy one. If several state variables are of interest, separate regression vectors can be determined for each. In this case, b and p become matrices instead of vectors. Given the calibration model in Equation 21, b can be determined hy b=X' p

(22)

Flgure 5. Curve-fitting the pure component xylene spectra to a measurement via CLS. (a) Mixed xylene measurement (lop) and pure spectra (bonom) muniplied by their appmprialely eslimaled concemrations.Ideally, the three isomer s m r a should add UP lo the minure s p e n r m ib) olnererre m e e n the overaii measJmmemam the sum 01 the three pqmr y weqhled p r e spectra iaea y, iesduas am random

1160 A * ANALYTICAL CHEMISTRY, VOL. 64.NO. 24, DECEMBER 15.1992

where X and p are measured experimentally. Derivation of X+ is what sets PCR apart from other inverse least-squares regression methods such as partial least-squares regression (4, 6). In PCR the first step in determining X" is to perform an SVD (24,26) on X,which results in The purpose of decomposing X into U.Z. and F is twofold. First. SVD yields column-orthogonal (U&d V ) and diagonal (2)matrices; therefore, inversion of U,Z, and V is a stable, well-conditioned operation. Thus

x+= (UXVT)*= VX-1UT

(24)

In addition to making a stable inversion possible, SVD provides a means of reducing noise. At the heart of SVD and its noise reduction capabilities lies the concept of factors or principal components. A fador of X is a linear combination of the original sample or time variables that span X (4). Being consistent with the matrix dimensions assigned thus far, we see that the columns of U contain orthogonal fadors that span the property variance of X,whereas the columns of V constitute an orthogonal basis spanning the time variance of X. Along the diagonal of Z a r e weights associated with these factors. The weights represent the dep e e to which each factor contributes to the overall variance of X. The utility of SVD stems from the fact that the principal components are formulated in a way that the first principal component spans as much variance as possible (Le., the first element of Z is as large as possible, and so on with the second, third, etc.). By analyzing the magnitude of the elements along the diagonal of Z, and, more importantly, the structure (or lack of structure) of the principal components, one can determine how many factors model signal or cyclic noise and how many model random noise. If it can be determined, for example, that the first three factors model the measurement space adequately, the remaining factors can be deleted. This reduces the numher of columns in U and V as well as the numher of rows and columns in L This reduction of the dimensionality of U,Z, and V precedes the inversion shown in Equation 24. Because the higher order principal components represent noise in the system, their removal reduces the overall noise, enhances the prediction ahility of F'CR, and reduces the chance of overiitting the data to noise (but also

adds bias to the model). Factor analysis, therefore, can be thought of as an information reduction process. Of course, one hopes that only the noise information is reduced, and not the signal. Equipped with the noise-reduced X+ and the experimental p, we can derive b from Equation 22. The regression vedor b represents the optimal way to fdter or multiply an input measurement vector z (i.e., a row of X,such that a desired scalar property p can be estimated when random and/or cyclic noise are present) when one does not know the signal and noise characteristics beforehand. Connecting the notations of filter theory and multivariate regression, b is equivalent to the KIF i.[nl or i,, the part of z[nl or z that matches the signal yet is simultaneously orthogonal to the noise and other signal components. A single mss-correlation summation at n = N of iJnl with z[nl, or the dot product of b with x, produces the optimal property estimate for the single component. To illustrate how a calibrationbased PCR experiment generates regression vectors and then predicts properties by using these regression vedors, consider the xylene example. The spectra of the 20 xylene mixtures in Figure 4 form the calibration measurement matrix X.The p vector for each xylene isomer is equivalent to the volume percent concentration of that isomer in each mixture. The SVD of X,followed by analysis of the principal components, reveals that three linearly independent components span X. This is not surprising, because the mixtures are comDosed of three isomers. More than 99% of the variance in X is spanned by these three factors. Additional factors derived from the matrix decomposition are assumed to model random noise in the system; therefore, these factors are removed. The dimension-reduced matrices U, E, and V are inverted and reconstructed to give A"+ (Equation 24), and X+is multiplied by each isomer's p to give a regression vector for each isomer. These regression vectors are shown in Figure 6. To quantify the efficacy of calibration-based PCR, leave-one-out cross-validation was performed (6). In cross-validation a regression vector for each isomer is calculated with one calibration sample's spectrum and with concentration values omitted from X and p. respectively. A regression vector for each isomer is derived, and each resulting b is then used to predict the concentration of

the sample omitted fromp, using the spectrum of the sample omitted from X. Cross-validation tests the method's ability to predict properties of samples not specifically included in the calibration set. The SEP generated from crossvalidation can be calculated from Equation 17 by defining $a as the concentration predicted from the cross-validation experiment with the ath sample omitted from the calibration set. SEP values for each xylene isomer derived using the PCR calibration model are listed in Table I. As can be seen, the PCR performance is a bit inferior to that of CLS or the KIF. However, in designing the filter, PCR only requires knowledge of the component(s) of interest to accurately quantify the analyte's concentration. If a two-component model were assumed in the xylene example, PCR (unlike CLS or theKIF) would still accurately predict the oxylene and m-xylene isomer concent r a t i o n s as long as t h e model spanned the variance of the pxylene isomer in the data. A visual comparison of the PCR vectors with those of the KIF (see Figures 3 and 6 ) reveals that both methods generate similar filters. The calibration-based PCR filters appear to possess more noise than the KIFs, and the SEP values for the KIF are, in general, better than those of PCR. This apparent experimental superiority of the KIF over PCR may stem from two reasons. First, from a numerical analysis standpoint, some experimental calibration designs are statistically superior to others. It is desirable to choose an experimental design that poses the calibration equation in the best conditioned manner. It can be shown that the KIF or CLS methods incorporate the best possible experimental design (i.e., they use pure known signals to derive the filters). Experimental designs based on c a b

Figure 6. Regression vectors calculated via PCR for o-xylene, in-xylene, and p-xylene.

bration (methods that attempt to span a large measurement space by using many different mixtures) inherently are more poorly conditioned. The second factor favoring the KIF and CLS is related to experimental uncertainty. For the xylene analysis, the experimental uncertainty in each p for the KIF is extremely low because the concentrations are pure. However, to experimentally determine eachp via calibration, each isomer concentration in each mixture must be measured either by preparing the volumetric mixtures or by using some other independent reference method. Obviously, because calibration-based methods will incorporate more measurement uncertainty into the p vectors, they will induce more error in the predicted state variable. For the above reasons, when the pure signals are well known and the system is well understood, the KIF or CLS using pure signals should provide improved prediction ability over that of calibration-based methods. PCR can estimate a vast range of properties that the KIF and CLS cannot evaluate. The inverse model allows properties to depend on any chemical component contributing to the measurement. For example, inverse regression methods have been coupled with near-IR spectroscopy to predict complex state variables such as hydroxide ion concentration (27), intrinsic viscosity of polymer blends (28),and gasoline octane number (21). These properties cannot be directly determined from the pure spectral signals, for various reasons. The problem with determining the hydroxide ion concentration is that it is impossible to directly measure the pure spectrum of hydroxide in solution because of the interfering presence of water. Spectroscopic analysis of polymer viscosity is hindered by the fact that it is not obvious how polymer viscosities are related to polymer spectra. The difficulty with spectroscopic octane number determination is understanding the correlation between 6ctane number and the numerous signals of the chemical components. To perform octane number analysis spectroscopically using CLS, one would need to independently determine the concentration of the hundreds of pure components that make up gasoline and then see how these components were correlated to the octane number. Spectroscopic octane number analysis by PCR requires only one independent reference measurement (the octane number) for

ANALYTICAL CHEMISTRY, VOL. 64, NO. 24, DECEMBER 15,1992 * 1161 A

REPORT each gasoline sample. Much work, however, should be done to verify that the calibration step uses a valid experimental design. Ideally, each chemical component should vary independently over a broad range within the calibration model. PCR, then, can determine how many principal components are necessary to span the measurement space and accurately predict the octane number. When done d e f ~ t i v e l y this , experimental design phase can be nearly as laborious as the CLS method. We should point out that preprocessing data using mean-centering (4, 13) or derivative ( I , 13) methods often results in improved SEP values. In particular, for xylene quantitation using PCR, the SEP for all isomers can he reduced to below 0.50% if one takes the second derivative of the xylene measurements prior to principal components analysis and regression. The second derivative removes irreproducible instrumental baseline offsets and slopes from the spectra. Because our purpose here was to explain and critically compare digital filtering and multivariate regression, rather than to estimate xy-

i

lene concentrations with minimal error, we have omitted further discussion of data prepmeessing.

Summary We have described the similarities and differences between finite impulse response digital filtering and multivariate regression as they pertain to quantitative property estimation. These techniques formulate a filter that operates on an input measurement to give a desired state variable estimate as an output. Bialkowski has shown that selection of the correct filter is based on one's knowledge of the signal and noise characteristics (2).He showed that if the signal and the noise are precisely known, either matched filtering (random noise only) or the KIF (both random and cyclic noise) methodologies optimally filter the data. Although this is true, caution should he used when following this logic. If something in the system is not perfectly understood and modeled when the filter is constructed, errors may result in property estimation. Furthermore, these filtering methods are not useful for the case commonly encoun-

tered in quantitative analysis in which a complete understanding of the system is not at hand. We have shown that multivariate regression provides a powerful recipe for designing finite impulse response filters, which accurately extract properties from data contaminated with both random and cyclic noise. PCR uses a statistically designed calibration experiment t o create property extraction filters and therefore does not require a full understanding of the signals and noise a priori. The calibration step of PCR requires only that an independent reference method determine the properties of the componenW of interest, but great care must be taken to assure that the calibration model spans the variance of the other components in the data. Unlike the KIF and CLS, inverse regression techniques can predict properties that depend on multiple components in the system. Xylene concentrations, properties that depend only on the individual component signals, were estimated to compare the methods. The KIF, CLS, and PCR methods were all capable of es-

Now SFE means suoer fast extraction E

xclusive snapin, no-tool sample cartridges help make Isco SFXTM the fastest and most productive SFE systems available. Simply load your samples, press a key, and come back minutes later for analysis-ready extracts. ual chambers and no waiting for heat-up and cool-down between extractions let you perform forty 20-minute extractions in just 7% hours. Priority samples can be slipped in whenever needed. And it's all done with non-hazardous. non-polluting COz!

FREE SFE VIDEO! Call (800)228-4250 today for your copy, and see firsthand how you can perform super-fast extractions with an Isco SFX system.

For literature only, just check the reader service number.

I%", Inc. P.O. Box 5347 Lincoln NE 68505 Tel. (800)228-4250 * Fax (402)464-0318

1162 A

-

ANALYTICAL CHEMISTRY,VOL. 64, NO. 24. DECEMBER 15,1992

timating concentrations with a few percent error. However, were we t o q u a n t i f y some other property that depended on multiple signal components, the KIF and CLS would h a v e failed. Therefore, calibration-based inverse regression methods offer improved methods of filter design when the signal and noise characteristics of a system are not totally known and w h e n complex properties are being estimated.

(24) Golub! 9. H.; Van Loan, C. F.Matrix Computations, 2nd ed.; Johns Hopkins University: Baltimore, 1989. (25) Brown, C. W.; Obremski, R. J.; Anderson, P. Appl. Spectrosc. 1986, 40, 7RA-A7

(26) Wold, 5. Chemom. Iniell. Lob. Syst

1987,2,37-52. (271 Phelan, M. K; Barlow, C. H.; Kelly, J. J.; Jinguji, T. M.; Callis, J. 9. Anal. Chem. 1989,61,1419-24. (28) Zhu, C.; Hieftje, G. M. Appl. Spectmc. 1992,46,69-72.

The authors acknowledge the Center for Process Analytical Chemistry and the National Science Foundation for financial 8upport. M. B. Seasholtz, B. M. Wise, N. L. Rieker, and the anonymous reviewers are thanked for their valuable critiques of this article.

References (1) Williams, P. C.; Norris, K. H. Near-Infrared Technology in ike Agricultural and Food Industries; American Association o f Cereal Chemists: St. Paul, MN, 1987. (2) Bialkowski, S. E. Anal. Ckem. 1988, 60,355A-361 A. (3) Bialkowski, S . E. Anal. Chem. 1988, So,403 A-413 A. (4)Beebe, K. R.; Kowalski, 9. R. Anal. Chem. 1987.59, 1007 A-1017 A. ( 5 ) Martens, H.; Naes, T. Multivariate Calibration;Wiley: New York, 1989. (6) Kowalski, 9. R.; Seasholtz, M. 9. I. Ckemom. 1991, 5,129-45. (7)Wise, B. M.; Ricker, N. L.; Veltkamp, D. F.; Kowalski, 9. R. Process Control Qual. 1990,I, 41-51. (81 Brown, R. G. Introduction io Random

Signal Analysis and Kalman Filtering;

Wiley: New York, 1983. (9)Brown, S . D. Chemom. Intell. Lob. Sysi. 1991,io,87-105. (10)Vandyke, S.J.; Wentzell, P. D. Anal. Ckem. 1991,2512-19. (11) Small, G.W.; Harms, A. C.; Kroutil, R. T.; Ditillo, J. T.; Loerop, W. R. Anal. Ckem. 1990,1768-77. (12)1989 Annual Book ofASTM Siandards; American Society for Testing and Materials: Philadelphia, PA, 1989. (13) Lysaght, M. J. Ph.D. Dissertation, University of Washington, Seattle, 1991. (14) Papoulis, A. Probability, Random Variables, and Stochastic Processes, 2nd ed.; McGraw-Hill: New York, 1984. (151 Jackson, L. B. Signals, Systems, and Transforms; Addison- Wesley: Reading, MA, 1991. (161 Papoulis, A. The Fourier Integral and Its Abblicatrons: McGraw-Hill New York. 1962.. (17)Bialkowski, S.E.Rev. Sci. Insbum. 1987,58,687-95. (18)Dyer, S . A,; Hardin, D. S . Appl. Spectrosc. 1985,39, 655-62. (191 Bialkowski, S. E. Appl. Specirosc. i988,42,807-11. (201 Candy, J. V. Signol Processing-The Model-Based Abbroach: McGraw-Hill: New York, 1986: (211 Kelly, J. J.; Barlow, C. H.; Jinguji, T. M.; Callis, J. B. Anal. Chem. 1989,61, 313-20. (22) Anton, H. Elemeniay Linear Algebra, 4th ed.; Wiley: New York, 1984. (23) Brown, C. W.; Lynch, P. F.; Obremski, R. J.; Lavery, D.S. Anal. Chem. 1982,54,1472-79.

Chris L. Erickson (le#) is pursuing his Ph.D. in analytical chemistty at the University of Washington. He earned his B.S. and M.S.degrees in 1987 and 1989, respectively,from Utah State University. His research interests include chemical analy sis using visible and near-IR interferometty, digitalfiltering, multivariate analysis, and photothermal spectroscopy.

James B. Callis (right) is professor of chemist7 and adjunct professor of bioengineering at the University of Washington. He received his B.S. degree in 1965from the University of California at Davis and his Ph.D. in physical chemistty in 1970 from the University of Washington, His research focusesprimarily on improving instrumentation for optical spectroscopy, including studies i n phosphorescence, near-IR spectroscopy, imaging, and noninvasive reaction monitoring.

Michael J. Lysaght is assistant professor of chemistty at the US.Air Force Academy. He received his B.S. degree from George Mason University (VA) in 1979 and his Ph.D. from the University of Washington in 1991. His research interestsfocuson instnrmentation, fundamentals, and applications of near-IR spectroscopy.

hntiers in Molecular Toxicology n anthology of the most up-to.date information regarding the mechanisms of toxicology and the methods for studying same. Taken from the ACS journal. Chemical Researchin Toxicoiogy, the 21 articies were selected both for their timeliness and the quality of their content. Collectively.they con. vey the sense of excitement that exists when chemistry and toxicology interface The variety of structural classes of toxic agents and the range of biological effects they induce present an infinite number of interesting challenges involving such techniques of modern mechanistic and theoretical chemistry as structure elucidation. chemical analysis and synthesis. and physical characterization. With that in mind. the editor has grouped the material into four areas of concern: Toxic Agents and Their Actions: Enzymes of Activation. Inactivation. and Repair: PhysicalMethods: and MacromolecularModification. Organic. analytical. biological. and environmental chemists and toxicologists will benefit from this book and its emphasis on chemical approaches to the solution of toxicologically in. teresting problems. Lawrence J. Marnett. Editor Vanderbilt University 294 pages (1992) Paperbound ISBN: 0-84122428.5 $26.95 Text 216.95

A

American Chemical Society D i i m b m n O'tice. Dept. 35 1155 Sixteenth S f , N W Washington, DC 20036

or CALL TOLL FREE

800-227-5558 Iin Washing:cn. D C 872.43631 and use ycu: credlt card

ANALYTICAL CHEMISTRY, VOL. 64, NO. 24, DECEMBER 15,1992

1163 A