Anal. Chem. 1988, 6 0 , 847-852
847
Background Correction for Fluorescence Detection in Thin-Layer Chromatography Using Factor Analysis and the Adaptive Kalman Filter David D. Gerow and Sarah C. Rutan* Department of Chemistry, Box 2006, Virginia Commonwealth Uniuersity, Richmond, Virginia 23284-0001
A method Is proposed that corrects for varlable background signals. Thls technlque has been appiled to the analysts of pdyaromatic hydrocarbons using highperformancethin-layer chromatography wtth fluorescence detection. The fluorescent background from thln-layer plates was found to be hlghly variable, and simple background subtractlon yielded lmpredse and inaccurate concentratlon estimates. The method proposed here is based on the assumptlon that variable background slgnais can be modeled by the abstract spectra obtalned from factor analysis of several background spectra, and the appropriate welghting factors can then be calculated by uslng the adaptive Kalman fllter. I t was found that the best modeb were obtalned by selecting spectra from random locations across the thln-layer plate, as opposed to spectra from a blank lane or spectra adjacent to the analyte zone. This approach gave concentratlon estlmates wlth Improved accuracy and preclslon, In most cases, when compared to simple subtractlon.
Background correction is a necessary step in many analytical procedures, before reliable quantitative results can be obtained. In particular, the fluorescent background signals arising from high-performance thin-layer chromatographic (TLC) plates are large and are not very reproducible. In the analysis of polyaromatic hydrocarbons (PAH’s) using TLC and fluorescence spectroscopy, the variability of this background signal can degrade the detection limits for the analysis, and may cause mathematical curve resolution approaches, used to enhance the selectivity, to give erroneous results. If these errors which arise due to the variability of the TLC substrate or the sample matrix can be eliminated, the limits of detection for these compounds should be lowered, and the selectivity of the analysis will be enhanced. Here, a technique based on factor analysis and the adaptive Kalman filter is described which alleviates the difficulties caused by variable background responses. Several approaches for compensation of unknown background contributions have been described in the literature. Statham has described a method that is based on a “top-hat” digital fiiter for X-ray spectrometry (1).This approach works well for the background responses in X-ray spectrometry, which are approximately linear; however, this method would not be able to compensate for the nonlinear variable fluorescence background responses observed from TLC plates. Liu and Koenig have described a background correction algorithm for infrared spectroscopy based on linear and quadratic models for the base line (2).Osten and Kowalski have developed methods that permit unknown background contributions to be detected and corrected (3).These methods are related to self-modeling curve resolution approaches and are based on the assumption of minimal background contribution or on the assumption that the background response is similar to the component responses. These methods are 0003-2700/88/0360-0847$01.50/0
restricted to background signals which meet the assumption described above. Wieboldt and Hanna have recently described a method based on the Gram-Schmidt orthogonalization technique for Fourier transform infrared detection in gas chromatography and supercritical fluid chromatography (4).This method is based on the assumption that there are Fourier transform vectors, which may be found throughout the chromatogram, which are representative of the base-line variations. These vectors are used to form the Gram-Schmidt basis set. This method allows a level base line to be obtained but does not allow for removal of the background components from the infrared spectra. Gemperline and co-workers have described a method based on factor analysis, where the background component is assumed to be present in a calibration data set (5). This method can correct for background components, which are present in both the calibration and unknown data sets, and is related to the approach described by Wieboldt and Hanna. Lorber and co-workers have proposed two methods for background correction for inductively coupled plasma (ICP) emission spectrometry (6, 7). One method is based on the assumption that the background spectrum is constant and allows for variations in signal intensity (6). This method is not appropriate for the multicomponent background responses observed from TLC plates. A second method proposed by Lorber uses a factor analysis algorithm to describe the variations in the ICP background caused by drift, changes in plasma parameters, and interfering components (7). Five factors were required to describe these variations. This approach is similar to the one described here; however, Lorber’s approach requires accurate sensitivities to be obtained in a matrix-free environment. This is not a restriction of the method described here. The Kalman filter is a recursive, linear least-squares digital filtering algorithm that was originally developed for engineering applications in 1960 by Kalman (8).Recently, several applications of this algorithm in analytical chemistry have been reported (9, lo),including multicomponent curve resolution. The adaptive modification of the filter allows accurate parameter estimates to be obtained, despite the presence of some types of model errors (11,12). A method for removal of background contributions from fluorescence responses measured in situ from TLC plates, which is based on an adaptive Kalman filter method, has recently been proposed (13). This approach combines derivative spectrometric methods and an adaptive Kalman filter and permits variable background responses to be removed from fluorescence spectra. In this case, the low-frequency variability in the spectrum is decreased using the derivative method, and the adaptive Kalman filter is used to determine the relative contribution of the background signal to the fluorescence spectrum. This approach has been shown to work well for removal of background contributions from fluorescence spectra (13)and TLC chromatograms (14)of PAH compounds and from TLC-fluorescence data obtained from o-phthaldi0 1988 American Chemical Society
848
ANALYTICAL CHEMISTRY, VOL. 60, NO. 9, MAY 1, 1988
aldehyde derivatives of amino acids (15). A complementary approach to the derivative-adaptive Kalman filter method discussed above is described in this paper. Here, a representative set of background fluorescence spectra are obtained from various blank locations on a highperformance TLC plate. These spectra are subjected to principal components analysis (PCA), as described by Malinowski (16), to determine the number of factors that contribute to the background responses. The abstract factors are then used to reconstruct the background contribution to a measured analyte response, using the adaptive Kalman filter to determine the appropriate contribution for each of the abstract background factors. Once this background response has been substracted, reliable qualitative and quantitative information can be obtained for the analytes of interest.
THEORY Factor Analysis. Factor analysis is a mathematical technique that has been used in several areas of chemistry for determining trends in matrices of data. The mathematical formulation used here is based on the methods described in the book by Malinowski and Howery (16). The first step in the application of this technique is to choose representative background spectra from various locations on the TLC plate. These spectra can be taken from a blank lane or chosen from an analyte lane before and after the analyte zone. The selected spectra are arranged as column vectors in a data matrix D D = [BIB2 ... B,] (1) where n is the number of background spectra. The data matrix is multiplied by its transpose to form the covariance matrix Z
Z = DT*D
(2)
The nonnormalized covariance matrix is used in this case since the magnitude of the error in the data matrix is assumed constant. This also has the effect of increasing the weighting of data points with the highest magnitude. The covariance matrix is diagonalized by using a matrix eigenanalysis routine. This step produces the set of abstract eigenvalues, A, and their associated eigenvectors, C
czc-1 = x
(3)
These abstract eigenvectors can be used to recreate or reproduce the original data matrix. In this case, however, they will be used to generate a set of abstract spectral factors which characterize the shape of the background fluorescence signal. These factors are produced by multiplying the primary eigenvector matrix by the original data matrix
M* = D.C*T (4) where the dagger (*) indicates that the eigenvectors due to the random noise components have been omitted. The resulting matrix, M,contains as column vectors the factors contributing to the systematic variability in the background fluorescence spectra. Choosing the proper number of factors necessary to characterize the features in the background spectra can be a very important and difficult task. Many methods have been proposed to solve this problem including calculating the standard deviation of the secondary eigenvector matrix and defining various fit quality functions (16). Several approaches for determining the dimensionality of the data matrix are examined here; the methods employed are described in the Experimental Section. Adaptive Kalman Filter. The model and algorithm equations for the adaptive Kalman filter are given in Table I. Equations 5 and 6 are the model equations, where X is
Table I. Algorithm Eauations for the Sauare Root Kalman Filter System Model X(k) = F(k, k - l)*X(k- 1) + ~ ( k ) Measurement Model z ( k ) = H T ( k ) * X ( k+ ) u(k)
State Estimate Extrapolation X(klk - 1) = F(k, k - l)*X(k- 1 J k - 1) Square Root Covariance Extrapolation S(klk - 1) = F(k, k - l ) * S ( k- Ilk - 1) + Q ( k ) Adaptive Measurement Variance R ( k ) = l/m[&,"u(k - j ) v ( k - j ) ] - [ H T ( k ) - S ( k (-k 1)12 where
~ ( k =) z(k) - H T ( k ) * X ( k l k
Kalman Gain K ( k ) = aS(klk - l).G(k)
where G(k) = ST(klk - l ) - H ( k ) l / a = GT(k)*G(k) + R(k) d = 1/11
+ [aR(k)]"*]
State Estimate Update X ( k / k ) = X ( k / k - 1) + K ( k ) [ ~ ( -k H ) T ( k ) * X ( k / -k l)]
Square Root Covariance Update S(klk) = S(klk - 1) - adS(k1k - l)*G(k)*GT(k)
Definitions number of parameters or components index indicating the wavelength or number of the most recent measurement smoothing window for innovations sequence
innovations sequence state vector (n X 1) state transition matrix (n X n) system noise (n X 1) system noise covariance (n X n) measurement (scalar) measurement function vector (1 x n) measurement noise (scalar) measurement noise variance (scalar) error covariance (n X n) square root of the error covariance (n x n) Kalman gain weighting factor (n X 1) identity matrix (n X n) zero matrix (n X n) Notation kth estimate for A based on j measurements
the state vector and consists of the parameters to be estimated. In this case, these parameters are the concentrations of the fluorescent analytes or the weighting factors for the abstract background components. Equation 5 describes the time dependence of these parameters; in this case the parameters are time invariant. Equation 6 describes how the measured fluorescence intensity at some wavelength k depends on these parameters. Equations 7 through 16 are the algorithm equations for the Potter-Schmidt version of the adaptive Kalman filter (12,17).These equations, in conjunction with the following model information, are used to obtain accurate estimates for the parameters from fluorescencemeasurements. X ( k )=
[::I
or X ( k ) =
[i"::.;i
F(k, k - 1) = I
(17)
(18)
ANALYTICAL CHEMISTRY, VOL. 60, NO. 9, MAY 1, 1988
z(k) = fluorescence intensity measurement at wavelength 12 (20)
or
R(12)= variance of the noise in fluorescence measurement or eq 9 in Table I (22) The first definition for Xis for the resolution of overlapped fluorescence responses, where CA and CB represent the concentrations of fluorophors A and B, respectively. In this case, the measurement function vector, HT, is comprised of the fluorescence sensitivity factors for these compounds, fA(k)and f B ( k ) , a t the kth excitation-emission wavelength pair. When the adaptive algorithm is used to determine the correct weightings for the abstract background components generated from the fador analysis procedure, the elements of X are the weighting factors, Cbackl and Cba&z. The elements of the measurement function vector are then the intensities of the abstract spectral vectors at the kth wavelength, Ml(k) and M2(k). For the algorithm calculations, the square root covariance matrix (S)is calculated at the start from the initial guess for the covariance matrix (P)using the Cholesky decomposition algorithm, as described by Kaminski et al. (17). The adaptive algorithm is implemented by checking the value for the innovations sequence (eq 10) and calculating an appropriate measurement variance, according to eq 9, based on this value. If the calculated measurement variance is greater than the variance of the noise in the fluorescence measurement, this new value for R is used to calculate the Kalman gain (eq 11through 14). This, in effect, causes a lower weighting to be assigned to those measurements that are not consistent with the chosen model at the kth wavelength pair. Previous studies have found that the adaptive filter yields optimal parameter estimates when the final covariance estimates for the parameters are minimized (11,12). The simplex optimization algorithm can be used to determine the initial guesses for the parameters and their associated covariances which yield these best estimates (12,13). The method works well, provided that there exists a wavelength range or ranges for which the model is accurate. The approach used in these studies is based on the combination of factor analysis and adaptive Kalman filter methods. The abstract background vectors, obtained as described in the previous section, are used as the model for adaptive filter. The analyte response(s) do not need to be included in the model, as long as the wavelength range of the measurements is greater than the range where the analyte(s) fluoresce. This is the criterion for successful implementation of the adaptive filter, as described above.
EXPERIMENTAL SECTION Experimental Methods. The PAH compounds were of at least 98% purity and were obtained from Sigma Chemical Co. Standard mixtures were separated on reverse-phase (2-18TLC plates (Whatman) using an 80% methanol/water mobile phase. Sample aliquots of 1.00 pL were applied by using a Desaga PSOl TLC spotting device. Fluorescence spectra were obtained by use of a Farrand MK-2 spectrofluorometer equipped with a TLC scanning attachment. The instrument was controlled by a Compupro 8/16-D computer described in earlier work (11). Data collection programs, written in C programming language, have also been described previously (13,15).Fluorescence spectra were obtained at 1.0-mm intervals along a development lane on the TLC plate and stored on floppy disk in binary format. The excitation wavelength was 250 nm,
849
the emission monochromator was scanned from 350 to 550 nm, and data were collected at 1-nm intervals. Twenty emission spectra were taken along each analyte zone. Data Processing Programs. The factor analysis programs were written in the C language using a subroutine to perform the eigenanalysis step which was obtained from Wiley Publishing Co. Once the abstract spectral factors are calculated, the proper number of factors to be used in characterizing the background of an unknown analyte spectra must be determined. One method of determining the number of primary factors is to plot the log of the eigenvalue vs the eigenvalue number. The slope of this plot approaches zero when the proper number of factors is reached. Another method of determining the proper number of factors is to calculate the standard deviation of the secondary eigenvector matrix according to the following formula: C
(RSD)*= j-ntl E Xj/r(c - n)
(23)
where n is the number of primary factors, c is the total number of spectra, and r is the number of wavelengths. This value is then calculated for each possible number of primary eigenvalues and compared to the value of the random noise in the measurements. The correct number of factors is chosen when the residual standard deviation is equal to the random noise component in the data. This method requires a knowledge of the magnitude of the noise component for an accurate determination of the size of the factor space. Once the set of primary abstract factors has been determined, the simplex optimized adaptive Kalman filter is used to calculate the weighting factors for each of the abstract background components contributing to any given fluorescence spectrum. The Kalman filter, the adaptive Kalman filter, and the simplex optimized K h a n filter, all written in Pascal, have been described previously (13). Simulated spectra were generated by adding fluorescence emission spectra collected in solution to actual background measurements obtained from blank TLC plates at the same wavelengths. Spectra of mixtures were simulated by adding various fractions of the measured responses to the background spectra. Quantification of the relative amounts of each of the components in the simulated spectra was obtained by using the factor analysis-Kalman fiiter background subtraction method on each mixture and pure component spectrum. The Kalman filter was then used to estimate the relative amounts of the original spectra added.
RESULTS AND DISCUSSION Several procedures were employed to characterize the performance of the proposed method. Calibration curves for benzo[ghi]perylene were generated by using traditional and factor analysis-adaptive Kalman filtering (FA-AKF) background subtraction approaches. The ability of the adaptive filter to determine the correct weightings for the abstract background factors was investigated for computer-generated combinations of TLC background and PAH fluorescence spectra. The reliability of the results from this method was compared to techniques employing simple background subtraction. Abstract factors were generated initially from a data set containing the known background responses; subsequent experiments were carried out using abstract factors from a background data set which was assumed (but not known) to describe the background contributions. Finally, a mixture containing benzo[a]pyrene and perylene, which coelute, was analyzed, and the concentrations were determined after using the FA-AKF background subtraction procedure. Several methods for determining the number of factors contributing to a given data set, as described in the Experimental Section, were evaluated. The results in many cases were ambiguous. Typically, two or three factors appeared to be necessary to describe the background variations completely. An example of the background contribution to a benz[a]anthracene spectrum, estimated using one and two factors, is shown in Figure 1. Spectrum a shows the response observed
ANALYTICAL CHEMISTRY, VOL. 60, NO. 9, MAY 1, 1988
850
t1
6'o 5.0
24.0 20.0
A'
._
4. 0
3.0
16.0
10-11
I
12.0
~
Signal (x
t1
Signal
2. 0
(x
..
lo-')
8.0
i. 0 ..
0.0
4. 0
L 320.
0.0 360.
400.
440.
'movelength
480.
520.
560.
600.
320.
400.
440.
480.
520.
560.
600.
480.
520.
560.
600.
480.
520.
560.
600.
Wavelength (nm)
Flgure 1. (a) Emission spectrum of benz[a]anthracene observed on TLC plate, (b) background contribubion estimated by wing two abstract
t
factors, (c) background contribution estimated by using one abstract
factor.
32.0
360.
(nm)
I
4. 0
..
2.0
..
0.0
..
Sigol (x
24.0
16. 0 Signal (x 10-2)
10-5
..
8.0 ..
-2. 0
4
320.
0.0 -8.0
360.
..
10.0
I
0.0
4
1
-2.0
440.
Wavelength (nm)
I
1
400.
L
1 320.
360.
400.
440.
Wovelength (nm)
Figure 3. Calibration spectra for benzo[ghl]perybne: (A) raw spectral
data;(8) spectra after simple background subtraction; (C)spectra after FA-AKF background subtraction.
Table 11. Results for Calibration for Benzo[ghi]perylene method FA-AKF
slope"
M-I)
3.91 (0.17)
intercepta 0.012 (0.030)
4.05 (0.20) 0.004 (0.035) subtraction 4.05 (0.20) 0.376 (0.035) raw data Standard deviations are in Darentheses.
over the simple subtraction results. The calibration parameters for each of the three approaches were calculated, where the intensity of the fluorescence at 420 nm was plotted vs the concentration of the benzo[ghi]perylene standards. The linear regression results are shown in Table 11. The relative errors in the slope and intercept estimates
ANALYTICAL CHEMISTRY, VOL. 60, NO. 9, MAY 1, 1988
Table 111. Accuracy of Concentration Estimates-Known
Background FA-AKF"
mixture
851
re1 concn
concn
% errorb
simple subtraction % errorb concn
pyrene 0.3333 0.3342 -16 0.27 0.2799 benz[a]anthracene 0.3333 0.3193 -4.2 0.3799 14 2 pyrene 0.3333 -4.2 0.3332 -0.01 0.3192 benzo[a]pyrene 0.3333 0.3303 0.90 0.3861 16 3 pyrene 0.2500 0.2500 -18 0.00 0.2040 benz[a]anthracene 0.2500 0.2500 0.00 0.2900 16 benzo[a]pyrene 0.5000 0.5000 0.00 0.5057 1.1 OResults for factor analysis-adaptive Kalman filter method, where the background contribution to the mixture response was a member of the data set used to generate the abstract factors. bPercent deviation from known concentration. 1
be obtained, provided that the background contribution to the mixture is adequately described in the calculation of the abstract factors. The assumption that the observed background variations can be described by the abstract factors may not always be valid. A second simulated data set was constructed by adding together known combinations of background spectra and analyte responses; however, in this case, the background contribution was not an element in any of the background data sets employed. Several background data sets were compiled, and the corresponding eigenvectors were calculated. Background data sets generated from emission spectra taken immediately before and after the analyte zones failed to give reliable abstract factor representations. The best results were obtained from a background data set constructed from spectra measured at a variety of locations across the TLC plate, although in some cases, abstract factors calculated from measurements made on a blank TLC lane provided an adequate description of the background variations. The reliability of each of these approaches was evaluated for the second simulated data set, and the actual background contribution was directly compared to the background estimated by the factor analysis-adaptive Kalman filter procedure. The variances of the fits for several simulated mixtures, calculated from the s u m of the squares of the difference between the actual and estimated backgrounds, are given in Table IV. In addition, a comparison of the fit qualitites obtained by using two and three abstract background factors is shown. In general, the use of a larger number of factors gave a lower variance of fit, The results obtained for the concentration estimates for the second simulated data set, as described above, are shown in Table V. In addition, results for simple subtraction are also given, where the mean concentration with the corresponding standard deviation is shown for simple subtraction of four randomly selected background responses. In most cases, the
Table 1V. Variance of Fits for Background Componento background set l b mixture
3
2
3
components
compo-
components
compo-
nents
3.15 x 10-4 1.40 x 1.75 X lo-$ 6.05 X 9.15 X 10" 5.03 X 2.24 X 1.18 X 3.72 X 9.89 X 10" 5.47 X 1.31 X
1 2 3 4 5 6
background set 2c
2
10-4 5.81 X 3.24 X 4.74 x lo4 4.32 X 10" 1.70 X 10" 2.16 X
lo4 lo4 104
10"
nents 1.46 X 1.21 x 3.97 x 2.80 X 1.23 X 2.78 X
104 10-4
lo-' lo4 10"
" Results for synthetic data set, where the background contribution was not known to be an element of the set used to form the abstract background spectra. Variances are the sum of the squares of the residuals from a comparison of the estimated background to the background used in generating the mixture spectra. Mixture compositions are given in Table V. bBackground abstract spectra generated from background data set taken from various locations across the TLC plate. Background abstract spectra generated from background data set taken from blank TLC lane. are smaller for the data fit using the FA-AKF approach, although the difference was not significant at a 90% confidence level (F test). This may be due to the fact that the background is more reproducible a t 420 nm, relative to the wavelength range 350-380 nm. Simulated Mixtures. This technique was applied to the quantification of synthetic mixture spectra. For the first set of simulated spectra, the background component in the mixture was a member of the background data set used to calculate the abstract spectral factors. The results for three mixtures are listed in Table 111. The results obtained for simple subtraction, where a background selected at random was employed, are also shown in Table III. These results show that reliable quantification of the analyte concentrations can Table V. Accuracy of Concentration Estimates-Unknown mixture
Background
FA-AKF" re1 concn
concn
% errorb
concn
simple subtraction RSD' % errorb
BAPd 0.200 0.201 0.50 -9.0 0.182 6.0 0.201 0.50 3.5 0.200 PER' 0.207 1.9 2 BAP -1.0 0.200 0.198 7.5 0.215 4.2 -5.1 0.0949 17 PER 0.100 0.117 28 3 BAP 2.0 0.200 0.204 -1.5 0.197 13 0.050 100 -23 0.0326 0.0384 -35 PER 0.0976 4 BAP 0.100 0.0937 9.0 -2.4 -6.3 2.5 0.207 0.200 0.205 2.9 PER 3.5 -2.8 0.0486 17 0.0534 5 BAP 0.050 6.8 0.40 0.500 0.8 0.502 0.498 -0.40 PER -2.2 0.502 6 BAP 0.500 0.489 2.0 0.40 -8.0 -0.2 0.0460 0.050 0.0499 6.0 PER a Results for factor analysis-adaptive Kalman filter method, where the background contribution to the mixture response was not a member of the data set used to generate the abstract factors. "Percent deviation from known concentration. 'Percent relative standard deviation for the average of the concentration estimates for four randomlv selected backeround sienals. Benzolalwrene. 'Pervlene. 1
852
ANALYTICAL CHEMISTRY, VOL. 60, NO. 9, MAY 1, 1988
the PAH concentrations. The combination technique was able to provide more accurate concentration estimates than simple subtraction of a background spectrum.
Table VI. Accuracy of Concentration Estimates-TLC Analysis simple
subtraction compound
concn, M
concn, M
benzo[aJpy-
4.75
X
5.12
X
rene Dervlene
8.99 X
1.17
X
IO4
70 error
FA-AKF concn,
%
M
error 4.4
9.6
4.96
X
17
9.69
X
lo-'
7.8
factor analysis-adaptive Kalman filter approach yields improved concentration estimates, although some of the simple subtraction results demonstrate quite high accuracies. Both approaches yielded inaccurate results for a minor component, such as perylene in mixture 3. The precisions for the two approaches were not directly comparable in these studies. While the standard deviations and relative standard deviations were calculated for the simple subtraction results, the variances estimated by the Kalman filter were optimistically small (relative standard deviations