2344
Anal. Chem. 1982, 54, 234-2347
Factor Analysis for Real-Time Gas Chromatography/Fourier Transform Infrared Spectrometric Chromatogram Reconstructions P. M. Owens,’ R. B. Lam,’ and T. L. Isenhour” Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27514
A GCIFTIR chromatogram reconstruction procedure which factor-analyzes interferometric data is presented. An improved Gram-Schmidt reconstruction technique that uses the modifled Gram-Schmidt algorithm to optimize basis vector selection Is described. Results of a comparison between the factor analysis method and the Gram-Schmidt reconstruction methods are reported. The factor analysis reconstruction method exhiblts a chromatographic sensltivity equivalent to optimum Gram-Schmidt reconstruction sensltivity with a significant reduction in computation time. The computational efficiency of the factor analysis reconstruction method allows its use for real-time GC/FTIR chromatogram reconstructlons; this makes possible a slgnlflcant sensitivity improvement over current real-time reconstruction procedures.
During the past decade, gas chromatography/Fourier transform infrared spectrometry (GC/FTIR) has become recognized as a powerful technique for complex mixture analysis. A typical GC/FTIR experiment results in the collection of several hundred interferograms which must be analyzed to determine which were produced by the GC effluents. This necessitates reconstruction of the gas chromatogram by one of two typical methods. The first reconstruction method was originally proposed by de Haseth and Isenhour (1) and involves use of the Gram-Schmidt orthogonalization procedure to calculate an orthogonal basis set of reference interferograms in which no effluents are present. The chromatogram reconstruction intensity is the orthogonal distance of each collected interferogram from the background basis set. The second reconstruction method requires calculating the absorbance spectrum of each interferogram and using integrated absorbances over a specific frequency range as the GC metric. A comparison of these two GC/FTIR reconstruction methods can best be done by evaluating chromatographic sensitivity and computational efficiency. Several comparisons of GC/FTIR reconstruction methods have been reported (2-4). In all cases, Gram-Schmidt reconstructions were superior in terms of higher chromatographic signal to noise ratios. However, there still appears to be some disagreement concerning optimum interferogram segment selection for Gram-Schmidt reconstructions. From a computational viewpoint, the GC/FTIR data collection rate of one interferogram or more per second requires efficient methods for computing real-time reconstructions. The development of microprocessors capable of calculating a 4096 point fast Fourier transform (FFT) in 0.2 s has made possible real-time integrated absorbance reconstructions. On the other hand, the Gram-Schmidt calculation is a lengthy one, especially Present address: Department of Chemistry, USMA, West Point, NY 10996. 2Present address: Foxboro Analytical, 140 Water St., POB 5449, Norwalk, CT 06856.
when utilizing the 30 basis vectors required for optimum sensitivity (2). Real-time Gram-Schmidt reconstructions me only possible with current FTIR computer hardware by decreasing the number of basis vectors to less than 10. Using fewer basis vectors lowers chromatographic sensitivity and largely offsets the advantage of the Gram-Schmidt reconstruction method. Optimization of basis vector selection has recently been shown to improve Gram-Schmidt chromatographic sensitivity when only a few basis vectors are employed (5). However, it appears that optimum Gram-Schmidt sensitivity (and thus the lowest detection limit) is possible only for off-line reconstructions. This paper presents an improved GC/FTIR reconstruction method in which background interferograms are factor-analyzed to form an orthonormal basis set. As in the GramSchmidt method, reconstruction values are computed as the orthogonal distances of each interferogram from this background basis set. The factor analysis reconstruction provides the sensitivity advantage of optimum Gram-Schmidt reconstructions while requiring fewer basis vectors; thus the factor analysis reconstruction is computationally more efficient. A comparison of factor analysis reconstructions with GramSchmidt reconstructions and with modified Gram-Schmidt reconstructions employing optimum basis vector selection is also reported.
THEORY Gram-Schmidt Reconstruction. A complete discussion of the Gram-Schmidt reconstruction method has been previously reported (1, 2). In this technique a basis set of interferograms (30 for best results) is collected prior to sample injection. From each of these reference interferograms a 100-point segment beginning 60 points past the light burst is selected. Each 100-point segment is treated as a vector. With the Gram-Schmidt orthogonalization procedure, an orthonormal basis is calculated from the initial set of reference vectors. Reconstruction calculations for each interferogram are then performed by using the same 100-point interferogram section. The Gram-Schmidt reconstruction intensity is the orthogonal distance of each sample vector from the basis set. This distance (eq 1) is calculated by first determining the length of the sample vector’s projection onto the basis set and then performing a Euclidean distance calculation to obtain the GC intensity
GC intensity = [(I-I) - (I-BJ2- (I.B2)’- ...I1/’
(1)
where I is the 100-point sample interferogram segment and B, is the ith orthonormal basis vector. Modified Gram-Schmidt Reconstruction. de Haseth (5) has recently proposed a Gram-Schmidt vector selection procedure in which a new basis vector is rejected if its orthogonal distance from the existing basis set is too small. An extension of this process is possible with the modified GramSchmidt orthogonalization procedure (6) and a pivoting strategy. The modified Gram-Schmidt algorithm (eq 2) is a rearrangement of the Gram-Schmidt calculation
0003-2700/82/0354-2344$01.25/00 1982 American Chemical Society
ANALYTICAL CHEMISTRY, VOL. 54, NO. 13, NOVEMBER 1982
2345
Table I. Mixture Compositions of Data Sets
k = i -t- 1, i
+ 2, ..., N
where Vk is a subsequent reference vector, B, is the newly calculated basis vector, and N is the total number of reference vectors. As each basis vector (B,) is calculated, the modified Gram-Schmidt algorithm subtracts, from each subsequent reference vector, its projection onto the new basis vector. A pivoting strategy selects the largest remaining reference vector as the next basis vector. In this way, a larger amount of data variance can be represented with fewer basis vectors. F a c t o r Analysis Reconstruction. A comprehensive treatment concerning the theory and chemical applicability of factor analysis has recently been published (7). Factor analysis is most commonly used to reduce a set of experimental data into a number of underlying significant factors, a linear combination of which will regenerate the original data. For example, in a GC/FTIR experiment, factor analysis can be applied to the set of infrared absorbance spectra corresponding t o an unresolved GC peak, since the measured spectra are linear combinations of the individual compound absorbance spectra. In this case, factor analysis provides an indication of the number of factors (compounds) contributing to the unresolved peak. Principal component analysis (PCA) is a factor analysis procedure which produces a data set’s significant underlying factors in decreasing order of importance. The PCA method requires an eigenanalysidi of the data set’s covariance matrix, with the eigenvectors representing the subspace generated by the underlying factors and the eigenvalues indicating the relative importance of each eigenvector. The covariance matrix is obtained by either premultiplying or postmultiplying the data matrix by ita trsmspose. Each successive eigenvector (ordered by the eigenvalues) accounts for a maximum of the variance in the original data set. For a two-variable set of data, a PCAs fiist eigenvector is the same line as that obtained from a linear least-squares fit, of the data. For GC/FTIR chromatogram reconstructions, the GramSchmidt basis vectors describe a reference interferogram subspace as a linear combination of basis vectors. Factor analysis can be used as an alternative technique to describe this reference subspace. Factor analyzing a set of reference interferograms using the PCA method yields, for any desired number of basis vectors, the statistically best possible set of factors (eigenvectors) describing the reference interferogram subspace. Since calculation of each eigenvector utilizes information from the entire set of reference interferograms, only a few eigenvectors are needed to describe the reference subspace. However, in Gram-Schmidt reconstructions, each basis vector represents only one interferogram and a separate basis vector is required for eaclh reference interferogram. Thus, the reference interferogram subspace can be better described using fewer basis vectors when the original set of reference interferograms is factor analyzed. Additionally, the calculated eigenvectors are orthonormal and they can simply replace the Gram-Schmidt basis vectors in computing GC/FTIR chromatogram reconstructioii intensities. Once the eigenvectors have been calculated, the factor analysis reconstruction procedure is identical with the Gram-Schmidt reconstruction method (eq 1). Thus, only minor modifications to currently available Gram-Schmidt routines are required for implementation of factor analysis reconstruction. EXPERIMENTAL SECTION Instrumentation. Several sets of interferograms from two instruments were used in all analyses. Their compositions are listed in Table I. Data set 01 was obtained on magnetic tape from L. V. Azarraga of the Environmental Protection Agency, Athens, GA. These data were collected with a Digilab FTS-14
data set 01
amt peak injected, no. pg 1
2 3
4 5
6 02
1
2 3 4 5 6 7
0.4 0.4 0.4 0.4
0.4 0.4 0.9 1.6
4.6 9.2 18.4 46.1 92.1
compound bis( 2-chloroethyl) ether acetophenone methyl salicylate 2,3,5-trimethylphenol acenaphthene 2,4,64rimethylphenol pentyl propionate pentyl propionate pentyl propionate pentyl propionate pentyl propionate pentyl propionate pentyl propionate
infrared spectrometer through a light pipe with an inner diameter of 2 mm and a length of 30 cm (0.94 mL internal volume) using a mercury cadmium telluride (MCT) detector. The remaining data were collected with a GC/FTIR system consisting of a Hewlett-Packard 402B gas chromatograph interfaced to a Digilab FTS-14 infrared spectrometer. Data were collected on magnetic tape. A gold-coated light pipe of inner diameter 1.5 mm and length of 54 cm (0.95 mL internal volume) and an MCT detector were used. Programs were written in Fortran and assembly language and all computations were performed with a 32K Nova 3/12 minicomputer. Procedure. Factor Analysis Calculations. The data matrix to be factor analyzed was obtained from a set of background interferograms. Each row consisted of 100 points of an interferogram beginning at a displacement of 60 points past the light burst. The different rows of the data matrix corresponded to the different reference interferograms. The covariance matrix was calculated by postmultiplying the data matrix by its transpose
C = DDT
(3)
where C is the covariance matrix of dimension N by N, D is the data matrix with N rows of 100 columns, and N is the number of reference interferograms used to form the data matrix. Eigenanalysis of the covariance matrix was then performed utilizing the TREDB and IMTQL2 Fortran subroutines of the EISPACK program package (8). The TRED2 subroutine utilizes the Householder method to reduce the covariance matrix to a tridiagonal form. The IMTQL2 subroutine uses an implicit QL algorithm t o calculate the eigenvalues and eigenvectors of the tridiagonal matrix. Program listings are available in ref 8. The fiial desired basis vectors were obtained by postmultiplying the transpose of the computed eigenvector matrix by the data matrix B = ETD
(4)
where E is the column eigenvector matrix of dimension N by N, D is the data matrix N by 100, B is the row basis vector matrix N by 100, and N is the number of reference interferograms used to form the data matrix. This step was necessary since 100-point segments from each interferogram were used in reconstruction calculations, thus requiring 100 dimensional basis vectors. An alternative procedure would have been to form a 100 by 100 covariance matrix in eq 3 by premultiplying the data matrix by its transpose. An eigenanalysis of this larger covariance matrix would have yielded the N by 100 basis vector matrix (B) obtained in eq 4. However, the excessive computation time required for the higher order matrix eigenanalysis made this alternative unsatisfactory. Additionally, the multiplications in eq 4 were necessary only for the small number of eigenvectors needed to describe the reference interferogram subspace. If all of the eigenvectors were kept, the factor analysis reconstruction procedure
2346
ANALYTICAL CHEMISTRY, VOL. 54, NO. 13, NOVEMBER 1982
Table 11. Comparison of Chromatographic Signal to Noise Ratios for the Different Reconstruction Methods
A
J
data set
peak no.
01
1 2 3 4 5
I
/I C
6 1 2
02
n
3
4 5 6 7
L
d
100
150
13.3 28.4 25.1 15.4 4.2 11.5 2.2 4.6 13.2 57.2 78.6 174.0 340.9
21.6 66.9 35.9 17.1 6.3 13.1 9.5 15.7 54.7 119.9 209.2 432.4 721.2
factor analysis 5 eigenvectors vectors
10 eigen-
27.5 62.5 37.0 20.6 7.1 16.8 8.0 20.8 60.9 98.9 210.4 466.0 661.7
17.4 51.8 33.2 23.4 7.3 16.5 6.7 11.7 40.4 93.8 163.4 350.1 557.1
Table I l l . Operations Count and Execution Time Comparisons of Basis Set Calculations
D
50
GramSchmidt 30 basis vectors
-__
n
0
modified GramSchmidt 5 basis vectors
200
250 300 950 900 INTERFLROORW N M L R
160
no. of interferograms used in basis calculation
500
Figure 1. Chromatogram reconstructions of data set 01: (A) 5 basis vector Gram-Schmidt reconstruction, (6)5 basis vector modified Gram-Schmidt reconstruction, (C) 5 eigenvector factor analysis reconstruction, and (D) 30 basis vector Gram-Schmidt reconstruction. Peak at interferogram 125 is noise.
would be identical with the Gram-Schmidt method. However, as the eigenvectors were generated in decreasing order of importance, the less significant eigenvectors described only noise present in the experimental data, while the first several eigenvectors described the background interferogram subspace. Thus, with the same set of reference interferograms, fewer eigenvectors than Gram-Schmidt basis vectors were required for GC/FTIR chromatogram reconstructions. Signal to Noise Ratios. In order to make a valid comparison among the reconstruction techniques, signal to noise ratios for all chromatographic peaks were calculated with eq 5 (5) where Ipis the GC intensity of the effluent peak, Npis the number of peak points, I b is the intensity of the base h e , and N b is the number of base line points used. The GC intensities used in eq 5 were calculated in the following manner. A linear least-squares fit to the base line across each peak was determined and subtracted from the original reconstruction values to yield the GC intensities. This procedure removed base line slope from the signal to noise calculations and forced base line intensities to oscillate about zero. Signal to noise calculations used the two or three most intense points for peak intensities and approximately 40 points to describe the base line adjacent to each peak. For a specific peak, the same interferograms were used in calculating signal to noise ratios for all reconstruction methods.
RESULTS AND DISCUSSION A comparison between the factor analysis reconstruction method and Gram-Schmidt reconstruction procedures was carried out by analyzing chromatographic sensitivity and computational efficiency. Figure 1 depicts reconstructions of data set 01 using each of the reconstruction procedures. Data set 01 was obtained from an injection containing 400 ng of six different compounds (Table I). The modified Gram-
10 15 20 25 30
Nova 3/12 execution multiplications times, s required factor Gram- factor Gramanalysis Schmidt analysis Schmidt 17490 33735 56980 88725 130470
11000
24000 42000 65000 93000
23 40 65 90 130
17
33 50 65 90
Schmidt reconstruction (Figure 1B) using 5 basis vectors exhibits greater chromatographic sensitivity than the accepted Gram-Schmidt 5 basis vector reconstruction (Figure 1A). A 5 eigenvector factor analysis reconstruction (Figure 1C) results in a much better reconstruction than the previous two and is comparable to a 30 basis vector Gram-Schmidt calculation (Figure 1D). Table I1 lists chromatographic signal to noise ratios obtained from the different reconstruction methods for data set 01 and data set 02. For data set 01 the 5 eigenvector reconstruction shows signal to noise ratios equivalent to those from a 30 basis vector Gram-Schmidt calculation, while the 10 eigenvector reconstruction exhibits generally greater chromatographic sensitivity. Data set 02 was obtained from a series of pentyl propionate injections with different concentrations. This allowed a comparison of the reconstruction techniques for the concentration range normally encountered in GC/FTIR analyses. In data set 02 the 10 eigenvector reconstruction was equivalent to a 30 basis vector GramSchimdt reconstruction. For both data sets, the 5 eigenvector reconstruction always exhibited far greater sensitivity than did the 5 basis vector modified Gram-Schmidt procedure. Computational efficiencies among the reconstruction methods were also compared. Table I11 compares the computational requirements and execution times for basis set calculations using different numbers of reference interferograms. Gram-Schmidt execution times were obtained by using Digilab GC/FTIR reconstruction software while factor analysis calculations utilized the EISPACK algorithms described previously. Factor analysis basis set computations typically required 20-50% longer than the corresponding GramSchmidt calculation. In view of the number of multiplications required by each algorithm, these execution times were reasonable. However, since basis set calculations are normally accomplished prior to sample injections, Table I11 only shows
Anal. Chem. 1982, 5 4 , 2347-2351
Table IV. Arithmetric Operations Required for Calculating the Gram-Schmidt Reconstruction Intensity of a Single Interferogram Using Different Numbers of Basis Vectors no. of interferograms used in basis calculation
multiplications required
5 10 15 20 25 30
606; 1111 1616 2121 2626 3131
additions required 599; 1099 1599 2099 2599 3099
a Number of operations required for a 5 eigenvector Number of operations; required for a reconstruction. 10 eigenvector reconstruction.
that factor analyzing a set of interferograms does not require an excessive amount of time. The best metric to measure the computational efficiency of a reconstruction algorithm is to determine the number of operations required for a single interferogram reconstruction after the basis set has been formed. Table IV lists the number of required operations for Gram-Schmidt reconstructions with different numbers of basis vectors. As the number of reference interferograms increaser,, the Gram-Schmidt reconstruction requires an excessive number of calculations since a separate basis vector is required for each reference interferogram. However, since the factor analysis reconstruction procedure utilizes information from the entire set of reference interferograms to calculate each eigenvector, only 5 to 10 eigenvectors (calculated from a set of 30 refereince interferograms) are required for reconstructions. A 5 eigenvector reconstruction requires the same number of operations as a 5 basis vector Gram-Schmidt reconstruction. For optimum GramSchmidt sensitivity (using 30 basis vectors), the GramSchmidt reconstruction requires 5 times more operations than
2347
does a 5 eigenvector reconstruction. Compared to a 10 eigenvector reconstruction, the Gram-Schimdt calculation is still 3 times longer. The factor analysis chromatogram reconstruction method matches the best chromatographic sensitivity of the GramSchmidt reconstruction procedure while allowing a 3- to &fold decrease in reconstruction computation time. This method's computational efficiency enhances its utility as a real-time chromatographic reconstruction procedure. Factor analysis chromatographic sensitivity represents a significant sensitivity improvement over present real-time reconstruction procedures. Research is currently being directed toward optimization of the factor analysis technique for different instruments.
ACKNOWLEDGMENT The authors wish to thank Leo V. Azarraga of the Environmental Protection Agency, Athens, GA, and Dan T. Sparks of the University of North Carolina, Chapel Hill, NC, for supplying the GC/FTIR data.
LITERATURE CITED de Haseth, J. A,; Isenhour, T. L. Anal. Chem. 1977, 53,1977-1981. Hanna, D. A.; Hangac, G.; Hohne, B. A,; Small, G. W.; Wleboldt, R. C.; Isenhour, T. L. J . Chromafogr. Scl. 1979, 77, 423-427. White, R. L.; Glss, G. N.; Brissey, G. M.; Wilkins, C. L. Anal. Chem. 1981, 53, 1778-1782. Sparks, D. T.; Lam, R. 8.; Isenhour, T. L. "Quantitative GClFTIR Using Integrated Gram-Schmldt Intenslties"; presented at the 1982 Pinsburgh Conference and Exposition on Analytical Chemistry and Applied Spectroscopy, March 1982. de Haseth, J. A.; Leclerc, D. F. "Interferogram Data Processing in FTIR"; presented at the 1982 Pittsburgh Conference and Exposition on Analytical chemistry and Applied Spectroscopy, March 1982. Rlce, J. R. Math. Comp. 1986, 2 0 , 325-328. Malinowskl, E. R.; Howery, D. G. "Factor Analysis in Chemistry"; Wiley: New York, 1980. Smith, 8. T.; Boyie, J. M.; Garbow, B. S.; Ikebe, Y.; Klema, V. C.; Moler, C. B. "Matrix Eigensystem Routines-EISPACK Guide"; Lecture Notes in Computer Science; Springer-Veriag: West Berlin, 1974.
RECEIVED for review April 28, 1982. Accepted August 9, 1982. This work was supported by National Science Foundation Grant No. CHE 8026747 and by the Department of the Army.
Robotic Sample Preparation Station Grover
D. Owens"
anld Rodney J. Eckstein
Miami Valley Laboratories, The Procter and Gamble Company, Cincinnati, Ohio 45247
A prototype robotic sample preparatlon station has been constructed from a mlcrocomputer, a mechanlcal arm, a pipet/buret, a solvent dispenser, a pH meter, and an electronlc balance. Thls system can perform automated weighlng of sollds and liqulds, pH determlnations, dilutlons, and dissolutlons. These slmple capabilities have been combined to allow the total automation of B pH tltratlon from callbratlon of the pH meter and preparatlon of a standard solution and tltrant to selectlon of the end lpolnt and final report generation.
Recent advances in microelectronics ( I ) have ushered in a new age of productivity in analytical chemistry. Automated instrumentation and powerful microcomputers are commonplace in the modern laboratory. These devices are in the process of revolutionizing the way analytical data are acquired
and processed (2). However, an important link is still missing in the chain of the totally automated laboratory. Automation, to many, means only the computerization of data handling and acquisition and does not include tasks such as the automatic movement of samples from point to point around the laboratory, automatic dilution, and automatic weighing ( 3 ) . Some of these functions can be performed by continuous or segmented flow analyzers (4). However, these devices are not able to perform certain common laboratory functons such as weighing of powdered solids. Furthermore, these devices often require that procedures be modified to meet the needs of the analyzer. Methods are not performed in the same fashion as a human analyst would perform them. The missing link between the power of the microcomputer and the mechanical world has recently been forged with the development of sophisticated, flexible mechanical arms ( 5 ) . These robots have received little exposure in the analytical
0003-2700/82/0354-2347$01.25/00 1982 American Chemical Society