Band-Target Entropy Minimization. A Robust Algorithm for Pure

Sara J. Fraser-Miller , Jukka Saarinen , Clare J. Strachan. 2016 ... Deborah Chen , H. Georg Schulze , Dana V. Devine , Michael W. Blades , Robin F. B...
0 downloads 0 Views 189KB Size
Anal. Chem. 2003, 75, 4499-4507

Band-Target Entropy Minimization. A Robust Algorithm for Pure Component Spectral Recovery. Application to Complex Randomized Mixtures of Six Components Effendi Widjaja, Chuanzhao Li, Wee Chew, and Marc Garland*

Department of Chemical and Environmental Engineering, 4 Engineering Drive 4, National University of Singapore, Singapore 119260

A newly developed self-modeling curve resolution method, band-target entropy minimization (BTEM), is described. This method starts with the data decomposition of a set of spectroscopic mixture data using singular value decomposition. It is followed by the transformation of the orthonormal basis vectors/loading vectors into individual pure component spectra one at a time. The transformation is based in part on some seminal ideas borrowed from information entropy theory with the desire to maximize the simplicity of the recovered pure component spectrum. Thus, the proper estimate is obtained via minimization of the proposed information entropy function or via minimization of derivative and area of the spectral estimate. Nonnegativity constraints are also imposed on the recovered pure component spectral estimate and its corresponding concentrations. As its name suggests, in this method, one targets a spectral feature readily observed in loading vectors to retain, and then combinations of the loading vectors are searched to achieve the global minimum value of an appropriate objective function. The major advantage of this method is its one spectrum at a time approach and its capability of recovering minor components having low spectroscopic signals. To illustrate the application of BTEM, spectral resolution was performed on FT-IR measurements of very highly overlapping mixture spectra containing six organic species with a twocomponent background interference (air). BTEM estimates were also compared with the estimates obtained using other self-modeling curve resolution techniques, i.e., SIMPLISMA, IPCA, OPA-ALS, and SIMPLISMA-ALS. A number of self-modeling curve resolution (SMCR) approaches have been developed and widely used in the past three decades. The main purpose of SMCR is to reconstruct the pure component spectra of the observable components from spectroscopic measurements of multicomponent mixtures alone. As the name suggests, SMCR basically does not require any spectral libraries or (in the ideal case) any other a priori information, although many SMCR techniques do require a statistical estimate * Corresponding author. Telephone: (65) 6874 6617. Facsimile: (65) 6779 1936; E-mail: [email protected]. 10.1021/ac0263622 CCC: $25.00 Published on Web 07/15/2003

© 2003 American Chemical Society

of the number of observable species present. The primary premise is the validity of a bilinear model underlying the spectroscopic data. SMCR development started with the elucidation of a twocomponent mixture system by Lawton and Sylvestre.1 This method was based on principal component analysis and nonnegativity constraints on the spectral estimates and their corresponding concentrations. However, the approach was limited to a twocomponent system. Later, this method was further extended by Ohta2 to a three-component system using a Monte Carlo approach to search for the boundaries fulfilling the Lawton and Sylvestre criteria. Borgen and Kowalski3 and Borgen et al.4 also extended this SMCR technique to resolve a three-component system. In the next phase, Kawata and co-workers5-7 used these nonnegativity constraints plus a minimization of information entropy, a concept developed by Shannon,8 to obtain rough but reasonably correct pure component spectral estimates. However, their investigation was also limited to two-or three-component systems. Although this method is in principle very general, in practice many problems arise. For example, when the number of contributing components increases, the dimension of the decision variables also increases according to the square of the number of components. Hence, finding the true objective function minimum corresponding to the true pure component spectra is difficult, particularly if the number of components was larger than three.9 Other similarly spirited methods were being developed at about the same time, as can be seen in the work of Neal10 for resolving pure component spectra of fluorescence mixture data, Brown and Harper11 for resolving mass spectral data, and Banerjee and Li12 (1) Lawton, L. H.; Sylvestre, E. A. Technometrics 1971, 13, 617-633. (2) Ohta, N. Anal. Chem. 1973, 45, 553-557. (3) Borgen, O. S.; Kowalski, B. R. Anal. Chim. Acta 1985, 174, 1-26. (4) Borgen, O. S.; Davidsen, N.; Zhu, M. Y.; Oeyen N. Mikrochim. Acta 1987, 2, 63-73. (5) Sasaki, K.; Kawata, S.; Minami, S. Appl. Opt. 1983, 22, 3599-3603. (6) Sasaki, K.; Kawata, S.; Minami, S. Appl. Opt. 1984, 23, 1955-1959. (7) Kawata, S.; Sasaki, K.; Minami, S.; Komeda, H. Appl. Spectrosc. 1985, 39, 610-614. (8) Shannon, C. E. Bell Syst. Technol. J. 1948, 3, 379-423. (9) Larivee, R. J. Development of minimum entropy techniques for resolving multicomponent systems in chemistry. Ph.D. Thesis, University of Delaware, 1989. (10) Neal, S. L.; Davidson, E. R.; Warner, I. M. Anal. Chem. 1990, 62, 658664.

Analytical Chemistry, Vol. 75, No. 17, September 1, 2003 4499

for resolving spectra using a spectral derivative minimization criterion. In addition, such a nonlinear constrained approach can also be seen in the spectral resolutions performed by Meister13 and Volkov.14 All of the above-mentioned methods can be classified as nonlinear constrained optimization techniques. In the next development phase, many new resolution techniques have also been developed. These techniques were based on identification of pure variables, local rank map, and iterative least squares. Pure variable approaches were seen in KSFA,15 SIMPLISMA,16 and IPCA17 methods, whereas local rank map approaches were seen in EFA,18 WFA,19 HELP,20 OPR,21 etc. Iterative least-squares methods could be seen in ITFFA,22 MCR,23 and PMF24 techniques. Although many methods have been proposed and implemented, some major drawbacks of spectral resolution still remain, in particular, (a) minor components having weak signals, (b) any components having a high degree of spectral overlap, and (c) component spectra having spectral nonlinearities. The latter two problems are of particular concern because they are ubiquitous in the chemical sciences/molecular spectroscopies. Therefore, a significantly new approach to overcome these shortcomings is certainly needed. Progress in this area opens a host of new opportunities (in previously intractable areas), such as meaningful in situ spectroscopic investigations of new and very complex reaction systems where no a priori information exists. In the past few years, our group has reexamined the use of information entropy criteria for spectral reconstruction. For example, Zeng and Garland25 revised Sasaki’s method, using fourth-order derivatives instead of second-order spectral derivatives, as part of the entropy function to enhance resolution in systems with highly overlapping features. Better approximations were clearly seen. Later, Pan et al.26 employed piecewisecontinuous variance-weighted spectral measurements combined with entropy minimization to overcome the problem of spectral windows having significantly different variance, and subsequently, good pure component spectral estimates were obtained. However, Zeng and Pan could only resolve two- or three-component mixture systems. In the next phase, a combination approach of entropy minimization, simulated annealing, and spectral dissimilarity was applied to resolve seven pure component spectra having a high degree of spectral overlap using synthetic data27 and resolve six pure component spectra having a high degree of spectral overlap using real in situ reaction data.28 Resolution results showed that (11) Brown, S. D.; Harper, A. M. In Computer-Enhanced Analytical Spectroscopy; Wilkins, C. L., Ed.; Plenum Press: New York, 1993; Vol. 4, pp 135-163. (12) Banerjee, S.; Li, D. Y. Appl. Spectrosc. 1991, 45, 1047-1049. (13) Meister, A. Anal. Chim. Acta 1984, 161, 149-161. (14) Volkov V. V. Appl. Spectrosc. 1996, 50, 320-326. (15) Malinowski, E. R. Anal. Chim. Acta 1982, 134, 129-137. (16) Windig, W.; Guilment, J. Anal. Chem. 1991, 63, 1425-1432. (17) Bu, D. S.; Brown, C. W. Appl. Spectrosc. 2000, 54, 1214-1221. (18) Maeder, M. Anal. Chem. 1987, 59, 527-530. (19) Liang, Y. Z.; Kvalheim, O. M. Anal. Chim. Acta 1994, 292, 5-15. (20) Kvalheim, O. M.; Liang, Y. Z. Anal. Chem. 1992, 64, 936-946. (21) Shen, H. L.; Manne, R.; Xu, Q. S.; Chen, D. Z.; Liang, Y. Z. Chemom. Intell. Lab. Syst. 1999, 45, 171-176. (22) Gemperline, P. J. J. Chem. Inf. Comput. Sci. 1984, 24, 206-212. (23) Tauler, R.; Kowalski, B. R.; Flemming S. Anal. Chem. 1993, 65, 20402047. (24) Xie, YL.; Hopke, P. K.; Paatero, P. J. Chemom. 1998, 12, 357-364. (25) Zeng, Y.; Garland, M. Anal. Chim. Acta 1998, 359, 303-310. (26) Pan, Y.; Susithra, L.; Garland, M. J. Chemom. 2000, 14, 63-77. (27) Widjaja, E.; Garland, M. J. Comput. Chem. 2002, 23, 911-919.

4500

Analytical Chemistry, Vol. 75, No. 17, September 1, 2003

although good first estimates could be obtained, some spectral overresolutions were clearly seen. The conclusion of these studies was that a one-spectrum-at-a-time resolution algorithm would be important to overcome enormous computation needed as well as the problem of overresolution when multiple spectra are simultaneously reconstructed. These above-mentioned studies have led to the development of a novel SMCR method, band-target entropy minimization (BTEM) algorithm. The BTEM has been successfully used to reconstruct the pure component spectra of reagents, products and transient intermediates from FT-IR reaction data of organometallic and homogeneous catalytic reactions.29-32 A particularly noteworthy application of BTEM can also be seen in the identification of the long sought after and elusive metal carbonyl hydride HRh(CO)4, whose contribution to the total experimental signal intensity was only 0.14%.33 These studies benefited from the fact that a very large number of spectra were taken (hundreds or even thousands) and that the composition space was rather flatsall of the components (with the exception of the solvent) are either highly or moderately dilute in all experiments. BTEM has now recently been applied to real experimental reactive systems with more than 12 species. At this point, it is worth repeating that BTEM does not require any a priori information whatsoever, no libraries and no statistical estimates of the number of observable species present. In summary, it can be noted that entropy minimization techniques in general, and BTEM in specific, search for the simplest (irreducible) underlying patterns in the spectroscopic observations. The potential usefulness of entropy minimization for more general pattern recognition has been noted elsewhere by Watanabe.34 Entropy minimization is associated with the principle of simplicity.35 Since previous studies have relied on very large data sets with dilute components, in the present contribution, the BTEM algorithm will be applied to a small data set, where none of the components can be considered dilute. Smaller data sets will yield smaller sets of basis vectors, so less refined spectral estimates can be expected. Also, if none of the components are dilute, then substantial spectral changes (nonlinearities) might arise. Accordingly, the BTEM technique was applied to a randomized “solvent” mixture system. The performance of BTEM for pure component spectral recovery from complex FT-IR spectroscopy measurement data is presented and discussed. THEORY The BTEM algorithm is initiated by vector-space decomposition of the data observations into the orthonormal basis vectors. Then, from these basis vectors, a closer visual inspection is made to identify significant spectral features of interest. Interactively, the user can select an interesting spectral feature to retain during (28) Chen, L.; Chew, W.; Garland, M. Appl. Spectrosc. 2003, 57, 491-498. (29) Widjaja, E.; Li, C. Z.; Garland, M. Organometallics 2002, 21, 1991-1997. (30) Chew, W.; Widjaja, E.; Garland, M. Organometallics 2002, 21, 1982-1990. (31) Li, C. Z.; Widjaja, E.; Garland, M. J. Catal. 2003, 213, 126-134. (32) Li, C. Z.; Widjaja, E.; Garland, M. J. Am. Chem. Soc. 2003, 125, 55405548. (33) Li, C. Z.; Widjaja, E.; Chew, W.; Garland, M. Angew. Chem., Int. Ed. 2002, 41, 3785-3789. (34) Watanabe, S. Pattern Recognit. 1981, 13, 381-387. (35) Kanpur, J. N. Maximum-Entropy Models in Science and Engineering; Wiley: New York, 1993.

spectral reconstruction. In brief summary, the BTEM algorithm forces the retention of selected spectral features and at the same time reconstructs the entire associated pure component spectrum based on an entropy minimization approach. The details of the BTEM algorithm are as follows. First, a matrix of observation data, that is absorbance data, is consolidated into Ak×ν, where k denotes number of mixture spectra, and ν denotes the number of data channels associated with the spectroscopic measurements. Starting from the bilinear form of the Lambert-Beer-Bougeur law, the absorbance matrix is considered as a linear combination of a concentration matrix Ck×s (which incorporates the path length l) and an absorptivity matrix as×ν. Next the absorbance data matrix, Ak×ν, is decomposed by singular value decomposition (SVD) (eq 1), yielding the abstract orthonormal matrices Uk×k T and Vν×ν with its singular matrix Σk×ν. Furthermore, Ak×ν can be approximated by eq 2, where s is the number of species recovered and z is the number of right singular vectors used for spectral reconstruction. Note that (Ts×z)-1 is the generalized inverse for a rectangular transformation matrix, aˆ s×ν is the matrix of averaged pure component expectations for s species, C ˆ k×s is its corresponding expectation for concentration calculated from eq 3, and k×ν is a combination of experimental error and spectral nonlinearities. From the description of model error used, it is implicit that the LBBL is only locally valid,36 and from the transformation used in eq 2 where z > s, the presence of spectral nonlinearities is explicitly highlighted. If global validity of the LBBL were true, the system is linear, and lonely s vectors would contain sufficient meaningful chemical/spectroscopic information. T Ak×ν ) Ck×sas×ν + k×ν ) Uk×kΣk×νVν×ν T Ak×ν ≈ C ˆ k×saˆ s×ν ) Uk×sΣs×z(Ts×z)-1Ts×zVz×ν

C ˆ k×s ) Uk×sΣs×z(Ts×z)

-1

)

T T (aˆ 1×νaˆ ν×1 )-1 C ˆ k×1 ) Ak×νaˆ ν×1

(4) (5)

The objective function for optimizing the elements of T1×z in the BTEM algorithm is shown in eq 6. The first term on the right-

| | ∑| | daˆ ν

hv )



daˆν

/

ν

(7)



max ˆ k×1,aˆ 1×ν ) ) γaF1(aˆν) + γcF2(C ˆ k) + γmax P(aˆ 1×ν,C

(8)

∑(aˆ )

∀ aˆν < 0

(9)

∀C ˆk < 0

(10)

where

F1(aˆν) )

2

ν

ν

ˆ k) ) F2(C

{

∑(Cˆ )

2

k

k

F1(aˆν) < λ1

γa ) 10 λ1 e F1(aˆν) < λ2

(11)

10 F1(aˆν) g λ2 4

γc ) 103 ∀ F2(C ˆ k) γmax )

{

(12)

max 104 aˆ 1×ν > R

0

max aˆ 1×ν eR

(13)

Simple but useful extensions of the approach are possible. In addition to the objective function based on information entropy minimization alone, some novel objective functions can be utilized, with the ultimate aim of simplifying the resolved bands of the pure component spectra. These new objective functions include the minimization of the summation of first-, second-, or fourthorder derivatives of the absorbance estimate aˆ 1×ν, with or without additional minimization of the integrated area of the absorbance estimate aˆ 1×ν in order to prevent the overresolution of the pure component spectral estimate.

minFobj )

∑|ds | + ∑|aˆ | + P(aˆ ν

ν

(36) Garland, M.; Visser, E.; Terwiesch, P.; Rippin, D. W. T. Anal. Chim. Acta 1997, 351, 337-358.

(6)

where

0

(3)

max hν + P(aˆ 1×ν,C ˆ k×1, aˆ 1×ν )

ν ln

ν

k g z g s (2)

T T Ak×νaˆ ν×s (aˆ s×νaˆ ν×s )-1

zgs

∑h

min G ) -

(2)

The one-spectrum-at-a-time reconstruction via BTEM is performed by the sequential projection of z right singular vectors onto onedimensional subspaces. Thus, each single vector associated with each pure component spectral estimate is sequentially calculated. The expectation for each spectrum aˆ 1×ν is then given by eq 4, with corresponding expectation for concentration C ˆ k×1 given by eq 5. In other words, the essence of BTEM as found in eq 4, is the optimal combination of a very large number basis vectors in such a way as to obtain a single new spectral estimate that is very simple in terms of information entropy and retains the targeted feature. The proper determination of the elements in the transformation matrix T1×z achieves this goal. T aˆ 1×ν ) T1×zVz×ν

hand side is the Shannon entropy function based on a first derivative of the absorbance estimate aˆ 1×ν (eq 7). The second term is a penalty function (eq 8) for ensuring (i) nonnegativities in each pure component spectral estimate aˆ 1×ν and each component estimated concentration C ˆ k×1, and (ii) that a reasonable maximum max is obtained. Together with the penalty spectral absorbance aˆ 1×ν function are 3 sets of associated scalar parameters: (i) γa, γc, γmax are penalty coefficients for constraints defined by eqs 11, 12 and 13, (ii) λ1 ) 10-3 and λ2 ) 10-2 are bounds for the absorptivity constraint defined in eq 11, and (iii) R defined in eq 13 sets the maximum absorbance of resolved pure spectrum, in relation to target the target band peak absorbance aˆ 1×ν which is normalized to be identically 1.0.

ν

max ˆ k×1,aˆ 1×ν ) 1×ν,C

(14)

ν

dsv ) dnaˆν/dνn;

n ) 1, 2, or 4

Analytical Chemistry, Vol. 75, No. 17, September 1, 2003

(15) 4501

Figure 1. First eight raw experimental randomized mixture spectra.

In this latter objective function, the spectral derivative and integrated area-based minimization no longer uses a truly proper probability distribution type of information entropy. However, the “BTEM” term is still be employed, because the nature of the new proposed function (eq 14) is still to maximize the simplicity of the pure component spectral features and to localize the band signals. In other words, the basic spirit behind this new objective function mimics the information entropy-based minimization, in which the complexity of the spectral shapes is minimized. Therefore, a slightly wider and more general meaning of entropy is adopted/retained. In BTEM, the vectors in the VT matrix were first inspected for significant spectral features. Such features usually appear only in the first few vectors of VT as they represent most of the variance in the data matrix Ak×ν. Since BTEM targets these features oneat-a-time, narrow intervals corresponding to these local extrema are assigned. The reason for taking a region of wavenumbers for the targeted band rather than an exact singular band peak wavenumber is that nonlinearities due to band-shifting and shape changes exist in real spectra. In the process of transforming the right singular vectors, target the targeted band peak absorbance aˆ 1×ν is normalized to 1.0, max and the maximum of the estimated spectrum aˆ 1×ν is constrained to be equal to or less than a preassigned scalar value R g 1.0 (eq 13). It is worth reiterating that no a priori information as such is required by this BTEM approach; only visual inspection and exhaustive search of interesting spectral features seen in the right singular vectors were performed. EXPERIMENTAL SECTION All the chemicals, i.e., n-hexane (99.6%, Fluka AG), toluene (99.9%, Mallinckrodt Chemical), acetone (99.7%, J. T. Baker), 3,3dimethylbut-1-ene (33DMB, 99%, Fluka AG), dichloromethane (DCM, 99.5%, Merck), and 3-phenylpropionaldehyde (98%, Merck), were used as obtained. The spectra were measured using a Perkin-Elmer 2000 FT-IR infrared spectrometer. A standard CaF2 IR cell, with 0.05-mm optical path length, was used for all measurements. The optical, 4502

Analytical Chemistry, Vol. 75, No. 17, September 1, 2003

sample, and detector chambers of the spectrometer were continually purged with N2 (99.999%, Soxal) at a flow rate of ∼2 L/min). The resolution was set to 4 cm-1 for all spectroscopic measurements. All spectra were recorded on the range of 950-3200 cm-1. Data acquisition was made at 0.2-cm-1 intervals. First, three background spectra containing moisture and CO2 signals were taken. Then 15 randomized mixtures of the 6 organic chemicals were prepared and measured. Thus, a total of 18 spectra were recorded in this study. The BTEM algorithm was implemented in MATLAB (version 5.3.1(R11)) environment, and computational tasks were executed using a dual processor Intel Pentium 500-MHz Win NT workstation with 2 GB RAM. RESULTS All 18 raw experimental spectra were consolidated into an absorbance matrix having 18 rows and 11 251 columns. The first attempt at data decomposition on this very long rectangular absorbance matrix failed due to a shortage of workstation memory (2 GB RAM). To overcome this issue, reduction of the number of data channels was performed by taking every other point on each spectrum. Again, singular value decomposition was performed on this new size-reduced matrix A18×5626 yielding three singular matrices, U, Σ, and VT matrices. The first 8 experimental mixture spectra and the first 6 VT vectors plus the 17th and 18th VT vectors are shown in Figure 1 and Figure 2, respectively. In Figure 1, the complex spectral features of this mixture system are clearly seen, particularly in the spectral range of 9502000 and 2800-3200 cm-1. As the average toluene concentration was ∼47%, it is expected that the signal of the mean spectrum for all mixtures will exhibit strong toluene features. It is clearly shown in the first VT vector, which is the average spectrum for all mixture observations. From Figure 2, it is also shown that the 17th and 18th VT vectors have very low signal-to-noise ratio. Indeed, the latter appears to contain primarily heteroscedastic noise (in 9502000- and 2800-3200-cm-1 regions) and some randomly distributed white noise.

Figure 2. First 6 and the 17th and 18th right singular vectors with marked extrema used for BTEM spectral reconstruction.

From the first six VT vectors, eight extrema labeled a-g were chosen as indicated in Figure 2 and were used as targets in the BTEM algorithm. All reconstructions were performed using 17 VT vectors. The parameters for BTEM spectral reconstructions and the estimated species identities are shown in Table 1 as follows. As a comparison, the real experimental reference and the estimated pure component spectra via BTEM are presented in Figure 3. Their high degree of similarity to all six solvents is readily apparent as well as the CO2 spectrum. The real gas-phase moisture reference spectrum is actually a superposition of moisture and spectrometer background whereas the BTEM estimate of moisture is a very good representation of gas-phase moisture alone.

Table 1. Spectral Reconstruction Parameters and Species Identities marked extrema

wavenumber region

max abs, R

species identification

a b c d e f g h

1029-1031 2871-2874 1215-1220 1724-1730 2959-2963 1264-1268 1558-1560 2359-2363

5 5 5 5 5 5 5 5

toluene n-hexane acetone aldehyde 33DMB dichloromethane H2O CO2

As mentioned in the Experimental Section, 15 of the 18 spectra used in this study contained the solvent mixtures. Table 2 presents Analytical Chemistry, Vol. 75, No. 17, September 1, 2003

4503

Figure 3. (a) Real experimental reference pure component spectra. (b) The estimated pure component spectra via BTEM.

the least-squares fit of (a) the six individual reference solvent spectra and experimental backgrounds, (b) the six individual BTEM estimates of the pure solvent spectra and experimental backgrounds, and (c) five individual BTEM estimates of the pure solvent spectra, the real reference toluene spectrum, and the experimental backgrounds. As shown in Table 2, there is a systematic error in the toluene and hexane concentrations using the six BTEM estimates (part b). This can be traced in part to the large toluene signals in the experiments, combined with a small discrepancy between reference toluene and the BTEM estimate of toluene. A close inspection shows that some hexane signal has been mixed into the BTEM estimate of toluene. When 4504

Analytical Chemistry, Vol. 75, No. 17, September 1, 2003

five BTEM solvents estimates are used as well as the real reference toluene, the concentration estimates (part c) are in reasonable agreement with the real concentrations (part a). DISCUSSION Although a limited number of spectroscopic measurements were taken (only 18 spectra) in this experiment, it was seen that BTEM could indeed recover all six pure component spectra plus two components from the background air. Almost all primary bands of each component spectrum were retained in the pure component estimates. Although there are some minor but noticeably distorted spectral features for some pure component esti-

Table 2. Comparison between the Reconstructed Concentrations and the Real Concentrationsa relative component concentrations hexane acetone aldehyde 33DMB

DCM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

(a) Using the 6 Individual Reference Solvent Spectra and Experimental Backgrounds 0.462 0.143 0.281 0.107 0.000 0.462 0.143 0.281 0.107 0.000 0.183 0.032 0.133 0.388 0.123 0.255 0.110 0.184 0.087 0.277 0.241 0.055 0.207 0.000 0.003 0.372 0.266 0.222 0.105 0.006 0.617 0.054 0.247 0.011 0.024 0.591 0.057 0.079 0.027 0.183 0.688 0.077 0.166 0.063 0.001 0.424 0.127 0.287 0.046 0.012 0.453 0.052 0.377 0.096 0.000 0.878 0.023 0.059 0.016 0.017 0.627 0.164 0.107 0.040 0.010 0.674 0.098 0.164 0.011 0.021 0.669 0.029 0.119 0.115 0.000

0.005 0.005 0.141 0.087 0.475 0.028 0.050 0.063 0.007 0.105 0.015 0.008 0.052 0.032 0.064

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

(b) Using the 6 Individual BTEM Estimates of the Pure Solvent Spectra and Experimental Backgrounds 0.369 0.219 0.295 0.096 0.017 0.369 0.219 0.295 0.096 0.017 0.155 0.053 0.141 0.353 0.135 0.232 0.126 0.201 0.087 0.274 0.238 0.102 0.229 0.000 0.026 0.313 0.296 0.240 0.098 0.033 0.493 0.196 0.269 0.010 0.023 0.509 0.192 0.094 0.021 0.157 0.527 0.226 0.174 0.047 0.000 0.349 0.198 0.307 0.046 0.030 0.357 0.149 0.392 0.086 0.013 0.650 0.232 0.068 0.005 0.000 0.529 0.295 0.120 0.033 0.009 0.542 0.245 0.181 0.008 0.016 0.540 0.198 0.130 0.094 0.000

0.000 0.000 0.163 0.082 0.404 0.020 0.010 0.028 0.000 0.071 0.003 0.000 0.014 0.000 0.027

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

(c) Using 5 Individual BTEM Estimates of the Pure Solvent Spectra, the Real Reference Toluene Spectrum, and the Experimental Backgrounds 0.402 0.104 0.322 0.112 0.036 0.402 0.104 0.322 0.112 0.036 0.158 0.000 0.144 0.370 0.148 0.240 0.050 0.209 0.096 0.301 0.248 0.023 0.241 0.001 0.038 0.332 0.207 0.254 0.111 0.050 0.550 0.034 0.297 0.018 0.050 0.571 0.023 0.098 0.031 0.204 0.633 0.059 0.203 0.066 0.023 0.375 0.087 0.330 0.055 0.049 0.384 0.031 0.424 0.100 0.031 0.854 0.016 0.076 0.018 0.026 0.598 0.136 0.128 0.046 0.036 0.627 0.073 0.202 0.018 0.045 0.628 0.018 0.144 0.120 0.013

0.024 0.024 0.181 0.104 0.450 0.046 0.051 0.073 0.017 0.104 0.031 0.010 0.059 0.036 0.078

spectra no.

a

toluene

Please note that samples 1 and 2 are replicates.

mates in the highly overlapping spectral range of 2800-3200 cm-1, the overall quality of the estimates is still very high. The most obvious distortion can be seen in the toluene estimate in the spectral range of 2850-3000 cm-1. Our previous work shows that such types of imperfections/artifacts in spectral reconstructions are easily overcome if the number of observations/spectral measurements is modestly increased. In addition, the flatter baselines of pure component spectral estimates are also apparent. This is due to the inherent nature of BTEM approach to produce the smoother and simpler spectral features. In particular, such reconstruction is clearly seen in the

moisture estimate. A very flat baseline was resolved, compared to its reference spectrum. BTEM discarded a variety of optical effects and other spurious signals in the background. The peak maximums at 1050, 2861, 2926, and 3018.4 cm-1 were eliminateds apparently they are not correlated to vapor-phase moisture and are associated with other airbourne species. Toluene had the strongest signal intensities contributing ∼47% on average, and carbon dioxide had the weakest signal intensities contributing only ∼0.6%. These results really confirm the robustness and generality of the novel BTEM technique for FT-IR pure component spectral recovery with a limited number of spectral measurements. BTEM could even recover components whose signal intensities vary more than 2 orders of magnitude. Comparison of Results from BTEM and Other Spectral Reconstruction Methods. For comparison purposes, other SMCR methods, i.e., simple-to-use interactive self-modeling mixture analysis (SIMPLISMA),16 interactive principal component analysis (IPCA),17 orthogonal projection approach continued with alternating least squares (OPA-ALS),37 and SIMPLISMA plus ALS were also implemented. The basic principle used in both SIMPLISMA and IPCA is the determination of a pure variable for each component. Therefore, if pure variables of all components in the mixture are known, the concentration profiles can be estimated, and in turn, the pure component spectra can be resolved based on the linear relationship of Beer’s law. IPCA relies on principal component analysis to obtain the pure variables, whereas SIMPLISMA does not. A different approach is used for OPA, which starts with a sequential selection of a set of most dissimilar spectral vectors from the data matrix A. These purest spectra are later used as the initial estimates in the iterative-refining process performed by ALS. Finally, adding ALS to SIMPLISMA often results in refinement of pure spectral estimates. It should be noted again that all of these methods require an a priori or statistical estimate of the number of observable speciesswhile BTEM does not. The chosen degrees of freedom (number of components) for these three SMCR methods was s ) 8. For SIMPLISMA, the offset value used to avoid dividing by zero and to bias the purity slightly toward variables with a higher intensity was set to 3% of the maximum peak intensity of the mean of the data set. The IPCA algorithm was implemented using the SVD results of A18×5626. For OPA-ALS, nonnegativity constraints on relative concentration and absorptivity estimates were imposed, and the ALS code used was written by Tauler and de Juan.38 Spectral reconstruction results using these four SMCR methods are presented in Figure 4. Although SIMPLISMA and IPCA could reconstruct the pure component spectra of toluene, nhexane, acetone, 3-phenylpropionaldehyde, and dichloromethane, the reconstruction qualities are quite poor and not comparable to those obtained via BTEM. Negative absorptivities and overresolutions are clearly visible, especially for toluene, 3-phenylpropionaldehyde, and dichloromethane. SIMPLISMA and IPCA also failed to produce the pure component spectra of 3,3-dimethylbut-1-ene, moisture, and CO2. Instead of individual spectra, a combined moisture and CO2 spectrum was resolved. The worst (37) Sanchez, F. C.; Toft, J.; Van den Bogaert, B.; Massart, D. L. Anal. Chem. 1996, 68, 79-85. (38) Matlab Code for ALS (als99.m), written by R. Tauler and A. de Juan, is available on the Internet. URL: http://www.ub.es/gesq/mcr/mcr.htm.

Analytical Chemistry, Vol. 75, No. 17, September 1, 2003

4505

Figure 4. Estimated pure component spectra obtained via other SMCR methods: (a) SIMPLISMA; (b) IPCA; (c) OPA-ALS; (d) SIMPLISMAALS.

reconstruction can be seen from the OPA-ALS results. Only the pure component spectrum of toluene was well resolved. Other reconstructions completely failed. The failure of OPA-ALS might be traced back to the reliance on obtaining the most orthogonal sets of initial estimates when one component (toluene) dominates the signal intensity. As such, the initial estimates are still quite similar and subsequent refinement does not improve the situation. It is worth noting that the inverted second-derivative SIMPLISMA was also performed; however, it did not produce any improvements. Finally, the last method, SIMPLISMA plus ALS, provided the best results of the four methods used in this comparison. The pure component spectra of the six solvents are relatively good; however, the estimations of moisture and CO2 are in considerable error. These results also do not match the quality of the BTEM estimates shown previously in Figure 3b. For this test case, additional useful comparisons between BTEM and the four other SMCR techniques can be provided. Table 3 shows the inner products of the spectral estimates with the experimental reference solvent spectra. From this table, it can be seen that in general BTEM and SIMPLISMA-ALS both significantly outperformed the other three techniques. Furthermore, with an average inner product value of 0.958, SIMPLISMAALS outperformed BTEM, whose average inner product value of 0.936 (note, however, that the background spectral estimates for SIMPLISMA-ALS are very poor, while the estimates from BTEM are very accurate). In summary, the present tests of BTEM on small and nondilute data sets indicate that the spectral estimates are not quite as good as those obtained previously from a large set of spectra from dilute 4506

Analytical Chemistry, Vol. 75, No. 17, September 1, 2003

Table 3. Measures of Spectral Similarity between the Reference Pure Component Spectra and the Spectral Estimates from Various SMCR Techniquesa

toluene n-hexane acetone aldehyde 33DMB DCM a

BTEM

SIMPLISMA

IPCA

OPA-ALS

SIMPLISMAALS

0.954 0.992 0.886 0.899 0.983 0.904

0.971 0.994 0.866 0.943 0.576 0.969

0.968 0.994 0.873 0.937 0.584 0.971

0.996 0.523 0.639 0.579 0.555 0.502

0.973 0.995 0.899 0.953 0.963 0.967

Entries are the corresponding values of the inner product.

solutions. Typical average values for BTEM inner products, from the larger experimental studies, are ∼0.98+ when hundreds of spectra are used as input.30 Potential for Application to Other Spectroscopies. BTEM has now been applied to FT-IR spectroscopic data from dilute organometallic reactions,28 homogeneous catalyzed reactions,29-31 and concentrated simple organic mixturessthe latter being the present emphasis. However, it should be emphasized that BTEM has also been successfully applied to various environmentally relevant solid samples, containing inorganic and organic contaminants, using RAMAN and hyphenated FT-IR-RAMAN spectroscopies.39-41 This clearly indicates a more general ap(39) Ong, L. R. Extension of Pure Component Spectra Reconstruction in Exploratory Chemometrics to Solid Samples for XPS and FT-Raman. B. Eng. Thesis, National University of Singapore, 2001. (40) Sin, S. Y. Application of FT-Raman Spectroscopy Measurements. B. Eng. Thesis, National University of Singapore, 2002.

plicability to spectroscopies having intense localized signals and a general robustness to deal with nonlinearities and other difficult spectroscopic complications. A modified form of BTEM applicable to discrete mass spectroscopy data, rather than continuously differentiable data, has been successfully developed and tested.42 Extensions to NMR and powder XRD studies are currently in progress. CONCLUSION The BTEM algorithm has shown its usefulness to recover eight pure component spectra from a limited number of spectral measurements. High reconstruction qualities were obtained, and indeed, these results show that BTEM outperformed SIMPLISMA, IPCA, OPA-ALS, and SIMPLISMA-ALS. One significant strength of BTEM is that it does not require a priori knowledge of the (41) Ong, L. R.; Widjaja. E.; Stanforth, R.; Garland, M. J. Raman Spectrosc. 2003, 34, 282-289. (42) Zhang, H.; Garland, M.; Zeng, Y.; Wu, P. In Recent Advances in Computational Science and Engineering; Lee, H. P., Kumar, K., Eds.; Imperial College Press: London, 2002; pp 49-53.

number of observable components present. Significantly more right singular vectors than the anticipated number of components are used and then transformed into pure component spectral estimates. Hence, chemical information imbedded in the latter VT vectors (z > s) due to spectral nonlinearities can also be accounted for and utilized in the transformation. In addition, minor components having lower spectroscopic signals are recovered. This characteristic is also novel and cannot be found in other SMCR methods, where a statistical number of observable components is usually needed prior to spectral reconstruction. Finally, although the computational requirements of BTEM are larger than more popular techniques, this can often be justified, particularly for systems containing many new and previously unknown species and those containing such species at trace levels.

Received for review November 28, 2002. Accepted June 4, 2003. AC0263622

Analytical Chemistry, Vol. 75, No. 17, September 1, 2003

4507