Research: Science and Education
W
Multivariate Curve Resolution Methods Illustrated Using Infrared Spectra of an Alcohol Dissolved in Carbon Tetrachloride Bjørn Grung* and Egil Nodland Department of Chemistry, University of Bergen, Allégt. 41, N-5007 Bergen, Norway; *
[email protected] Geir Martin Førland Chemical Institute, Bergen University College, P.O. Box 7030, N-5020 Bergen, Norway
Advances in instrument manufacturing influence the way an analytical chemist works. The enormous quantity of data a modern computerized instrument produces demands efficient methods for information extraction. Traditionally, analytical data involved scalar quantities. The last 20 to 30 years have seen a change, via vectorial data (e.g., a digitized spectrum) to the current situation where instruments may deliver gigabytes of data for each sample analyzed. For example, the full scan mass spectral detection for high performance liquid chromatography produces huge quantities of data. In spite of the advances made in instrumental design and the methodological advances within the field of model building, textbooks on analytical chemistry only briefly mention multivariate data analysis. In many institutes of education, this leads to multivariate modeling techniques being largely ignored. It is the aim of this article is to present some multivariate methods for curve resolution and to illustrate how they may come into play in the teaching of analytical chemistry. To illustrate this, infrared spectroscopic measurements are used to study intermolecular hydrogen bonding of an alcohol dissolved in a nonpolar solvent. In this environment, the alcohol molecules self-associate and form various types of hydrogen-bonded species. The process of hydrogen bonding evolves with increasing alcohol concentration. At low concentration the alcohol molecules are present mainly as free alcohol monomers. As the gross concentration of alcohol increases, hydrogen-bonded aggregates are formed. The rank analysis reveals the number of these different species contributing to the absorption bands in the spectra. Here, a global and a local method are used, both based upon factor analysis. The curve resolution, which is basically solving the Beer– Lambert law for all parameters, was done iteratively. The resulting concentration profiles can be used in calculating the degree of association and equilibrium constants, as in the work of Frohlich (1). Further information and interpretation of the resolved spectra are given in ref 2. Theory The term hyphenated instrumentation is sometimes used to describe a measurement system where two measurement techniques are coupled. This type of instrumentation characterizes each sample by a complete data matrix. As early as 1980, Hirschfeld (3) listed 66 possible hyphenated methods. An example is high performance liquid chromatography with diode array detection (LC-DAD). Plotting any row in the data matrix displays the spectrum of the analytes that elute www.JCE.DivCHED.org
•
at that specific time. A plot of a column shows the combined chromatographic profile of the chemical components absorbing light at that specific wavelength. Although the methods presented in this work are designed to work with data from hyphenated instrumentation, they have also been successfully used to resolve data from reaction monitoring when spectral measurements (complete spectra) are used for monitoring. An example of such reaction monitoring is demonstrated later in this work. A fair assumption to make about the signal produced by this type of instrumentation is that it is additive and bilinear. Additive means that the systematic part of the measured signal can be regarded as individual contributions from each component. Let X be the data matrix recorded for a sample consisting of R chemical components. The contribution from component number i is referred to as Xi. The additive assumption may now be written as R
X = X1 + X 2 + ... + X R + E =
∑ X i
+ E (1)
i =1
E is a matrix representing experimental noise. The bilinearity assumption implies that the signal from any chemical component can be seen as the outer product of two vectors; one describing the concentration profile and one the spectral profile belonging to the component of interest. The use of both these assumptions leads to R
X =
∑ c i siT +
E = C ST + E
(2)
i =1
The matrix C contains, as columns, the concentration profiles of all the chemical components with a detectable signal. The matrix S contains, as columns, the spectral profiles of all the chemical components with a detectable signal. Owing to experimental noise there is always a slight difference between the recorded signal X and the sum of the modeled contributions from each component, CST. This difference is usually referred to as the residuals and collected in a matrix E of the same dimensions as the data matrix X. The curve resolution method described in this work aims to solve eq 2 for the unknowns C and S. Different methods for curve resolution vary: both in their approach, as well as to the extent that they employ external information. What they have in common is that they try to answer the following three questions pertaining to the system investigated: How many? What? How much? The first question is a question of chemical rank—how many components exist in the system?
Vol. 84 No. 7 July 2007
•
Journal of Chemical Education
1193
Research: Science and Education
The second question is about the identity of the components. This is usually answered by interpreting the resolved spectral profiles, S. The third question concerns the quantitative information (either relative or absolute through the use of standards) and is usually answered by studying the resolved concentration profiles, C.
Finding the Number of Chemical Components The starting point for any curve resolution is chemical rank analysis. It is not possible to resolve any profiles unless one has a good estimate of the chemical rank of the system. Several methods exist for this purpose, and they may be divided into global and local methods. A simple global method is to study the size of the eigenvalues of the information matrix XTX, or the singular values of the data matrix (they contain the same information). As shown in ref 4, this method is insensitive to the presence of minor components. The method’s ability to correctly decide the chemical rank is also heavily dependent upon the analytical profiles from different constituents being significantly different from each other. Another global approach is based on principal component analysis (PCA) (5). Any data matrix X can be decomposed using PCA into a set of scores (T) and loadings (P): (3) X = TPT The number of principal components extracted is equal to the number of columns in T (and P). For a data matrix X, having N rows and M columns, it is possible to extract N or M principal components—whichever is the smaller number. Extraction of the maximum number of principal components is usually of little interest. Instead, one extracts only as many principal components as necessary to explain the systematic variation in X. Any data matrix contains noise in addition to the systematic variation, and it is therefore common to truncate the PCA after extraction of R principal components, where R is the chemical rank of the system. The shape of the loading vectors (the columns of P) can be used to estimate the proper chemical rank. The shape of the first loading vector is usually smooth, and it is often somewhat similar to the mean spectrum of a data matrix. The second loading vector is orthogonal to the first (by definition) but, more importantly, contains more noise than the first. By studying the shapes of loading vectors from the different principal components, it becomes apparent that they gradually become noisier as the number of extracted components increase. When the number of principal components exceeds the chemical rank R, any additional loading vector should in theory display a noisy structure. This is illustrated in Figure 1. In practice, factors such as correlated noise, instrumental artifacts, small net analytical signals (6), and so forth make it difficult to rely on this approach alone. A better approach is to utilize the evolving, or time dependent, nature of the data. Therefore, local methods are better suited to determine the chemical rank. Several local rank mapping methods exist. Some of the better known are evolving factor analysis (7, 8), fixed size moving window factor analysis (9), cookie cutter method (10), and eigenstructure tracking analysis (ETA) (11). Only the ETA method is described in the work presented here. However, the methods are similar in their approach, and all can be referred to as
1194
Journal of Chemical Education
•
Figure 1. The first three loading vectors for a region of chemical rank 2. Notice that two of the loading vectors display structure. The third loading vector is noise.
windowing techniques. All the methods mentioned here can be used to establish a rank map. This map gives information on the number of analytes—both globally and locally. The information in the rank map is subsequently used in the resolution process, where every analyte’s analytical profiles are calculated. The ETA procedure focuses on small regions of the matrix, so-called sub matrices. The size of these matrices is W × M, where W is called the window size. M is the number of variables in the data set. The window is placed so that it covers the first W rows of the matrix. The eigenvalues of this matrix are calculated. Then the window is moved down one row, so that it covers rows two and three (for a window size of 2). The eigenvalues of this matrix are calculated, and the window moved again. This procedure is repeated throughout the data matrix. The eigenvalues calculated by this procedure are called evolving eigenvalues. As the data evolve (and the window is moved through the data), we obtain new eigenvalues. The logarithms of the evolving eigenvalues are plotted as a function of the window position (retention times). The resulting plot is called an ETA plot. An ETA plot for window sizes 2, 3, and 4 is shown in Figure 2A, B, and C, respectively. For the first retention times in Figure 2A, the two evolving eigenvalues are fixed at a stable and low level. This region is a zero-component region, where no analytes give rise to a signal. After some time, the first evolving eigenvalue starts to increase. This indicates that an analyte has started to elute. We have moved into a rank one region. Such regions are called selective regions. Later, the second evolving eigenvalue starts to rise. This means that another analyte has started to elute. Two significant evolving eigenvalues imply chemical rank two, that is, the recorded signal has contributions from two analytes. Towards the middle of the plot in Figure 2B, the third eigenvalue rises above the noise level. We have entered a three-component region. As the analytes cease to elute, the corresponding eigenvalues drop down to the noise level again. One observes a second selective region towards the right-hand side of the plot, where only one eigenvalue lies above the noise level. Finally, we again move into a zero-component region.
Vol. 84 No. 7 July 2007
•
www.JCE.DivCHED.org
Research: Science and Education
Figure 2. ETA plots using window size 2 (A), 3 (B), and 4 (C) in the retention time direction.
It is not necessarily the same analyte that appears in both selective regions. We cannot decide from the ETA plot alone whether the two selective regions contain signal from the same analyte, or whether the two regions belong to different analytes. Furthermore, we cannot state that the three-component region appearing in Figure 2B only contains signal from three analytes. The presence of a possible fourth analyte can only be detected by running the ETA analysis again, but with a window size equal to four. By redoing the analysis with window size four, as shown in Figure 2C, a fourth evolving eigenvalue would be displayed in the plot. If this eigenvalue ever rises above the noise level, then at least four analytes are present. To detect five analytes, a window of size five would be needed. One repeats the analysis until the last evolving eigenvalue lies at the base level at all times. Successful use of ETA discloses the underlying evolving rank structures in the different regions of the data matrix. The information extracted using this method is invaluable when it comes to successfully resolving the data into analytical profiles.
www.JCE.DivCHED.org
•
ETA plots are simple to construct, but using them in a proper way is sometimes difficult owing to the complex behavior of real analytical data. Real data often contain heteroscedastic noise, where the intensity of the noise increases as the intensity of the recorded signal increases (12). This leads to additional, seemingly significant, evolving eigenvalues in the ETA plot—the number of lines above the noise limit is larger than the chemical rank. To a certain extent, this can be remedied by proper transformations (13). However, this and other deviations from the ideal situation make it necessary to validate the ETA plot. It is common to validate the ETA results by performing PCA on a region of suspected chemical rank R. As mentioned above, one would expect the first R loading vectors to contain visibly less noise than the latter loading vectors for a rank R system. The explanation above concerns the use of ETA on data with a chromatographic (separation) direction. As will be shown later, ETA plots can also be used when studying data without such a direction. In the experimental section, a dilution series is studied using infrared spectroscopy. The lack of a separation direction necessitates performing the ETA analysis in the wavenumber direction. This makes interpreting the plots more difficult, as the signal from an analyte is spread out throughout most of the wavenumber region. This is a different situation from the chromatography case, where the signal from an analyte appears within a well-defined region. Validation of the ETA results becomes even more important in such cases. To support the findings of the ETA, rank analysis by latent projective graphs (LPG) (14) can be applied. A two-way data structure of dimension N × M can be presented in two coordinate systems. These are the N-dimensional time space defined by N measurements of a spectrum and the M-dimensional wavenumber space defined by M wavenumbers. The N row vectors (spectral profiles) project as N single points in the M-dimensional wavenumber space, and consequently the M column vectors (concentration profiles) project as M single points in the N-dimensional time space. Selective spectral regions are revealed in time space as segments of straight lines that can be extrapolated through the origin of the coordinate system. This property is used in LPG. By replacing the original N coordinate axes by two latent (underlying) axes from a PCA, a simple bivariate representation (score or loading plot) is obtained. The selective spectral regions still map as straight-line segments, while nonselective regions do not. Thus, by visual inspection of latent projective graphs the desired selective regions can be found. To correctly resolve the measured data, it is crucial to separate the systematic variation due to chemical components from the variation due to noise. This makes rank estimation important, and a host of techniques for this purpose therefore exists. Global techniques, such as PCA and LPG, make calculations based on all the data in the data matrix. Local rank analysis, such as ETA, works by investigating smaller regions of the data matrix. However, heteroscedastic noise may lead to an overestimation of the local rank when using ETA. Furthermore, the presence of minor components and the similarity of the various analytical profiles in the data make trusting the results obtained from a single rank analysis precarious. Thus, the use of more then a single rank analysis method is crucial for the determination of the correct
Vol. 84 No. 7 July 2007
•
Journal of Chemical Education
1195
Research: Science and Education
chemical rank. In this work, ETA is used as the main rank analysis tool. Its sensitivity and speed makes it well suited for this purpose. PCA and LPG are used to validate the results from the ETA.
Resolution of Data into Analytical Profiles Once the global (and possibly local) chemical rank has been established, resolution of the recorded data X into sets of analytical profiles C (concentration profiles) and S (spectral profiles) is attempted. Since the pioneering work of Lawton and Sylvestre (15), a multitude of methods have been developed. In the work presented in this article, alternating least squares (ALS) (16, 17) was employed. ALS, also known as alternating regression (AR), is based upon iterating between the following two equations:
( )
C = XS ST S
(
ST = C TC
)
−1
−1
(4)
CT X
(5)
Equation 4 is the least squares solution of the equation X = CST with regards to C, providing that S is known. Equation 5 is the least squares solution of the same equation providing C is known. Of course, neither C nor S is known prior to resolution. By iterating between eqs 4 and 5, starting from an initial estimate of S, one approaches an acceptable solution. Karjalainen suggested using random numbers for starting estimates. While this may be the best starting point in a strict statistical sense, in as much as it provides an unbiased solution, other approaches have surfaced. Among these are needle vectors (18) and key profiles (19). If external estimates of the pure spectra of any of the components are available (for example through selective regions), these estimates should of course be used. Once selected, eq 4 provides the concentration profiles corresponding to the starting estimate of S. Because the spectral profiles are just starting estimates (and may be poor estimates) of the true, underlying spectra, the concentration profiles provided by eq 4 are (usually) far from correct. This is easily seen, as a visual examination reveals concentration profiles containing regions of unlikely features. Such features include negative parts, multi-modal profiles (for chromatographic data), and so forth. For reaction monitoring data, reactants can be assumed to decrease monotonously. Likewise, products ought to increase their concentration in a monotonous way. The unlikely features are removed from the profiles. As a general rule, the solution improves as the number of constraints increases. After the concentration profiles have been adjusted according to the constraints, eq 5 is used to calculate the spectral profiles corresponding to the adjusted concentration profiles. Again, the spectral profiles are adjusted according to the constraints. This cycle continues until convergence, which means that the calculated profiles satisfy all constraints. ALS is far from the only available method for curve resolution. It is possible to divide the plethora of methods into two main categories—iterative and direct methods. The iterative methods, of which ALS is one, are usually easier to use. Furthermore, one arrives at a solution faster than is the case with the direct methods. As the term implies, the solu-
1196
Journal of Chemical Education
•
tion estimates are iteratively improved during the analysis until one hopefully arrives at an acceptable solution. As mentioned above, the employment of constraints that adjust the intermediate profiles so that they appear in accordance with our knowledge of the shape of a proper profile is crucial to the success of these methods. Another well-known iterative method is iterative target transformation analysis (20, 21). It differs from ALS in that it iteratively refines only the chromatographic profiles and avoids estimating the spectral profiles until the chromatographic profiles are acceptable. In cases where the spectra of the analytes are similar, this may be a better approach. For further information on iterative methods, the interested reader is referred to the literature. A problem with the iterative methods is that the use of the common constraints of unimodality and non-negativity is not enough to ensure that correct solution is found. There exist an infinite number of sets of profiles, all fitting the data equally well, that fulfill these constraints. This is referred to as the fundamental uncertainty (22). The band of solutions can be narrowed by employing more constraints, but a unique solution is often difficult to obtain through the use of iterative methods alone. The other main group of methods is the direct methods. As opposed to the iterative methods, these focus on local parts of the data, utilizing selectivity, local rank information, and so forth to set up a set of equations. Solution of these equations leads to a unique and correct set of profiles— provided that the local rank information is correct. The ETA procedure explained earlier is often used as a tool to obtain the rank maps necessary for successful resolution. Herein is the problem with this group of methods. Obtaining this information with a sufficient degree of quality is far from trivial, and the task is often time consuming. User expertise plays an important role when applying these methods. Examples of such methods are heuristic evolving latent projections (4, 14), evolving factor analysis (7, 8), and sub window factor analysis (23). It is beyond the scope of this article to present a thorough overview of the field. The interested reader may consult a recent review article by Jiang et al. (24). The best approach to curve resolution is usually to master a range of techniques, as no method is able to solve all problems. Often, the best result is obtained by combining several methods. For example, a direct method based on rank mapping only produces the correct results if the underlying rank map is correct. Still, some or all of the resolved profiles from such a resolution may be used as a starting point for an iterative method. Recently, there has been some interest in developing fully automated curve resolutions of huge data sets. An example of this approach can be found in the literature (25, 26). Experimental Pure 1-octanol and carbon tetrachloride (Merck p.a.) were stored on molecular sieves (3–4 Å zeolite) to remove trace water. A series of 23 solutions was prepared from the pure liquids, and the spectra were recorded in three different cells. For the first 9 spectra, representing the concentration range 0.0016–0.014 M, a 10 mm Infrasil quartz cell was used. For the next 5 spectra, representing the concentration range
Vol. 84 No. 7 July 2007
•
www.JCE.DivCHED.org
Research: Science and Education
0.016–0.050 M, a cell with CaF2 windows and a 2 mm spacer was used. For the last 9 spectra, representing the concentration range 0.063–0.197 M, a cell with CaF2 windows and a 0.5 mm spacer was used. A Nicolet Magna-IR 860 E.S.P. instrument, set to 32 scans兾spectrum at an optical resolution of 2 cm᎑1 and Happ– Genzel apodization was used. The experiments were carried out at ambient temperature (22 ± 1 ⬚C), and dried air was used as purge gas. In an attempt to reproduce the instrument purging and minimize temperature fluctuations, the cells were kept in the sample compartment for 4.5 min prior to scanning. After rationing the sample spectra against background spectra of dried air, residual water vapor bands in the spectra were removed by subtraction of a pure spectrum of water vapor. The amount of water vapor to subtract was found by trial and error using the built-in subtract function in the Omnic software (Thermo Nicolet Corporation). When the baseline and leading edge of the OH stretching band in the resulting spectrum appeared smooth and even, it was saved and used for further analysis. This procedure was repeated for all 23 spectra when necessary.
Figure 3. The 23 spectra of 1-octanol in the investigated concentration range.
Results and Discussion The theory section uses hyphenated chromatography data to explain curve resolution. The authors believe that this type of data is the best choice when explaining the techniques. Performing curve resolution within the context of a college or university course, however, can also be done using data from spectroscopy alone. Because stand-alone spectrometers are more common in educational institutions than hyphenated instrumentation, a dilution series measured using infrared spectroscopy is presented here. Infrared spectroscopy of 1-octanol in carbon tetrachloride was measured and the data were collected in a data matrix. Each row in the data matrix represents an absorption spectrum of the alcohol in the carbon tetrachloride solution. As one moves down through the matrix, the absorption spectra represent data from successively increasing concentration of the alcohol. All data analysis should start with an examination of the raw data available. The OH-stretching bands of 1-octanol at 23 various concentrations of the alcohol are shown in Figure 3. Three major absorption bands are seen in the spectra. At low alcohol concentrations only the free OH-stretching absorption band, near 3640 cm᎑1, is present. As the alcohol concentration increases, a new band appears at about 3500 cm᎑1 (2, 27). At even higher alcohol concentration, a second broad hydrogen-bonded OH-stretching band arises at about 3330 cm᎑1. It increases rapidly in intensity with increasing concentration of alcohol, reducing the 3500 cm᎑1 band to a shoulder (2, 27). Correct estimation of the chemical rank is important for the technical aspect of the curve resolution, but it also contains information about a fundamental property of the system. The number of uncorrelated OH stretching frequencies (i.e., the chemical rank) describes the number of alcohol species (monomer, dimer, oligomer, etc.) present. The starting point of any curve resolution is therefore chemical rank analysis. In an educational setting, the students must learn to em-
www.JCE.DivCHED.org
•
Figure 4. First four loadings in the wavenumber direction.
phasize the importance of a correct determination of the chemical rank. The first four loading vectors from a global PCA on all spectra are presented in Figure 4. The first three loadings are smooth curves. On the fourth loading vector, a high frequency signal is superimposed on the main features throughout the full OH-stretching region. High frequency signals are indicators of noise. The fourth loading vector explains 0.001% of the variation in the spectra, and at least 8 loading vectors are needed to get a noisy vector without any structure. The complex behavior of the system makes it difficult to determine the chemical rank from a global PCA on the spectra, and some experience is needed to estimate a reasonable rank based on this method. Apparently, the sudden increase in uncorrelated noise in the fourth loading vector
Vol. 84 No. 7 July 2007
•
Journal of Chemical Education
1197
Research: Science and Education
represents a limit for the rank determination. However, local rank analysis methods are normally needed to confirm the results obtained from global methods. A preliminary conclusion is that we have a three-component system, although a four-component system is a possibility. The result from the ETA method in the wavenumber direction is shown in Figure 5A. It is easily seen by comparing this plot to the plots shown in Figure 2 that the analysis is more complicated for data without a separation direction. First, in spectral data the analytes do not appear and disappear in the regular fashion characteristic of chromatographic data. Secondly, the spectra contain heteroscedastic and correlated noise. These features give rise to additional seemingly significant eigenvalues in the ETA plot. In particular, the first eigenvalue rises above the baseline level both at the high and low frequency regions. Even so, it is still possible to use the ETA plot to discover regions in which the spectra are dominated mainly by one component. However, validation of the ETA interpretations becomes crucial owing to these effects. A trial-and-error-procedure, where the complete resolution is carried out using various numbers of components and regions is usually necessary. Incorrect rank estimations are usually disclosed because the resolution process is unable to offer acceptable stable solutions. For the data set investigated here, several combinations of two to four candidate regions of different widths were tested. Resolution using four components gave no meaningful spectral results, thus indicating an overestimation of the chemical rank. By using only two components, the pure spectrum of the monomer could not be resolved. Thus, three components were assumed to be the proper choice. Figure 5A displays the three regions, labeled (a), (b) and (c), from where estimates of the concentration profiles were found. Estimates from these regions were used as starting points in the iterative ALS refining of the concentration profiles. In addition, the spectrum of the most diluted octanol solution was assumed to be that of the pure monomer, as only an insignificant degree of alcohol self-association was assumed to have taken place at this concentration. The monomer spectrum was therefore constrained to be similar to this measured spectrum in the resolution process. The first selective region, labeled (a), is observed at about 3650 cm᎑1. The second region, labeled (b), is observed at about 3540 cm᎑1. The last selective region, labeled (c), is found at about 3230 cm᎑1. In the area between the selective regions, two components seem to be present. All three selected regions are found as straight-line segments in the LPG displayed in Figure 5B. If we study the spectra, the ETA plot, and the LPG, we can see that the same component is certainly not present in all selective regions. This supports the indication that we have a three-component system. Thus, matrices with chemical rank equal to three form the basis for further quantitative analysis of the spectral data. From the three seemingly selective regions revealed by ETA, we generate estimates of the concentration profiles of the unassociated and associated species. This is shown in Figure 5C. The initial concentration profiles give the initial spectral profiles as shown in Figure 5D. To independently refine the profiles, eq 2 is solved for C and S using ALS. Only the constraint of not allowing negative parts in any of the profiles was applied. In this spectroscopic example, eq 2 is a sim1198
Journal of Chemical Education
•
Figure 5. Presentation of rank analysis and profile estimation. Eigenstructure tracking analysis in the wavenumber direction with a window size of four. The figure shows the logarithm of the eigenvalues. (A) Three selective regions are labeled (a) (around 3650 cm᎑1) , (b) (around 3540 cm᎑1), and (c) (around 3230 cm᎑1). The latent projective graph (B), in which these regions map as straight-line segments. From the selective regions estimates of three concentration profiles (C) are found, giving the estimates of the spectral profiles (D).
Figure 6. The resolved spectra for the three different alcohol species (The species are shown in Figure 7.)
plified version of the Beer–Lambert law (A = εcl ), where A is the absorbance at a single wavenumber, ε is the extinction coefficient, c is concentration, and l is the cell thickness. The rewriting is done by combining the unknown ε and the known l into the new parameter S and replacing A at all wavelengths with X. Thus, the quantitative information in C will be based on information from all wavelengths, not just a single wavelength as in classical quantitative spectroscopic analysis. The use of all wavelengths in the calculation of the concentrations can be regarded as using hundreds of univariate replicates. The curve resolution algorithm converged after only four rounds of iterations (the residuals E defined by E = X − CST reached a minimum after this number of iterations). The resolved spectra for all three components are shown in Figure 6. The first component represents the OH stretching frequencies in free alcohol monomers, and thus, displays the sharp band at 3638 cm᎑1. This is due to the stretching
Vol. 84 No. 7 July 2007
•
www.JCE.DivCHED.org
Research: Science and Education
vibration of the bond labeled a in Figure 7. The second component displays two bands. The broad and highly asymmetric at about 3500 cm᎑1 is due to the bonds labeled b and c in Figure 7. The smaller one at 3623 cm᎑1 stems from the stretching vibration of bond d. This component represents open-chain aggregates. In an open chain aggregate, the OHstretching frequencies can be divided into two major groups. The free-end molecule (bond d) contributes to the absorption in a high frequency region near the absorption originating from free alcohol monomers (bond a). The hydrogen-bonded molecules absorb at lower frequencies. In the molecules inside the oligomeric chain (bond c) both the hydrogen and oxygen atoms are hydrogen-bonded. Thus, the absorption band at 3623 cm᎑1 originates from the free-end molecule in an open chain aggregate while the broad band at 3500 cm᎑1 represents the rest of the molecules in that aggregate. The asymmetry and the broadening of the band may be explained by the various frequencies originated from molecules at positions near the bound-end (bond b) or inside the aggregate. We are now left with the broad and more or less symmetric absorption band at about 3300 cm᎑1. This band originates from hydroxyl groups inside the cyclic structures, and corresponds to the bonds labeled e in Figure 7. (2, 27). The resolved concentration profiles give the relative increase in the species concentration and not the real molar concentrations. An association model must be adopted to determine the real molar concentration profiles. Several association models have been described in the literature (1, 27– 32). The more rigorous models describing a successive association process require information about the size and the size distribution of the alcohol species to be usable. Owing to lack of this type of information and to make further progress, we adopt the single-parameter model as described by Nodland (2). This model describes the association between monomers and one type of n-mer and is used for the determination of the molar concentration profiles shown in Figure 8. Note that the molar concentration profiles of the different alcohol species are based on alcohol monomers and not on the species itself. Further details in calculating the degree of association and equilibrium constants can be found in the literature (1, 2, 27). Conclusion Curve resolution is a new, powerful tool for the analytical chemist. The multivariate nature of the methods enables qualitative analysis impossible with univariate methods, and it gives quantitative information of better accuracy and precision. The best way to learn these techniques is by combining theoretical lectures with practical work on real data obtained in a laboratory. Thus, both chromatography data, as described in the Theory section, and reaction monitoring data, as shown in the Results and Discussion section, may be suitable for practicing multivariate curve resolution treatment. A course on multicomponent resolution techniques has been taught for several years. The course consists of three major parts: (i) a series of traditional lectures presents the theory; (ii) a set of computer exercises on carefully selected data introduces the students to the practical aspects of curve www.JCE.DivCHED.org
•
Figure 7. Illustration of the alcohol structures.
Figure 8. The predicted molar concentration profiles of the monomers, the open-chain oligomers, and the cyclic oligomers.
resolution; and (iii) two group exercises where the students employ the methods on real data sets and submit written reports. Scientific articles are used as the written curriculum, but a successful course is heavily dependent on proper software being available. The software package Xtricator (Pattern Recognition Systems AS, Bergen, Norway) is used by all students attending the course. A demo version of this software can be downloaded from the Web (33). Students (and teachers) with a minimum of programming skills can easily program several curve resolution methods themselves; for example, in the MATLAB (The MathWorks, Natick, MA) environment. In our experience, trying to program a method described in an article (or taught in the lecture room) is the best test of whether one truly has understood a method. To encourage such fundamental understanding, the course includes one day dedicated to introducing the students to MATLAB. Students are also encouraged to solve the mandatory exercises using self-written code (that is to be submitted and evaluated). Free curve resolution MATLAB code is available on the Internet. As an example, the Group of Solution Equilibria
Vol. 84 No. 7 July 2007
•
Journal of Chemical Education
1199
Research: Science and Education
and Chemometrics at the Analytical Chemistry Department at the University of Barcelona has published curve resolution software (34). Usage of this code requires a valid MATLAB license. To further aid the interested reader we have provided in the Supplemental MaterialW three variants of MATLAB code performing alternating regression. The variants differ in the way the starting estimates of the profiles are found. The interested reader may of course modify and use the files freely. W
Supplemental Material
Three variants of MATLAB code performing alternating regression are available in this issue of JCE Online. Literature Cited 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Frolich, H. J. Chem. Educ. 1993, 70, A3–A6. Nodland, E. Appl. Spec. 2000, 54, 1339–1349. Hirschfeld, T. Anal. Chem. 1980, 52, A297. Liang, Y. Z.; Kvalheim, O. M.; Keller, H. R.; Massart, D. L.; Kiechle, P.; Erni, F. Anal. Chem. 1992, 64, 946–953. Wold, S.; Esbensen, K.; Geladi, P. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. Lorber, A. Anal. Chem. 1986, 58, 1167– 1172. Maeder, M. Anal. Chem. 1987, 59, 527–530. Maeder, M.; Zuberbuehler, A. D. Anal. Chim. Acta 1986, 181, 287–291. Keller, H. R.; Massart, D. L. Anal. Chim. Acta 1991, 246, 279–290. Geladi, P.; Wold, S. Chemom. Intell. Lab. Syst. 1987, 2, 273–281 Toft, J.; Kvalheim, O. M. Chemom. Intell. Lab. Syst. 1993, 19, 65–73. Gerritsen, M. J. P.; Faber, N. M.; van Rijn, M.; Vandeginste, B. G. M.; Kateman, G. Chemom. Intell. Lab. Syst. 1992, 12, 257–268.
1200
Journal of Chemical Education
•
13. Keller, H. R.; Massart, D. L.; Liang, Y. Z.; Kvalheim, O. M. Anal. Chim. Acta 1992, 267, 63–71. 14. Liang, Y. Z.; Kvalheim O. M. Anal. Chem. 1992, 64, 936–946. 15. Lawton, W. H.; Sylvestre, E. A. Technometrics 1971, 13, 617– 633. 16. Karjalainen, E. J. Chemom. Intell. Lab. Syst. 1989, 7, 31–38. 17. Tauler, R.; Casassas, E. Chemom. Intell. Lab. Syst. 1992, 14, 305–317. 18. de Juan, A.; van den Bogaert, B.; Cuesta Sánchez, F.; Massart, D. L. Chemom. Intell. Lab. Syst. 1996, 33, 133–145. 19. Grande, B. V.; Manne, R. Chemom. Intell. Lab. Syst. 2000, 50, 19–33. 20. Gemperline, P. J. Chem. Inf. Comput. Sci. 1984, 24, 206–212. 21. Vandeginste, B. G. M.; Derks, W.; Kateman, G. Anal. Chim. Acta 1985, 173, 253–264 22. Kruskal, J. B. Proc. Symp. Appl. Math. 1983, 28, 75–104. 23. Manne, R.; Shen, H. L.; Liang, Y. Z. Chemom. Intell. Lab. Syst. 1999, 45, 171–176. 24. Jiang, J. H.; Liang, Y. Z.; Ozaki, Y. Chemom. Intell. Lab. Syst. 2004, 71, 1–12. 25. Shen, H. L.; Grung, B.; Kvalheim, O. M.; Eide, I. Anal. Chim. Acta 2001, 446, 313–328. 26. Eide, I.; Neverdal, G.; Thorvaldsen, B.; Shen, H. L.; Grung, B.; Kvalheim, O. M. Environ. Sci. Tech. 2001, 35, 2314–2318. 27. Førland, G. M.; Liang, Y. Z.; Kvalheim, O. M.; Høiland, H.; Chazy, A. J. Phys. Chem. 1997, 101, 6960–6969. 28. Gupta, R. B.; Brinkley, R. L. AIChE J. 1998, 44, 207–213. 29. Førland, G. M.; Libnau, F. O.; Kvalheim, O. M.; Høiland, H. Appl. Spec. 1996, 50, 1264–1272. 30. Brot, C. J. Mol. Struct. 1991, 250, 253–257. 31. Hoffman, T. Fluid Phase Equilibria 1990, 55, 271–292. 32. Tucker, E. E.; Farnham, S. B.; Christian, S. D. J. Phys. Chem. 1969, 73, 3820–3829. 33. Pattern Recognition Systems. http://www.prs.no (accessed Mar 2007). 34. Multivariate Curve Resolution. http://www.ub.es/gesq/mcr/ als2004.htm (access Mar 2007).
Vol. 84 No. 7 July 2007
•
www.JCE.DivCHED.org