1694
Anal. Chem. 1983, 55,1694-1703
(18) Miller, I.J.; Fellows, S.K. Nature (London) 1981, 289, 398. (17) Elder, T. J.; Soltes, E. J. WoodFiber 1980, 72,217. (18) Eager, R. L.; Mathes, J. F.; Pepper, J. M.; Zohdl, H. Can. J . Chem. 1981, 59,2191. (19) Menard, H.; Roy, C.; Gaboury, A.; Belanger, D.; Chauvette, G. Fourth Bioenergy R&D Seminar, Winnipeg, Canada (March 1982), p 331. (20) Turro, N. J.; Chung, C. J.; Lawler, R. G.; Smith, W. J., I11 Tetrahedron Lett. 1982, 23,3223. (21) Nlmz, H.; Mogharab, I.; Ludemann, H. Makromol. Chem. 1974, 775, 2583, 2577. (22) Snaoe. C. E.; Smith, C. A.; Bartie, K. D.; Matthews, R. S.Anal. Chem. 1982, 5 4 , 20. (23) Yergey, J. A.; Risby, T. H.; Lestz, S.S.Anal. Chem. 1982, 5 4 , 354. (24) Buchanan, M. V. Anal. Chem. 1982, 5 4 , 570. (25) Burke, P.; Jennings, K. R.; Morgan, R. P.; Gilchrist, C. A. Anal. Chem. 1082. 5 4 . 1304. (26) Fargher, R. G.;Pyman, F. L. J . Chem. SOC. 1919, 115, 217. (27) Stoochnoff, B. A.; Benoiton, N. L. Tetrahedron Left. 1973, 27, 21. (28) Majors, R. E. J . Chromatogr. Scl. 1980, 78, 503. (29) Kalinowski, H.; Kessler, H. Org. Magn. Reson. 1975, 7, 128. (30) Batcheior, J. G.; Cushiey, R. J.; Prestegard, J. H. J . Org. Chem. 1974, 39, 1698. (31) Munson, B. Anal. Chem. 1977, 49, 772A. (32) Polley, C. W.; Munson, B. Anal. Chem. 1981, 53,308. (33) "Atlas of I3C NMR Data"; Heyden: London, 1979.
(34) Newmark, R. A.; Hill, J. R. Org. Magn. Reson. 1980, 13,40. (35) Mukoyama, J.; Tanno, T.; Yokokawa, H.; Fleming, J. J . Polym. Sci., Polym. Chem. Ed. 1973, 7 1 , 3193. (36) Lai, A.; Monduzzl, M.; Saba, G. J . Chem. SOC., faraday Trans. 2 1981, 77, 227. (37) Smith, W. B.; Proulx, T. W. Org. Magn. Reson. 1978, 8 , 215, (38) Stothers, J. 8.; Tan, C. T. Can. J . 6hem. 1974, 52, 308. (39) Mody, N. V.; Bhattacharyya, .. J.; Miles, D. H.Phytochemlstw 1974, 13, 1175 and 2027. (40) Haufmann, K. I n Arnold Weissberger's "Chemlstry of Heterocyclic Compounds"; Interscience: New York, 1953; Vol. 6, Part I, pp 39-A1 ... (41) Grlmmett, M. R. I n "Advances in Heterocyclic Chemistry"; Katritzky, A. R., Boulton, A. J., Eds.; Academlc Press: New York, 1970; Vol. 12, p 112.
__
RECEIVED for review March 14,1983. Accepted June 1, 1983. A Strategic Grant from NSERC, Canada, is gratefully acknowledged. The authors are also grateful to A. G. Harrison, Department of Chemistry, University of Toronto, for permission to record the CI spectra.
Nonlinear Multicomponent Analysis by Infrared Spectrophotometry Mark A. Maris and Chris W. Brown*
Department of Chemistry, University of Rhode Island, Kingston, Rhode Island 02881 Donald S. Lavery
Donald Lauery & Associates, Inc., 4225 Ruskin Street, Houston, Texas 77005
Matrix calculations are evaluated for least-squares regression analyses of multlcomponent spectrophotometrlc callbratlon data uslng absorbance as the Independent varlable. Transformations of the absorbance data are Investigated as means for modellng data exhibiting severe devlatlons from the Beer-Lambert law. Regresslon analyses have been performed on several sets of data from low-resolutlon vaporphase Infrared spectra of llght alkanes. Accuracles of the analyses were compared after varying the following regression parameters: number of standard samples, number and spacing of analytical frequencies and mathematical transformations applled to the raw absorbance matrix. I n general, the results showed that for these data a power series of the absorbance data glves slgnlflcant Improvements In accuracy over a slmple linear model. Addition of analytical frequencies Improves the results of the two-component analyses. Overdetermlnatlon of the regression by uslng addltlonal standard mlxtures adds greatly to the accuracy of most analyses. I f the correct equation Is found for fmlng the observed data, the optlmum number of standard mlxtures can be predlcted for any lndlvldual analysis.
Quantitative spectrophotometric analysis is based on the Beer-Lambert law. Chemical systems obeying this linear relationship between concentration and absorbance have been quantified accurately for many years. Problems in quantitative spectrometry arise when dealing with chemical data which show deviations from Beer's law. Fortunately, methods
now exist which, in many cases, can easily deal with such problems. The simplest application of the Beer-Lambert law is graphical. A calibration plot can be created for a single component by graphing chemical concentration vs. absorbance a t a single analytical frequency. For multiple components measured at several frequencies this technique rapidly becomes cumbersome, especially if interferences exist between components. For this reason multicomponent quantitative spectrometry was not truly practical until the advent of modern computer hardware. Most investigations have retained the traditional formulation of data with absorbance represented as a linear function of concentration. For many multicomponent systems, linear calibrations restrict the analyst to a narrow region of concentration of one or more of the chemical components. In addition, solving the problem using absorbance as the independent variable may offer definite advantages when dealing with data showing severe deviations from Beer's law.
THEORY In its simplest form, the Beer-Lambert law for one component is A = abc (1) where A is the absorbance, a the absorptivity, b the pathlength, and c the concentration. When the pathlength is constant, the terms a and b are usually combined to give a single proportionality constant. A modification of the Beer-Lambert law can be written as A = klc ko (2)
0003-2700/83/0355-1894$01.50/00 1983 American Chemlcal Society
+
ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER 1983
where kl is the combined proportionality constant (the slope) and ko is an intercept term. This equation can be used to fit linear data with a constant background absorbance or to approximate nonlinear data over a narrow concentration region of interest. When more than one component is present, the BeerLambert law becomes a series of simultaneous equations. Absorbances at each of the analytical frequencies are represented as the sum of the proportionate component concentrations. For two components, this can be written as
(3) A2 = h l c l + k22cz where A j is the absorbance at the ith frequency, ci is the concentration olf the j t h component and k , is the proportionality constant. This can be expressed in matrix notation as
+
1695
standard is necessary. This makes K an f by (n 1)matrix, and M will have dimensions of (n + 1) by f , where f 2 (n 1). This implies that one entire row of M is useless in determining the concentrations of the n components, but it must still be calculated. The second disadvantage is less obvious. In setting up an analysis, it may often be an advantage to overdetermine the problem in terms of the number of analytical frequencies and standards. The extra information gained, and the degrees of freedom added to the regression analysis combine to give increased predictive value to the fit of the data (3). Since eq 9 must be used to solve for M in this case, it is adding the equivalent of a second least-squares analysis to the procedure, increasing the complexity of the calculations and the possibility of round-off error by the computer (1). If absorbance is used as the independent variable, the problem is simplified. Here, the system is expressed as
+
C = PA where P is an n by f matrix. The straightforward solution for P by a least-squares multiple regression is
or
A = KC
(5)
While the matrix representation is unnecessary for a problem with only two equations, it greatly simplifies the processing of larger sets of data. More thorough discussions of matrix representations of spectroscopicdata may be found elsewhere (1-4).
In setting up a quantitative spectrometric analysis for a multicomponent system, absorbances at f specified frequencies are measured for a set of m standard samples. These standard samples may each contain one or more of the n components of interest. It then becomes a problem of relating the collected absorbance data to the known concentrations in the samples. Either absorbance or concentration may be used as the independent variable in seeking a solution to the problem. Concentration has generally been used as the independent variable (1-3) as in the Beer-Lambert law expressed in matrix notation
A = KC
(6)
Here, A is an f by m matrix containing the measured absorbance data for each frequency and standard mixture, C is an n by m matrix containing the known concentrations and K is an f by n matrix containing the proportionality constants. A simple least-squares solution for K is
(7) where Ct is the transpose of C. The K matrix cannot be used directly to determine analyte concentrations in an unknown sample. It must be converted to a form such as
C=MA (8) where M is an n by f matrix. For f = n,K is square and may be simply inverted to give M. The same is true for a “nonzero intercept” fit iff = n + 1and K is still square. Both of these problems are exactly determined, and have the correct amount of data to solve for K. In a case overdetermined with respect to analytical frequencies, i.e., f > n + 1,K is no longer square and a solution for M is now ‘M = (KtK)-lKt (9) Two major disadvantages can be seen in solving for M via this “K-matrix” method. To create a nonzero intercept fit, an extra row of 1’s is added to the C matrix, an absorbance at an additional frequency must be measured, and an extra
and the P matrix may be used directly, as in eq 10, for analysis of an unknown sample (1). There are a couple of inherent disadvantages to this approach, the first being that it is not intuitively obvious. The Beer-Lambert law as expressed in eq 6 is based on the easily visualized model of several components additively producing a resultant absorption spectrum. The terms in the K matrix may then be looked upon as the absorptivities per unit length of the components at the various frequencies. In calculating the P matrix, the physical analogy is much more difficult to comprehend. Here, the measured absorbances for each mixture are treated as a vector in f-dimensional space. The “direction” of the vector, defined by the relative absorbances, is indicative of the uniqueness of the spectrum and the relative chemical composition of the sample. The “length” of the vector is indicative of the sum of the concentrations of the various components. The second disadvantage deals once again with the overdetermination of the analysis with respect to the number of frequencies and standards. To solve for P, it is always necessary that we have m 1 f. So as f increases, the number of standard samples must also be increased. In setting up the standard samples for the P-matrix approach, mixtures of the various components are generally used (1). As the number of standards increases, the complexity of the sample preparation increases greatly. It then becomes a trade-off between the tedium of preparing extra sample mixtures and the accuracy which may be gained by overdetermination. There are several major advantages to using absorbance as the independent variable. The first is the solution for the P matrix requires fewer steps, giving a simpler algorithm. Also, in the performance of the regression analysis of eq 11, each row of the P matrix is determined independently of every other row. This implies that P coefficients for one component may be solved for, and used, without using the concentrations of any of the other components in the calculations. The other (n - 1) components in such a case would be included in the standard samples merely as likely interferences, but not themselves as analytes of interest. This approach cannot be used with the K-qatrix method. The most interesting advantage in the P-matrix technique is that it solves for the concentration of the components of interest as a function of some measurable quantity. In most spectroscopic analyses, these measured quantities are unmodified absorbance values; however, the algebra of the Pmatrix method allows for any mathematical transformation
1696
ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER 1983
to be made on the absorbance terms prior to solving for P. In such a case, the P matrix relates the analyte concentrations to these transformed absorbance terms. In the analysis of an unknown sample, the measured absorbances are also transformed in a consistent fashion, and the component concentrations may be determined as before. The only difference in setting up the analysis is that the number of standard mixtures must be greater than or equal to the number of transformed absorbance terms, t. If t > f , then the appropriate number of standards must be added to the system. If the transformation is chosen correctly, the resulting model can be used to fit data which exhibit severe deviations from linearity. Barnett and Bartoli (4) suggested that higher-order powers of absorbance terms might be incorporated into a model to give an equation, for a single component, such as f
Cik
r
= Pi0 + C CPij(Ajk)‘ j=lq=l
(12)
where c,k is the concentration of the ith component in the kth standard, Ajk is the absorbance at the j t h frequency in that standard, and r is the maximum specified power to which the data values will be raised. This results in a power series of the absorbance data and gives a distinctly curved calibration plot. Another possible model is one which incorporates cross terms, such as Cik
= Pi0 + P i l A l k -k PiZA2k
+
PiSAlkA2k
+ Pi4(AlkI2 + Pi6(AZk)2 (13)
This model has been suggested (4) to provide a correction for intermolecular interactions in the form of the ps3AlkA2kterm. ~ AZkare the absorbances measured a t freHere, A 1 and quencies 1 and 2, respectively, for the kth standard. The applicability of such “curvilinear” models to spectroscopic data via the P-matrix approach will be demonstrated here. Calibration data with severe deviations from linearity will be examined, and comparisons for various models will be made. EXPERIMENTAL SECTION Apparatus. Vapor-phase infrared spectra of hydrocarbon mixtures were measured on a Beckman Model 4260 infrared spectrophotometer and acquired and stored on disk on a Data General NOVA minicomputer. A Wilks 20-M. variable-path gas sample cell was used at a constant pathlength of 5.25 M. Samples were introduced into and flushed from the cell via a gas-handling system built in-house. This system is constructed of stainless steel and Teflon. Total volume of the sample system is approximately 5800 mL. Procedure. Spectra of standard mixtures and simulated unknown samples were measured in the 3000-cm-’ region, corresponding to the C-H stretching frequencies. Methane (U.H.P. grade, Matheson) and ethane and propane (C.P. grade, Matheson) were used. Samples were injected by gas-sampling syringes (Precision Sampling) into the gas handling system. Dry nitrogen was used as a dilutent in the cell, and all spectra were measured with samples at atmospheric pressure. To induce a consistent nonlinearity in the calibration curves, the spectra were recorded at a resolution of 25 cm-l. This resolution obliterated all fine structure in the spectra of the gases leaving only relatively broad peaks. Concentrations for each component ranged from parts-permillion to parts-per-thousand levels. In addition to the standard samples, sets of simulated “unknown” samples were prepared in the same manner to test the performance of the analyses. Unknown samples were prepared for single-component (ethane) and two-component (ethane/propane) systems. The unknown set for ethane consisted of ten samples in the calibration range and one sample outside the range. Two unknown sets were prepared for the two-component case. The set designated EPlOU was made up of 21 samples in the calibration regions of interest. The set EPllU consisted of six samples outside of the calibration region.
Table I. Models for Fitting Absorbance Dataa LN, “Linear Nonzero Intercept” c = Po + PIA1 i- P2A2
QN, “Quadratic Nonzero Intercept” c = Po + PIA, + P2A2 i- P 4 1 Z + P,A,*
+ P,AIA?
PNr, “Power Series, Nonzero Intercept, Order r” C=P,
+ PIA, + P ~ A+,
+P
~ - + ~pt-lAzr A ~ ~
LZ, QZ, PZr, These are “zero-intercept” versions of the three models shown above, omitting p o in each case. a
Only two frequencies used for clarity.
Concentration values used for standard mixtures and simulated “unknowns” are listed in Chart I. Computer Techniques. The quantitative analysis programs and procedures described here were developed on the Data General NOVA 3/12 minicomputer. All programs were written in extended FORTRAN IV. The program package consists of several program units stored on disk and “swapped” in and out of memory on demand. This arrangement saves memory space and allows each program unit to be highly versatile. The absorbance and concentration data for a sample, or a set of samples, are stored as a data “file” on disk. Once an analysis matrix is calculated, it may also be stored for later use. Absorbance data may be entered manually via the computer keyboard, from spectra previously acquired and stored on disk, or directly from the Beckman 4260 spectrophotometer “on the fly”. Concentration data are entered manually. Analyses may be calculated by using either the “K-matrix”or “P-matrix”conventions. In the P-matrix option, the absorbance values may be transformed prior to solving for P. The following transformations can be used (1)Linear. The absorbance matrix is unmodified. (2) Power series. Rows are added to the absorbance matrix, each element in a new row being a higher-order integer power of an element in an existing row. This gives an integer power series of the absorbance data. The order of the series is user definable. (3) Quadratic. Rows of the existing matrix are multiplied together in every possible nonredundant combination. This gives second-order cross-product absorbance terms, as well as squared terms. (4) “Nonzero intercept”. This adds a row of 1’s to the end of the absorbance matrix and may be applied to any of the above options as an additional transformation. Abbreviations have been applied to these models and these are listed in Table I. These abbreviations are primarily used in identification of computer files but will be used here for their convenience. The solution for the K or P matrix is a direct application of eq 7 or 11, respectively. The K matrix is converted to a usable form by direct use of eq 9. The matrix inversion algorithm is a standard Gauss-Jordan elimination technique, with a pivot value calculated to minimize found-off error. All matrix multiplications and inversions are performed in double-precision arithmetic,giving 17 significant decimal digits. The program can handle very large data sets due to its ability to store intermediate matrix calculations in disk “files”. Currently, up to 70 analytical frequencies or 70 transformed absorbance terms may be used. Moreover, up to 10 components can be analyzed at one time. The number of standard samples used to build an analysis is limited only by disk space, due to “update”-style algorithms in which data from only one sample are used at a given time. Analysis matrices were built for each of the following systems: (1)methane; (2) ethane; (3) propane; and (4) ethane/propane mixtures. Various combinations of the number and values of frequencies, number of standards, and types of transformations were applied to model the data. Analytical frequencies used to construct the analyses were varied to estimate the effect of these parameters on the quality of the results. Frequencies were chosen to fall near actual absorption maxima in some seta; in others, they were spaced evenly across the spectral region. The various frequency sets used to build the analyses are listed in Table 11.
ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER 1983
or Se. This statistic represents the degree of scatter of the true concentration values about the line (or plahe) calculated as the best least-squares fit of the data. The value of the standard error is minimized as the representation of thd data by the model is improved, and the confidence in the value rises as the number of standard samples is increased (5). In comparing regression results of a data set fitted by models with different numbers of terms, statistics should be adjusted for the differenoe in the number of degrees of freedom in the regressions (6). Statistics used for evaluation of results are summarized in Table 111.
Table 11. Frequency Sets Usea in Analyses set designation a
frequencies used in regression analyses, cm-l
1A 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B
2980 2980, 2940 2980,2975 3020, 2980, 2940 2980, 2975, 2945 3020, 2980, 2940, 2900 2980, 2975, 2945, 2940 i3020, 2980, 2940,2900, 2860 2980,2975,2945, 2940, 2880 3020, 2980, 2940,2900,2860,2820 2980,2975, 2945, 2940, 2880,2860
RE’SULTS AND DISCUSSION Determination of Single Components. Typical spectra of methane, ethane, and propane a t a spectral resolution of 25 cm-l are shown in Figure 1. The effects of band broadening by the low resolution are clear, and this results in maFked nonlinearity in the calibration data. Results for the fits of the standard data for these compounds by different models are given in Table IV. From the relative errors, it is evident that each compound exhibits a different degree of nonlinearity under conditions of low resolution. The effects of using more than one analytical frequency for the analysis are represented in Figure 2a, which illustrates the fit of two models to the standard samples for ethane. The
a Sets designated “A” use frequencies evealy spaced across the hydrocarbon region. Sets designated “B” are located approximately at peak maxima for the individual comDonents involved.
Statistical Analysh. Several statistics may be used to describe the accuracy and precision of a multiple linear least-squares regression. A statistic which has been recommended (5) for use in reporting leasbsquares data is the standard error of the estimate Table 111. Statistics Useu to Evaluate Least-Squares Results
equationa statistic
individual
total
n z (individual)
mean absolute deviation
i=1
m
n
standard error of estimate (Se)
n
m
f: .I:( C i j - &j)’ 1=1 1=1
n(m - 1) adjusted stan,dard error of estimate (Se,) coefficient of multiple determination (R’)
Substitute m - t for m - 1
m ,
I: ( C i - c)2
i=l
m I: (
q - 3 2
i=l
m-l m-1 1 - -(1 - R’) m - t (1- R2) m-t a Key: n is the number of components, m is the number of standard mixtures, t i s the number of coefficients/ component, c is the “true” concentration, 6‘ is the estimated concentration,F is the mean true concentration. adjusted coeFficien? of multiple determination (R’,)
1--
Table IV. Results of Regression Analyses Using Different Models (Single Components, Single Frequencies)
compound (sample set used) methane ( M l )
frequency, om 8023
ethane ( E l )
2980
propane (Pl)
2980
a See Table I for key to abbreviations of models. from true values.
1607
adjusted standard error of estimate, PPm 22.3 10.0 2.9 1.5 32.6 18.5 11.9
adjusted coefficient of determination,
modela d j b ppm R’a LZ 19.2 0.886 64 LN 7.4 0.977 24 PZ2 2.1 0.998 09 PN2 1.2 0.999 49 LZ 28.6 0.992 64 LN 14.4 0.997 63 PZ2 10.2 0.999 02 PN 2 4.2 6.8 0.999 77 LZ 6.8 7.6 0.997 45 LN 2.4 3.4 0.999 48 PZ2 3.5 4.2 0.999 21 PN2 1.7 2.4 0.999 75 d is the mean absolute deviation of calculated concentration values
1898
ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER I983
Chart E. Concentrations of Standard and Simulated Unknown Samples One Component Systems (Concentrations in ppm (vlv)) methane standard set M1
propane standard set P1
0.0 8.6 17.2 25.9 34.6 51.9 69.2 86.5 103.8 121.1 138.4 155.7 173.0 216.2
El 0.0 25.9 51.9 86.5 155.7 259.4 432.5 605.6 692.0 956.0 1038
0.0
13.0 21.6 30.3 43.3 51.9 77.9 173.0 259.4 346.0 432.5
ethane standard sample sets E3
E2 0.0
0.0
51.9 155.7 432.5 692.0 1038
259.4 432.5 692.0 956.0
ethane unknowns E6U
E4
E5
0.0 259.4 692.0 1038
0.0
0.0
432.5 1038
16.9 34.6 69.2 121.1 173.0 346.0 519.0 778.5 865.0 1211U
Two Component System-Ethane and Propane (Concentrations in ppm, given as [ethane]/ [propane] ) Standard Sample Sets EP 1
EP 2
o.o/o.o
0.0125.9 0.0143.3 0.0177.9 0.01173.0 0.01259.4 0.01346.0 0.01432.5 25.910.0 25.9113.0 25.9143.3 25.9160.5 25.917 7.9 86.510.0 86.511 3.0 86.512 5.9 86.5143.3 86.5177.9 155.710.0 155.7113.0 155.7125.9 155.7133.3 155.7160.5 155.7177.9 432.510.0 432.51 173.0 432.51259.4 432.51346.0 432.5143 2.5 692.01 0.0 692.01173.0 692.01259.4 692.01346.0 692.01432.5 103810.0 10381173.0 10381259.4 10381346.0 10381432.5
0.01 0.0
O.OlO.0
0.0125.9 0.0177.9 0.01173.0 0.01 25 9.4 0.01346.0 0.014 3 2.5 25.910.0 25.91 13.0 25.9143.3 25.9160.5 86.510.0 86.5125.9 86.5143.3 86.5177.9
0.01 25.9
155.710.0
155.7113.0 155.7143.3 155.7160.5 155.7177.9 432.510.0 432.51173.0 4 32.51259.4 432.51432.5 692.010.0 692.01173.0 692.0125 9.4 692.01 346.0 692.01432.5 1038/0.0 10381 173.0 103 81 25 9.4 10381346.0 103814 32.5 1
EP 3
0.0177.9
o.o/i 73. o
0.01 259.4 0.01346.0 0.01432.5 25.910.0 25.9113.0 25.9160.5 86.510.0 86.5125.9 86.5143.3 86.5177.9 155.710.0 155.7/13.0 155.1143.3 155.7177.9 432.510.0 43 2.51259.4 43 2.51432.5 692.010.0 692.01173.0 6 92.012 59.4 69 2.014 3 2.5 1038 l O . O 10381173.0 10381259.4 103814 32.5
EP 4
o.o/o.o 0.0125.9 0.01 7 7.9 0.01259.4 0.0134 6.0 0.0143 2.5 25.9113.0 25.9160.5 86.510.0 86.5125.9 86.5177.9 155.7113.0 155.7143.3 155.7177.9 432.510.0 432.51259.4 432,51432.5 692.010.0 692.01259.4 69 2.0143 2.5 103810.0 1 0 381173 .O 10381259.4 10381432.5
EP 5
o.o/o.o 0.0177.9 0.01 25 9.4 0.01346.0 0.01432.5 25.9160.5 86.510.0 86.5177.9 155.7113.0 155. 714 3.3 155.7177.9 43 2.510.0 4 32.51 259.4 432.51432.5 692.01 0.0 692.01432.5 103810.0 1038/173.0 10381432.5
ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER 1983
EP7
EP6
O.OlO.0
O.OlO.0 0.0/7 7.9
0.01346.0 0.01432.5 86.517 7.9 155.714 3.3 432.510.0 432.51259.4 692,01432.5 103810.0 10381432.5
0.01346.0 0.01432.5 25.9160.5 86.517 7.9 1 55.714 3.3 1 55.7177.9 432.510.0 432.51259.4 432.51432.5 692.01432.5 103810.0 10381432.5
Set EP1 OU iinknowns in calibrafion range 0.0113.0 173.0/346.0 0.0160.5 1 73.0143 2.5 0.01129.8 519.010.0 25.9125.9 519.01129.8 34.610.0 519.01259.4 86.5160.5 51.9.01389.3 121.1/0.0 865.010.0 173.010.0 86 5.01129.8 173.01173.0 865.01259.4 173.01259.4 865.01389.3
l_ll
a
I
EP8
EP 9
0.010.0 0.01346.0 0.01432.5 86.5177.9 43 2.510.0 692.0143 2.5 103810.0 10381432.5
0.01346.0 0.0143 2.5 86.51 77.9 432.510.0 692.01432.5 1038/0.0
1699
0.010.0
Set EPllU unknowns out of calibration range 0.0/519.0 12lllO.O 519.0/510.0 12111129.8 865.01519.0 12111389.3
Unknown out of cadibration ranges.
I _ _
I _
f reached a value of four or five, the second-order analysis blew up, whereas the linear analysis maintained a lower level of error. Apparently, the use of analytical frequencies at or below 2900 cm-l leads to larger errors in the second-order model. Since the errors are still small for the linear model, it appears thht these additional analytical frequencies have a linear dependence. The effect of varying the number of standard samples used to construct the analysis is very interesting for the singlecomponent case. Figure 3 shows the standard error as a function of the number of standards used for the LN and PN2 models. In plotting the adjusted standard error of the estimate, Sea, vs. m for a given system, there is a definite trend to the curve. This suggests that the shape of the curve should be predictable, i.e., the number of standards necessary to fit a given concentration range should be predictable. For n components, the total Sea of the analysis is defined as
(
c
(Cij
- C,)2
j+=1
Sea =
n(m - t )
)
1/2
(14)
as shown in Table 111. Squaring both sides of the equation and rearranging gives 3Lt00
3mtn
I
I
3000
2600
L”0
FREQUENCY CM-1
Figure 1. Absorbance spectra of single components: (A) propane, 173 ppm; (B) ethane, 432.5 ppm; (C) methane, 216.2 ppm. All display
spectra processed to flatten base lines. Methane spectrum expanded 5 X and smoothed for clarlty. models used are the linear, nonzero intercept (LN) model, and the second-order power series, nonzero intercept (PN2) model. For both models, two frequencies greatly improve the results, whereas additional frequencies do not add a great deal to the accuracy of the fit. Analysis of the ‘‘unknownn set of ethane samples was more informative as shown in Figure 2b. For two or three frequencies, the second-order model gave better results. When
n m
Se,2(nm - nt) =
C C(cij- Cij)2 j&=1
(15)
Dividing by n x m and rearranging the left side of the equation gives =D (16) Se,2(1- t / m ) = ( c where D is the mean squared deviance for the entire data set. Dividing by (I - ( t / m ) )and taking the square root give
D is minimized for any least-squares fit of the standard data; therefore, if the “correct” model is being used to fit the data, the value of D should not change radically with the addition
1700
ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER 1983
w
EuiI--
w
v)
k€J
E$6
$ 69
n w + w 3 n c,
=;
t0
I 1
I 2
I 3
I Y
I 5
a
I
I 6
7
‘
32
Lt M.
6 8 10 NUMBER OF STANDARD MIXTURES
12
Flgure 3. Standard error of estimate vs. number of standard samples
used for single component regression analyses. Data are for ethane using one frequency (2980 cm-’). Results of regression analyses are as follows: (0)LN model; (A)PN2 model. Values plotted for regression results are adjusted statistics (Se,). Results of analyses of simulated unknown sample set E6U are as follows: (0)LN model; (A) PN2 model.
0
2 1 3 Lt 5 6 F. NUMBER OF ANALYTICAL FREPUENCIES
7
Flgure 2. Standard error of estimate vs. number of analytical frequencies used for single component regresslon analyses. Data are from ethane analyses using standard set El: (0)LN model, (A)PN2 model; frequencies evenly spaced (-), frequencies at peak maxima (- - -); (a) results of regression analyses of standard data, (b) results of analyses of simulated unknown sample set E6U.
or deletion of standard samples, when building several analyses. So, for a constant D (meaning the correct model is being used), Sea will vary as Sea a
(Lm -)t ~ ’ ~
and Sea will asymptotically approach a minimum value of D1iZ as m is increased. In addition, the error found in the analysis of standard samples should approach that of the analysis of “unknown” samples as more standards are added, and the fit to the data is improved. Both of these behaviors are seen in the plot of Sea vs. m for the PN2 model in Figure 3. If the proper model is not used to fit the data, the behavior of the plot of Sea vs. m is more difficult to characterize. Whether the plot levels off will depend on the concentrations of the standards which are added to the system. Figure 4 illustrates examples of this with a hypothetical curvilinear one-component system fitted by a linear model. Figure 4a shows the “true” curvature of the system compared to the linear model. Figure 4b illustrates this system fitted by only a few standards. In this case, the Sea will be relatively high because none of the standard samples fall on the fitted line. Figure 4c is an example of the standard error dropping greatly when more standards are used as a result of the new standards falling near or on the fitted line. Figure 4d shows another case, in which the Sea decreases less drastically with increased number of standards, since the new samples fall rather far from the fitted line.
In Figure 3 (ethane a t 2980 cm-l), the adjusted standard error for the linear model appears to continually decrease as the number of standards is increased. The gentle slope of the curve at m = 11would seem to correspond to the case of Figure 4d. While the ‘data appear to be fitted more accurately according to the Sea at this point, the actual value of the analysis is not increased, as indicated by the results from the “unknown” set. If different standards were added, the error of the regression would probably fluctuate about the average value seen for the analyses of the unknown set. This value is the lower error limit for a linear model with this data set. For the “unknown” samples, this linear model does not seem to improve in its predictive accuracy as standards are added to the model, and it should not. As in Figure 4a, any unknowns for this system should fall on the curved line, but regardless of the number of standards used to model the data, the fit will remain linear. Unknowns falling in the middle of the range will never be as accurately determined as those falling where the linear fit crosses the “true” curve for the system. Overall, the average accuracy for unknown samples would be fairly constant for any single linear model. Determination of Two Components. For two components, mixtures of ethane and propane were used. From the individual spectra in Figure 1, the severe band overlap is obvious with the entire region blanketed by both components. Table V gives data from analyses built for this two-component system using varying numbers and values of analytical frequencies. The standard error of the analyses for the standard data follows a common trend. High standard errors are seen for any model using only two frequencies; errors drop to reasonably constant, low levels as three and more frequencies are used. Selection of frequencies is important. In all models, the best fit to the standards was obtained by using frequencies evenly spaced across the absorption band. This enhancement of accuracy by equal spacing of frequencies has been reported (3)with other multicomponent systems. It is possible for a multicomponent system that the spacing of frequencies in this
ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER 1983
1701
Table V Results of Analyses Using Different Numbers of Analytical Frequencies ethane and propane two components set EP1 39 standard samples frequencies evenly spaced (frequency sets “A”) statistic standard error of estimate (ppm)
Results of Analyses Using Different Numbers of Analytical Frequencies ethane and propane two components set EP1 39 standard samples peak maxima (frequency sets “B”) frequencies statistic standard error of estimate (ppm)
Results of Regression Analyses Adjusted Statistic, Se,, Used for Regression Results
Results of Regression Analyses Adjusted Statistic, Sea, Used for Regression Results
modelused LZ LN QZ QN PZ2 PN2
2
23.0 22!.2 17.5 16.5 19.2 17.4
number of frequencies used 3 4 5 14.1 9.3 4.6 3.2 4.8 3.5
11.6 5.4 2.5 2.2 3.4 2.8
5.6 4.9 1.8 1.8
3.0 2.8
number of frequencies used 6
modelused
5.1 5.1 1.5 1.6 2.9 2.9
FLesults of Unknown Analyses Unknown Set EIPlOU: 21 Unknowns in Calibration Range modelused
2
LZ LN QZ QN PZ2 PN2
1‘7.5 l!5.9 24.3 24.6 18.4 20.6
number of frequencies used 3 4 5 11.0 9.6 8.5 8.4 8.5 7.6
16.3 13.1 11.0 10.7 8.3 8.2
12.2 12.2 17.0 17.2 7.7 8.0
modelused
17.3 16.8
LZ LN QZ QN PZ2 PN2
16.8
16.7 12.1 12.3
3
150.8 150.6 132.5 131.0 143.1 139.6
19.3 18.8 11.9 11.8 15.4 15.5
4 14.7 13.3 6.1 5.2 11.8 11.2
5 15.9 14.5 4.9 4.0 9.1 9.0
6 5.7 4.9 1.6 1.6 2.2 2.7
Results of Unknown Analyses Unknown Set EPlOU: 21 Unknowns in Calibration Range
6
2 178.2 178.4 165.8 170.3 162.8 165.4
number of frequencies used 3 4 5 17.1 16.6 21.6 21.7 15.1 14.6
13.0 11.5 4.8 3.9 9.6 8.9
11.5 11.2 34.3 57.5 15.7 14.4
6 12.1 12.6 25.4 26.1 7.2 7.6
Unknown Set EP11U: 6 Unknowns Out of Calibration Range
Unknown Set EP11U: 6 Unknowns Out of Calibration Range
--
LZ LN QZ QN PZ2 PN2
2
number of frequencies used
number of frequencies used
modelused
2
3
4
5
6
modelused
2
LZ LN QZ QN PZ2 PN2
35.7 35.8 60.1 56.6 46.2 46.4
31.5 24.8 21.8 21.9 20.1 18.2
50.3 41.4 46.2 37.5 21.0 20.0
28.0 33.3 64.4 56.7 20.6 17.5
26.6 27.4 34.6 48.7 20.7 23.3
LZ LN QZ QN PZ2 PN2
398.3 431.8 2566.0 2210.0 399.3 466.6
way functions as a factor analysis of the data, enhancing the unique features of the component absorbance patterns. When these analyserJ are used to predict nunknown” concentrations, the analyses using evenly spaced frequencies all gave better overall resiults than those using peak maxima; however, these varied according to the model used. The power series models seem to be more consistent in modeling these data, and most give better results than the best quadratic fit. This suggests that higher order terms are definitely needed to model this system but that the placement and number of the frequencies used will drastically affect the ability of a particular curvilinear model to accurately represent the data. In addition, it seems that the cross-product terms in the quadratic models are of little use for this system. Since there should be little or no chemical interaction of the components, this observation seem13 reasonable. The effect of varying the number of standard mixtures used for a multicomponent analysis is similar to that seen with a single component. Plots of the standard error of the estimate vs. m are shown in Figure 5. The behavior of the LN model is easily explained. As m increased, most of the standards added emphasized the middle of the calibration region. This shifted the best fit toward those data a t the expense of the high concentration region. Therefore, as m increased, the error went down for standards and unknowns in the calibration range whereas the error rose for unknowns out of the range. The “wild” points at m = 7
3 52.6 41.4 419.8 458.1 39.0 35.1
4 17.8 15.8 1.5 2.0 12.7 11.8
5 72.8 88.7 267.9 504.0 82.5 71.2
6 28.4 28.2 67.5 71.5 19.3 17.5
reflect the sudden lack of a high range calibration mixture. The PN2 power series model gives very consistent results for unknowns both in and out of the calibration region. This would seem to indicate that this model fits the system reasonably well but is probably not the ”true” model of the system. If it were the correct model, the standard errors for the standard data and the unknowns in the calibration region should be very close to one another. This is not the case. The same comments may be made for the third-order PN3 analyses, although the overall results seem slightly better when enough standards are used. For systems exhibiting complex deviations from Beer’s law, it is difficult to determine the exact number of standard mixtures needed to adequately define the calibration surface. If the correct model is in use, this becomes equivalent to asking how many points are necessary to specify a function. Since the “true” model is not usually known, it is generally helpful to overdetermine the problem by adding extra standard mixtures. Margoshes and Rasberry (7) recommended, as a rule of thumb, that two standards should be included for each coefficient of the model in use. Thus, for a PN3 model with three frequencies, there are ten coefficients per component and 20 standards are recommended. The work of Margoshes and Rasberry was concerned with single components and single frequencies; nonetheless, extrapolation to larger systems should be valid. As is observed in Figure 5c, the PN3 model does indeed reach a constant error at m = 20 for the unknowns
ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER 1983
1702
E
7
4
/
a
e 0
I
I
50
100
I
I
I
I
150
200
250
300
CONCENTRATION
I 350 5
10
15
20
25
30
35
90
M, NUMBER OF SThNDhRD MIXTURES
'D. R-/b
f.e
w v e
"1
m
5
!b-
kE
e
E Q0
I
I
50
100
I
I
I
I
150
200
250
300
CONCENTRhTION
1 350
E
1 ;
8 1
I
10
5
I
I
I
I
I
I
15
20
25
30
35
Y0
M, NUMBER O f SThNDFIRD MIXTURES a
Qa
l
f
0
50
100
150
200
250
300
I
\
350
CONCENTRATION &in
:
I
I
I
I
I
I
I
Flgure 5. Standard error of estimate vs. number of standard samples used in regression analyses for two-component system. Data are for ethanelpropane sample sets designated EP1 to EP9 All analyses were calculated by uslng three frequencies (set 3A): (0)results of regression; (A)analyses of unknown set EPlOU; (0) analyses of unknown set EP11U. Results are plotted with (a) LN model, (b) PN2 model, and (c) PN3 model. Adjusted statistic (Sea)is plotted for results of regression. I
I
I
I
50
100
150
200
I
I
I
250
300
350
CONCENTRhTION
Flgure 4. Modeling a nonlinear system with a linear plot. Data are from an artificial sample set falling on a perfect second order curve: (a) "true" shape of callbration plot (---), best linear fit (-); (b) system fitted by four widely spaced data points; (c) eight data points, four added near fitted line; (d) eight data points, four added farther from fitted line.
in the calibration region. For more complex systems, it would seem reasonable that more standard mixtures might be needed to be certain of adequately covering the calibration surface for all components. Registry No. Propane, 74-98-6; ethane, 74-84-0; methane, 74-82-8.
LITERATURE CITED (1) Brown, C . W.; Lynch, P. F.; Obremskl, R. J.; Lavery, D. S. Anal. Chem. 1982, 5 4 , 1472-1479.
1703
Anal. Chem. 1083, 55, 1703-1707 (2) Demming, S. N.; Morgaln, S. L. Clin. Cbem. (Winston-Salem, N.C.) 1979, 25,840-855. (3) Sternberg, J. C.; Stillo, 6 . L.; Schwendeman, FI. H. Anal. Chem. 1960, 32,84-90. (4) Barnett, H. A.; Bartoli, A. Anal. Chem. 1980, 32, 1153-1156. (5) Davis, R. B.; Thompson, J. F.; Pardue, H. L. Clin. Chem. (Winston&/em, N.C.)1978, 24,(111-620.
(6) Wesolowsky, G. 0. “Multiple Regression and Analysis of Variance”; Wiley: New York, 1976; pp 43-45. (7) Margoshes, M.; Rasberry, S. D. Anal. Chem. 1969, 4 1 , 1163-1172.
for review January
317
lgg3* Accepted June 13,
1983.
Multiple Analytical Frequencies and Standards for the Least-Squares Spectrometric Analysis of Serum Lipids Harold J. Kisner’ a n d C h r i s W. Brown*
Department of Chemistry, University of Rhode Island, Kingston, Rhode Island 02881 George J. Kavarnos
Cyto Medical Laboratory, Inc., 12 Case Street, Norwich, Connecticut 06360
Thls paper descrlbes an appllcatlon of multlple least-squares regresslon analysis to the slmultaneous determlnatlon of the predominant serum Ilplds. Severe band overlap of the ester carbonyl absorbance peaks used for analysls Is offset by selectlng a large number of evenly spaced analytlcal wavelengths extending over the carbonyl absorptlon reglon. Callbratlon by standard mlxtures Incorporates molecular Interactions Into the analysis matrix. Over-determlnatlon wlth respect to the number of standards extends the workable concentration range of the Components wlthout loss of accuracy wlthln that range. Llnear curve flttlng models contalnlrrg dlfferent comblnatlons of zero-, flrst-, and second-order terms have been applied to the data. The optlmum least-squares model for all three components Is a llneer relatlonshlp In the form y = a bx. P matrlx notatlon for the Beer-Lambert law, In the form C = PA Po, was used. Analysls tlme, depending on frequency selectlon and the number of slqnals averaged, Is less than 5 mln.
+
+
Often a limiting factor in multicomponent quantitative spectrometry is the lack of mutually independent absorbance bands. Recently, we experienced this problem while investigating the simultaneous determination of triglycerides, phospholipids, and cholesteryl esters by infrared spectrometry based on the difference in their carbonyl ester absorbance peaks (1). Severe overlap of the carbonyl bands (Figure 1, ref 1)limited the effectiveness of the two-point calibration method used for analy!sis. Difficulties encountered when using strongly overlapping bands for quantitative analysis can be overcome by applying a least-squares curve fitting technique to absorbance data (2-4). Calibration by mixtures incorporates molecular interactions into the analysis matrix and, depending on the number of standards used, can extend the workable concentration range of the components. Selection of analytical wavelengths is not restricted by the number of components; rather, it is limited by the computer capacity for matrix inversion (4). Sternberg et al. (2) recommended using a large number of evenly spaced wavelengths,over the absorption region of interest. Sustek ( 5 ) suggested that a 3- to 4-foPd ‘Present address: Postdoctoral Resident in Clinical Chemistry, Department of Pathology, University of Maryland School of Medicine, Baltimore, MD 21201.
Table I. Matrix Representations of Beer-Lambert Law for Least-Squares Quantitative Analysis K matrix
Beer-Lambert lawa A = KC + K , calibration A = KC A c t = KCCt K = ACt(CCt)-’ unknowns
P matrix C = P A + P, C = PA CAt = PAAt P = CAt(AAt)-’
A = KC C=PA K~= AK~KC c = ( K ~ K ) -K‘ ~ A
a Zero-order terms are included in the appropriate matrix (K or P) for computations.
overdetermination in analytical positions offers optimum results, even in the case of overlapping absorption bands. In our previous study, the use of zero-order terms allowed for the linear aproximation of nonlinear data over a limited coneentration range. Second-order terms should be more appropriate for a least-squares regression equation representing nonlinear data, as suggested by Barnett and Bartoli (3). When deviation from linearity was due to intermolecular interactions, tFey introduced “product terms” that corrected for molecular association between two compounds. Recently, we ( 4 ) reported on matrix representations for spectroscopic multicomponent analyses, and these methods are summarized in Table I. Two methods of least-squares analysis are presented: traditional K-matrix notation where A is a function of C, and P-matrix notation, where C is a function of A . The use of P-matrix notation for the BeerLambert law facilitates matrix computations and is more compatible for the inclusion of zero or higher-order terms (8). (For a more detailed description of the advantages of the P-matrix approach, see ref 4 and 8.) We have examined the utilization of a multiple linear least-squares regression analysis of overdetermined absorbance data for the simultaneous determination of triglycerides, phospholipids, and cholesteryl esters. Results comparing wavelength selection and curve fitting using linear, power series, and quadratic models in P-matrix notation are discussed herein. EXPERIMENTAL SECTION Apparatus. Uncompensated infrared spectra were measured with the solution contained in a 13-mm cell, which was constructed in-house. The cell consisted of 25 X 25 X 4 mm AgCl windows
0003-2700/83/0355-1703$01.50/0 0 1983 American Chemical Society