Anal. Chem. 1993, 65, 2209-2222
2209
Fourier Analysis of Multicomponent Chromatograms. Application to Experimental Chromatograms Francesco Dondi,’ Adalberto Betti, Luisa Pasti: and Maria Chiara Pietrogrande Department of Chemistry, University of Ferrara, Via Luigi Borsari, 46, I-44100 Ferrara, Italy
Attila Felingert Department of Analytical Chemistry, University of Veszprbm, P.O. Box 158, H-8201 Veszprbm, Hungary
A complete procedure is presented for quantitative estimation of both the separation performance and the retention pattern in a n experimental multicomponent chromatogram obtained under programmed elution conditions, I t is shown that, under a conventional good experimental setup, assuringlimited peak widthvariation (*lo%) and low peak asymmetry, the following quantities can be determined: peak capacity, saturation factor, peak tailing factor, number of single components (SC), and parameters of the SC interdistance distribution. The procedure is based on the experimental autocovariance function (EACVF) and power spectrum (EPS) of the experimental chromatogram handled by numerical methods previously presented and validated. What the main features of a n experimental EACVF plot a r e and how and where to search for information related to the retention pattern and SC peak width value contained therein is also reported, with reference to two typical examples of multicomponent capillary gas chromatograms: a chamomile lypophilic extract and a naphtha sample. The EACVF plots are fitted to four different theoretical models of the SC interdistances-exponential, uniform, normal, and gamma-in order to obtain the best description of the retention pattern and a n evaluation of the SC peak width value. Moreover, the ordered structure of the chromatogram was identified and analyzed in the EACVF plot, allowing one to obtain a n additional, independent, estimate of the SC peak width. Similar fittings performed on the EPS plot made it possible to validate and confirm the EACVF analysis results and, in addition, to detect peak width variations and peak asymmetry effects i n the experimental multicomponent chromatograms. These last SC peak shape data were in very good agreement with the peak shape analysis performed with the EdgeworthCram6r series over well-separated peaks of linear hydrocarbons in a reference chromatogram obtained under the same conditions. INTRODUCTION The need for quantitative characterization of the separation extent achieved in a chromatogram of a multicomponent t Present address:
EniChem, P. le Donegani, 12,144100Ferrara, Italy. t Presentaddress: Department of Chemistry, University of Tennessee,
Knoxville, TN.
0003-2700/93/0365-2209$04.00/0
mixture was the topic of aseries of relativelyrecent All these works focused on the great importance of evaluating the number of compounds from the number of the separated bands and the degree of peak overlapping, which can seriously affect the identification and quantitation of selected compounds. The correct description of the retention pattern of the chromatogram, i.e., how the compounds occupy the available chromatographic space, also proved to be of fundamental imp~rtance.~ The matter has been successfully faced by the Fourier analysis approach since a multicomponent chromatogram can be represented as a sequence of random pulses.1 In practice the statistical properties of complex multicomponent chromatograms were represented in terms of their power spectrum (PS; see Glossary at the end) or, the equivalent, in terms of the autocovariance function (ACVF). From this description, various procedures were devised for determining the number of detectable single components (SCs), m , the common value of the peak width, u, and the type of retention pattern (Le.) distribution of the interdistances between subsequent SC peaks)-simply called the interdistance model, IM. Basically two hypotheses were made: (1) lack of correlation between peak position and peak height; (2) constancy of the SC density along the time axis. In most cases constancy of the SC peak width and Gaussian-type (G) SC peak shape were assumed. Moreover, the cases of nonconstant peak width (G type) and of exponentially modified Gaussian (EMG) peak shape were exploited.3 All the above-mentioned procedures were tested by using computer-generated chromatograms.24 In this paper, for the first time, these procedures are applied in practice to estimate the overall statistical attributes of a real multicomponent chromatogram (SC number greater than 50)) to estimate among other things, m , the retention pattern type (the IM type ahd its parameter values), the common value of the peak width and shape, and its eventual variability (1)Felinger, A.;Pasti, L,;Dondi, F. Anal. Chem. 1990, 62, 1846. (2)Felinger, A.;Pasti, L.; Reschighian,P.; Dondi, F. Anal. Chem. 1990, 62, 1854. (3)Felinger, A.;Pasti, L.; Dondi, F. Anal. Chem. 1991, 63, 2627. (4)Felinger, A.;Pasti, L.; Dondi, F. Anal. Chem. 1992, 64, 2164. (5)Davis, J. M.; Giddings, J. C. Anal. Chem. 1983, 55, 418. (6)Nagels, L.J.; Creten, W. L.; Vanpeperstraete, P. M. Anal. Chem. 1983, 55, 216. (7)Martin, M.; Guiochon, G. Anal. Chem. 1985,57, 289. (8) Martin, M.; Herman, D. P.; Guiochon, G. Anal. Chem. 1986, 58, 2200. (9)Creten, W.L.; Nagels, L. J. Anal. Chem. 1987,59, 822. (lo)El Fallah, M. Z.; Martin, M. Chromatographia 1987,24, 115. (11)Davis, J. M.; Giddings, J. C. J . Chromatogr. 1984,289, 277. (12)Davis, J. M.; Giddings, J. C. Anal. Chem. 1985, 57, 2168. (13)Davis, J. M.;Giddings, J. C. Anal. Chem. 1985, 57, 2178. (14)Dondi, F.; Kahie, Y.D.; Lodi, G.; Remelli, M.; Reschiglian, P.; Bighi, C. Anal. Chem. Acta 1986, 191, 261. (15)Coppi, S.;Betti, A.; Dondi, F. Anal. Chem. Acta 1988,212, 165. (16)Oros, F.J.; Davis, J. M. J . Chromatogr. 1991, 550, 135. (17)Davis, J. M. J . Chromatogr. 1988, 449, 41. (18)Delinger, S.L.; Davis, J. M. Anal. Chem. 1990, 62, 436. 0 1993 American Chemical Soclety
ANALYTICAL CHEMISTRY, VOL. 65, NO. 17, SEPTEMBER 1, 1993
2210
l
0 0
1
L
"
'
"
500
W
1000
1500t,s 2000
I
2500
Chromatogram of chamomile linearized and subjected to baseline correction: (a)total chromatogram; (b) 0-1000-s part. The numbers quoted under the time axis Identify the different portions for the EACVF computation. Figure 1.
,
.L
Y
(a)
6-
4.8 -
3.6 -
910 11 12 13 14 15 16 17 I 8
I. 0
500
lo00
1500
t/8
I
2000
(a) Chromatogram of naphtha linearized and subjected to baseline correction. (b) Plot of retention time, 4, vs carbon number; (1) data in the linearized chromatogram; (2) data in the original chromatogram. The numbers quoted under the time axis of the chromatogramidentify the differentportions for the EACVF computation. Flgure 2.
'1
3t
I
I
IC11
1/12
1
IC9
O O
IC10
500
]C13A
IC14
1
1500
Flgure 3. C9-Cl4 reference chromatogram,linearized and subjected to baseline correction. The enlarged details report the C9, C11, and C14 peak shapes.
range. These chromatograms were obtained by temperatureprogrammed GC analyses of chamomile flower extract (Figure 1) and of a naphtha sample (Figure 2). A chromatography expert would surely be able to classify the first as "completely disordered" and the second as partially "ordered" and thus having different IMs. Since all the above-mentioned numerical procedures give estimated values of the common SC peak width by which they can be validated, a reference chromatogram of the linear aliphatic hydrocarbon series C9(21.4(Figure 3) was worked out under the same experimental conditions, in order to have an independent estimate of the SC peak shape properties from well-separated peaks in different elution regions.
The other point considered here is related to the chromatogram time axis. Inspection of the original reference chromatogram or the naphtha chromatogram shows that peak interdistances between successive terms of the homologous linear hydrocarbon series (predominantpeaks) are found not to be constant but are steadily decreasing (see plot 2 in part b of Figure 2). If it is accepted that there should be a constant retention time increase for each CH2 addition, the time axis of these chromatograms must be accordinglymodified. When performing autocorrelation analysis, the "clock regulation" allows one to automatically single out correlations among peak positions which would otherwise be lost. Moreover, since the relationships between retention time increments (referred to the CH2 increment basis) and molecular fragments are well-known,the correlation study should be able to single out other molecular structure relationships among SCs of the analyzed mixture. The evaluation of all the above-mentioned statistical properties should give a much more detailed and precise description of a complex mixture than does the indistinct and vague term "fingerprint" usually attributed to a highresolution chromatogram. In particular, IM must also be characterized for a correct evaluation of other properties such as stand-alone probability (i.e., the probability that any component will be resolved), the need to further increase the separation power, and detection limits. All these properties have been described, but only under the specific exponential IM.5-10 The possibility of determining IM type should surely improve the reliability of m determination: a question faced in the past by other authors,ll-l* but always using procedures based on the only exponential IM type.
PROCEDURE Preliminary Step: Search for Optimum Chromatographic Conditions. The preliminary step of this procedure consists of setting up common experimental conditions for recording optimum chromatograms of the two multicomponent mixtures and the C9-Cl4 linear aliphatic hydrocarbon mixture (the reference chromatogram). The first optimum criterion to be met is that the maximum extent of separation be attained in the shortest analysis time possible. The extent of separation is7 Y =Plm (1) where p is the number of bands separated at a given value of resolution R,, defined for two SC peaks of Gaussian shape and equal height as5
R, = xJ4ug (2) where x o is the interdistance between two adjacent SC peaks and ugis their common peak standard deviation (under the hypothesis of G-type SC peak shape). However specific requirements are needed for the present type of studies. First, one needs to identify a uniformity of the SC peak density: X = m f X = 1fT (3) where X is the considered chromatogram time span and T is the mean interdistance between subsequent SCpeaks. Such a condition is imposed by the theoretical models at hand. This also means that the so-called saturation factors LY
= mfN,
(4)
over a given chromatographic span X must be constant. In eq 4, N , is the peak capacity, which for a resolution value R, = 0.5 is5 N, = Xf20,
(5)
ANALYTICAL CHEMISTRY, VOL. 65, NO. 17, SEPTEMBER 1, 1993
By combining eqs 3-5 one has CY
= 2ag/T
(6)
The second requirement regards the shape of the SC peak: it should be as close to Gaussian as possible and it should be as constant as possible over the whole span X. These conditions are achievedby optimum tuning of the temperature program with a conventional recording integrator. On this output, a preliminary evaluation of the statistical attributes of the chromatogram and of the extent of peak overlapping can be made by using the Davis-Giddings methods.13J5 In this way, an approximate evaluation of both N , and CY (eqs 4 and 5) in different portions of the chromatogram can be performed. If this preliminary inspection results in either too great an CY value (>0.5) or in significant peak tailing, improvement of the separation is mandatory since both the present and the Davis-Giddings methods are known to fai1.2-4J2J3 The AD conversion parameters-data acquisition frequency and range-are then set up as usual, in order to have enough points per peak (at least 15 points in the 6ag range for the Fourier analysis procedure, see details under Computation) and to keep the peak on scale at maximum height. The chromatogram can thus be run to make a digitized version. Moreover, the baseline drift must be subtracted and this requires that a certain number of returns to the baseline be present in the considered chromatogram span. This simple baseline correction procedure is relatively easy to be performed in chromatograms which are not too crowded. It was also verified that this correction effectively subtracted only the typical column bleeding effect, and with respect to this subtraction, the eventual continuous part of the mixture chromatogram was not significant. Figures 1-3 report the digitized chromatogram plots, corrected for baseline drift, which have undergone the same “clock regulation” in order to have constant ”CH2” retention time increments. The original and the “regulated” chromatograms will be designated nonlinearized (NL)and linearized (L) chromatograms, respectively. The optimum experimental conditions here followed were the same as previously set up for similar chamomile GC ana1y~is.l~ Step One: SC Peak Shape Analysis on the Reference Chromatogram. In essence,this step provides a quantitative evaluation of the peak shape properties in the different parts of the chromatogram. The independent estimate of the SC peak shape properties is also necessary to validate the results of the model fittings (third and fourth step). Since the overall procedure will allow one to make different types of estimates of the SC peak width, a, in the Appendix they are fully discussed. Peak shape analysis is performed by applying the Edgeworth-Cram& (EC) nonlinear least-squares fitting procedurelg to the well-separated hydrocarbon peaks in the reference chromatogram of Figure 3. In this way an independent evaluation is obtained of SC peak shape parameters nand S,respectively,the standard deviation and the skewness, from which an estimation of CTG(the Gaussian component of a) and of r (the time constant of EMG-type peak shape) is obtained:20 = a(s/2)1/3
(7)
= a2 - T2
(8)
aG2
From 7 and UG, the peak tailing factor, T / O G , is obtained. Formulas and details of the EC series fitting procedure and (19) Dondi, F.; Betti, A.; Blo, G.; Bighi, C. Anal. Chem. 1981,53,496. (20) Dondi, F.; Pulidori, F. J. Chromatogr. 1984,284,293. (21) Dondi, F.; Remelli, M. J. Chromatogr. 1984, 315, 67.
2211
expression can be found in several references19~22and are not reported here. The subsequent step of the procedure consists of computing the experimental ACVF (EACVF), the experimental autocorrelation function (EACF), and the experimental power spectrum (EPS) from the digitized chromatogram. Step Two: Computation of Peak Dispersion Ratios; EACVF, EACF, and EPS. All the quantities discussed in this section are directly computed from the digitized chromatogram (Yj, j = 1, N p ) . First, the total area, AT, is determined over a given span X. Then, the peak maxima ( h ~ , i ,= 1,p ~and ) the band areas (ab,i,i = 1,p b ) above given threshold levels are detected. See under Computation how p~ andpb are determined. This data set is collected by using a convenient algorithm which must be able to filter the intrinsic noise (see refs 2-4). The criterion followed was to discard all the detected peaks lower than 1% of the highest peak. From the set of peak maxima and the set of band areas, aM and UM, ab and a b , their averages and the standard deviations, respectively, were computed and then the peak dispersion ratios aM/aM and Crb/ab were obtained. The EACVF is computed according to the well-known expression2~~~
. N.-i where P i s (10)
M multiplied by the sampling interval gives the maximum time span over which the ACVF is required. In practice, no more than half of the total chromatogram time span can be computed. Moreover,the most interesting part of the EACVF is restricted to the 0 < t < 20 a interval (see the next section). Instead of the EACVF, the EACF, which is the normalized EACVF, is often used P,(i)
= C&/C,(O)
(11)
The EPS is computed as M-1
F(w) = 2(C,(O)
+ 2CC,(i)W(i)
cos(wi)), 0 Iw I T
(12)
$=I
where w is the frequency. The function of the window w(i) in eq 12 is to filter out the random component in the EACVF function most commonly present at high i v a l u e ~ . ~See ? ~ *ref 2 on how to choose the w(i) function. Step Three: EACVF Shape Analysis. The EACVF (or its normalized quantity, EACF) is the straightest quantity obtained from the recorded multicomponent chromatogram (see eqs 9 and 11). This section will discuss how an EACVF plot can be read; i.e. what are its major features, and where and how the information contained within it can be singled out. In order to make the interpretation as easy as possible, the reader is invited to identify four typical features in the different parts of the EACVF plot. Their names are respectively “singlepeak part”, “IM part”, “deterministic part”, and “EACVF noise”. (22) Remelli, M.; Blo, G.; Dondi, F.; Vidal-Madjar, M. C.; Guiochon, G. Anal. Chem. 1989, 61, 1489. (23) Oliv6, J.; Grimalt, J. 0. Anal. Chem. Acta 1991, 249, 337. (24) Jenkins, G. M.; Watts, D. G. Spectral Analysis and Its Applications; Holden-Day: San Francisco, 1968.
2212
ANALYTICAL CHEMISTRY, VOL. 65, NO. 17, SEPTEMBER 1, 1993
Table 1. Some Theoretical Expressions of SC Peak Interdistance Distributions, fit),and Their Corresponding Theoretical Autocovariance Function TACVFs, C(t) f(t)
at)
Exponential
(0It )
+T
E,,, nn
Deterministic
-
First, one must consider that an EACVF has the same meaning as the well-known correlation coefficient of regression analysis, the only difference being that, instead of only one value, a set of values vs the time span (i quantity in eq 9) are considered." Ahigh EACVF (or EACF) value at point i means a strong correlation between chromatographicresponse pairs separated by the time span i (Yj,Yj+i). One can thus understand that for time spans lower than about 3u positive correlations are expected, since this is roughly half the width of the SC peak and in it chromatographic responses are related to one another. This part will be called the single peak part (0 < t < 3u) of the EACVF. Beyond this part, at time values up to about 20u, the EACVF exhibits distanct features of the IM type and the a value, as was previously described.4 This part is called the IM part (3u < t < 20a). All the relevant statistical attributes of a random multicomponent chromatogram-i.e., the SC peak width, the SC number, the (Y value, and the type and parameter of the IM-are hidden in the part of the EACVF plot constituting the sum of the single peak part and of the IM part. It must be observed that restricting the EACVF analysis to the 0 < t < 20u range means assuming that no significant correlations exist between SC peak positions which are more distant than 20u. If at fixed positions At, even greater than 20u, peaks repeat in the chromatogram-as they do in the reference and in the naphtha chromatograms(see Figures 2 and 3)-positive peaks will appear in the EACVF plot at time positions equal l A t , 2At, 3At, ...,nAt, where At is the fixed interdistance. Once identified in the EACVF, this pattern will be called the deterministic part, in order to distinguish it from the previous one, (the IM part), which essentially reflects in random short distance aspect of the peak location in the chromatogram. The deterministic part and the IM part represent a discontinuous Dirac type and a continuous interdistance distribution, respectively. All the above-mentioned features can be identified by a simple inspection of the EACVF plot. Two typical theoretical ACVF (TACVF)expressions, reported in Table I, can be used. They refer to two extreme cases of retention pattern: (a) the Poissonian type, which is the most general case of IM, i.e., of a chromatogram in which the frequency function of the interdistance between subsequent SC peaks is exponential (E) and the peak positions are the most disordered ones, and (b) the deterministic (D) case, the example of most ordered retention pattern where SC peaks appear at repeated position nT.l In both cases a G-type SC peak shape is assumed and the peak width is reported as up. The shape of the E-type TACVF is monotone and decreasingsince it is dominated by a negative exponential term. After t > 5ugthe TACVF goes to zero. The repeated pulse shapes in the D-type TACVF are of Gaussian type with standard deviation UTACF equal to u,(2)112. The two plots are reported in Figure 4a,b. The IM part and the deterministic part of an experimental EACVF plot are more or less obscured by noise components, as discussed e1sewhere.u Here it is only to be recalled that
LL
0.00 -
2 0.06
0.04
0I
20
40
60
00
100 1
t b
L
00 100 t k Flgure 4. TACVF plot computed for E-type (a) and D-type (b) retentlon patterns (see eqs 1 and 2, respectively, In Table I). 0
20
40
60
these noise componentsresult because the interval over which the EACVF is computed is finite (see eq 9). The EACVF is thus a "sampled" statisticalquantity affected by bias?*%here called EACVF noise. The first classification of a real multicomponent chromatogram as Poissonian (E) or deterministic (D) must thus take into account these effects. In practice two extreme conditionswill appear: in the limit case of Poissonian-like, E-type, IM the "EACVF noise" should appear as noise around zero, since zero is the expected pattern in the IM part (see Figure 4 and Table I). Any pulses present in it must be lower than the EACVF noise level. In the second (D) case to be significant, the repeated pulses must not only overcome the EACVF noise but must also have the expected width ug(2)l12at fixed interdistances. It is, thus, important to know something about the confidence interval of this EACVF noise. Unfortunately the point is not theoretically well defined (at least to our knowledge). The point has here been solved by numerical simulation (see Computation for the details on how the computation is done). The study of a repeated peak position in a deterministictype EACVF can also be made quantitative by applying EC series peak shape analysis to the D-type EACVF peaks. In this manner one should obtain an independent SC peak width, consistent with the one obtained from the SC peak shape analysis in the reference chromatogram (see step one of the Procedure), since the experimental analogue of UTACF, QEACF must be equal to u(2)112 (see the Appendix). Mixed conditions, that is a chromatogram having both periodically repeated and randomly distributed peaks, as is the case of naphtha (see Figure 21, constitute another
ANALYTICAL CHEMISTRY, VOL. 85, NO. 17, SEPTEMBER 1, 1993
2215
Table 11. Description of the Different Fitting Methods in the PS-ACVF Analysis of Multicomponent Chromatograms tested fitting exp fittad hypotheses interdistance theoretical param obtained goodness validation method function on SC peaks models fitting function from the best fitting of the fitting criteria conditions 1 2
EACVF,eq9 G,constant EACVF, eq 9 G, constant
3 4
EPS,eq 12 EPS,eq12
D
E/U/N/I'
EC series m,WCF,S FFT or TPSso best fitting IM m, Qgt
EMG,constant E G,nonconstant E
TPS, eq 2c TPS, eq l l c
[OIM
m,UG, 7, u m,ul, u2
or PI
sc u z mcp(2)'/2
s2 (eq 8) value; SC u
EC series pattern
b
SCuand7 method Id SC u variation u1 and u2 c
Table 11, ref 4. * Reference 4. Reference 3. Reference 2. important case. This case can be interpreted as a mixture of two mixtures, one ordered and one disordered, superimposed in the same chromatogram. By numerical simulation it was observed that the separate features in the EACVF plots are in practice additive when they do not interfere with each other, i.e. when the separate EACVF lie in different zones on the time axis.26 Beside the two extreme retention patterns described above (E and D types), the following additional models were developed4 uniform (U), normal (N), and gamma (r),whose frequency functions are respectively uniform (U): f ( t ) = '/zT, 0 It I2T
(13)
normal (N):
gamma (r):
where in the IM of type N (eq 14) urn is the standard deviation of the interdistances between subsequent SC peaks; in the IM of type r, eq 15, p is the order, r(p) is the gamma function,m and T is the basic time constant of the r model, related to the mean value of the interdistance between subsequent peaks by
T = TP (16) These three additional models make it possible to represent a variety of interdistance distributions between subsequent SC peaks.4 In fact, in the N case both the mean T and the standard deviation urn can be independently changed, the only limit being that no negative interdistance values can be allowed. In practice, the relative interdistance dispersion RSD = uM/T (17) begins to be meaningless for values greater than 0.5. In the U model the onlyvariable parameter is T, with RSD = 1/(3)1/2. In the r model the T and the p parameters can be independently changed and the relative standard deviation is consequently variable (RSD = l / ( ~ ) l / ~For ) . all these IMs, the features of both TPS and TACVF plots have been extensively discussed.4 The EACVF shape analysis can be made more precise by numerical fitting methods, as discussed in the forthcoming section. Step Four: Model Fitting. By fitting either the EACVF, eq 9, or the EPS, eq 12, to a specific theoretical model, the hypothesis concerning the IM type can be more carefully (26) Dondi, F.; Felinger, A., unpublished results.
(26) Abramovitz, M.; Segun, I. A. Handbook ofMathematica2 Functions; Dover Publications: New York, 1965.
checked and the statistical attributes of the multicomponent chromatogram determined from the best fit. Since many theoretical models for the multicomponent chromatogram and related numerical approaches have been set up to date,2*4 there are several possibilities. To prevent the reader from getting lost among SC peak models, IM types, either constancy or variability of the peak width, and different methods of computing EPS or TACVF, the different procedures are fully described in Table 11. For example, the different models E, U, N, and l7 can be compared only under the hypothesis of G-type SC peaks. On the other hand, a more complex peak shape is checked only under the hypothesis of a Poissonian chromatogram (see method 3, Table 11). Finally the nonconstancy of the SC peak shape should be tested, but only under the hypothesis of Poissonian IM and G-type shape of the SC peak (see method 4, Table 11). The order by which the different hypotheses are checked is not immaterial, but it will be convenient to follow that of Table 11. Validation conditions are an important part of the method application (see Table 11). In fact, in addition to the experimentally computed quantities ATand EACVF (or EPS) over a given chromatogram span X,all the fitting methods require the SC peak dispersion ratios (uh/ah or uda, of the heights and of the areas, for methods 2 and 3 or for method 4, respectively). The latter quantities cannot be directly computed but are approximated by uM/aM and q,/ab, which are computed over the digitized chromatogram, as described under Computation. Because of this approximation, methods 2-4 are unbiased but only under well-specified conditions.2-4 The main point of method 2-the multiple choice method-lies in the acceptance or rejection of the four interdistance model hypotheses. This choice is made on different grounds: the first is based on the degree of fitting attained with the four different models (see Table 11),the second is by comparing the SC peak width obtained from fitting with those determined from separated single peaks in the reference chromatogram (see the Appendix). Moreover, the SC number, m,determined in different chromatogram spans X must be additive. It will thus be convenient to cut the total chromatogram into different parts and to process them separately (see in Figures 1-2 how the chamomile and the naphtha chromatograms were partitioned). The Meaning of the Determined SC Number. A final remark must be made on the meaning on the SC number obtained from methods 1-4. Since a threshold level (15% of the highest peak or of the greatest band area) was employed in the peak detection step in order to filter out the noise, the determined m will express how many of the single components of the mixture are present in an amount greater than 1% (w/w) with respect to mass contained in the highest peak (or in the greatest band area; see Computation on how peaks and bands are detected). It must, in fact, be observed that the analyzed components have prevailing hydrocarbon structure and their sensitivity to the employed detector-flame ionization detector-is mass proportional.
2214
ANALYTICAL CHEMISTRY, VOL. 65, NO. 17, SEPTEMBER 1, 1993
EXPERIMENTAL SECTION The capillary chromatograph used (HRGC 5160 Mega series; Carlo Erba, Milan, Italy) was equipped with a flame ionization detector (FID)and with a Mode14270Spectra Physics integrator. Helium was used as carrier gas. The general chromatographic conditions were as follows: injection port at 200 "C and FID temperature at 300 "C. The column was a fused-silicacapillary SE 30 0.15 pm,30 m X 0.32 mm (Carlo Erba). The extract of chamomile was obtained with a Likens Nickerson apparatusn by a countercurrentsteam distillationwith subsequent extraction of the vapor with n-pentane. The chromatographic conditions for the separation of chamomile were as follows: 50 "C for 5 min, 50-120 "C at 5 "C/min, 120-200 "c at 3 "C/min, 200-300 "C at 7 "C/min.ls The pentane extract was injected with a split ratio of 15. The naphtha samplewas implied, using the same injection mode with split ratio of 1:100 and the following experimental conditions: 50 "C for 5 min, 50-300 "C at 5 "C/min. The output signal of the detector was recorded by a Spectra Physics recorder, Model 4270, and digitizedat a 10-Hzfrequency by a 12-bit AID converter Acro 900 (Acrosystem Co.) connected to Olivetti M24 personal computer. COMPUTATION From the original 10-Hz frequency files of the digitized chromatograms other files of 5- and 2-Hz frequencies were obtained. The linearization procedure of the chromatograms was performed as follows: the ratio of the effective ACH2 increment in a given Cn-Cn+l span was computed with respect to the ACH2 incrementof the C9-C10 span. This gives the relative increment of the number of time points to be inserted at regulation positions in the same interval. The Y value was determined by interpolation of the two closest points. The peak maxima are determined as follows: five consecutive Y points are examined and a maximum was detected when both points 1-3 are increasing and points 3-5 are decreasing. The minima are detected in the same way. All maxima lower than 1% of the highest peak are discarded. The band areas correspond to chromatographic zones containing a peak maximum and are limited by valleys. All the band area values lower than 1% of the greatest one were discarded. Because of the different threshold level employed,a slight difference may exist between p~ and pb, respectively,the number of detected peaks and bands. The numerical procedure followed in the nonlinear leastsquares approximation by EC series is the one described in ref 19. The 5-Hz frequency data file was used and about 60 points/ peak were processed in the range of r6ag(=12o,). The best fitting degree attained in the EC series fitting was measured by 100 CV % = - ( c ( Y i - Y,,J2/(NP - k - 4))''' hM
(18)
I
computed in the peak range of *4up The maximum hM of the considered peak, Yi,,is the chromatographicresponseat the point i (=l, ...,Np)computed by the EC series of expansion order k. Npis the number of points. The numerical procedures of methods 2-4 are the same as describedin refs 2-4. The optimum nonlinear simplex procedure reported in ref 28 was employed. The minimized expressionwas the following:
z(
)
W' TACVF(i) - EACVF(i)
(19) = TACVF(0) with M = 64. In the EACVF processing, the 2-Hz data file was employed. This frequency corresponds to about 15 points in the 6ug range. No differences in the best fitting parameters were observed when the 5-Hz or the 10-Hz data files were processed. Mean and confidence intervals for EACF in Figure 9 were computed as follows: 200 repeated Poissonian chromatograms
'*
(27) Nickerson, G. B.; Linkens, J. J. Chrornatogr. 1966,21, 1. (28) Morgan, E.; Burton, K. W.; Graham,N. Chenom. Intell. Lab. Syst. 1990, 8, 97.
of 200 SC and adah = 1 and a given a value were generated according to the rules in ref 2. The EACVF was computed by using eq 9 over a circular chromatogram obtained by connecting the ending part to the beginning. In this way the EACVF values at each time span i were computed over the same number of point pairs, equal to Np;Le., the upper limit in the s u m of eq 9 was always Np,instead of Np- i (by this artifice, the unequal effect of the norming factor Npof eq 9 was avoided). For each time value, the EACF mean and standard deviation values were computed. From these the confidence interval at 95% of probability was calculated. It can be seen that the confidence intervals are nearly independent of t/uge,after the beginning portion of the EACF plot (see Figure 9).
RESULTS AND DISCUSSION Figures 1-3 report the three multicomponent chromatograms subjected to base line correction and "clock regulation". In the same figures the numbers quoted under the time axis identify the parts considered in the partial processing of the chromatogram. In all three chromatograms the first part (0-5 min) is missing since it refers to the isothermal condition of elution in which no peak width constancy is possible. Parts 1-4 of the chamomile (c) chromatogram-the most extended one which will be analyzed-appears to have plenty of randomly located peaks and spans the C9-Cl3 elution region of the reference chromatogram (see the enlarged detail in Figure 1). This part will be codedaa Lc14 or NLcl4 according whether the linearized (L) or the nonlinearized (NL) portion will be considered. The portion of the chamomile chromatogram beyond Lc14 (or NLcl4) was not analyzed since it contain a low number of very high peaks: to take them into account would have significantly depressed the total number peaks, overcoming the detection threshold level (1% of the highest peak). In Figure 2 the naphtha (n) chromatogram is reported. In it the preponderant peaks correspond to linear hydrocarbon sequence. In the enlarged detail b of Figure 2, the dependence of linear hydrocarbon retention times on carbon number before and after clock regulation is reported. The part of the naphtha chromatogram considered (parts 1-9, called Ln19 or NLnl9; see Figure 2a) corresponds to the linear hydrocarbon sequence up to C18 and it is more extended than both the reference chromatogram (Figure 3) and the parts 1-4 of the chamomile chromatogram (Lc14 or NLcl4; see Figure lb). However, even the part of the naphtha chromatogram which is not recovered by the reference chromatogram was obtained under the same programmed temperature elution condition (5 "C/min). Therefore, the peak shape analysis performed on isolated peaks of the reference chromatograms can be representative for both the naphtha and chamomile chromatograms. Single-Component Peak Shape Analysis on the Reference Chromatogram. Table IIIreports the results of peak shape analysis by the EC series fitting procedure and the numerical values of the peak parameters. Note that this is the first example of peak shape analysis by EC series under programmed temperature GC capillary conditions. It can be seen that the attained degree of fitting is, in general, very good (CV % 0.5). With the sole exception of the C14 peak, the whole "fitting directory" observed upon increasingly expanding the EC series order was in agreement with what has been reported's22 for correct behavior. The EC series fitting directory includes a set of features which are in practice observed when the EC series nonlinear fitting procedure is applied to a given experimental peak and include the following: stability of the peak parameters when the k order of the series is expanded; a fitting degree (CV %) and a k,, value compatible for a given skewness value. For the peak parameters to be accepted as unbiased, these fitting behaviors
=
ANALYTICAL CHEMISTRY, VOL. 65, NO. 17, SEPTEMBER 1, 1993
2215
Table 111. Peak Shape Analysis by EC Series of the m-Alkane Peaks in the Reference Chromatogram. nonlinearized
linearized
alkane
mP
U
S
c9 c10 c11 c12 C13 C14
450.0 689.1 909.4 1111.2 1308.0 1517.9
1.58 1.23 1.12 1.12 1.23 1.39
0.51 0.14 0.03 0.02 0.07 4.01
At(CHJ
0
.Cb
2
213 A15
1.28 f14%
1.18 f9.3 %
0.37 h0.39
-
mean
SD a Quantities
k2 3 4
4 3 2
cv %
mean
U
S
0.52 0.43 0.22 0.24 0.57 0.60
450.0 689.9 920.9 1153.2 1387.2 1619.9
1.58 1.26 1.24 1.29 1.41 1.48
0.51 0.16 0.04 0.02 0.07 -0.02
-
At(CH,)
0.31
234 h3.5
U
1.37 flO%
1.26 f9.5 %
cv %
k-
0.52 0.48 0.51 0.56 0.40 0.65
2 3 4 4 4 4 ;b
;/a0
0.40 f0.39
0.39
computed from u by using eqs 7 and 8. The units of %, At, u, and T are in seconds.
3
. ff
1.5
L
E
ne
1
Y 0
2 ci ui 0.5 Flgure 6. EACF plot of the chamomile chromatogram: (a) EACF of 0-800-s portion of the linearized(shifted upper curve L; displacement 0.5 unit) and nonlinearired(lower curve NL) chromatogram. (b) EACF of the 0-40-s portion of the linearized chromatogram (single peak part IM part).
C
+
8
9
10
11
12 13 14 15 Carbon Number
Flgure 5. Comparisonof SC peak parameter obtainedfrom separated peaks In the reference chromatogram [(H)a;( 0 ) T ] and from Dtype peaks of the EACVF plots (horizontal segments), vs carbon number 1-4 refer respectively to the first, second, Dtype peak of the reference EACVF plot and to the first, second, Dtype peak of the Ln17 chromatogram.
must satisfy certain practical rules reported elsewhere.20On these grounds only the C M 1 3 peak parameters can be accepted as unbiased. One can verify that the retention time increments in the homologousseries, At(CH,), calculatedfrom the peak means, mp,are not absolutely constant (Aj(CH2) = 234 f 3.5 s; see L case in Table 111). The observed variability is most likely the consequence of some differences in peak asymmetry in the different chromatographic regions (see the enlarged peak shapes in Figure 3) and of the fact that the clock regulation was obtained with reference to peak maxima. The C14 peak exhibits a fronting effect which is typical of the components eluted a t high temperature under programmed GC conditions (see detail in Figure 3). The S value is accordingly negative, revealing a nonlinearity of the elution condition. In this case, the EC series is known to be unable to correctly approximate chromatographic peak and consequently the peak parameter determination can prove highly inaccurate.21 The C14 data were reported in Table 111only for the sake of completeness. A more complete analysis of these points lies beyond the aims of the present treatment. Figure 5 reports the peak width, u (square symbols), and the decay constant T (ball symbols, computed under the hypothesisof an EMG shape by using eqs 5 and 6) as a function of the carbon number. It can be seen that the less retained peak (C9) exhibits the greatest T value, which is likely to reveal significant extracolumn band deformation, more
0 I
0
I
100 200 300 400 500 600 700 800
tis Flgure 7. EACF plot of the naphtha chromatogram: (a) EACF of 0-800-9 portion of the linearized(shifted upper curve L; displacement 0.5 unit) and nonllnearlzed(lower curve NL) chromatogram. (b) EACF of the 0-40-s portion of the linearizedchromatogram (single peak part I M part).
+
effective for the more volatile and less retained compounds (see the enlarged C9 peak shape detail in Figure 3). For the more retained peak (C14) a negative T value is computed due to its fronting shape ( S < 0). Obviously, such a negative value does not have physical meaning. The symmetrical aspect of intermediate peaks (see, for example, the C11 peak in Figure 3) with their low decay constant T values (see Figure 5) is most likely the result of these two opposite band deformation factors. In any case, one can appreciate the goodness of the Gaussian approximation (peak tailing factor ;/& = 0.4) and the limited variability of the peak width ( r f l O % ) along the chromatogram (see Table 111). EACVF Shape Analysis. Figures 6-8 report the EACF plots of the chamomile, naphtha, and reference chromato-
2216
ANALYTICAL CHEMISTRY, VOL. 65, NO. 17, SEPTEMBER 1, 1993
0
Ab
Y
I
(NL)~
W
N 0
N
?e
0
grams, respectively. Both the NL and L EACFs are reported for these three cases. The L-type EACF is shifted upward in order to make the comparison more effective. In the chamomile and naphtha plots (Figures 6 and 7) the enlarged detail shows the beginning of the L-type EACF plot. Since the SC peak width a is about 1.4 s (see Table 111), these enlarged details effectively showthe features of both the single peak part (0-30)and the IM part (3-20~). The chamomile EACF plot is the only one which is not apparently affected by the linearization procedure (see L and NL plots in Figure 6). In fact, inspection does not reveal any significant differences between them: the single peak part is present in both cases, followed by a more or less pronounced random part-but of similar pattern-at higher t values. These chamomileEACF plots look very similar to the TACVF of multicomponent chromatograms having an exponential IM (cf. Figures 6b and 4a). In fact, one can detect neither the D-type recursivity of Figure 4b nor other specific features such as an extended negative concavity or nonrandom oscillatory patterns typical of other models such as the N or the U ones, as extensively described e l s e ~ h e r e . ~ The oscillations observed beyond the single peak part of the chamomile EACF plot are clearly random in both position and amplitude (see the 100-400-9 range in the EACF plot in Figure 6a). In order to identify it as the EACVF noise, the amplitude extension was checked with reference to the 95% probability confidence intervals (Figure 9) of the EACF computed by simulation, as described under Computation. One can see that the correlation starts to become more uncertain for t > 3a, and that the noise level is significantly dependent on the a value. By comparison (cf. Figure 6 and Figure 9) one can verify that the noisy oscillations of the chamomile EACF plot almost always lie within the confidence region of a = 0.4,which is most likely the a value for this chromatogram (seethe following). They are thus interpreted as the EACVF noise. The conclusion is that no D-type structure can be recognized in the chamomile chromatogram. The fact that no deterministic pattern was significantly detected cannot however rule out the existence of other hidden structure intercorrelations among the mixture components. This failure may derive from the fact that these structure relationships are characteristic of minor peaks or that they were not sufficiently singled out by the clock regulation setup (only one of the many regulations possible). Surely column efficiency should be improved in order to attain stringent evidence. In fact, the confidence interval of the experimental autocovariance function noise decreases with a (see Figure 9) and thus with an increase in peak capacity N , (see eq 4). As for the most convenient IM for the chamomile chromatogram, this will be analyzed later on.
100
200
300
400
flug Figure 9. Mean EACF plot (plot lying over the f/ug axis) and its confidence intervals (95% of probability)for dlfferent a values (0.2, 0.4, 0.6, 0.8, and 1, respectively, in order of increasing wklth). Data obtained from 200 repeated simulated chromatograms,all having 200 SCs and a,,/ah= 1. The enlarged detail in the left-hand corner shows the behavior at low Nu, values. Table IV. Peak Shape Analysis by EC Series of the First Two Deterministic Peaks of the EACVF Computed over Different Selected Sets of the Reference and Naphtha Chromatograms first peak second peak 'JEACF/ cv mvF/ cv set (2)1/% S ,k % (2)W S ,k % reference C9-Cl4 naphtha Ln17 Ln19
1.33
0.13
3
0.24
1.34
0.16
2
0.26
1.36 1.64
0.14 -0.16
3 4
0.43 0.41
1.36 1.58
0.10 0.14
4
4
0.27 0.53
SC peak width, estimate u.
The naphtha and reference chromatogram EACF plots proved significantly affected by the linearization procedure (see Figures 7 and 8). In fact, a recursive pattern emerges in them, as a consequence of this transformation, with a simultaneous lowering of the noisy portion. Moreover, since the interdistance between the most pronounced peaks exactly matches that in the original chromatograms existing between = 234;see Table 1111,this the linear hydrocarbon peaks is to be referred to the CH2 group retention increment. EC series peak shape analysis was performed on these EACF ''CHz" peaks, according to what was discussed under Procedure and the results are reported in Table IV. In Figure 5 a comparison can be found of the different SC peak width estimates: from the separated peaks of the reference chromatogram (a);from the first two D-type EACF peaks of both the naphtha and the reference chromatogram. These last estimates are the UEACF/(2)1/2 values as discussed under Procedure and in the Appendix. One can see that agreement is in general very good. In the same Figure 5 a comparison is also made between the T estimates and the agreement is satisfactory. One can observe that this kind of comparison between peak parameters obtained from different sources-the true chromatographic peaks and the EACF D-type peaks-is rigorous only in the case of G-type SC peak shape. Nonetheless one must observe that the comparison made proves to be significantly validated by the close agreement found between the data of the reference chromatogram (agreement
(at
ANALYTICAL CHEMISTRY, VOL. 65, NO. 17, SEPTEMBER 1, 1993
Table V. Results of Method 2 Fitting of Chamomile Chromatograma peak detectn results of fitting Partb PM w/aM IM $2 m uE thirdparame(morp) Lc14
52
1.98
E U N
r
Lc12
19
1.65
E U N
r
Lc24
33
1.09
E U N
r
Lc23
20
1.07
E U N
r
Lc34
13
0.89
E U N
r
and
9.8X 1o-L 2.2 x 10-2 5.7 x 10-4 3.8 x lo-“ 1.8 X 10-9 6.6 X 10-9 1.1 x 10-9 1.0 x 10-9 7.6 X 10-9 0.13 2.3 X lo-“ 6.0 X 10-9 2.7 X lo-‘ 4.8X 1W2 2.5 X lo-“ 2.6 X 1o-L 1.2 x 10-2 7.6 X 1.2x 10-9 9.8 X lo-“
129 120 130 140 28 24 28 26 95 62 103 99 59 48 55 60 52 43 44 43
1.42 1.52 1.41 1.39 1.37 1.49 1.39 1.42 1.52 2.07 1.44 1.50 1.40 1.54 1.44 1.41 1.44 1.73 1.57 1.56
8.47 0.62 11.58 1.41
8.04 0.90 6.03 0.99 6.76 1.60
separatn params a No Y
IM params T
RSD
8.16 8.72 8.07 7.49 11.57 13.46 11.74 12.33 7.64 11.67 7.07 7.35 6.14 7.51 6.62 6.02 7.05 8.60 8.24 8.49
1.00 0.58 0.97 1.26 1.00 0.58 0.99 0.84 1.00 0.58 1.13 1.05 1.00 0.58 0.91 1.01 1.00 0.58 0.82 0.79
2217
0.37
377
0.37
0.23
112
0.73
0.41
251
0.32
0.43
128
0.36
0.37
116
0.30
,
The best fitting results are in italics. * See Figure 1 for the reference key; Lc,chamomile linearized Chromatogram.e Respectively, for N r IM.
between u values of the different hydrocarbon peaks and corresponding U)~ACF/(~)~/~ of the EACF plot computed from the reference chromatogram). A slight discrepancy is instead observed in the case of the data derived from the whole naphtha chromatogram (see the Ln19 case in Table IV), but in this case, no exact correspondence exists between the considered chromatographic spaces (cf. Figures 2 and 3). The deterministic character of the “CH2” EACF peaks is thus largely proved. Moreover the shape analysis of the deterministic EACF part proves to be a very precise approach for characterizing the efficiency features in a multicomponent structured chromatogram, such as the case of naphtha. In fact, since a mean evaluation of SC peak width is obtained (UE~CF/(~)’/~), an independent estimate of the peak capacity, N,, in a given chromatogram span can be computed (see eq 5). All the other minor peaks in the naphtha EACF plot (Figure 7b) are left out since they are both too low and unresolved. This part is thus the naphtha EACVF noise. Beside the deterministic pattern, a short-distance, random pattern is also identified. In the chromatogram this corresponds to the “forest” of peaks, one following the other; in the EACF plot this corresponds to the flat IM part (38 C t C 20u, u = 1.4 s) appearing after the single peak part (see Figure 7b), typical of the Piossonian retention pattern (cf. Figure 4a). The beginning portion of the naphtha EACF plot will be checked for the type of IM by fitting it to the four different IM models (E, U, N, I’). EACVF Fitting by Constant Peak Width Interdistance Models. The results of method 2-multiple choice modelare reported in Tables V and VI. Different sets of parameters are reported here: “peak detection”, “results of fitting”, “interdistanceparameters”, and “separationparameters”.The peak detection quantities are directly computed from the digitized chromatogram parts without any assumption. The results of fitting are obtained by applying the four models E, U, N, and I?. The interdistance parameters are the mean and the standard deviation of the interdistance between the subsequent single-componentpeaks, either directly obtained from the fitting (in the case of the N model) or computed according ref 4. The separation evaluation data, a,y, and N,, are obtained by using the basic eqs 1-6. They are reported only for the best fitting model, the results in italics. The latter data represent the final goal of the analysis, which is,
in fact, to assign retention pattern type and to evaluate the statistical density of the peaks over the chromatogram (a), the available separation power (Nc), and the obtained separation extent (7). Let us now consider the fitting results reported in Table V for chamomile. In the most extended part of the chromatogram (Lc14), 52 maxima and a peak dispersion ratio ( U d a M ) of about 2 were detected. The latter figure is significantly greater than 1-the expected udah for an exponential distribution of the SC peak heights2-and it is most likely determined by the predominant central peak. When more restricted parts of the chromatogram are considered (Lc24, Lc23, and Lc34; Table V), a U ~ Q value M close to 1is found. When the fitting performance of the different models is compared through their s2 values, the order is in general I’ > N > E >> U; the difference among the first three models being not as great as the difference between the last. Moreover these three IMs (r,N, E) give close m,u8, T, and UIM estimates. In particular the mean interdistance values between subsequent peaks ( T ) are nearly equal to their standard deviation (urn). Thisfiiding is not unexpected when the underlying model of the chromatogram is Poissonian, for which T = UIM.‘ The fact that other models (N and r)show comparable fitting performance and furnish comparable results for T and UIM is due to their flexibility, since they are functions of three parameters. One can also see that the third parameter of the r model (p) is near 1 (see Table V), the condition under which the r and E models coincide.‘ All these findings confirm that the interdistance model of the chamomile chromatogram is of E type. The agreement between the different peak width estimates-from the chamomileEACFs and from separated peaks of the reference chromatogram-is shown in Figure 10. It must be observed that the comparison is now made between ug and u obtained respectively from the best fitting model of the EACF plot and the reference chromatogram because they both estimate the same quantity, Le., the total standard deviation of the SC peak (see the Appendix). It must instead be observed that the us estimates obtained from the worse fitting model, the U model, are significantly different (see values close to 2 for Lc34 and Lc24, Table V). These findings prove the reliability of the present approach.
ANALYTICAL CHEMISTRY,VOL. 65, NO. 17, SEPTEMBER 1, 1993
2218
Table VI. Results of the Method 2 Fittinct - over Different Parts of the NaDhtha Chromatogram. results of fitting IM params peak detectn T RSD IM 52 m third parame (mor p) d a M P&b PM us 105
Ln19
1.51
E
272
1.47
8.29
1.00
5.8X 103 3.5X 103 5.3X 5.6X 5.5X1k2
232 259 250 165 133 134
1.64 1.51 1.52 1.17 1.31 1.28
3.46
9.72 8.70 9.02 4.32 5.38 5.31
2.7XlV
154
1.23
7.09
4.63
40 32 36
1.02 1.15 1.07
5.63
5.59 6.93 6.18
0.58 0.93 0.82 1.00 0.58 0.65 0.38 1.00 0.58 0.91
37
1.06
6.72
5.99
0.39
r
1.7X le2 1.3X
76 62 65
1.25 1.35 1.30
2.86
3.38 4.11 3.95
1.00 0.58 0.72
3.7X10-9
66
1.35
3.95
3.90
0.50
N
1.3 1.2 1.0
86 40 20
1.48 2.00 2.61
2.28
2.46 5.17 10.5
1.00 0.58 0.22
0.67
3.7 x 10-2 2.6 X 1k2 3.2X
65
1.49
2.80
3.23
0.60
112 93 96
1.43 1.58 1.54
4.85
6.04 7.28 7.06
1.00 0.58 0.69
2.9XlV
3.6X le2 4.3X 3.6X 1 t 2
102
1.51
4.60
6.62
0.47
268 218 225
1.33 1.46 1.41
4.19
5.17 6.33 6.16
1.00 0.58 0.68
1.2x 1 V
248
1.37
6.01
5.57
0.41
34 30
1.80 1.92
25.78 28.98
1.00 0.58
U N
r
65
Ln15
1.45
E U N
r
21
Ln12
1.11
E U N
r
Ln23
24
1.66
E U N
Ln45
18
0.48
E U
r
32
Ln58
1.64
E U N
r
97
Ln18
1.52
E U N
r
8
Ln89
1.47
20
1.66
N
3.8X 1 V
32
1.88
20.34
27.40
0.74
109
31 54 45 46
1.88 1.29 1.07 1.34
2.22
E U
5.7 X 3.5 x 4.3 x 3.8X
3.42
28.00 4.36 5.25 5.02
0.67 1.00 0.58 0.68
50
1.34
6.92
4.63
0.38
l . l X 10-'
1.2 x 10-1 1.1x 10-1
52 43 43
1.33 1.46 1.43
3.53
4.31 5.21 5.19
1.00 0.58 0.68
2.0 10-2
48
1.38
7.89
4.60
0.36
E U
1.6X 1.5X 1 t 2
48 41
1.47 1.59
5.33 6.25
1.00 0.58
N
1.4 X10-= 2.7X 1 t 2
41
1.57
4.18
6.24
0.67
46
1.48
1.71
5.62
0.76
r
17
1.60
E U N
r
Ln67
20
1.77
r
a The best
r IM. d
~ =.
O/S 1.5
-
1
2.1 x 10-2 1.2X
7.3 x 103 7.1 X 103
N Ln56
5.7 X 1p2 9.4X 6.5X le2
8.06 1.49
E U
r
Ln35
2.3X 1 V 4.0 X le2
10-2 10-2 1b2 4.5 X 1 V
separatn paramsd a
NC
Y
0.35
767
0.39
0.53
291
0.42
0.35
104
0.57
0.69
95
0.36
0.92
70
0.28
0.46
219
0.32
0.49
505
0.39
0.14
233
0.25
0.58
86
0.40
0.60
79
0.35
0.53
79
0.49
fitting results are in italics. See Figure 2 for the reference key; Ln, naphtha linearized chromatogram. Respectively, for N and 0.5.
with the exception of the very first part (Lc12). Note that in this case the a estimate is poor since the value of m in this part is low (26) and the error estimate of m is on the order of m1/2.2 Finally it must be observed that method 2 here applied was validated by simulation under comparable conditions.4 One can thus conclude that the EACVF method 2 is coherently able to define both the type of the retention pattern and the a,N,, and y parameters of the chamomile chromatogram. Let us now consider the result obtained by numerical handling of different parts of the naphtha chromatogram (see Table VI). If the whole chromatogram is considered (Ln19), the best IM is once more the exponential one (9= 2.3 X 103). The retention pattern of the whole naphtha chromatogram appears as the most disordered one, i.e., of the Poissonian type, as in the case of chamomile. This result is surprising since the two cases appear quite different. The other two types of IM exhibiting a comparable fitting degree (103)toward the same Lc19 total chromatogram, are r and N, as was observed and discussed for chamomile. When shorter chromatographicregions are instead handled (all cases of Table VI, except the Ln19 case), the E congruency disappears and the best fitting is generally obtained by gamma
ANALYTICAL CHEMISTRY, VOL. 65, NO. 17, SEPTEMBER 1, 1993
-.-. I LL
2219
I
0.06
0.03 0.02 0.01 0
\---3
--
1
] + ! W ,, ,I , I
’\-
8
9
/
’
I
10 11 12 13 14 15 16 17 18 19 Carbon Number Figure 12. Comparison between peak width estimates, u, obtained In dlfferentparts of the linearizednaphtha chromatogram (Ln) (reported as horizontal segments over the corresponding elution range of the linear hydrocarbon series) and the corresponding values (). obtained from the reference chromatogram (C9-C14). See Figure 2 for the reference keys of the chromatogram parts.
fundamental properties, that of the stationariness in detected peak density. Obviously this does not means that in a “true” Poissonian chromatogram the peak density should be absolutely constant along the chromatogram. In fact, the interdistance between subsequent SCs is always a random quantity and from this it follows that the expected number of components in a given span is a random quantity as well (m fm1/2 2). However, the mean interdistance values in the Ln15 and Ln89 spans are significantly different. The pseudoPoissonian condition of the chromatogram has the disadvantage that no hypothesis can be made as to local overlapping patterns. The same points in such chromatograms were also verified by Davis and Giddings (see chromatogram of Figure 4 in ref 13). Comparison between the ug values obtained in different portions of the naphtha chromatogram and the values of u obtained from the reference chromatogram is made in Figure 12. It can be seen that the agreement is not as good as that observed for chamomile (cf. Figures 11 and 12). The fact that no clear validation is attained here for the peak width estimate and that low fitting degrees are observed indicates that the separation efficiency must be improved for naphtha if a sharper description of its retention pattern is required. This is not a generic conclusion, but it is also supported by what was found by ~imulation.~ In fact, for the r interdistance model with a values around 0.5 and for SC numbers as low as 50, the overall parameter estimation in simulated chromatograms was very poor as is observed in short portions of the naphtha chromatogram. PS Fitting by Constant Peak Width Interdistance Model. The results so far obtained by fitting the experimental autocovariancefunction are now compared and validated with respect to those which can be obtained by using method 3. The latter differs from the former in the fact that the experimental power spectrum of the chromatogramcomputed according to eqs 9 and 12-is fitted to theoretical PS models (see Table 11). Method 3 is numerically different from method 2 since it requires the proper windowing selection (see eq 12) and it has been extensivelyvalidated by numerical simulation in the case of E-type IM, for the G- and the EMGtype SC peak shape and in the presence of noise.2 Application of this method can give an estimate of the decay constant 7 and, thus, additional information can be effectively obtained when the E-type pattern holds true. Since a moderate peak asymmetry has been effectively detected in the reference chromatogram (see Table I11 and Figure 5), it is interesting to check its relevance on the m estimate. It can be seen that the m and ug values obtained by applying the different methods are practically equal for both the chamomile and
2220
ANALYTICAL CHEMISTRY, VOL. 65, NO. 17, SEPTEMBER 1, 1993
Table VII. Comparison of Different Fitting Methods (Autocovariance and Power Spectrum Fitting, EACV and PS, Respectively) and of Different SC Peak Shape Functions (Gaussian and Exponentially Modified Gaussian Function, G and EMG; PS Fitting Method) method 3* ,.h0- method 20 matogram GWe' G~YP~' EMG typec part m ug m ug m ad UG 7 T/UG chamomile Lc14 Lc12 Lc24 naphtha Ln19 Ln15 Ln58 Ln89
129 1.42 129 1.38 126 1.51 1.21 28 1.37 29 1.35 28 1.37 1.37 95 1.52 96 1.51 101 1.77 1.46 272 165 112 34
1.47 283 1.39 271 1.25 1.17 164 1.16 164 1.18 1.43 116 1.44 115 1.51 1.80 37 1.73 36 1.88
0.90 0.02 1.01
0.74 0.01 0.69
1.24 0.09 0.07 1.17 0.06 0.05 1.51 -0.05 -0.03 1.82 -0.06 -0.03
EACVF fitting. b PS fitting. SC peak shape function. Data obtained by using eq 25. Table VIII. Method 4 Fitting Results. chromatogram Pb o'ddab m (11 NLcl4 Lc14 NLnl9 Ln19
70 56 111 88
1.88 1.84 1.40 1.41
132 124 247 264
1.10 0.96 0.63 0.89
uz
uaV
r
1.50 2.01 1.78 1.81
1.30 1.49 1.20 1.35
1.36 2.09 2.80 2.03
a Retention pattern, exponential I.M.; SC peak shape, Gaussian. ul,62, and uaVare the lower, the upper, and the average peaks with values. remectivelv. r = ~ 1 1 ~ 2 .
the naphtha chromatograms when the G-type peak shape is assumed (see Table VII, the first two column sets). This result indicates that peak tailing factor, T / U G , values lower than r0.6 have no influence on the procedure. For the chamomile, the T estimates are in significant agreement with the data obtained from the peak shape analysis of separated peaks in the reference chromatogram and from D-type peaks of the EACF plot (T 0.4; see Table 111,Figure 5, and Table VII). It must be also observed that the UG values are always lower than the ugvalues (see Table VII): this is because the former are the estimates of only one part of the SC peak width (the Gaussian part; see the Appendix). One can also verify agreement between the u estimates obtained under the EMG hypothesis and the ug values obtained under the G hypothesis. For naphtha, the T / U G values are exceptionally low (near to zero), but this last finding may be not significant owing to the poor fitting degree of the E model for this case. PS Fitting by Nonconstant Peak Width Interdistance Model. In Table VI11 the existence of eventual variations (random or steady) in SC peak width along the chromatogram is exploited. In fact, with method 4 (cf. Table 11) both the random and the nonrandom variations of the peak width along the chromatogram are theoretically accounted for in terms of chromatogrampower spectrum, under the hypothesis of E-type IM.3 The application of this further method is thus able to detect the presence of hidden peak width variation of the chamomile or the naphtha chromatogram, which was instead detectable in the reference chromatogram (see Table I11 and Figure 5). The fitting results give the boundary, the ratio, and the average peak width values-ul, 6 2 , r = uz/ul, and uav,respectively-together with an estimate of m. It can be seen that both the boundary and average values correspond to what was measured in separate parts of the chromatogram (cf. Table VI11 and Figures 10 and 12). Moreover, the values of the SC peak variability ratio, r, observed in the most extended region of the chromatograms are coherent with what was observed in the reference chromatogram (cf. Table 111). It can be observed that the agreement is not perfect but this
is most likely determined by the complexity of the model which, in this case, must simultaneously account for many independent parameters: peak width, number of components, peak width variability, and interdistance model. The linearized chromatogram gave a wider peak width spread in one case (chamomile) and a smaller one in the other (naphtha). This finding is not in perfect agreement with what the linearization procedure effectively produces, especially in the most extended Ln19 case. However, this disagreement may derive from the fact that slight peak tailing is simultaneously present and both effects cannot be perfectly accounted for. The SC number evaluation m is instead in good agreement with what was obtained from methods 2 and 3 (within 5% for chamomile and 20% for naphtha). Even in this case, it can be concluded that the slight experimentally observed peak width variations (*lo%) are negligible for the present approach. Let us now briefly discuss the chromatographicsignificance of such a complex picture attained by the PS-ACVF analysis: (1) the chamomile chromatogram is a typical Poissonian multicomponent chromatogram, with a peak height distribution close to exponential; the naphtha chromatogram is on the whole pseudo-Poissonian, but locally more ordered that the chamomile case; (2) the saturation factor value is acceptable (a! = 0.36) for chamomile and worse for naphtha ((Y = 0.51); (3)the mean separation extent values, of the naphtha and chamomile chromatograms are, instead, comparable (r0.4) because the former is more ordered than the latter; (4)the statistics of peak overlappingpresented by Davis and Giddings5 can be applied to infer probabilities of finding singlet, doublet, or higher multiplet peaks only to the chamomile but not to naphtha because only the IM of the former is of type E, whereas that of the latter is much more similar to the r type. Evidence has been attained that the overlapping Statisticsare profoundlyaffected by the IM type.29 This topic will be presented in a separate handling.
r,
CONCLUSIONS The present experimental study shows that the PS-ACVF methods allow for a direct, coherent determination of both the retention pattern and SC peak width parameters from an experimental multicomponent chromatogram. A good conventional setup of the chromatographic separation, assuring a peak width constancy within 10-15% and a peak tailing factor T / U G lower than ~ 0 . 6is, enough to furnish digitized chromatograms suitable for the present procedure. Fundamental performance attributes of a chromatographic separation of a multicomponent mixture--N,, a,and y-can be estimated together with the type of distribution of the interdistances between subsequent peaks. Moreover, the ordered structure present in a multicomponent chromatogram can be easily singled out and analyzed in the EACVF plot. In this way, not only can the structure-retention relationships be detected in the multicomponent mixture but also additional, independent estimates of the SC peak width can be obtained from deterministic-type EACF peaks. The peak tailing factor T / U G and the extent of peak width variation can also be estimated by using more complex PS models, thus making the present approach a completeinvestigation method of the multicomponent separation.
ACKNOWLEDGMENT This work was made possible by the financial support of Italian Ministry of the University and the Scientific Research (MURST, 60% and 40%). (29) Pietrogrande,M. C.; Felinger, A.; Dondi, F. AnalyticalChemistry National Symposium, Pavia (I), Sept 22-25, 1992. (30) Dondi, F.; Ramelli, M. J. Phys. Chem. 1986,90, 1885.
ANALYTICAL CHEMISTRY, VOL. 65, NO. 17, SEPTEMBER 1, 1883
APPENDIX
2221
obtained
Meaning of the Different SC Peak Width Estimates. The present procedure allows one to obtain peak width estimates fromtwo different sources: (1)directly from isolated SC chromatographic peaks by using the general EC series fitting procedure; (2)from deterministic peaks of the EACVF plots by using the same general EC series fitting procedure; (3)from the beginning part of the EACF plot (single peak + IM part) by using pertinent TACF model functions; (4)from the EPS plot by using pertinent theoretical power spectrum model functions. In order to avoid confusion among these different estimates some explanations are necessary. The SC peak width is defined as the peak standard deviation, i.e.:
u z ug
(25)
and as with the EC series zeroth order expansion an unbiased peak width estimate is obtained. In only one case has a different hypothesis-the EMG one-been used (see method 3 in Table 11). In this case, separateevaluation of the Gaussian part, UG, together with the decay constant, T , is obtained from the fitting. In this case the following relationship holds true: u2 = aG2
+
7’
(26)
which is thus employed to derive the SC peak width estimate. Note the different use of UG and us.
GLOSSARY u = ( J Y ( t ) ( t- mP)’dt)’l2
A
AT
where
mp = J Y ( t ) t dt
(21)
ACF ACVF a.
is the mean. By using the general EC series fitting procedure, from a given peak (no matter whether it is a chromatographic peak or an EACF peak) an unbiased estimate of the peak standard deviation together with the other statistical peak parameters can be obtained as long as certain conditions are respected.ls21 When applied to well-separated SC peaks, the EC procedure will give an estimate of the SC peak width, u. When applied to D-type EACF peaks, an unbiased estimate of the standard deviation of these peaks-denoted as u w c r w i l l be obtained. The following relationship will be assumed to hold true: UmCF
= u(2)”’
(22)
since eq 2 of Table I states the following relationship uTACF
=~~(2)”~
(23)
between the peak width of a D-type TACF peak and the SC peak widthunder the hypothesis of G-type shape. According to eq 22, an additional and independent estimate of the SC peak width is thus possible: d
= a~CF/(2)’/’
(24)
which will be strictly correct only under the hypothesis of G-type SC peak shape. However, even without exact treatment of the problem, eq 24 will prove to be correct even under moderate peak skewing conditions. This is possible by processing the reference chromatogram, which contains only well-separated peaks. The other sources (3 and 4) have the common features, making use of exact theoretical models of either ACF or PS. In the great majority of cases (see Table 111,these methods assume G-type shape for the SC peaks and in these models the peak standard deviation is reported as ue, to recall this basic hypothesis. When the G-based TACF or the TPS is fitted to an experimental case, an estimate of the SC peak width, reported as ug and analogous to u, is obtained. This can be argued by the following considerations. The G-type ACF or PS model can be considered as the zeroth expansion of a more general model, as the Gaussian function is the zeroth expansion of the general EC series expansion of peak functi0n.m When we use these TACF or the TPS models in nonlinear least-squares fitting, an unbiased estimate of u is
ab 0b.i
ah
aM
CAI
cv % C
D E EACF EACVF EC EMG EPS f(t)
Fn(w) FFT G GC hM(0
h-) IM
L M m, m
N NP NL NC n PS P
P Pb PM
SC peak area total area of the multicomponent chromatogram autocorrelation function autocovariance function mean of the SC peaks mean of the band areas ith band area detected above a given threshold level mean value of SC peak height mean value of peak maxima in the multicomponent chromatogram autocovariance function value at time t numerically computed EACVF at point i, eq 9 % variation coefficient, eq 17 chamomile (chromatogram) deterministic IM exponential (Poissonian)IM experimental autocorrelation function, eq 11 experimental autocovariance function, eq 9 Edgeworth-CramBr exponentially modified Gaussian function experimental power spectrum, eq 12 peak interdistance distribution numerically computed EPS at frequency w , eq 12 fast Fourier transform Gaussian peak shape gas chromatography (ith) peak maximum in the chromatogram (maximum) EC series expansion order interdistance model, i.e., distribution of the interdistances between subsequent SC peaks in the chromatogram linearized (chromatogram) maximum time point in EACVF computation, eq 9 peak mean, eq 20 number of SCs in the chromatogram above a given threshold level normal IM number of points nonlinearized (original chromatogram) peak capacity computed at a given resolution, eq 6 naphtha (chromatogram) power spectrum parameter of the r IM number of peaks computed at a given R, value number of detected bands above a given threshold level number of detected maxima above a given threshold level
2222
ANALYTICAL CHEMISTRY, VOL. 65, NO. 17, SEPTEMBER 1, 1993
relative interdistance dispersion, eq 17 chromatographic resolution, eq 2 single component peak width ratio (udal) skewness total squared deviation between EACVF and TACVF, eq 19 mean value of interdistance between subsequent SC peaks (above a given threshold level) time axis theoretically computed autocovariance function theoretical power spectrum uniform IM numerically computed window function at point i time span in the chromatogram SC peak interdistance at a given R. mean value of the chromatographic response, eq 10 (theoretically computed) chromatographic response at point i saturation factor (mlN,), eq 4 mean interdistance between subsequent linear hydrocarbons Dirac function gamma IM gamma function
separation extent @/rn), eq 1 numerically computed EACF at point i, eq 11 standard deviation of the SC peak shape, eq 20 measured width of a D-type SC peak in the EACF plot Gaussian component of the peak width standard deviation under the EMG peak shape hypothesis standard deviation of the peak interdistances standard deviation of peak maximum distribution width of a D-type SC peak in the TACF plot standard deviation of the SC area distribution average value of the peak width in the chromatogram [(a1
+ u2)/21
standard deviation of band areas SC peak standard deviation in all EACF and PS methods under the Gaussian peak shapehypothesis standard deviation of SC peak height distribution lower and upper SC peak width time constant peak tailing factor frequency
RECEIVEDfor review December 29, 1992. Accepted May 6,1993.