Anal. Chem. 1990, 62,1854-1860
1854
Fourier Analysis of Multicomponent Chromatograms. Numerical Evaluation of Statistical Parameters Attila Felinger Department of Analytical Chemistry, University of Chemical Engineering, P.O. Box 158, H-8201 Veszprgm, Hungary Luisa Pasti, Pierluigi Reschiglian, and Francesco Dondi* Department of Chemistry, University of Ferrara, I-44100 Ferrara, Italy
A procedure based on the power spectrum (PS) model of a multlcomponent chromatogram Is Introduced by which the number m of detectable components (or slngle-component peaks) and the parameters of the slngle-component peak such as standard devlatlon and asymmetry factor can be evaluated. I n essence, when Wed to theoretical models, the expertmental PS-exprdng the chromatographic response varlance dependence on the tlme distance-provldes the Inf a m a t h necessary to accept or reject the model and to glve the necessary parameter estlmatlons. The procedure is tested by using computergenerated multlcomponent chromatograms wlth Polssonlan retention tlme dlstrlbutlon and random and uncorrelated peak helghts, In which density, asymmetry and hdght distrlbutlon are widely varled. How to obtaln unbiased PS numerkal determlnatton by windowing Is also dlscussed. It Is shown that unbiased parameter estlmatlons are obtained, the only procedure Ihttatlon belng the approxhatbn made In the evaluatton of the slngle-component peak helght dlsperslon. An example Is glven of how a retentlon time dlstrlbutlon other than the Polssonlan can be detected.
INTRODUCTION In the preceding paper (1)a new method was suggested for determining statistical properties of multicomponent chromatograms based on the analysis of chromatographicresponse covariance properties as a function of the time distance. The method appears to consist of the fitting of power spectrum (PS) or autocovariance function (ACVF) of an experimental multicomponent chromatogram to their theoretical models. In fact, Fourier analysis was successfully applied to describe stationary uncorrelated multicomponent chromatogram models, that is chromatograms whose detectable components were taken to have the same form of elution peak but different heights and whose retention time sequence is not correlated to peak heights but has stationary properties along the time axis; i.e. the retention interdistances between subsequent components have stationary statistical properties. These theoretical models are likely to fit real cases of complex mixture chromatographic separations obtained under conditions of temperature programmed GC and gradient elution HPLC (2-11). The most general expression of the PS (see Glossary) for a stationary uncorrelated multicomponent chromatogram is (1)
where F is the power spectrum as a function of the frequency w , g ( w ) and 6(w) are respectively the Fourier transform of the
unitary common peak shape function and the characteristic function of the interdistances between subsequent component peaks, T is the mean value of these interdistances, ah and Oh are respectively the mean peak height and its standard deviation, and 6(w) is the Dirac function. The latter vanishes if origin of the Y response axis of the chromatogram is shifted to its mean value computed over the considered time interval (1). Re means the real part of the argument. Equation 1 is considerably simplified becoming the basis for chromatogram parameter determination
under the hypothesis of Poissonian retention time distxibution and exponentially modified Gaussian (EMGF) peak shape function (1). AT and X are respectively the total area of the chromatogram under a given range X of the time axis. u and T are the parameters of the EMGF peak shape. u is the value of the Gaussian part of the standard deviation and 7 the time constant. The same eq 2 represents the case of Gaussian peak shape by simply putting T = 0. The autocovariance function (ACVF) is obtained by applying the Wiener-Khinchin theorem (1, 11)
-
\",
4rmX'""
which simplifies considerably for Gaussian peak shapes
The method based on eqs 2-4 can be an alternative to the Davis-Giddings (DG) method (2-6) in determining the component number of the separated mixture m. Moreover u, and thus the peak capacity N , of the separation, can also be determined. In fact, one can see that by nonlinear regression m, u,and T can be obtained, if F(w) or C(t) are determined, if AT is measured over a given chromatographic space X,and if the Uh/ah value is known. If, instead, nothing is known about the latter quantity, it can be evaluated through the peak maxima of the chromatogram by computing uM/aM, the dispersion ratio of the maxima. It must be observed that whereas experimental determination of X,AT, and uMl/aM is trivial, this is not so for PS or ACVF. In fact, eqs 1-4 refer to an infinitely long multicomponent chromatogram; the real chromatogram being instead only a limited sample of it (1). In this paper the relevance of the above-mentioned approximation and the problem of the unbiased numerical determination of PS from a digitalized chromatogram will be studied. The overall procedure is checked with respect to
0003-2700/90/0362-1854$02.50/00 1990 American Chemical Society
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
1
1855
-
In the above equation M is the truncation point in the ACVF. The task of w ( k ) ,the lag window, is to smooth out the ACVF E ? . 151 f2J to zero at around M. Note that since PS too is an even A C V F E ~PS. ( E?. ~~ * R UiFT FITTING function of frequency, it is only necessary to calculate it over A a positive frequency range. However, to preserve the Fourier METHOD II M transform relationship between PS and ACVF, it was necE essary to double the power associated with each frequency w E q . 171 PS LACVFPS 3 T > 0 (14). B: WiFT P:TTINCE Instead of ACVF, its normalized value with respect to the R zero value is often used METHOD I
P A
%
FFT
BT
0
G R A M
METHOD 111 IS. f j l
=.
ACVF-
S ACVF
Eq.(3/4? =----. . - ..rir
me
Te
-
Figure 1. Numerical procedures for multicomponent chromatogram
parameters computation.
simulated cases even in the presence of those error sources usually found in practice such as noise and bias in base line level determination. The main part of the work will be devoted to the handling of uncorrelated Poissonian chromatograms, one of the most interesting models (1, 2). However because it is also wellknown that the degree to which an attained fitting is good can suggest the acceptance or rejection of the model employed in the fitting; the ability of the method to reveal the presence of non-Poissonian retention time distribution will also be verified. Finally, it must be remarked that the two approaches, respectively based on PS or ACVF computations, are perfectly equivalent since they give the same information. The choice between them will thus be only a matter of computational ease. One can see that PS is much simpler and therefore the main part of this work will be focused on it. POWER SPECTRUM AND AUTOCORRELATION FUNCTION DETERMINATION OF EXPERIMENTAL MULTICOMPONENT CHROMATOGRAMS The problem is just an example of application of standard numerical techniques of time series spectral analysis (12-14). There are three different approaches (see Figure 11, which, however, result in similar estimated parameters. The f i s t two are based on PS and the third on ACVF. The chromatogram [ Y ( t ) ,0 < t < XI is supposed be sampled at equally spaced N intervals [ Yj, j = 1, N ] , a t the discrete time points tj = (j - l)At, with At = X / ( N - 1). For the sake of simplicity it is supposed that At = 1. Method I. The simplest method of the computation is to calculate the ACVF of the chromatogram centered around the mean (14) 1N-k C,,(k)= - (yj - fi (Yj+k- fi (5)
Mj,l
k = O , l , 2, ..., M - 1 where the mean
P is P = -1c NY j Nj=1
(6)
In practice the calculation is stopped at a given k value, since only a short correlation range is required. The PS is obtained from ACVF by Fourier transform, on the basis of the Wiener-Khinchm theorem (1, 14). Since ACVF is an even function, only the the real part (cosine) is considered (12, 14)
which is called the autocorrelation function (ACF). If AT # 1, PS can be recovered from eq 7, by multiplying by At and plotting it againt w At instead of w (14). Method 11. The estimation of PS can be performed directly by computing the Fourier transform of the time series. In practice the fast Fourier transform (FFT) algorithm (13,15) is applied to the chromatogram and the real part (Re) and the imaginary part (Im) of the Fourier transform are obtained. The PS of the chromatogram is computed (13)
F ( w ) = [Re(w)2+ I r n ( ~ ) ~ ]
(9)
Unfortunately, this spectrum cannot be used for parameter estimation as it is the Fourier transform of the total ACVF. The shape of an ACVF calculated from a stochastic sequence is deterministic only for short-range correlations. As t increases random oscillations can be found in the ACVF (14). Therefore, the total ACVF is obtained from the total PS by Fourier inversion. Using an appropriate window, the longrange correlation in this total ACVF can be cut off and after another transform a smooth PS is obtained (see Figure 1). The cutting off is equivalent to calculating an average ACVF of a great number of random chromatograms having the same statistical parameters (14). The first procedure of PS estimation seems simpler but it must be noted that when the digitalized chromatogram contains a great number of points this second algorithm can be faster. Method 111. The statistical properties of the chromatogram can also be estimated when fitting the experimental ACVF to the theoretical ACVF (see eqs 3 and 4). In this case the calculation of the smooth PS is not needed. This fact can reduce computation time. However, since ACVF expression is often complex (see, e.g., eq 4 for the EMGF case) or difficult to derive ( I ) , methods I and 11, which are based on PS expression, appear of more general applicability. Windowing. The lag window w ( k ) (see eq 7) plays an important role in estimating PS. Its shape influences PS shape. To estimate peak shape parameters minimal PS distortion is desired. There is no general solution to this problem (14) and only the case of Poissonian retention time distribution is considered here. The basic property used to build up the windowing procedure is that the Poissonian ACVF is a decreasing monotone function (see eqs 4 and 5). In order to minimize PS distortion, caused either by the random fluctuations of the ACVF or by an inappropriate window, the following algorithm was chosen to filter out the random fluctuations in the numerically computed ACVF. When calculating ACVF the value must be checked to see whether it is above or below zero. When the ACVF proves to decrease continuously, it is truncated at the last positive value (M) and the rectangular window applied
M-1
F(w) = 2V,(O)
+ 2 k=1 c Cx,(k) w ( k ) cos (wk)l O l w l a
(7)
This means that for k 5 Mall the values are unchanged and, above M , the ACVF is equal to zero. When, after having reached its minimum, which is not below zero, the ACVF
1856
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
begins to increase, it is truncated a t the first point after the minimum (M). The ACVF is unchanged in the interval 0 5 k 5 M/2 and it is multiplied by the Tukey window (14) for M/2 < k 5 M. The ACVF is equal to zero above M
w(k) =
[:
0.5 + 0.5
f
'
The Presence of Noise and the Base-Line Determination. The presence of noise in the chromatogram can easily be accounted for within the PS method approach. One example of this here considered is the simplest case of white noise. Its effect can, in fact, easily be eliminated since it is a time series without any correlation (14). For this reason, when calculating the ACVF of a noisy random chromatogram, the effect of the white noise appears only at around t = 0; that is the white noise results in an increment of the variance. On this basis the increment in variance can be cut off by extrapolating the beginning of the ACVF to t = 0. I t can be noted, that the possibility of noise filtering makes the PS method superior over the DG method, in which the resolution factor R, discriminating between peaks and components is a function of noise ( 4 ) . The base-line correction of the chromatogram must be carefully established. In fact, the "experimental" chromatogram is to be considered as a discrete part of an infinitely long time series (1)and Fourier algorithms, which suppose the time signal be periodic, transforming exactly one period. For this reason the base-line values a t the end and a t the beginning of the "experimental" chromatogram must coincide. Otherwise "aliasing" appears (16). Nonetheless, small systematic errors in the base-line level determination are not critical, as will be demonstrated here below. COMPUTATIONS All the programs were written in Basic and run on an IBM PSI2 Model 50 computer. Simulated chromatograms were generated according to refs 4-8. In order to have enough points in the ACVF, a frequency of 4/u and 6/u respectively for nonnoisy and noisy peaks was used. The random character of the generated sequences was checked by the chi square test (17). ACVF, PS, and ACF were computed according to eqs 5, 7, and 8 and fitted to theoretical models (eq 2 and 4) by using the Marquard nonlinear least-squares procedure with numerically computed derivatives of minimized function (15). The white noise was generated by using a STATGRAPHICS routine (18). The peak sensing algorithm for maximum identification was based on the comparison of three successive points in the case of nonnoisy peaks. For noisy peaks a maximum was identified in a sequence of seven points, the first three were increasing and the last three decreasing. The fast Fourier transform routine of ref 14 was employed. Using different random sequences for each parameter combination, 25 runs were performed. Tables I-IV report the mean values and the standard deviations of the fitting model parameters computed over these repeated runs. All the data reported in Tables I-IV refer to a resolution of R, = 0.5 (2). The signal-to-noise (SIN) ratio reported in Table I11 is computed by dividing the maximum peak by 4 times the noise standard deviation. RESULTS AND DISCUSSION Windowing Effect. Figure 2a reports a typical simulated case of an uncorrelated Poissonian chromatogram. The windowing effect in computing PS of this limited-in-time chromatogram (see eqs 7,10, and 11) is better understood by referring to Figures 3 and 4. In Figure 3 the problem of random fluctuations in an experimentally computed ACF is
t
t
Figure 2. (a)A simulated example of Poissonian uncorrelated chromatogram (m = 25, cy = 0.5 ( R , = 0.5),7 = 0). (b) A simulated example of a chromatogram with peak interdistances uniformly distributed between 0 and 4u. (m = 25, cy = 0.5 ( R , = 0.Q T = 0).
'P 0.75
Lii I
0.25
I\ 0
M
:O
20
30
40
Figure 3. Curve 1 (-). ACF (p,) of the chromatogram reported in Figure 2a. Curve 2. (+++). Average of 25 ACFs of Poissonian chromatograms having the same statistical parameters. Point "M" is the cut-off point at t 4u.
shown. In this figure the deterministic part of the ACF can be distinguished from the nondeterministic (=random) portion where coincidence is found between single ACF and the average of 25 ACFs computed over 25 repeated chromatograms all having the same parameters but obtained by using different random sequences. The cut-off point (point "M" in Figure 3) is a t t = 40 where the theoretical ACVF expression is expected to go practically to zero (see eq 4). If only this part by means of the above described windowing function is used to build up the PS, a smoothed function of Figure 4 having the shape expected according to eq 3, is obtained. On the contrary if the total ACVF is used, the noisy PS of Figure 4 is obtained. Estimating ah/ah. Since no "a priori" information on component peak height distribution is available when an
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990 Table
1857
I. Comparison of Results Obtained b y PS Method and DG Method DG method
set A
B C D
E F G H I
J K L M N
0 P
Q
R S T U
v W x
Y
z
A1 B1
Cl D1
El F1 G1 H1 I1
J1 K1 L1
M1 N1
m
HDF"
200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 100 100 100 300 300 300
ran ran ran ran ran ran ran ran ran ran ran
ran ran ran ran ran
ran ran ran ran exP con exP con
exp con exp con exP con
017
HRb
ff
PS method me
mi
ma
P
50 150 50 150 50 150 50 150 50 150 50 150 50 150 50 150 50 150 50 150
0.167 0.167 0.333 0.333 0.500 0.500 0.667 0.667 0.167 0.167 0.333 0.333 0.500 0.500 0.167 0.167 0.333 0.333 0.500 0.500 0.167 0.167 0.167 0.167 0.333 0.333 0.333 0.333 0.500 0.500 0.500 0.500 0.667 0.667 0.333 0.500 0.667 0.333 0.500 0.667
203 f 10 204 f 9 207 f 15 205 f 9 202 f 12 205 f 18 217 f 17 210 f 22 196 f 19 198 f 24 211 f 33 202 f 26 211 f 45 216 f 39 213 f 34 200 f 29 209 f 57 207 f 40 215 f 62 210 f 48 192 f 10 217 f 10 177 f 24 237 f 29 184 f 12 235 f 18 179 f 28 244 f 35 176 f 15 246 f 30 167 f 38 283 f 38 171 f 17 252 f 31 92 f 8 88 f 12 86 f 11 266 f 15 261 f 15 251 f 18
200 f 14 200 f 17 197 f 18 191 f 18 190 f 23 190 f 29 179 f 29 188 f 19 188 f 15 192 f 19 178 f 23 175 f 13 155 f 20 129 f 21 186 f 1 2 188 f 18 166 f 8 169 f 14 138 f 22 118 f 19 198 f 14 187 f 13 183 f 30 191 f 15 199 f 23 187 f 19 148 f 18 179 f 19 179 f 23 172 f 29 119 f 14 149 f 13 152 f 7 202 f 24 101 f 7 84 f 13 68 f 16 282 f 21 262 f 13 230 f 19
203 f 16 203 f 19 214 f 24 204 f 19 230 f 43 225 f 42 254 f 58 254 f 41 195 f 16 201 f 23 206 f 26 200 f 17 202 f 50 158 f 29 195 f 14 196 f 21 189 f 14 195 f 25 176 f 49 149 f 38 201 f 15 186 f 14 202 f 55 197 f 16 226 f 46 194 f 22 169 f 22 199 f 27 213 f 31 183 f 44 139 f 15 179 f 22 191 f 9 273 f 114 112 f 8 102 f 18 90 f 24 311 f 32 311 f 20 295 f 20
160 f 4 162 f 5 134 f 5 134 f 4 111 f 4 110 f 5 97 f 4 96 f 5 148 f 4 147 f 5 119 f 5 115 f 5 95 f 6 94 f 5 146 f 5 143 f 6 112 f 5 113 f 5 93 f 4 92 f 4 160 f 6 167 f 5 139 f 5 150 f 5 130 f 6 142 f 6 112 f 5 123 f 5 109 f 4 119 f 7 85 f 5 101 f 6 93 f 5 101 f 6 65 f 4 41 f 4 47 f 2 191 f 8 161 f 6 138 f 5
m m m m m m m
m
0.35 0.35 0.35 0.35 0.35 0.35 0.20 0.20 0.20 0.20 0.20 0.20
m
m m
1
0.20 0.20
m
m
m
1 1
m
0.35 0.35
m
m
m
1
m
1
exp
0.20 0.20
m
exP
m
m
con
con exP exP exP exP exP exP
1
m
1
m
m
m
m
m
m
m
m
m
m
m
m
" HDF, height distribution function.
1.6 1
-
1.2
0.E r
I
0
I aM2
0.317 0.339 0.340 0.326 0.332 0.334 0.299 0.312 0.307 0.315 0.315 0.310 0.298 0.291 0.293 0.317 0.282 0.240 0.256 0.250 0.914 0.080 0.676 0.127 0.752 0.134 0.662 0.177 0.570 0.179 0.479 0.157 0.667 0.204 0.769 0.650 0.608 0.827 0.737 0.620
uh2/ah2
0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 1
0 1 0 1 0 1
0 1 0 1
0 1 0 1 1 1 1 1 1
HR, height ratio (maximum single component peak height to its minimum value).
2 L
Ex
UM2
0.2
0.1
0.3
OU
Figure 4. The PS ( F x xof ) the chromatogram of Figure 2a: curve 1, PS calculated from the total ACF; curve 2, smoothed PS calculated from truncated ACF.
unknown mixture is considered, the component peak height dispersion, u h / a h , in eqs 2 and 3 is estimated from the experimental chromatogram by means of peak maxima dispersion ratio, uM/aM. In Table I (last two columns) examples of (uM/aM)'ratios vs (q,/ah)' ratios can be found for different simulated cases of peak height distribution, saturation factor
values, and tailing factors. Only in extreme cases can important differences be seen. However, since (uh/ah)' appears as a term of a sum (see eqs 2-4), its bias effect is further reduced. Parameter Determination by Nonlinear Fitting. The component number m and the peak shape parameters u and T were determined by using the above-described approximation and windowing method in the nonlinear least-squares procedure. In Figure 5 the results of PS fitting for a typical simulated case are reported. It can be observed that the degree of fitting is very good, implying very high precision of the obtained chromatogram parameters. On the contrary, the individual parameters obtained from different simulated cases belong to less or more dispersed statistical populations (see Tables I-IV where the means and the standard deviations are reported). This finding is the result of the natural statistical variability among repeated chromatograms, which are repeated Poissonian sequences, and in ref 6 the interdistance distribution was empirically shown to fit a log normal distribution. It is well-known that the mean and the standard deviation of the events falling within a given time period are respectively m and ml/' (18). In the simulation of repeated chromatograms, an exact constant and known number of components have been introduced by random number generation. Under these conditions the statistical variability appears instead within the interdistance distribution which does not perfectly match the expected distribution for Poissonian sequences (18).
1858
ANALYTICAL CHEMISTRY, VOL.
62, NO.
17,
SEPTEMBER
0 0 0
Figure 5. Fltting the smoothed PS. Chromatogram case of Figure 2a (uncorrelatedPoissonian chromatogram): (a) experimental PS (Fm; * ) and theoretical PS computed according to eq 2 (F, -); (b) Difference between experimental PS (F,) and best fitting PS (F).
Table 11. Comparison of the Number of Components Estimated by Different Methods me
setn
method I
method I1
V
217 f 15 235 f 18 246 f 30 252 f 31 177 f 24 179 f 28 167 38 261 f 15 251 f 18
222 f 15 231 f 20 242 f 25 231 f 32 172 f 29 169 f 33 158 f 38 262 f 25 247 f 25
2 D1 H1
W A1 El MI N1
*
method I11 method IVb 227 f 17 231 f 15 242 f 17 247 f 27 175 f 21 168 f 23 159 f 29 260 f 15 246 f 18
205 f 10 209 f 16 207 f 26 209 f 33 210 f 29 207 f 42 205 f 53 308f 33 305 f 29
a All the parameters are the same as in Table I. *Results obtained by using method I and the true values of uh2/ah2.
The me estimate, based on the interdistance properties through ACVF computation, will thus retain a typical Poissonian variability contribution. One can see that the standard deviation of the me estimate is about m1f2,as expected (see Tables I-IV). In Table I1 the three computational methods (methods 1-111) are compared. It can be seen that the results are the same and the methods are perfectly equivalent as expected. For this reason only the results obtained by using method I will be discussed. The general performance of the present PS method was f i t tested with respect to the most complete case set previously studied ( 5 ) . In this manner many factors (see Table I) affecting multicomponent chromatograms were considered: (1) the saturation factor a = m / N , for R, = 0.5 within the 0.167-0.667 range; (2) three types of peak height distribution; random (rnd), exponential (exp), and constant (con); (3) the number of components m between 100 and 300; (4) three values for the degree of peak asymmetry ( u / T = m, 0.35,O.Z). In addition to the above reported points, the following were also considered: (5) the presence of white noise; (6) the base line level determination; (7) the cases of component numbers m between 10 and 100 (see cases of Tables I11 and IV). Table I compares the results obtained by the present PS method with those previously obtained by the DG method
1, 1990
( 5 ) . First of all it can be observed that with the PS method one does not have to choose between the me estimate from slope or intercept: m, and mi respectively (although this last point was recently resolved by Davis (7)).As a general rule it can be observed that, for cases of symmetric peak shape, the PS and DG methods yield comparable results. In fact, there is good accuracy in me estimation, only well-defined outliers being found either for exponential or constant distribution at high saturation values (see cases C1-H1 in Table I). On the contrary, when the peak shape is asymmetrical, the PS method is much more powerful than the DG method (see cases M, N, S, and T in Table I). Whereas understanding origin failure for the DG method is rather cumbersome, in the case of PS it is easily explained as deriving from a biased 6h2/ah2 estimate from the experimental maxima. In fact, one can see that the estimated me values are lower whenever the ratio 6 M 2 / a M 2 is lower than (rh2/ah2 (cases N1 and M1 in Table I-exponential type distribution). The opposite holds true for constant type distribution (see cases F1, Hl). It was confirmed that once the exact ratio is employed, this bias is removed (see the data under method IV in the last column of Table 11). Therefore, the way to further improve the performance of the PS method would be to improve estimation of the uz/ah2 ratio. This point is beyond the aim of the present work, but for the moment a practical rule can be forwarded according to which the accuracy of determined me values can be "a posteriori" estimated by watching the uM2/aM2 ratio: if this is abnormally low or high (e.g. CO.2 or >0.7) me is respectively over- or underestimated. The PS performance was further verified in cases where the DG method failed or was not extensively characterized (see Table 111). Cases 01-05 show PS performance in determining low m values at a = 0.5. It can be seen that accuracy is acceptable even for low m values, while the relative precision only improves with an increase in m. Thii depends once again on the above explained intrinsic source of variability in random sequences which is on the order of m1/2. Cases Pl-P3 show that the PS method gives excellent results even with a > 1 where the DG method fails ( 5 ) . Finally, cases Ql-Q3 show that the method is insensitive to white noise, at least for SIN > 20, provided it is properly filtered off as explained above. Cases Rl-R4 and Sl-S4 of Table 111, when respectively compared to cases F and C1 in Table I, show that moderate errors in the base line level determination are also not critical. The peak shape parameters u and r are likewise evaluated when estimating the component number me. Selected results concerning these are presented in Table IV. One can see that the mean accuracy is generally good for both peak shape parameters. The estimated u precision is very good for T = 0, but worsens slightly when peak asymmetry and saturation increase, since the two peak shape parameters become covariant under the above conditions. This last result is particularly important to validate the overall method. In fact, accuracy in u and T determination proves that the applied windowing method does not introduce any serious distortion in PS. Secondly, when applied in practice, the method can avail itself of an important check control by comparing the u and T values obtained from multicomponent chromatogram PS with those measured from isolated component peaks, using different procedures (e.g. by Foley and Doney method (19) or by Edgeworth-Cram6rseries (20,21)). Agreement found between these values will then prove the Poissonian character of the chromatogram and thus make it possible to accept or reject the performed statistics. In order to check this point under very extreme conditions, a 100-component chromatogram having interdistances of uniform random distribution instead of exponentially dis-
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
1850
Table 111. Application of PS Method to Cases of Great Overlapping, White Noise, and Biasing Base Line Level Determination set
m
HDF
017
HR
a
m0
P
01 02 03 04 05
10 25 50 75 100 200 200 200 200 200 200 200 200 200 200 200 200 200 200
ran ran ran ran ran ran ran ran ran ran ran ran ran ran ran exP exP exp exP
m
m
150 150 150 150 150 150 150 150 150 150 150 150 150 150 150
m
m
m
m
m
m
m
m
0.500 0.500 0.500 0.500 0.500 1 1.333 2 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500
14 f 4 28 f 7 55 f 10 82 f 13 106 f 12 213 f 21 205 f 31 189 k 45 197 f 15 208 f 22 259 f 25 208 f 19 201 f 12 214 f 18 202 f 20 177 f 12 170 f 16 182 f 11 172 f 12
6f1 10 f 2 28 f 3 41 f 2 43 f 3 73 f 4 57 f 3 39 f 3 93 f 5 71 f 9 33 f 8 111 f 5 110 f 4 112 f 4 112 f 4 107 f 5 106 f 6 107 f 4 108 f 4
P1 P2 P3
61 Q2 Q3 R1 R2 R3 R4
s1 s2 s3 s4
m m m m m m
m m m
m m m m
B,b %
SINa
0 0 0 0 0 0 0 0 0 0 0 +0.2
m m m m m
m m m
100 20 5 m
-0.2 +0.4 -0.4 +0.2 -0.2 +0.4 -0.4
m m m m m m m
B, base line espressed as a S / N , signal to noise ratio, computed by dividing the maximum peak by 4 times the noise standard deviation. percent of the maximum peak.
Table IV. Comparison between Theoretical (a, T ) and Estimated (u,,, T,,) Peak Shape Parameter Values theoretical parameter set
m
HDF
A E J N
200 200 200 200 200 200 200 200
ran ran ran ran ran ran ran ran
P
T Q1 Q2
u/r
HR
a
me
P
U
m
50 50 150 150 150 150 150 150
0.167 0.500 0.167 0.500 0.167 0.500 0.500 0.500
203 f 10 202 f 12 198 f 24 216 f 39 200 f 29 210 f 48 207 f 14 212 f 12
160 f 4 111 f 4 147 f 5 94 f 5 143 f 6 92 f 4 112 f 3 113 f 5
2.08 2.08 2.08 2.06 2.08 2.06 6.18 6.18
m
0.35 0.35 0.20 0.20 m
m
tributed (=Poissonian case) was generated and subsequently submitted to PS numerical fitting as though it were of the Poissonian type. A portion of this chromatogram is reported in Figure 2b. It must be observed that in this case even an expert chromatographer would have some difficulty in discriminating the type of retention time distribution by simply observing the case. The resulting nonlinear fitting to the incorrect model is reported in Figure 6. It can be seen that this fitting is not as good as the one in Figure 5 where the correct model was applied to the case. The cr estimate is 25% lower than the true value (3.73 instead of 4.92) and the me estimate is 199 instead of loo! The present PS method is thus potentially able to recognize the overlapping pattern of a multicomponent chromatogram. However a full exploitation of this important point requires more extensive investigation and lies beyond the aim of the present work.
CONCLUSIONS The PS method developed here appears to have more general applications than the DG method previously described. Although a complete digitized chromatogram is required, it yields a great deal of useful information even including validation of the statistics employed. Therefore, this method can truly cast light on the internal structure of mixture complexity making it possible to detect the degree of order/disorder and to establish pattern cataloging procedures. It must be observed that the PS method does not include any detailed theoretical treatment of overlapping statistics by directly computing the number of single component peaks
0..
estimated parameter
7
re
70
5.93 5.93 10.35 10.35
2.07 f 0.05 2.08 f 0.06 2.10 f 0.20 2.11 f 0.43 2.03 f 0.49 1.77 f 0.67 6.18 f 0.18 6.03 f 0.31
5.88 f 0.67 5.5 f 1.4 10.3 f 1.7 10.9 f 2.7
r
0.1
(5, - F)
F; F,
0.6
0.4
Figure 6. Fitting of the smoothed PS, chromatogram case of Figure 2b: (a) experimental PS (Fm;* ) and theoretical PS computed according to eq 2 (F, -); (b) difference between experimental PS (F,) and best fitting PS (F).
1860
ANALYTICAL CHEMISTRY, VOL. 82, NO. 17, SEPTEMBER
(singlet) or multicomponent bands (doublets, triplets etc). Nonetheless, since the PS method can verify the proper statistical distribution of both peak heights and retention times, it is possible to correctly employ the DG approach which, instead, contains this aspect of peak overlapping. The work performed thus far indicates that a great deal is yet to be done: e.g. improvement of height dispersion estimates, testing different retention time distribution models, and so on. Nonetheless, what is most important at this point is to apply this approach to concrete practical cases so as to establish where it is of greatest use and what direction further development should take. GLOSSARY ACVF autocovariance function autocorrelation function ACF total area of the multicomponent chromatogram AT mean value of single component peak height ah mean value of peak maxima in the multicomponent QM chromatogram autocovariance function value at time t , eqs 3 and C(t) 4 numerically computed ACVF at point k , eq 5 CJk) DG Davis-Giddings (method) EMGF exponentially modified Gaussian function PS value at frequency w , eqs 1 and 2 F(w) FFT fast Fourier transform FT Fourier transform Fxx(k) numerically computed PS at point k , eq 7 Fourier transform of the unitary peak shape located g(w) at the origin h single component peak height HDF height distribution function HR height ratio (maximum single component peak height to ita minimum value) Im imaginary part m number of single component peaks present in a given mixture as above but estimated by PS method me number of single component peaks present in a mu, mi given mixture estimated from slope and intercept, respectively, by DG method peak capacity computed at a given resolution Nc number of peaks computed at a given R, value P power spectrum PS Re real part chromatographic resolution RU T mean value of interdistance between subsequent single component peaks t time axis 3 mean value of the chromatographic response, eq 6
Yj
chromatographic response at point j
numerically computed window function at point k , eqs 7,9, and 10 saturation factor (=m/N,) Dirac function sampling interval of the chromatogram separation extent ( = p / r n ) numerically computed autocorrelation function (ACF) at point k , eq 8 standard deviation of the Gaussian form of the single component peak or the Gaussian part of the standard deviation of the EMGF as above but estimated by PS method standard deviation of the single component peak heights standard deviation of the peak maxima time constant of the exponentially modified Gaussian function as above but estimated by PS method frequency LITERATURE CITED Fellnger, A.; Pasti, L.; Dondl, F. Anal. Chem., preceding paper in thls issue. Davis, J. M.; Wings, J. C. Anal. Chem. 1983, 55, 418. Martin, M.; Guiochon, G. Anal. Chem. 1985, 57, 289. Davis, J. M.; Wings, J. C. J . Chromtogr. 1984, 289, 277. Davis, J. M.; Wings, J. C. Anal. Chem. 1985, 57. 2168. Davis, J. M.; Wings, J. C. Anal. Chem. 1985, 57, 2176. Davis, J. M. J . Cromatcgr. 1988, 449, 41. Dondl, F.; Kahle, Y. D.; Lodl, G.; Remeiil, M.; Reschlgilan, P.; Bighl, C. Anal. Chim. Acta 1988, 191, 261. Herman, D. P.; Gonnord, M. F.; Gulochon, G. Anal. Chem. 1984, 56, 995. Martin, M.; Herman, 0. P.; W i n , G. Anal. Chem. 1988, 58, 2200. El Failah. M. 2.; Martln, M. ChromtograpM 1987, 24, 115. Middleton, D. An Intrcductlon to Statistfcal Communlcatlon Theory; McGraw-Hill: New York, 1960 p 141. Massart, D. L.; Vandenginste, €3. G. M.; Deming, S. N.; Mlchotte, Y.; Kaufman, L. Chemometrics: a Textbook; Elsevier: Amsterdam, 1988; Chapter 15. Jenkins, G. M.; Watts, D. 0. SPeC&8/ Analysis and Its Applications; Holden-Day: San Francisco, CA, 1968. Annino, R.; Driver, R. D. Scientlflc and EngineeringApphtrons with Personal Computers;John Wiley 8 Sons: New York, 1986. Bracewell, R. N. The Four& Transfm and its AppHcethms; McGrawHili: New York, 1986. Crambr. H. MattmmaHcel Methods of Stetlsfics; Princeton University Press: Princeton, NJ, 1974. STATC;F(APHICS,Version: 1.2; Statistical Graphics Corp. Copyright 1985 STSC, Inc. Foley, J. P.; Dorsey, J. 0. J . Chromtogr. Scl. 1984, 22, 40. Dondl, F.; Remelll, M. J . Chromatogr. 1984, 315, 67. Remelil, M.; Blo, G.; Dondi, F.; VldaCMadjar, M. C.; Gulochon, G. Anal. Chem. 1989. 61, 1489.
RECEIVED for review December 21, 1989. Accepted May 1, 1990. This work was made possible by the financial support of the Italian Ministry of Public Education (MPI), the Italian National Research Council (CNR), and the Hungarian Academy of Sciences.