Anal. Chem. 1990, 62, 1846-1853
1846
Fourier Analysis of Multicomponent Chromatograms, Theory and Models Attila Felinger Department of Analytical Chemistry, University of Chemical Engineering, P.O. Box 158, H-8201 VeszprCm, Hungary
Luisa Pasti a n d Francesco Dondi* Department of Chemistry, University of Ferrara, I-44100 Ferrara, Italy
On the bask of only the assumptions of the independence between peak heights and peak posltlon and of statlonary retention time dlstrlbution and under the hypothesis of constant peak shape, Fourler analysis is applied to analyze the structure of multicomponent chromatograms. I n essence analysis Is made of the propertles of the chromatographic response covarlence as a function of time dlstance and of Its Fourier transform which Is the power spectrum (PS). A general theoretlcai expression for the PS of the chromatogram is derived as a function of relative peak helght dispersion, peak poeltkn dietrlkrtion, and peak shape properties. The PS of the total chromatogram has the property of being proportional to the PS of a slngle-component peak. Detalied expressions for the PS and of the autocovariance function of chromatograms havlng different retentlon tlme dlstrlbutlons are presented and dlscussed. The PS of multlcomponent chromatograms havlng Polssonian retention t h e dfstributlon Is found to be congruent to the PS of the slngie-component peak. A new way to obtain the statistical attrlbutes of multicomponent chromatograms (Le. retention time distrlbutlon type, number of components In the chromatogram and peak shape properties) Is thus emphasized.
INTRODUCTION Peak overlapping in multicomponent chromatograms is a well-known experimental fact which arises whenever very complex mixtures are analyzed. The origin of this comes both from the limited peak capacity available to date, even in highly efficient chromatographic techniques, and from the random nature of single component distribution over chromatographic space. Davis and Giddings (1) and later Martin and Guiochon (2) presented a quantitative explanation of the dependence of peak overlap on detectable component number m (singlecomponent peaks) and peak capacity N , (see Glossary). The theoretical descriptions presented were based on three assumptions: (1) Poisson distribution of retention times, (2) constant height of single-component peaks, and (3) Gaussian peak shape. Under these hypotheses a method was formulated able to determine component number m from the number of peaks p counted over one or more chromatograms (3-6). The fist of the above-mentioned hypotheses, that is the Poissonian character of the retention time distribution, has a physical foundation (1)and was statistically proved to hold in practice (5,6). Nonetheless the last two of the above-mentioned assumptions are too strict to be applied to real cases where nonuniform concentrations of single components and tailed peak shapes are often found. In order to determine broader conditions for application of this method, extensive numerical simulation of more realistic cases was presented (3-7). In this way it has been shown
that the same Davis-Giddings method (DG method) can also be applied under nonconstant peak height distribution conditions provided the saturation factor a = m INc at resolution R, = 0.5 is no greater than 0.6-0.7. This practical approach, connected to the above-mentioned exact theoretical treatment was, thus, the first true success in describing complex mixture separation. In fact, with this approach any given chromatographic separation can be quantitatively measured by estimating the extent of separation attained ( 7 ) defined as the ratio p l m (2). More extended theoretical and numerical handling of this question were presented (8-10) and concrete application of this approach was given in LC, GC, and gradient elution optimization (5, 6 , 11, 12). The weak points of the approach are self-evident: in fact, while a significant part of the problem can only be solved by numerical simulation, to date it has been impossible to handle cases having retention time distributions other than the Poissonian distribution. In this paper the whole topic is reconsidered. Here an analysis of chromatographicresponse variability as a function of time is used to draw missing information on the overlapping pattern of multicomponent chromatograms. This variability can be evaluated through chromatographic response covariance as a function of time or by its Fourier transform, which is the spectral measure or power spectrum (PS). The Fourier approach is not new in chromatography and it has been successfully applied in different cases of interest, as in characterizing single chromatographic peak shapes (13-16) and flame ionization detector signals (17)as well as in deriving a general theoretical model of the chromatographic process (18). The account for such a theoretical approach for the study of multicomponent chromatograms lies in a basic property exhibited by a general random multicomponent process Y ( t ) made up, linearly, of individual processes
Y(t) = Ch,u(t - m,) where m, is the shift constant and h, a scale constant for the nth event u ( t ) (19). The term stochastic uprocess" referred to Y(t) and represented by eq 1 expresses the fact that the Y values, at all times t , are random variables (19). This basic property is that the PS of the total process Y(t) and of the single event u(t)are proportional, the proportionalityconstant being dependent only on the m, and h, distribution. In eq 1 the mathematical representation of a multicomponent chromatogram having common single-component peak shapes can easily be identified. In this case m, is the peak position, h, the single component peak height, and u(t)the expression of the peak shape. Since the PS measurement is not a great problem, the aim of the present work will be to obtain useful and tractable information on (m,,, h,,). The only hypothesis that will be made about (m,,h,) is the independence of their distributions; that is no correlation is assumed to exist between peak heights and peak positions. The other assumption, considering the peak shape u ( t ) constant,
0003-2700/90/0362-1846$02.50/00 1990 American Chemical Sociefy
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
is most likely met in practice under programmed temperature gas chromatographyor gradient elution liquid chromatography provided at least the following conditions are respected: (a) temperature or gradient program conditions are optimized in order to have constant resolving power over the whole chromatogram (I);(b) the mixture being analyzed must not have too great a range in polarity and thus no signifcant peak shape variations are exhibited as elution progresses. The present approach, even under the above-mentioned conditions, removes all three limiting conditions of the previous treatments right at the outset. In this paper an analytical approach based on the rigorous Fourier transform (20) is applied to obtain the theoretical expression of PS of a multicomponent chromatogram as a function of the statistical properties of the chromatogram such as component number m , the relative peak height dispersion and the peak shape attributes of a single peak (awidth and the asymmetry factor). The aim of this paper is not to fully consider all the possible structures of real chromatograms which can be quite varied-e.g. there can be a mixture of random components with nonrandom peak positions, or correlation may exist between height and peak position or peak shape. Rather the purpose here is only to devise a new theoretical approach which could be the prerequisite for a new, more powerful computational method of statistical properties of multicomponent chromatograms. THEORY Introductory Remarks. From a statistical point of view the chromatogram of a complex mixture can be regarded as a series of pulses whose parameters are random variables. The pulses are the single-componentpeaks. The shape of a single pulse is supposed to be constant along the chromatogram. The position m and the height h of the pulses are random variables. The shape of the pulses may be a Gaussian function of unit height
1847
1.2
Y 0.8
0.4
0
t Figure 1. A piece of a random Chromatogram.
In this case an infinitely long time chromatographic sequence of type Y ( t )is to be considered. The second is to take an average of the ensemble, where the ensemble, built by replicas of the same piece of random chromatogram, is Y(t). The two approaches can be related to each other with unit probability under the “ergodic hypotheses”,which roughly holds provided the stochastic process is stationary. The term “Stationary” means that the statistical properties of the process do not change with the time. Put more simply the density of the number of components, the peak heights, etc., are almost constant in different parts of the chromatograms and the chromatogram has no base-line drift. This hypothesis is reasonably true provided the chromatogram is long enough and the analyzed mixture sufficiently complex. The Wiener-Khinchin (20,22) theorem establishes the desired equivalence between the two approaches
F ( u ) = 2 1 1 C ( t ) e - i w td t = 4 S0m C ( t )cos (ut) d t (5) C ( t ) = $ 1 1 F ( w ) e i w t dt = L2 rS m0F ( w ) cos (ut)dw
or it may be a more complex peak shape such as the exponentially modified Gaussian function (EMGF), which is the convolution of Gaussian peaks given by eq 2 with an exponential decay function (21) U(t,rn,U,T)
=
exp[
(r/2)1’2-0 7
[
1 + erf
-
e]
[ &(;
-
)I?
(3)
In eqs 2 and 3, t is the time, u the Gaussian part of the standard deviation of the peak, and T the time constant. In Figure 1a piece of a chromatogram is represented as a sequence of random pulses. A chromatogram which contains m components is thus
c m
Y ( t )=
n-1
hnu(t,rnn,c,7)
(4)
where h, is the height of the component peak n. With respect to the general stochastic process Y(t) of eq 1,the multicomponent chromatogram of eq 4 is to be considered as one of ib possible realizations-although limited in time. Conversely Y ( t )is to be considered as a collection of a very large number of chromatograms like the one in Figure 1, all these chromatograms having the same statistical properties. There are two ways in which Fourier analysis derives the PS of Y(t). The first is to take an average over time; that is along the chromatogram represented as a stochastic process.
(6)
where i is the imaginary unit and w the frequency. C ( t ) is the autocovariance function (ACVF) and F(w) the PS. The ACVF of a stationary stochastic process Y ( t )is C ( t ) = lim L S T f 2 [ Y ( t ’ )n [ Y ( t T+m
T
-TI2
+ t’) - n d t ’
(7)
where 3 is the mean value of Y. The ACVF of a random process gives the degree to which the values of the process at one time are dependent up on the values at another time. The ACVF is the time average employed to obtain PS (see esq 7 and 5). Instead of ACVF, its normalized value is often employed At) = C(t)/C(O) (8) which is called the autocorrelation function (ACF). C(0)-the ACV value for t = 0-is the variance of the process. The PS determined by using the ensemble average is given by (20)
where the operator E stands for the expected value over the ensemble and ZT(b)(u) is the Fourier transform of the kth replica of the stochastic process Y(t), which is nonzero only within the time period It1 < T. The expected value of a random variable ,$ is
where F ( [ ) is the distribution (i.e. cumulative) function of ,$.
1848
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
Both the introduced approaches (ensembleltime averages) are useful in this context. In fact the ensemble average allows us to obtain an analytical solution of PS of Y(t) as a function of statistical properties of the random quantities (h,m) and of the peak shape function whereas the time average will be the basis for its estimate from real chromatograms Y ( t ) . Fourier Approach to “Ensemble”Derivation of Chromatogram Power Spectrum. In this section the PS of a random multicomponent chromatogram is derived as a function of statistical chromatogram parameters. The mathematical development is highly technical. The reader interested only in the conclusions can now skip to eq 42 a t the end of this section. The general procedure employed here is the one used in describing random pulse sequences (20). However, the theoretical derivation developed here is specific in three main points: (1) Gaussian-like peak shapes, which never go to zero, are considered instead of rectangles or triangles. (2) Allowance is made for mutual overlapping among different pulses (component peaks). (3) Independence between peak heights and peak positions is introduced at the appropriate moment to make the derivation as simple as possible. In this way the possible extension of the theory is also more clearly focused. From a probabilistic point of view Y(t) is to be considered as built up by all the possible replicas of the same “chromatogram case”, all these repetitions having the same statistical properties, that is the same distribution of distances between subsequent peaks, the same peak height distribution, and eventually the same correlation between these quantities. In practice a single chromatographic case is one typical experimental random multicomponent chromatogram whereas replica must be considered as another similar chromatogram as far as the number and the type of peaks are concerned but having different peak locations. Such “limited in time” replicas, reported in Figure 2 as an example, are built up from 2 N 1 pulses within the limited time period (2N + 1)T, T being the mean time between subsequent pulses. In the kth representation of such a random process each pulse has the height hn(k)and the center mnfk). The shape of a single pulse is given by the deterministic peak shape function u ( t ) (see eqs 2 and 3). The kth representation of the chromatogram as one of the possible issues of Y(t) is
-I k-1
n
-I
I
I
I
-T
0
+1
i
I
-T
0
+T
-T
0
+T
+
V k ) ( t=)
N n=-N
h,(k)u(t,m,(k),a,7), k = 1, 2, ..., a
(11)
and it is thus a “random function of time” because it contains random quantities hnfk)and m,(k). The definition of the Fourier transform of the peak shape function (eq 2 or 3) located a t the origin (m = 0) g ( w ) = ~ ~ u ( f= ,O,u)e-iwt m dt
(13)
according to the general Fourier transform shifting properties (23). The Fourier transform of the kth “limited in time” issue of Y(t)-eq 11-can be built up remembering that this is a linear transformation (23)
IZN(k)(w)(2 = ZN(k)(w)Z N ( k ) ( , )
(16)
where the line above 2 means the complex conjugate. Writing eq 14 into eq 16 we have IZN(k)(U)12 =
and
E(IZN(k)(W)12) =
N
ZN(~)(U =)
In deriving eq 15 the total time T of eq 9 was equalized to (2N + l ) T which is exact when N is great enough. With reference to Figure 2, eq 15 means that two extensions are to be made. The first is the extension to +-m and --m on the time axis, corresponding to the limit on N quantity in eq 14. The second is the average of all possible k replicas. To obtain F ( w ) from eq 15, first we must calculate the expected value of IZJk)(u)I2
(12)
The Fourier transform of the peak shape function at its proper position m is g(w) e-rmw
Figure 2. Three limited in time replicas of the same chromatogram case with Poissonian interdistancedistribution and random peak height distribution belonging to the general ensembe Y ( t ) (a= 0.25,R, = 0.5, m = 25); (-) component positions.
h,ck)g(w) exp(-iwmn(k))
(14)
n=-N
The expression of the PS of Y(t) is obtained by introducing eq 14 into eq 9 n
N
N
c h,(k)hj(k)exp[-iw(m,(k) n=-N j=-N
g(w)g(w)E(
- mj(k))])
(18)
Taking into account that the expected value of the sum is equal to the sum of the expected values (24), and distinguishing the cases when n = j , eq 18 is written as
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
qn-j(w) = q,-,(-u) or C p ( w ) = qp(-o) N
N
k(w)12 C C E{hn(k)hJ(k) exp[-io(mn(k)- m J ( 9 ] ) (19) n=-Nj=-N n+J
Equation 19 is made up of two parts: the first depending on peak heights; the second on both peak heights and peak positions. This equation, which is very complex, can be simplified into separate terms only dependent on peak shape, peak heights, or peak interdistance properties, under the hypothesis of independencebetween height and position. The dependence on peak shape is, on the other hand, clear at this point; lg(w)I2-the PS of the peak shape-being a common factor as anticipated in the Introduction. The first term of eq 19 is transformed by recalling that the mean value of the function (5 - E ( [ ) ) 2called , the variance of the random variable E and denoted by D 2 ( [ ) ,is (24)
D2(5)=
1% -m
dF(t) = a t 2 )- E 2 ( f ) (20)
Denoting the mean value and the variance of h i k )by ah and uh2, respectively, and using eq 20 for the expected value of ( I z , , ( ~ ) ) ~we , have
E((hn(k))2} = E21h,'k') + D2(h,'k') = ah2 + Uh2
(22)
Use of symbol K(w) indicates that this quantity does not depend on peak position. The second part of eq 19 depends only on the difference n - j. That is why a new function qn-J is introduced referring to all the possible cases of time interdistances between component peak n and component peak j. qnpJ(u) = k(~)l~E(h,(~)h,(~) exp[-iw(mn(k)- mJck))]) (23) The above expression becomes relatively tractable by once again using the hypothesis that peak position m and peak height h are independent of one another. Using the property that
E(td = E(t)E(d
(28)
After writing eqs 22 and 26 into eq 19 and the latter into the general form of the PS of Y(t) (eq 15), we have
F(w) =
Let us now express the interdistance function qp as a function of distance between subsequent peaks. In order to do this the characteristic function method is used which is known to be able to solve many problems of sums of random variables (18,25). Since the position of a component peak is a random variable, the distance between two subsequent component peaks is also a random quantity pn(k)
= mn+l(k)- mn(k)
(24)
for independent 5 and q variables (25)and remembering that h, and hJmust have the same expected value (ah),we can write eq 23 as
In the cases when n # j the distance between the nth and j t h pulse is the sum of n - j - 1 random distances n-1
m,(k) - m,&) J = C p (k)
In these cases when n # j, the values in the double summation in eq 19 depend only on the difference between n and j. For this reason the double sum of eq 19 can be replaced by a single one where the index p means the difference n - j N
N
and the expected value is E(exp(io?p,(k))) = E(!? r=J exp[io~L,(~)]] (32) r=j
The characteristic function of a random variable [ is defined as the mean value of the function eitw (24). Denoting this function by 0,(w), we have ~ , ( w )=
E(e't") = 1:eicw
If the distances between component peaks are independent of one another and independent of retention time position, then the common characteristic function of p i k )can be denoted as 0 ( w ) . A t this point the hypothesis of constant or stationary density of detectable components along the chromatogram is introduced. Remembering the properties of expected values of products of independent random variables (eq 24), eq 32 can be written (18, 25) as n-1
&?3(exp[-i~pL,(~)]) = dn-j(-w)
r=J
2N
C C qn-J(w) = C ( 2 N + 1 - p)[q,(w) + qp(-w)l
n=-Nj=-N
p=l
(26)
nZJ
In the above expression we used exp[-io(mn(k)- mJ(k))]= exp[-i(-o)(mJ(k) - mn(k))] (27) From this it follows that
(34)
and eq 25 as n-1
qn-,(w)
= ah2k(W)12E(eXp[-iWC/dCL,'k']]= ah2k(w)I2On-'(-w) r=j
(35)
The real part (Re) of the characteristic function is even and the imaginary part (Im) is odd (26). For this reason qp(w)
+ qp(-w) = Re q P b ) + i Im qp(w) + Re qp(o)iImq,(w) = 2Reqp(o) (36)
Red(w) = ReO(-w)
nfJ
N
(33)
and
C E(hn(k)hJ(k) exp[-iw(rn,@) - m J 9 ] }= n=-Nj=-N
N
(31)
r=j
qn-J(u)= b(w)12ah2EIexp[-iw(m,(k) - mJ(k))]) (25)
k(o)12 C
(30)
(21)
In deriving eq 21 the index n is neglected because heights and positions are supposed to be independent. Now the first part of eq 19 can be written as
K ( o ) = ( 2 N + 1)k(o)l2(ah2+ ah2)
1849
(37)
Writing eqs 22 and 35 into eq 29 and taking into account eqs 36 and 37, we have
1650
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
Table I. Some Distribution Functions, Their Characteristic Functions, and Term No. 3 of Equation 42 S(0)
2Re-
1 - O(w)
Exponential 1 1 - iwT
1 7 exp(-t/T)
0
(0 I t )
Uniform 2 sin j:)eib/2 bw
2
bw sin (bw)
Gamma 1 (1 - iw)P
5
p T
By use of the properties of the geometric progression (19,27), the limit of the sum of the series in eq 38 is [
O(w)
if w
+2
2Re[k = l ( k' ) ( - i ~ ) ~ r
Delta Function b(t - r )
+ 2 cos (bw) - 2
b2w2 - 2bw sin(bw) - 2 cos (bw)
s(u
-
F) -I
is centered around its mean value when calculating the ACVF, then lim C ( t ) = 0
>0
(43)
t-m
since Equation 38a is the well-known classical limit of the geometric series, under the hypothesis that lO(w)l < 1. The latter condition holds true for characteristic functions in the domain of w > 0 (see eq 31 and ref 24). Equation 38b is the limit for the divergent case of geometric series when w = 0, since lO(0)l = 1 (24),and it is obtained by using a property of the Dirac function calculus (28) (40)
and noting that for w, = 0 (24) (d(0) =
F(w) =
k(w)IZ T
t-m
( m = )
+
lim E ( Y ( ~~? ( tt ? ) =
t'-m
P
(44)
and the Dirac function can drop out in eq 42 (20). Power Spectrum of Typical Models of Multicomponent Chromatograms. Term no. 1 in eq 42 is the PS of the unit height peak shape located a t the origin. If the EMGF is taken as the peak shape (see eq 3), its Fourier transform can easily be calculated by using the convolution theorem of the Fourier transform, which says that convolution in time domain is equivalent to multiplication in frequency domain e-w2$/2
g(w) = (27r)wJ-
iT
where T-the mean interdistance between subsequent peaks-is the first moment of the distribution whose p(w) is the characteristic function. p'(0) is the first derivative of p(w) at w = 0. The PS of the stationary uncorrelated multicomponent chromatogram is thus as follows 2ah2
lim ~ ( t=)~
+ 1 + 2Re-
It is important to note that the above expression is composed of three main parts. It is determined by (1) the PS of the single-component peak (lg(w)I2),(2) the relative dispersion of the single-component peak heights (bh/ah), and (3) the distribution of the peak positions (retention times) along the time axis through the characteristic function, cp(w), of distances between subsequent peaks. T and ah are the mean peak interdistance and the mean height of the component peaks. The model is called "stationary" and "uncorrelated" because it is supposed that both the peak shape and the peak interdistance distribution are constant over the chromatographic space and that no correlation exists between peak height and peak position. The presence of the Dirac function at zero frequency derives from the fact that ACVF does not reach zero as t m (20). If the chromatogram, considered as a stationary time series,
-
1
+ iw7
(45)
As 7 tends toward zero, the asymmetry of u ( t ) decreases, and when T = 0, we have a symmetric Gaussian function. The PS of the EMGF is thus e-w202
k ( w ) ( 2 = 27r2-
1+ w272
(46)
Term no. 2 (eq 42) is easily calculated once the component peak heights distribution has been assumed. For example, for exponential or constant random distribution (rh2/ah2 amounts to 1and 0.33, respectively. Only for constant peak height is this term zero. Term no. 3 (eq 42), for various distance distributions, is presented in Table I. First of all it must be observed that a continuous interdistance distribution results in a continuous contribution to PS. Among the three continuous distributions, the exponential distribution which corresponds to the Poissonian case (24) is the most usually referred to (1,2). In this important case term no. 3 in eq 42 is simply zero (see Table I). In effect the Poissonian distribution is to be considered as the fundamental interdistance distribution because it is the limit distribution whenever a given sequence of retention times is the result of superimposition of a great number of uncorrelated elementary subsequences (ref 19, page 370). In this case provided that N is great enough. In eq 47 T i s the (47)
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
1851
1
fa)
Y 0.8 0.6
0.4
sto,nttttg n n t t t t t t t l t 0
, \ ; ;
5
tttrtt!
tttfttt
io
m
- -1
0.2
0 0
5
Figure 4. Simulated chromatogram with deterministic interdistance( T ) and random peak height distribution. rn = 20; T = 4u.
Figure 3. (a) Example of Poissonian sequence generation, Slotby superimposltlon of elementary subsequences (S,)having uncorrelated phases: m,retention time axis; p , interdistance between subsequent retention times. (b) Interdistancefrequency function of the Poissonian sequence: (0)experimental frequency function data of the case reported in part a; (- - -) theoretical frequency function plot, f ( p ) = 110.28 exp(-p/0.28), of the same case.
time constant of the resulting exponentialdistance distribution in the total sequence and T, the mean distance in the kth elementary subsequence. It is important to note that nothing is said about the type of distance distribution in elementary subsequences. The only restriction is that the elementary subsequence phases are to be uncorrelated. Therefore under these circumstances Poissonian distribution plays the same central role played by normal distribution for sums of random variables. Such limit conditions are most likely fulfilled by many complex chemical mixtures of environmental or natural origin. In fact, they are often composed of many homologous series or of many recursively correlated subsets of chemical compounds the retention times of which constitute an elementary subsequence of single-component detectable peaks. Figure 3a shows these superimposition effects generating a Poissonian sequence with loss of memory. In this case: T1 = 0.7, T2= 1.5, T3= 2.4, and T4= 1, and by applying eq 47, we obtain 1 / T = 110.7 111.5 + 112.5 1 = 110.28. The “experimental”result is T = 0.26. In Figure 3b it can be seen that the experimental interdistance frequency function data are not too far from exponential frequency function with theoretical time constant equal to 0.28. A full explanation of this point lies beyond the aim of the present paper. We can obtain the PS of a multicomponent Poissonian chromatogram when writing eq 46 into eq 42, taking the term no. 3 in Table I for exponential distribution, and neglecting the delta function as explained above
+
+
It can now be seen that the PS of the total multicomponent chromatogram is congruent with the PS of the single component peak (see eqs 46 and 48). This fact, which is a highly distinct property, is the direct consequence of the assumed
Poissonian model. It is worth noting that this, too, depends on the mutual overlapping allowance between peaks. In fact, this congruency property is not exhibited for nonoverlapping Poissonian pulse sequences (20). By using the Wiener-Khinchin theorem (eqs 5 and 6) and the well-known properties of Fourier transform inversion, it is possible to obtain the ACVF in the case of Gaussian or EMGF peak shape and Poissonian interdistances distribution. The ACVF of a random chromatogram containing EMGF type peaks is
The ACVF is much simpler for Gaussian peak shapes
The same procedure can be applied in obtaining the analytical expression of the PS and ACVF for chromatograms different from the Poissonian. What is most interesting in this context is not the detailed analytical expression but the general shape and how information regarding chromatographic structure is contained within it. In the case of Poissonian retention time distribution both the PS and ACVF are dominated by a negative exponential term and go to zero very quickly with an increase in u or t (see eqs 48 and 50). In the case of Gaussian peak shapes, the fact that PS or ACVF are reduced to lfl,,, of,their maximum values for o = 2.15/a or t = 4 . 3 is ~ easily calculated. Physically this means that in Poissonian chromatograms the correlation between peak positions is significant only at these short distances. It must be recalled that both the PS and ACVF here derived refer to an infinitely long chromatogram having the same statistical properties along the time axis. The case of finite real chromatograms will be considered in a forthcoming paper. Let us now see how differently structured chromatograms appear under the approach developed here. The chromatogram reported in Figure 4 has a very different retention time distribution structure from those reported in Figure 2. Even a beginner in chromatography will easily find that the interdistances between peaks are constant and about 4a, the only randomness being in peak height. In this case the PS
1852
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
and ACVF appear quite different from the above-described shapes. They are obtained by introducing the proper term, no. 3 in eq 42. Under the hypothesis of Gaussian peak shape we have 4ra2ah2 F ( w ) = -exp(-w2u2)[ah2/ah2 6(w - n r / T ) ]
+
T
(51)
n = 0, 1, ..., and by applying the Wiener-Khinchin theorem (eq 6) 7r'/2uah2 m = -[uh2/ah2 + 11 exp[-(t - nr)2/4a2]
c(t)
T
n=O
(52) where Tis now the constant repeated position. It can be seen that decreasing and repeated delta pulses-corresponding to constantly repeated peak positions-are found in PS whereas ACVF, built up to repeated Gaussian functions, never goes to zero. Both PS and ACVF are thus significantly different from the above-described Poissonian case. In a mixed case, that is one which is the sum of deterministic and random components, the PS will appear as a combination of the above-described cases; that is, both a continuous and a discrete part will be present. This case will not be enlarged upon here. We shall, instead, turn our attention to how the useful information contained in the PS described above can be referred to measurable chromatographic quantities. CHROMATOGRAPHIC PS EXPRESSION Equation 48 must be transformed in a form useful for determining statistical properties from real Poissonian chromatograms. First of all, ah2 in the preexponential term can be expressed as a function of the total area of the chromatogram and the component number m. In fact, the area of component peak i from both Gaussian and EMGF cases is
Ai = (2r)'i2ahi
(53)
and the total area of the chromatogram rn
AT
= (2~)'/~~Chi i=l
(54)
As ah is the mean height of the component peaks i r n
(55) Equation 54 can be written as AT = (2r)'i2~mah (56) Assuming that X is the total time of the considered chromatogram, the mean distance T between two subsequent single component peaks is T = X/m (57) Writing eqs 56 and 57 into eqs 48, 49, and 50, we get
which are respectively the general expression of PS and of
ACVF for EMGF and Gaussian peak shape function. Equations 5 "we the basic equations for a method which evaluates statistical properties of an uncorrelated Poissonian multicomponent chromatogram. In fact, one can see that if the term ( q , / a h ) is known and if F(w) (or c(t)), AT and x are measured, not only can the component number m be determined, but also the single peak properties, u and 7,can be evaluated by simple nonlinear fitting of experimental PS to eq 58 (or ACVF to eqs 59 and 60). It can be noted that Matin and Guiochon previously suggested the use of chromatogram areas to analyze multicomponent chromatograms (2). In the present case of full and more general solution is derived. In fact, the attained degree of fitting to eq 60 can also indicate whether the supposed model should be accepted or rejected; in addition a comparison of the u and T values obtained from the experimental multicomponent chromatogram with those determined by injecting single components could further confirm the hypothesized model. One can easily see the possible concrete applications for this approach in precisely evaluating the performed separation by computing N , or y, in revealing overloading or unsuspected tailing effects, column drift, etc. Now the major problem is how to obtain an unbiased experimental determination of PS (or ACVF) and of (crh/ah). This will be discussed in a subsequent paper. CONCLUSIONS The Fourier analysis approach to studying multicomponent uncorrelated and stationary chromatograms outlined here appears superior to previously reported statistical descriptions because it can take into account both peak height dispersion and peak interdistance distribution. The general method here developed not only can generate a great variety of specific models for uncorrelated chromatogramsbut, in principle, also can be further extended to handle correlated cases. Because PS (or ACVF) experimental measurement is a standard technique of time series spectral analysis (29,301,the approach here developed promises, when applied in practice, to give much greater information than that which could be attained with previously methods. _ presented GLOSSARY autocovariance function autocorrelation function total area of the multicomponent chromatogram mean value of single component peak height mean value of peak maxima in the multicomponent chromatogram parameter of the uniform distribution (see Table 1) autocovariance function value at time t variance of the random variable [ expected value of the quantity exponentially modified Gaussian function PS value at frequency w distribution function of the random variable 5 frequency function of the random variable I , Figure 3b Fourier transform of the unitary peak shape located at the origin single component peak height peak height of the nth single component in the kth representation of the chromatogram as stochastic process part of the power spectrum not dependent on peak position distribution, eq 22 imaginary part single-componentretention time expressed as first moment as above but referred to the nth single component or at the nth pulse (eq 1) as above but referred to nth component in the kth representation of the Chromatogram as stochastic process
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
number of single component peaks present in a given mixture peak capacity computed a t a given resolution number of peaks computed at a given RB value parameter of the gamma function (see Table I) power spectrum part of the power spectrum dependent on retention time distributions referring to all the possible interdistances between components n and j , eq 23 real part chromatographic resolution limited time range over infinite time axis, eq 9 mean value of interdistance between subsequent single-component peaks as above but referring to the vth subsequence building up a total random sequence of peak positions in the multicomponent chromatogram, eq 45 time axis peak shape function of parameters m, u, T time range of the multicomponent chromatogram over which the power spectrum is computed multicomponent chromatogram as a function of time kth representation of the multicomponent chromatogram belonging to the general ensemble
YW
mean value of the chromatographic response general ensemble of multicomponent chromatogram as a stochastic process Fourier transform of the k-th representation of the chromatogram as stochastic process, within the time T saturation factor ( = m / N c ) Dirac function characteristic function of the random variable E, eq 33 gamma function (27) separation extent ( = p / m ) random variable distance between subsequent peaks as above, but referred to nth and (n - 0 t h peaks as above, but referred to kth representation of the chromatogram as stochastic process. autocorrelation function value at time t standard deviation of the Gaussian form of the single component peak shape function or the Gaussian part of the standard deviation of the EMGF
gh
OM 7 W
1853
standard deviation of the single-component peak heights standard deviation of the peak maxima time constant of the exponentially modified Gaussian function frequency LITERATURE CITED
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30)
Davis, J. M.; W i n g s , J. C. Anal. Chem. 1983, 55, 418. Martln, M.; Gulochon, G. Anal. Chem. 1985, 5 7 , 289. Davls. J. M.; Giddings, J. C. J . Chromafcgr. 1984, 289, 277. Davis, J. M.; Giddlngs, J. C. Anal. C h m . 1985, 5 7 , 2168. Davls, J. M.; Giddlngs, J. C. Anal. Chem. 1985, 5 7 , 2178. Dondl, F.; Kahle, Y. D.; Lodl, G.; Remelll, M.; Reschigllan, P.; Bighi, C. Anal. Chim. Acfa 1988, 191, 261. Herman, D. P.; Gonnord, M. F.; Guiochon, G. AMI. Chem. 1984, 5 6 , 995. Martin, M.; Herman, D. P.; Gulochon. 0. Anal. Chem. 1988, 58. 2200. El Fallah. M. 2.; Martin, M. chfO~fogK8phk1987, 2 4 , 115. Creten, W. L.; Nagels, L. J. Anal. Chem. 1987, 59, 822. Davis, J. M. J . Chromafogr. 1988. 449. 41. Coppi, S.; Betti, A.; Dondi, F. Anal. Chim. Acfa 1988, 212, 165. Kelly, P. C.; Harris, W. E. Anel. Chem. 1971. 43, 1170. Kelly, P. C.; Harris. W. E. AMI. Chem. 1971. 43, 1184. Wakao, N.; Tanaka, K. J . Chem. Eng. Jpn. 1973, 4 , 338. Malczewski, M. L.; Grushka, E. J . Chromafogr. Scl. 1981, 19, 187. Smlt, H. C.; Walg, H. L. Chromatogrephle 1975, 8 , 311. Dondi, F.; Remelll, M. J . Phys. Chem. 1988, 90, 1885. Feller, W. An Infroducfbn to Rob8bilw Theory end I f s Applicetions, 2nd ed.;John Wlley 8 Sons: New York, 1971; Vol. 11, pp 370, 476, 625. Lgvine, 8. Fond8menfs W q u e s de le Redbfechnlque Stafistique; Edltlons Mlr: Moscou, 1973; Vol 1. Chapter 11. Foley, J. P.; Dorsey, J. 0. J . Chromafogr.Sci. 1984, 22, 40. Middieton, D. An Infroducfbn to Sfatisficel Communicefbn Theory; McGraw-Hill: New York, 1960; p 141. Bracewell, R. N. The Fourier Transform and Its AppiicatlonJ; MdirewHill: New York, 1986. Cram&, H. Mathemaficel Methods of Stafisflcs;Princeton University Press: Princeton, NJ, 1974. Feller, W. An. Infroduction to Rob8bllfty Theoty and Ifs Applicafbn; John Wlley & Sons: New York, 1968; Vol. 1, p 222. fitlvier, M. Nofions Fondamenfeks de la th6ode des pmbabitif6sI 2nd ed.; Dunod: Paris, 1979 Chapter 5. Abramowltz, M.; Segun, 1. A. Handbook of Mathematical Functions; Dover Publlcatlons: New York, 1965. Zeldovltch, I.; Mychkls. A. Ekmefs de mafhemafiqueseppffq~eas;Edltlons Mir: Moscou 1974 p 191. Jenkins, G. M.; Watts, D. G. Spectral AMI)& and I f s Appiiceflons; Holden-Day: San Francisco, CA, 1968. Massart. D. L.; Vandeglnste, B. G. M.; Deming, S. N.; Michotte, Y.; Kaufman, L. Chemomefrlcs: a Textbwk; Elsevier: Amsterdam, 1988 Chapters 14 and 15.
RECEIVED for review December 21, 1989. Accepted May 3, 1990. This work was made possible by the finantial support of the Italian Ministry of Public Education (MPI), the Italian Research Council (CNR), and the Hungarian Academy of Science.