Extension of Statistical Overlap Theory to Poorly Resolved

The theory shows that the average resolution required to separate single-component peaks varies with ... Joe M. Davis, Matevz Pompe, and Clint Samuel...
0 downloads 0 Views 260KB Size
Anal. Chem. 1997, 69, 3796-3805

Extension of Statistical Overlap Theory to Poorly Resolved Separations Joe M. Davis

Department of Chemistry and Biochemistry, Southern Illinois University at Carbondale, Carbondale, Illinois 62901-4409

A general theory of statistical overlap is developed that enables prediction of the number of observable peaks in separations, even when the peak capacity is much less than the number of single-component peaks requiring separation. The theory is applicable to any separation in which the distribution of intervals between adjacent singlecomponent peaks is governed by homogeneous statistics. The theory shows that the average resolution required to separate single-component peaks varies with saturation. This variation occurs because multicomponent peaks, and not single-component peaks, overlap as separation efficiency decreases. Previously reported equations relating the average number of observable peaks to the average number of single-component peaks are shown to be good even when the former is only 10-15% or so of the latter, as long as this variation is addressed. The theory is verified further by its application to experimental gas chromatograms. This paper consequently represents a significant advance in the ability to describe overlap in separations by statistical means. This paper describes means for correcting a major shortcoming of statistical theories of overlap based on point processes: the inability of theory to describe overlap in separations of low efficiency. Earlier work has shown that statistical theory correctly predicts the number of observable peaks in simple models of separations when the number of single-component peaks and peak capacity are known.1,2 A single-component peak (SCP) is a Gaussian-like concentration profile generated by a pure chemical species. In the model separation, the positions of SCP centers are represented by points (hence the earlier characterization, “point process”), and adjacent SCPs are postulated to overlap if the interval between the points representing them is less than the average bandwidth of the SCPs. However, actual separations are much more complicated than the model system. For example, overlap theory based on Poisson statistics underestimates the number of observable peaks in real separations when overlap is severe,3,4 although its predictions are correct when overlap is slight.3-5 This underestimation occurs because overlap theory is based on the concept of resolution and resolution is defined only for two SCPs. Resolution is not useful in describing the overlap of three or more SCPs and, consequently, is not useful in describing (1) Davis, J. M. Ph.D. Dissertation, University of Utah, Salt Lake City, UT, 1985. (2) Pietrogrande, M. C.; Dondi, F.; Felinger, A.; Davis, J. M. J. Chemom. Intell. Lab. Syst. 1995, 28, 239. (3) Herman, D. P.; Gonnord, M. F.; Guiochon, G. Anal. Chem. 1984, 56, 995. (4) Delinger, S. L.; Davis, J. M. Anal. Chem. 1990, 62, 436. (5) Davis, J. M. J. Chromatogr. 1988, 449, 41.

3796 Analytical Chemistry, Vol. 69, No. 18, September 15, 1997

severe overlap. If overlap is slight, then peaks in separations are either singlets or doublets. Because the formation of doublets by two overlapping SCPs is described by resolution, overlap theory works under these conditions. However, if overlap is severe, then some peaks in separations are triplets or more complicated peaks, and the formation of these peaks by three or more overlapping SCPs cannot be described by resolution. For example, a triplet peak formed by the overlap of the three ordered SCPs, A, B, and C, cannot be described by the overlap of A and B alone and of B and C alone. In reality, C does not overlap with B alone but with the composite profile formed by the overlap of A and B. Similarly, A does not overlap with B alone but with the composite profile formed by the overlap of C and B. Under these conditions, overlap theory breaks down because the pairwise interactions between SCPs represented by resolution simply do not occur. Although this problem is evident, the modification of overlap theory required to correct it was developed only recently.6 This modification shows that the number of peaks in separations is determined not by the resolution of SCPs but by the resolution of multiplets. A multiplet is defined as a concentration profile having only one maximum but containing any number of SCPs. Thus, singlets, doublets, triplets, etc. are all multiplets. In fact, multiplets are identical to observable peaks. It is easy to understand why the resolution of multiplets is important. Imagine first a separation of infinite efficiency, in which all SCPs are resolved and infinitely narrow. As efficiency decreases, some adjacent SCPs overlap to form doublets. As efficiency decreases further, however, some doublets overlap with adjacent SCPs to form triplets, some doublets overlap with adjacent doublets to form quartets, etc. If the degradation of efficiency were now stopped and efficiency were increased, then these processes would reverse; specifically, the quartets would resolve into two doublets, the triplets would resolve into singlets and doublets, and, ultimately, the doublets would resolve into two singlets. In each of these cases, two observable peaks would be generated from one observable peak, and one could speak of their resolution. But the resolution in question is the resolution of multiplets, not SCPs. This resolution is defined here as the apparent resolution, Rsa, of multiplets and only equals the resolution, Rs, of SCPs if the one observable peak is a doublet. Unsurprisingly, values of Rsa differ from those of Rs. The correct prediction of peak number by overlap theory requires that one relate the apparent resolution Rsa of adjacent multiplets to the resolution Rs of the two SCPssone in each multipletsthat actually are separated. The apparent resolution Rsa is important, because it determines the observable peak (6) Davis, J. M. Chromatographia 1997, 44, 81. S0003-2700(97)00139-X CCC: $14.00

© 1997 American Chemical Society

number. The resolution Rs also is important, because overlap theory is described relative to it. In ref 6, such a relationship was developed and used to modify the most familiar overlap theory based on the assumption that the positions of SCPs are determined by homogeneous Poisson statistics. This paper shows the relationship is general and applies to any homogeneous statistical distribution of SCP positions. By this relationship, the probability density function for the resolution Rs of separated SCPs in adjacent multiplets can be calculated. The average of this function is R* s, the average resolution required for separation. Theory shows that R* s varies with saturation (defined below), the most important parameter in statistical overlap theory. In contrast, previous work has postulated that R* s is independent of saturation. This postulate is erroneous; indeed, it is this error that causes point process statistical theories to break down when overlap is severe. THEORY The theory developed in ref 6 is presented here. Consider a complex multicomponent separation spanning a region X. Although the positions of all SCPs in X are determined by welldefined physicochemical attributes (e.g., free energy differences), the mixture complexity is so large that this deterministic information is neither attainable nor available. Thus, SCP positions are postulated to be governed by a statistical distribution that determines the average number, m j , of SCPs in X and the average number, p, of observable peaks in X. This representation of deterministic information by a statistical distribution is not unusual; for example, a statistical distribution is used to describe the outcomes of a coin toss, even though the physics of spinning coins is well understood. Distribution of Intervals between SCPs. Overlap theory predicts that the average number p of observable peaks in X is2

p)m j p1

(1)

where p1 is the probability that adjacent SCPs are resolved,

p1 ≡ p1(R) )

∫ h(z) dz ∞

xo

(2)

In eq 2, h(z) dz is the probability that the interval between adjacent SCPs lies between z and z + dz, and h(z) is the probability density function (pdf) for the distribution of intervals between adjacent SCPs. This pdf governs the statistical distribution of SCP positions and differs for different statistics types; for example, in ref 6, the statistics type was Poisson. Equation 2 states that adjacent SCPs are resolved if the interval between them exceeds span xo; this span equals 4σavR* s, where R* s is the average minimum resolution required for separation and σav is the average standard deviation of the two SCPs.7 Here, one will assume that σav is constant throughout the separation, although its systematic variation in the separation can be addressed.8 This constancy has been shown to be a good approximation in programmed chromatographic separations, in which the standard deviations of all SCPs are roughly constant.7 Equation 2 shows that p1 depends on the saturation, R, of the separation. Parameter R is defined as the expected number m j of (7) Davis, J. M.; Giddings, J. C. Anal. Chem. 1983, 55, 418. (8) Davis, J. M. J. Microcolumn Sep. 1997, 9, 193.

SCPs in X, divided by the peak capacity nc of X:7

R)m j /nc; nc ) X/xo ) X/(4σavR*s )

(3)

The peak capacity simply is the maximum number of SCPs of width xo that can be placed in X with all adjacent SCPs resolved by resolutions equal to R* s. Distribution of Resolutions Required for Separation. Equations 1-3 characterize the interval distribution between adjacent SCPs and are not new.2,7 In addition, one must determine the distribution of resolutions Rs required for separation.6 One has a distribution, because the minimum Rs required to separate two SCPs varies with the ratio of their amplitudes, and this ratio varies for different pairs of adjacent SCPs in the separation. Definition of g1,1(Rs). Consider first the distribution of the minimum resolution required to separate a doublet into its two constituent SCPs. For two SCPs having nearly equal amplitudes, this minimum Rs is small; for two SCPs having very different amplitudes, this minimum Rs is large. For a statistically large number of SCP pairs, the minimum resolution has a distribution of values. Let g1,1(Rs) equal this distribution (or pdf); it is defined such that the product g1,1(Rs) dRs is the probability that the minimum resolution Rs required for separating the two SCPs in a doublet into two distinct maxima lies between Rs and Rs + dRs. Figure 1a is a graph of g1,1(Rs) vs Rs for Gaussian SCPs having equal standard deviations and exponentially distributed amplitudes (i.e., the amplitudes follow an exponential distribution, as they commonly do in complex mixtures of natural origin9,10 ). This graph can be determined numerically. Unsurprisingly, resolutions less than 0.5 are inadequate for separation, whereas resolutions greater than 1.2 are large enough to separate virtually any pair of adjacent SCPs into two maxima. Definition of gik(Rs). The pdf g1,1(Rs) governs only the minimum resolution needed to separate doublets into two distinct maxima. Other closely related pdfs govern the minimum resolution needed to separate more complicated observable peaks into two simpler multiplets, e.g., a triplet into a singlet and doublet, a quartet into two doublets, etc. All such pdfs influence the observable peak number and must be considered in overlap theory. In addition, the importance of each distribution changes with the amount of overlap. For example, g1,1(Rs) is important when overlap is slight, because most overlapping peaks are doublets. In contrast, it is not important when overlap is very severe, because doublets are rare and most overlapping peaks are triplets or more complicated multiplets. To define these other pdfs, first consider i + k contiguous SCPs that overlap and form one observable peak. Consider further that separation efficiency is increased until two adjacent multiplets (observable peaks) are generated from these SCPs; one multiplet contains i SCPs and the other k SCPs. Now define gik(Rs) as the pdf for the distribution of the minimum resolution required to separate the two adjacent SCPssone in each multipletsthat actually are separated. The meaning of gik(Rs) is similar to that for g1,1(Rs), i.e., the product gik(Rs) dRs is the probability that the minimum resolution Rs required to separate these SCPs into members of two adjacent multiplets lies between Rs and Rs + dRs. (9) Nagels, L. J.; Creten, W. L.; Vanpeperstraete, P. M. Anal. Chem. 1983, 55, 216. (10) Dondi, F.; Kahie, Y. D.; Lodi, G.; Remelli, M.; Reschiglian, P.; Bighi, C. Anal. Chim. Acta 1986, 191, 261.

Analytical Chemistry, Vol. 69, No. 18, September 15, 1997

3797

Figure 1. (a) Distribution g1,1(Rs) vs Rs for two SCPs having equal standard deviations σ ) σav and exponentially distributed amplitudes. (b) Interval distribution between centers of i and k SCPs composing two adjacent multiplets. (c) Model of adjacent multiplets. In the upper graph, i and k SCPs having equal amplitudes and standard deviations σ overlap to form two adjacent multiplets shown in the lower graph. The mean interval between SCPs known to overlap is κxo, the resolved SCPs A and B are separated by ∆x, the multiplet maxima are separated by ∆xa, and the multiplet standard deviations are σi and σk. (d) Relationship between κxo, ∆xa, and ∆x. (e) Illustration of synthesis of g(Rs) by eq 5. (f) Graphs of various pdfs h(z). The left-hand figure is a graph of the Γh(z) for different parameters pˆ ’s (pˆ ) 1 corresponds to the exponential h(z)); the right-hand figure is a graph of the uniform h(z) and the normal h(z) for various RSDs. Graphs a, c, d, and e are reprinted with permission from ref 6.

Some examples may clarify these concepts. Consider that the three ordered SCPs, A, B, and C, are resolved into the singlet A and the doublet BC. Here, gik(Rs) ≡ g1,2(Rs), and the resolution Rs is between A and B. Similarly, consider that the four ordered SCPs, A, B, C, and D, are resolved into the doublet AB and the doublet CD. Here, gik(Rs) ≡ g2,2(Rs), and the resolution Rs is between B and C. It now is apparent that g1,1(Rs) is simply a specific case of gik(Rs) where i ) k ) 1. Definition of g(Rs). Finally, let g(Rs) equal the pdf for the distribution of the minimum resolution needed to separate adjacent SCPs into members of two adjacent multiplets for all pairs of adjacent SCPs in the entire separation. Its meaning is identical to that of gik(Rs). Unlike gik(Rs), however, g(Rs) contains information about all multiplet pairs, regardless of specific values of i and k. The pdf g(Rs) is very important, because it determines the average resolution R* s needed to separate adjacent SCPs into 3798

Analytical Chemistry, Vol. 69, No. 18, September 15, 1997

members of two adjacent multiplets,

R*s )

∫R ∞

0

s

g(Rs) dRs

(4)

which in turn determines saturation R and average peak number p, in accordance with eqs 1-3. In ref 6, it was proposed that g(Rs) could be approximated by a weighted sum of the distributions gik(Rs):

g(Rs) ) p1,1g1,1(Rs) + p1,2g1,2(Rs) + p1,3g1,3(Rs) + ... + ∞

p2,2g2,2(Rs) + p2,3g2,3(Rs) + ... )



∑∑ p

ikgik(Rs)

(5)

i)1 k)i

where pik is the probability of forming two adjacent multiplets containing i and k SCPs. Equation 5 states that each gik(Rs)

contributes to g(Rs) in accordance with the likelihood that adjacent multiplets (i.e., observable peaks) in the separation contain i and k SCPs. Thus, if pik and gik(Rs) can be calculated, then eqs 1-5 determine the average number p of observable peaks in the separation. Calculation of pik. The probability pik is calculated from the point process statistics that determines the distribution of intervals between adjacent SCPs. Consider two adjacent multiplets, with one multiplet containing i SCPs and the other containing k SCPs. The points representing the SCP centers in these multiplets are shown in Figure 1b. In the multiplet containing i SCPs, the i 1 intervals between the SCP centers are insufficient for separation, since only one maximum is generated. Similarly, in the multiplet containing k SCPs, the k - 1 intervals between the SCP centers are insufficient for separation. Thus, i + k - 2 intervals are insufficient for separation. In contrast, three intervals in this distribution of intervals are sufficient for separation: the one between the left-most SCP in the multiplet on the left and the SCP to its immediate left, the one between the right-most SCP in the multiplet on the right and the SCP to its immediate right, and the one that actually separates the SCPs in the two multiplets (see Figure 1b). It is important to account for the first two intervals as well as the one between the multiplets; otherwise, the multiplets would contain more than i and k SCPs. The probability that an interval is sufficient for separation is p1, where p1 is defined by eq 2; the probability that it is insufficient for separation is 1 - p1. Furthermore, the intervals between adjacent SCP centers are postulated to be independent. Consequently, one easily can show that the average number Pik of occurrences of this interval distribution in an interval X containing m j homogeneously distributed SCPs is

j p13(1 - p1)i+k-2 Pik ) m

(6)

The probability pik of this outcome is simply the ratio of Pik to the summation of all possible Pik’s:

pik ) pik(R) )

(1 - p1)i+k-2

Pik ∞

)



∑∑ P j)1 n)j



1

j)1 n)j

p1(1 - p1)i+k-2 )







∑∑(1 - p )

j+n-2

jn

, i g 1; k g i

In ref 6, the author assumed that gik(Rs) could be approximated by g1,1(Rs), provided it was appropriately shifted and scaled along the Rs axis. Figure 1c illustrates the model of two adjacent multiplets used to calculate this shift. In the model, i and k Gaussian SCPs having equal amplitudes and standard deviations σ overlap to form two multiplets. The upper part of the figure shows the individual overlapping SCPs; the lower part of the figure shows the two multiplets formed by this overlap. The baselines of both SCPs and multiplets are shown to emphasize that these figures are not simulated separations. One postulates that all adjacent overlapping SCPs are separated by the average interval κxo between adjacent SCPs that are known to overlap, where κ < 1 (κ is evaluated below), and, as stated earlier, xo is the average minimum interval required for separation. In Figure 1c, the resolved SCPs A and B are separated by interval ∆x, the multiplet maxima are separated by ∆xa, and the standard deviations of the multiplets containing i and k SCPs are σi and σk, respectively. ik The resolution Rik s of A and B is defined as Rs ) ∆x/4σav ) ∆x/4σ, since σ is constant. Similarly, the apparent resolution ik ik Rsa of the two multiplets is defined here as Rsa ) ∆xa/[4(σi + σk)/2], where (σi + σk)/2 is the average standard deviation of the two multiplets. The superscript ik is used to simply identify the equations that follow with adjacent multiplets containing i and k SCPs. These definitions determine the simple algebraic relationship

Rik s ik Rsa

)

∆x(σi + σk)/2 ∆xaσ

(8)

Relationship between ∆x and ∆xa. The interval ∆xa between adjacent multiplets is determined straightforwardly. Because the SCPs in either multiplet have equal amplitudes, the multiplet maximum is located at the center of the overlapping SCPs. For the left-most multiplet containing i SCPs, the interval between the first and last SCP centers in the multiplet is (i - 1)κxo. Consequently, the interval between the maximum of the left-most multiplet and the SCP center A shown in Figure 1d is (i - 1)κxo/2. By a similar argument, the interval between the maximum of the right-most multiplet and the SCP center B is (k - 1)κxo/2, also as shown in Figure 1d. Because the interval between the SCP centers A and B is ∆x, the interval ∆xa between adjacent multiplets is

(7) ∆xa ) ∆x + (i + k - 2)κxo/2

(1 - p1)j

(9)

j)0,2,4,...

where j and n are dummy summation indexes. Equation 7 depends on R, because p1 depends on R. Thus, the weight of each gik(Rs) in eq 5 changes with the amount of overlap, as was argued earlier. Calculation of gik(Rs). The calculation of the functions gik(Rs) is a difficult problem unless i ) k ) 1, because of the various combinations of SCP amplitudes, positions, etc. by which multiplets containing i and k SCPs can be formed. All such combinations affect the value of gik(Rs). Regardless of these combinations, however, all gik(Rs)’s qualitatively resemble g1,1(Rs) in Figure 1a. In other words, they are skewed bell-shaped curves that span a narrow range of Rs values, with each gik(Rs) reaching a maximum at some Rs.

Relationship between σ and σi, σk. It is a simple matter to express the concentration profile of the multiplet containing i SCPs in Figure 1c in terms of the concentration profiles of the i equally spaced Gaussian SCPs composing it. The second moment, σi2, about the mean of the multiplet concentration profile can be calculated from this profile to be

σi2 ) σ2 + (κxo)2(i2 - 1)/12

(10)

σi ) xσ2 + (κxo)2(i2 - 1)/12

(10b)

and thus

Analytical Chemistry, Vol. 69, No. 18, September 15, 1997

3799

The standard deviation, σk, of the multiplet containing k SCPs is equal to eq 10b, with k substituted for i. Equation for Shift of g1,1(Rs). When ∆x ) xo, the values of ik ik* ik* Rik s and Rsa equal their average values, Rs and Rsa , the average minimum resolutions needed for separation. It is these values that are relevant to overlap theory. The substitution of them, of the condition ∆x ) xo, and of eqs 9 and 10 into eq 8 leads to the expression -1 ik* Rik* × s ) Rsa (1 + (i + k - 2)κ/2)

{x

2 2 1 + (4κRik* s ) (i - 1)/12 +

}

2 2 x1 + (4κRik* s ) (k - 1)/12

/2 (11a)

Thus, the average minimum resolution Rik* s between the SCPs A and B in adjacent multiplets containing i and k SCPs is determined ik* by eq 11a. One observes that Rik* s ) Rsa when i ) k ) 1, as it should. We now introduce two assumptions. The first is that, on average, multiplets are Gaussian-like in shape (see Figure 1c) and that they have a near-exponential distribution of amplitudes, even though the SCP amplitudes are constant. This assumption leads ik* to the conclusion that, regardless of i and k, Rsa has a value equal to the average minimum resolution separating a doublet into two distinct maxima. This value is 0.71. The second assumption is that any gik(Rs) approximated by g1,1(Rs) should be scaled, such that the ratio of its minimum value to its average value predicted by eq 11a remains constant. Thus, gik(Rs)’s for large i and k extend over narrow Rs ranges. This assumption is more difficult to justify, other than to observe that gik(Rs) can extend over negative values of Rs unless one introduces it. In addition, extremely large Rs’s are not needed to resolve complex multiplets into two simpler multiplets, so some rationale does exist. Since the minimum value of g1,1(Rs) is 0.5, as shown in Figure 1a, this assumption modifies eq 11a to

Rmin,ik ) Rmin,ik (R) ) 0.5(1 + (i + k - 2)κ/2)-1 s s {x1 + (5.68κRmin,ik )2(i2 - 1)/12 + s )2(k2 - 1)/12}/2 x1 + (5.68κRmin,ik s

(11b)

where Rmin,ik is the minimum value of gik(Rs), and the number, s 5.68, is determined as 4 times the proportionality factor, 0.71/ 0.5. Equation 11b depends on R, because κ depends on R. Thus, all gik(Rs)’s are now determined; they equal g1,1(Rs), once it is shifted along the Rs axis such that Rmin,ik is expressed by eq s 11b and then scaled. The direction of the shift is to smaller Rs’s. Indeed, if i and k do not equal 1, then Rs’s less than 0.5 can be adequate for separation. This finding at first seems nonintuitive. As shown in Figure 1 of ref 6, however, three ordered overlapping Gaussian SCPs, A, B, and C, having pairwise resolutions less than 0.5, can form two multiplets (observable peaks). In this example, one has a doublet AB and singlet C that are separated into two observable peaks, even though the resolution between B and C is less than 0.5. This finding is consistent with the pdf, g1,2(Rs), having density for Rs’s less than 0.5. Determination of K. It only remains to determine the scalar κ that quantifies relative to xo the average interval κxo shown in 3800

Analytical Chemistry, Vol. 69, No. 18, September 15, 1997

Figure 1c between adjacent SCPs that are known to overlap. Scalar κ is defined by the equation

∫ zh(z) dz ) ∫ h(z) dz xo

κxo ) κ(R)xo

0

xo

(12)

0

where h(z) is the pdf, introduced in eq 2, for the distribution of intervals between adjacent SCPs. Equation 12 expresses a conditional probability, i.e., it equals the average interval between adjacent SCPs, given that they overlap. It is observed that, in ref 6, the symbol γ was used for this scalar, instead of κ; the symbol is changed here to avoid further confusion with the extent of overlap, also symbolized by γ.11 Equation 12 completes the equations for the peak overlap of homogeneously distributed SCPs. Synthesis of g(Rs). Figure 1e illustrates the synthesis of g(Rs) by the above equations. The dashed curve, g1,1(Rs), is identical to that in Figure 1a. The other dashed curves are various gik(Rs)’s, as calculated from g1,1(Rs) and eq 11b. The amplitudes of these gik(Rs)’s increase with increasing i and k, because the pdf’s are scaled and the area under each equals unity. The solid curves having small amplitudes are these gik(Rs)’s, multiplied by the probabilities pik expressed by eq 7. The bold curve is the sum of these solid curves (and others not shown) and is g(Rs), in accordance with eq 5. The average value of the g(Rs) so determined equals R* s, in accordance with eq 4. For any homogeneous pdf h(z), eqs 1-12 determine R* s as a function of R, i.e., R*s ≡ R* (R). In other words, only one R is s consistent with these equations and a particular R* . Conses quently, one now must express eq 3 as

R ) 4m j σavR*s (R)/X

(13)

which for a specified m j , σav, and Xsthat is, for specified attributes of the separationsdetermine a unique R. Thus, R and R* s are not independent. The difficulty with previous formulations of the overlap problem by point process statistics was the erroneous assumption that R* s is independent of R. As is shown below, cognizance that R* varies with R enables s the simple equations for p previously published to describe overlap even at high saturations. PROCEDURES Equations 1-13 were solved numerically for various h(z)’s considered in previous studies to govern the distribution of intervals between adjacent SCPs. For each h(z), a series of R* s’s was specified, and for each R* s, eq 4 was solved iteratively for R using bisection. In each iteration of R, all parameters on which the distribution g(Rs) in eq 4 depended were calculated. Specifically, using the R of that iteration, probabilities pik were calculated from eq 7, κ was calculated from eq 12, resolution Rmin,ik was s calculated from eq 11b using bisection, and then g(Rs) was calculated from eq 5. The convergence criterion for R was a relative change in R of less than 10-4 between successive iterations. A large number of coordinates, (R,R* s), was so determined and then fit by a cubic spline. Sufficient coordinates were (11) Martin, M.; Guiochon, G. Anal. Chem. 1985, 57, 289.

Table 1. Various pdfs, h(z), and the Corresponding Expressions p1 and Ka pdf

h(z)

p1

exponential λ exp(-λz); z g 0 (Γ of order 1) Γ of order pˆ

κ

exp(-R)

{

R-1 -

(λpˆ )pˆ zpˆ -1 exp(-λpˆ z)/(pˆ - 1)!; z g 0

pˆ -1

exp(-pˆ R)

∑(pˆ R) /k! k

e-R 1 - e-R pˆ

k)0

R-1

k

k)0

pˆ -1

∑(pˆ R) /k! k

1 - exp(-pˆ R)

normalb

f(z)/

∫ f(z) dz, where f(z) ) (x2πσ ) ∞

0

exp(-(z - λ-1)2/2σg2); z g 0 uniform

λ/2; 0 e z e 2/λ

g

-1

( (

1-R x2RSD 1 1 + erf x2RSD

1 + erf

1 - R/2

) )

-1

R

{

1+

}

∑(pˆ R) /k!

1 - exp(-pˆ R)

k)0

xπ2RSD[e

-(2RSD2)-1

2

2 -1

- e-(R-1) (2RSD ) ]

erf[(R - 1)(x2RSD)-1] + erf[(x2RSD)-1]

}

1/2

a λ ) m j /X; RSD ) λσg; σg is the standard deviation of the Gaussian distribution, f(z); erf is the error function. b Method 1 expression of normal h(z) in ref 2.

chosen to define the spline with accuracy. The spline then was used to determine R* s at equally spaced values of R, e.g., 0.1, 0.2, etc. Computer simulations of multicomponent separations were generated to test the coordinates (R,R* s) computed from the cubic spline. Each simulation contained m j ) 250 Gaussian SCPs distributed in a separation space X of unit length, with intervals between successive SCPs consistent with the appropriate h(z). The numerical methods needed for these simulations are detailed elsewhere.2 All SCPs had equal standard deviations σ (hence, the average standard deviation of adjacent SCPs, σav, equaled σ) and exponentially distributed amplitudes. For any R, the σ of SCPs in any simulation was chosen to satisfy eq 13, with R* s equal to that determined by the cubic spline. The average number of maxima in 100 simulations generated at each R was identified with the average number p of observable peaks and compared to the prediction of eq 1. All integrals were evaluated either analytical or numerically by Simpson’s rule. All computations were made on a Power Macintosh 6100 using Language Systems FORTRAN (Sterling, VA). To evaluate the theory’s application to experimental separations, an analysis of overlap in saturated chromatograms of petroleum mixtures reported by Herman et al.3 was reinterpreted. The coordinates, ((nc - 1)-1, ln (p -1)) in Figure 9 of ref 3 were determined with a digitizer pad and converted to the coordinates (R,p), with R defined by values of R*s calculated for the exponential h(z) and values of m j determined previously by Herman et al. The new coordinates (R,p) then were graphed and compared to the predictions of eq 1. RESULTS AND DISCUSSION Description of h(z). Table 1 reports various pdfs, h(z), that have been used to model the distribution of intervals between SCPs and the expressions p1 and κ evaluated from them using eqs 2 and 12, respectively. The relevance to analytical separations of these h(z)’s is discussed briefly. The exponential h(z) commonly studied by the author was shown by Felinger and coworkers to describe the distribution of intervals resulting from

the superposition of a large number of point processes.12,13 Consequently, it is a good model for the distribution of intervals between SCPs developed from complex mixtures containing many homologous series, such as mixtures of natural origin. The other h(z)’s have been studied principally by Dondi and Felinger using Fourier theories of statistical overlap.2,12,14-16 The normal h(z), depending on its parameters, can represent the distribution of intervals between SCPs originating from a small, intermediate, or large number of homologous series. Its flexibility makes it very useful, particularly in describing SCPs appearing at nearly periodic intervals. As the relative standard deviation (RSD) of the normal h(z) decreases, this period becomes more sharply defined (the RSD is defined in Table 1). The Γ h(z) is somewhat intermediate in behavior, with parameter pˆ ) 1 corresponding to the exponential h(z) and large pˆ ’s resulting in an asymptotic approximation of a normal h(z). The uniform h(z) is a bit unusual, but it is useful if an upper limit exists to the interval between SCPs. Although the exponential h(z) usually best approximates the distribution of intervals between SCPs of large, complex separations, the other h(z)’s can describe the interval distribution within specific portions of these separations even better than the exponential h(z).16 A wide variety of h(z)’s can be generated from these four distributions by choosing different parameters (e.g., RSD and pˆ ). Furthermore, these h(z)’s can be superposed to create even more varied distributions.17 Selection of the appropriate h(z) is made by fitting experimental data (e.g., peak numbers and capacities, autocorrelation functions, etc.) to overlap theories developed for different h(z)’s, with average SCP number m j and various parameters as unknown coefficients, and then choosing the best fit.14-17 Representative graphs of these h(z)’s are shown in Figure 1f. Interpretation of Simulation Separations. The p1’s in Table 1 have been reported previously.2 Expressions for κ are reported here for the first time, except for the exponential h(z).6 (12) Felinger, A.; Pasti, L.; Dondi, F. Anal. Chem. 1990, 62, 1846. (13) Felinger, A. Anal. Chem. 1995, 34, 2078. (14) Felinger, A.; Pasti, L.; Reschiglian, P.; Dondi, F. Anal. Chem. 1990, 62, 1854. (15) Felinger, A.; Pasti, L.; Dondi, F. Anal. Chem. 1991, 63, 2627. (16) Dondi, F.; Betti, A.; Pasti, L.; Pietrogrande, M. C.; Felinger, A. Anal. Chem. 1993, 65, 2209. (17) Zou, M. Masters Thesis, Southern Illinois University, Carbondale, IL, 1996.

Analytical Chemistry, Vol. 69, No. 18, September 15, 1997

3801

Figure 2. (a) Graph of p vs R for the exponential pdf. Curve equals m j p1, with p1 reported in Table 1; circles represent the average number of maxima in computer simulations containing m j ) 250 Gaussian SCPs; error bars represent one standard deviation. (b) Graph of R*s vs R for the exponential pdf. Dashed curve is cubic spline fit to coordinates (R, R*s) determined by eqs 1-13. Inset is graph of p vs R, with simulation p’s scaled to R*s ) 0.71. Solid curve equals m j p1, with p1 reported in Table 1. (c, d) As in (a) and (b) but for the Γ pdf (pˆ ) 3). Graphs a, b, and the inset in (b) are reprinted with permission from ref 6.

For these h(z)’s, Figures 2-5 are graphs of R*s vs R calculated from the theory detailed above; of p vs R, with p determined by simulation and R calculated from these R* s’s and eq 13; and of p vs R, with p determined by simulation and R calculated from the constant R* s value, 0.71. The curves in graphs of p vs R represent theory for p calculated from eq 1, with m j ) 250 and the p1’s in Table 1; the circles represent the average numbers of maxima in 100 computer simulations containing 250 SCPs on average; and 3802 Analytical Chemistry, Vol. 69, No. 18, September 15, 1997

Figure 3. (a,b) As in Figure 2a,b but for the Γ pdf (pˆ ) 5). (c,d) As in (a) and (b), except pˆ ) 7.

the error bars represent the standard deviations of these maxima numbers. Some trends are noted below, but in general the agreement between simulation and theory is excellent for all h(z)’s, even when p approaches 30, i.e., when p/m j ≈ 0.12. Agreements of this caliber have not been reported for realistic simulations of separations since this field of research began in 1983. This outcome is somewhat surprising, in light of the simplicity of theory. Of particular interest is the finding that the assumptions of equal SCP amplitudes and equal spacings between SCPs known to overlap do not cause problems at high R. Among the trends noted above, one observes that theory slightly underestimates p for large R, except for the uniform h(z). Furthermore, at intermediate R’s, theory slightly overestimates p for normal h(z)’s having small RSDs (e.g., RSD ) 0.15 and 0.25). Finally, theory slightly overestimates p for the normal h(z) at

Figure 4. (a,b) As in Figure 2a,b but for the uniform pdf. (c,d) As in Figure 2a,b but for the normal pdf (RSD ) 0.50).

Figure 5. (a),b) As in Figure 2a,b but for the normal pdf (RSD ) 0.25). (c,d) As in (a) and (b), except RSD ) 0.15.

even low R and RSD ) 0.50. This behavior is expected, however, and is caused by the slight extension of the normal h(z) over negative z values, when RSD ) 0.50 (see Figure 1f).2 One also observes that graphs of R* s vs R correlate strongly with the rate at which overlap varies. In these graphs, the circles represent coordinates (R,R* s) determined by eqs 1-13 (except * for the coordinate, [R ) 0,R* s ) 0.726]; this R s was calculated as the average of g1,1(Rs)); the dashed curve is the cubic spline fit to these coordinates. In general, overlap becomes more severe as dR* s/dR becomes increasingly negative. For example, overlap is virtually nonexistent at low R’s for Γ h(z)’s having large pˆ ’s and normal h(z)’s having small RSDs, and dR*s/dR is almost zero in these cases. In contrast, a rapid increase of overlap with increasing R is accompanied by very negative dR* s/dR’s (in particular, see Figure 5d). Incidentally, these h(z)’s cause this unusual overlap pattern because they are narrow, bell-like-shaped

curves (see Figure 1f). Consequently, adjacent SCPs are nearly periodically spaced and overlap only if their average bandwidth approaches the near-periodic spacing. When this happens, however, nearly all adjacent SCPs overlap. One should not be troubled by R* s’s less than 0.5 in these graphs. As was discussed earlier, the resolution Rs of the separated SCPs in any two adjacent multiplets can be less than 0.5. Similarly, the average resolution R* s of separated SCPs in all adjacent multiplets throughout the entire separation can be less than 0.5. The insets in parts b and d of Figures 2-5 are graphs of p vs R, with R defined by R*s ) 0.71. This R* s was determined by El Fallah and Martin to be the average resolution required to separate two Gaussian SCPs having exponentially distributed amplitudes Analytical Chemistry, Vol. 69, No. 18, September 15, 1997

3803

and equal standard deviations;18 it almost equals 0.726, the mean of g1,1(Rs) (these two numbers should be identical and differ probably because different approaches were used to determine them). These graphs are similar to ones published elsewhere3,4 and were generated from simulation results using the proportionality between R* s and R (see eq 13). Specifically, the three-part simulation coordinate, (p,R*s,R), was mapped into the simulation coordinate (p,0.71,0.71R/R* s) and then graphed against theory. Thus, all simulation p’s are identical to those in parts a and c of Figures 2-5; they are simply scaled relative to new R’s. For these scaled results, a marked disagreement between simulation and theory is observed at high R. The disagreement is familiar, and its origin now is clear. As shown by graphs of R* s vs R, one cannot assign R* s a constant value. Such an assignment results in agreement between simulation and theory at only low R, when g(Rs) almost equals g1,1(Rs) (i.e., when pik ≈ 0 unless i ) k ) 1) and R* s ≈ 0.71. Indeed, the failure to address the variation of R* s with R is the principalsif not the onlysorigin of previously reported departures from theory for large R. The reason that simulation and theory now agree is simple. As shown by eq 13, R is proportional to the product, σavR* s. Because R* now decreases with increasing R, σ is larger than it av s would be if R were defined by R*s ) 0.71. Consequently, SCPs are broader. Thus, overlap is more extensive than previously considered, fewer peaks are observed, and the positive deviations from theory discussed above are eliminated. Reinterpretation of Experimental Chromatograms. To provide an application of this theory to experimental separations, the author reinterpreted two series of saturated homogeneous separations containing SCPs having constant standard deviations and reported in 1984 by Herman et al.3 These researchers developed at different peak capacities nc five separations of the petroleum distillate, fulgene, and six of the aromatic fraction of Emeraude oil by gas chromatography. They also verified that the exponential h(z) described overlap in the two most efficient separations of both mixtures. Parts a and c of Figure 6 are replicates of the graphs developed from the Emeraude oil and fulgene chromatograms, respectively; these graphs were reported as Figure 9 in ref 3. The straight lines are fits to the equation

ln(p - 1) ) ln(m j - 1) - m j /(nc - 1)

(14)

of p’s and nc’s determined from the two most efficient chromatograms of both mixtures, with m j ≈ 68 determined for fulgene and m j ≈ 138 determined for Emeraude oil. The author observes that eq 14 simply is a linearized expression for p based on the exponential h(z) that is corrected for “end effects”. In Figure 6a,c, nc is defined relative to the constant R* s, 0.525, proposed by Herman et al. It is clear that eq 14 describes rather poorly most of the remaining experimental coordinates, ((nc - 1)-1, ln (p -1)), associated with the low-efficiency chromatograms. The poor description of overlap for these coordinates does not occur because eq 14 is flawed but simply because R* s cannot be assigned a constant value, as discussed above. In contrast, eqs 1-13 are capable of describing all the coordinates. To prove this assertion, the abscissas, (nc - 1)-1 ) (X/[4σav(0.525)] - 1)-1 in Figure 6a,c were converted to the values 4m j σav/X ≡ θ, using eq (18) El Fallah, M. Z.; Martin, M. Chromatographia 1987, 24, 115.

3804

Analytical Chemistry, Vol. 69, No. 18, September 15, 1997

Figure 6. Graphs of ln(p - 1) vs (nc - 1)-1 developed from gas chromatograms of (a) Emeraude oil and (c) fulgene and reported by Herman et al. in ref 3. (b,d) Reinterpretation of these graphs by theory presented here for exponential h(z). Graphs a and c are reprinted with permission from ref 3 (copyright 1984 American Chemical Society).

j ’s reported 3 for nc, the assignment R*s ) 0.525, and the m immediately above. Furthermore, the ordinates, ln (p - 1), were converted to values of p. The R’s corresponding to these data were calculated very easily since R* s for the exponential h(z) varies almost linearly with R, as shown in Figure 2b (correlation coefficient r > 0.999), with an intercept ao and slope a1 equal to 0.732 and -0.177, respectively. Thus, if theory is correct and the exponential h(z) describes overlap in the saturated chromatograms, then the R’s corresponding to these data should equal

R ) θR*s ≈ θ(ao + a1R)

(15a)

R ≈ θao/(1 - θa1)

(15b)

or

with θ defined immediately above. Figure 6b,d shows graphs of p vs R so determined for the Emeraude oil and fulgene chromatograms, respectively. The symbols represent the experimental p’s determined by Herman et al. that are paired with the R’s calculated from eq 15b. The curves are graphs of eq 1, with p1 equal to that for the exponential h(z) reported in Table 1. It is clear that statistical overlap theory, as developed here, closely describes the experimental p’s for both mixtures at all saturations. CONCLUSIONS In ref 6, the author observed that the description of overlap by eqs 1-13 had “a utility not currently envisioned...(the theory) is awkward.” The author observes this conclusion was premature. The description, once generalized, provides powerful means for describing overlap with point process statistical models. The author also observes that eqs 1-13 are an outgrowth of a theory describing overlap by modeling the breadths of SCPs by a random variable that varies linearly with the amplitude of SCPs. The expression so determined for the average number p of peaks is19

∫ g(z)p (z) dz

p)m j



0

1

(16)

where z ) Rs/(4σav) and p1(z) equals eq 2, with variable z replacing xo. The pdfs g(z) and g(Rs) are closely related but not identical. In particular, the g(z) postulated and verified in ref 19 for the exponential h(z) does not work well with most of the other h(z)’s examined here, and, consequently, the theory in ref 19 is inferior to that developed here. (19) Davis, J. M. Chromatographia 1996, 42, 367. (20) Davis, J. M.; Giddings, J. C. J. Chromatogr. 1984, 289, 277. (21) Schure, M. R. J. Chromatogr. 1991, 550, 51. (22) Creten, W. L.; Nagels, L. J. Anal. Chem. 1987, 59, 822. (23) Pietrogrande, M. C.; Pasti, L.; Dondi, F.; Rodriguez, M. H. B.; Diaz, M. A. C. J. High Resolut. Chromatogr. 1994, 17, 839.

An important aspect of this work is placement of statistical overlap theory on solid theoretical ground. In earlier studies, 3,20 0.71,18 0.6,21 and 0.822 ) were proposed various R* s’s (e.g., 0.5, as empirical values to reconcile theory and observation. This empiricism reflected a basic lack of understanding of overlap in saturated separations. The author now has shown that the resolution of multiplets, not SCPs, determines p, has developed a physicochemical theory for the distribution of resolution, and has shown that the theory so determined predicts the number of maxima in both computer simulations and experimental gas chromatograms. Overlap in homogeneous one-dimensional separations now appears to be well understood from a statistical perspective. Fourier models of statistical overlap are powerful tools for the characterization of separations. One attribute they cannot provide, however, is an estimate of the number of peaks,2 which is particularly useful in gauging the completeness of separation. Point process statistical models do permit the estimation of peak numbers, and the work presented here permits this estimation at high saturation for the first time. The application of this work to experimental separations governed by h(z)’s other than the exponential one awaits the test of time. Since overlap theories based on Fourier analysis show that other h(z)’s are appropriate to some separations,16,23 the author is hopeful that this work also is applicable to such separations. Finally, it is clear that some assumptions made here, e.g., that SCPs have constant standard deviations and exponentially distributed amplitudes, will not always be valid. Additional modifications of theory to describe more realistic experimental situations may be needed, particularly to describe overlap in nonhomogeneous separations. Fortunately, the latter description should not prove difficult.19 Despite the potential shortcomings of this work, it is very satisfying that a reasonable physical model of overlap produces results of the quality shown here. Indeed, this accomplishment has eluded the author for over a decade. To speak whimsically, earlier point process statistical models were useful only when they were not needed; that is, when the overlap was so slight that one probably did not care. This situation no longer is true. Received for review February 5, 1997. Accepted June 30, 1997.X AC9701391 X

Abstract published in Advance ACS Abstracts, August 15, 1997.

Analytical Chemistry, Vol. 69, No. 18, September 15, 1997

3805