Anal. Chem. 1994,66, 735-746
Statistical Theory of Overlap for Variable Poisson Density: Relaxation of Constraint of Randomness Joe M. Davis Department of Chemistry and Biochemistry, Southern Illinois University at Carbondale, Carbondale, Illinois 6290 1-4409 The statistical model of overlap is modified to address onedimensional separations containing single-component peaks (SCPs) with variable Poisson densities. The resulting theory describes overlap in unsaturated separations, for which the variation of SCP density throughout the separation is known. The expected number of peaks is expressed by an integral, whose value depends on the average saturationof the separation, the number of detectablecomponents,and a frequency function proportional to the SCP density. The theory is verified by its application to computer-simulatedchromatogramscontaining SCPs with linear, quadratic, sinusoidal, and exponential frequencies. Procedures are proposed to estimate the frequency function from the distribution of chromatographic maxima and to estimate the number of detectable components in chromatograms. These procedures are verified by their application to computer-simulatedchromatograms and to an experimental chromatogram of a well-characterized mixture. Computer simulations show that these procedures are more robust than those based on a constant Poisson density. A mathematical proof is offered, which shows that at constant low average saturation the maximum number of peaks in a separation containing SCPs with variable Poisson densities is obtained, when the SCP density is constant. This proof may explain previous experimental observations. Over the last decade several statistical theories have been developed to describe overlap in both simple and complex one-dimensional The statistical model of overlap3(SMO) proposed by Giddingsand theauthor is among the simplest of these, and several applications of it to experimental chromatograms have been r e p ~ r t e d . ~ - The l~ accuracy of its predictions under appropriate conditions also has been verified by experiment.15J6 It predicts, as do more complex theories, that most observed maxima in multicomponent separations are fused structures and that the fraction of singlet maxima in such separations is small. D. Anal. Chem. 1982, 54, 63. (2) Nage1s.L. J.;Creten,W.L.;Vanpeperstraete,P.M.Anal.Chem. 1983,55,216. (3) Davis, J. M.; Giddings, J. C. Anal. Chem. 1983, 55, 418. (4) Martin, M.; Guiochon, G.Anal. Chem. 1985, 57, 289. (5) Martin, M.; Herman, D. P.; Guiochon, G.Anal. Chem. 1986, 58, 2200. (6) Herman, D. P.; Billiet, H. A. H.; deGalen, L. Anal. Chem. 1986, 58, 2999. (7) Felinger, A.; Pasti, L.; Dondi, F. Anal. Chem. 1990, 62, 1846. (8) Felinger, A.; Pasti, L.; Dondi, F. Anal. Chem. 1991, 63, 2627. (9) Herman, D. P.; Gonnord, M. F.; Guiochon, G.Anal. Chem. 1984, 56, 995. (IO) Davis, J. M.; Giddings, J. C. Anal. Chem. 1985, 57, 2178. (1 1) Dondi, F.; Kahie, Y .D.; Lodi, G.; Remelli, M.; Reschiglian, P.; Bighi, C. Anal. Chim. Acta 1986, 191, 261. (12) Coppi, S.; Betti, A.; Dondi, F. Anal. Chim. Acta 1988, 212, 165. (13) Dondi, F.; Gianferrara, T.; Reschiglian, P.; Pietrogrande, M. C.; Ebert, C.; Linda, P.J . Chromatogr. 1990, 485, 631. (14) Oros, F. J.; Davis, J. M. J . Chromarogr. 1991, 550, 135. (15) Davis, J. M. J . Chromatogr. 1988, 449, 41. (16) Delinger, S. L.; Davis, J. M. Anal. Chem. 1990, 62, 436. (1) Rosenthal,
0003-2700/94/0366-0735$04.50/0 0 1994 American Chemical Society
Two limitations discourage the routine application of the SMO to chromatograms (or electropherograms) of complex mixtures. The first is that single-component peaks (SCPs), or peaks resulting from the detection of pure chemical species, have amplitudes proportional to analyte concentration. These amplitudes cause complications as the density of SCPs increases, because the SMO only addresses the relative positions of SCP maxima and not the amplitudes themselves. Computer simulations show that estimates of statistical parameters are in error by more than lo%, when the number of SCPs exceeds by more than half or so the number of maxima that can be r e ~ o l v e d . ~ . ~ ~ The second limitation is that thedensity of SCPs throughout the chromatogram, or the relevant fraction thereof, must be constant. This limitation results from an assumption that Poisson statistics governs the position of SCPs. While separation conditions often can be found for which this assumption holds, the search for these conditions can be timeconsuming and occasionally frustrating. Furthermore, for some mixtures the separation conditions are such that this prerequisite cannot be satisified.14 Several papers have been published recently by Felinger et al. on a statistical theory that reduces or avoids these limitation^.^^^^'*-^^ The authors have shown that the power spectrum of chromatograms can be interpreted in a statistical manner. For appropriate amplitude distributions, the authors can calculate good statistical parameters from chromatograms having SCP densities greater than those addressed by the SMO. In addition, one does not need to assume that the SCP density is Poisson; rather, one may assume the density is one of several functions. Efforts presently are underway to address shortcomings of the SMO. For example, the work of Martin shows considerable promise in reducing the restrictions of SCP amplitudes.2' Here, the author seeks to remove the restriction that SCP density must be constant, while maintaining theoretical simplicity. As is shown below, with simple modifications, the SCP density can assume virtually any continuous function. Furthermore, this function need not be known; at low saturation, it can be approximated from the separation itself. With this approximation, one can calculate from theory various statistical parameters, such as the number of detectable components and the saturation of the separation. (17) Davis, J. M.; Giddings, J. C. Anal. Chem. 1985, 57, 2168. (18) Felinger, A.; Pasti, L.; Reschiglian, P.; Dondi, F. Anal. Chem. 1990,62, 1854. (19) Felinger, A.; Pasti, L.; Dondi, F. Anal. Chem. 1992, 64, 2164. (20) Dondi, F.; Betti, A.; Pasti, L.; Pietrogrande, M. C.; Felinger, A. Anal. Chcm. 1993, 65, 2209. (21) Martin, M., personal communication, 1993.
Analytical Chemistty, Vol. 66,No. 5,March 1, 1994 735
In general, the major objectives of overlap theories are to provide theoretical frameworks by which to gauge the severity of overlap, to serve as regressional models by which to calculate the number of detectable mixture components from observed chromatographic signals, and (once the number of mixture components is known) to facilitate estimation of the additional capacity required to improve a separation to an acceptable level. The significance of the present work is that it suggests these objectives can now be achieved without the highly restrictive assumption t t SCPs must be distributed with constant density through t the separation. The modification introduced here thus removes a major restriction on the SMO and should facilitate its application to a variety of separations, to which it previously has not been applied. The work below details the modification of theory, the testing of theory by computer-simulated chromatograms containing SCPs having amplitudes of zero (Le., simple distributions of ordered numbers), the testing of theory by computer-simulated chromatograms containing SCPs having exponentially distributed amplitudes, and the estimation by theory of component numbers and saturations from simulated chromatograms and one experimental chromatogram. This work is not to be interpreted as an exhaustive study of the merits and limitations of this new theory but merely a survey of the potential that exists. Much detailed work necessarily is deferred to future studies.
x
THEORY Basic Equations. As shown over a decade ago by Davis and Giddings, the expected number p of peaks in an interval X containing m randomly distributed SCPs is3
x, one can express eq 2 as the differential limit (3) where dpldx replaces p/X and X(x) is the SCP density at coordinate x. Equation 3 describes the infinitesimally small number dp of peaks expected in the infinitesimally small interval dx, to which the constant Poisson density X(x) applies. The total number p of peaks can be calculated by summing the dp contributions for all intervals dx, Le., by multiplying eq 3 by dx and integrating over interval X p = ~oxX(x)e-’(x)xa dx
(4)
where the lower limit, 0, is the coordinate at the beginning of interval X. Equation 4 describes overlap for what shall be designated here as the variable-density case. While eq 4 is an acceptable expression for p, the following result is perhaps more useful. One now defines a dimensionless, continuous frequency function Ax), such that
If interval X contains an average number m of SCPs, as in the constant-density case, then fX(x) dx = m
(6a)
Joxf(x) dx = X
(6b)
which implies that
By substituting eq 5 into eq 4,one obtains the expression where f i is a statistical approximation tom, xois the minimum interval between adjacent SCPs sufficient for resolution, n, = X/xo is the peak capacity, and a = r?zxo/X = m / n , is the saturation of the separation. The saturation is a measure of the “crowdedness” of a separation; it is the ratio of the number f i of SCPs requiring separation to the capacity n, available for their separation. For Gaussian SCPs with standard deviation u, computer s i m ~ l a t i o and n ~ ~experiment9-* ~~ have shown that xo = 2u, when p is identified with the number of resolved maxima. Equation 1 describes overlap for what shall be designated here as the constant-density case, in which SCPs are distributed in X with a constant Poisson density. ’ Equation 1 easily is modified to account for the overlap of SCPs that are distributed in X withvariable Poisson densities. By dividing eq 1 by X , one obtains 1 9 1 3
where the constant (Y = m xo/X = m/n, now is interpreted as the average saturation of the separation. The local saturation at any coordinate x is Ax). and may be greater or less than this value. Iff(x) > 1, then the local saturation is greater than (Y; iff(x) < 1, then the local saturation is less than (Y. By introducing the frequency,flx), one is able to expressp in a form in which the average saturation is equal to the saturation cr of a constant-density separation. Consequently, at equal saturations the peakcapacities of constantand variable-density separations containing m SCPs are equal. These equivalences furthermore facilitate a simple comparison of equation 7 to equation 1; for A x ) = 1, eq 7 reduces to equation 1.
a
By defining the dimensionless variable, t = x f X , one can reexpress eqs 6b and 7 as where Xp = m/X is the constant SCP density. This density is the average number m of SCPs contained in span X or, more generally, the average number of SCPs contained in any interval of separation space. If one now considers the case in which this density is a function of separation coordinate (22) Davis, J. M.; Giddings, J. C.J . Chromorogr. 1984, 289, 277.
736
Analytical Chemistty, Vol. 66, No. 5, March 1, 1994
which are the expressions desired here. Davis and Giddings also showed for the constant-density case that the expected number P, of peaks containing v
SCPs is3
250r
p, = me-2mxo/X(1 - e-mo/X)4
(10)
For example, the expected numbers of singlet and doublet peaks are given by eq 10 for the v values, 1 and 2. This result also can be modified to account for the overlap of SCPs that are distributed with variable Poisson densities. By means identical to those above, one derives the expressions
These equations reduce to eq 10 for Ax) = A{) = 1. It is appropriate to consider what constraints on frequency f ( f) exist. Clearly,f( () must be nonnegative. Furthermore, it is a function; i.e., it has only one value at any {. In addition, at least for the theory developed here, it is continuous, although not necessarily differentiable, at every {. Although the following issue will be deferred to a later study, the author sees no reason that the minimum span, XO, necessary for the resolution of adjacent SCPs cannot also be a function of coordinate x. In this case, eq 4 becomes
Estimation of Statistical Parameters. For specified values of m and 2, eqs 9 and 1l b can be used to calculate p and P, for any frequency f({). Such calculations are useful in comparing the extent of separation expected for a particular frequency to the constant-density case. If previous applications of the SMO provide a basis for projection, however, the principal use of eq 9 will be to estimate statistical parameters, e.g., 2 and m, from actual chromatograms (or other onedimensional separations). For these chromatograms, f( {) typically will not be known. Fortunately an approximation to f({), with which statistical parameters can be calculated, can be deduced simply from the chromatogram. Estimation of Frequency A{). In accordance with basic probability theory, the derivative of the cumulative distribution F({) for SCPs is the frequencyf({), which one needs to evaluate eqs 9 and 11. The value of F( {) at coordinate {is the fraction of SCPs whose center-of-gravity coordinates are less than or equal to { (e.g., if 65% of all SCP coordinates were less than or equal to { = 0.5, then F({=0.5) would equal 0.65). Because overlap obscures some SCPs, F({) cannot be determined simply by inspecting chromatograms. It can be approximated, however, by the easily determined cumulative distribution Fa({) for chromatographic maxima (the subscript, “a”, designates “approximation”). The approximation of F( {) by Fa({)is illustrated by Figure la,b. Figure l a is a graph of the numbers of SCPs and chromatographic maxima, whose coordinates of maximum concentration in a computer-simulated chromatogram are less than (, vs coordinate {. The details underlying the chromatogram’s simulation are not important here, other than to note thatf({) was a quadratic function of (and less than 2, m was 250, 2 was 0.25, and SCP amplitudes were distributed exponentially. Unsurprisingly, the number of SCPs exceeds the number of maxima at any positive {, because of overlap.
0.8
0. n
0 0
0
I
I
I
I
1
c
Om2 Om4 Om6 Om* Figure 1. (a) Numbers of SCPs and maxima In a computergenerated chromatogram of saturation a = 0.25 containing 250 SCPs, whose coordinates of maxlmum concentration are less than VI) C. (b) Cumulatlve distributions 45)and F.(O constructed from (a).
t,
Figure 1b shows the same graph as two cumulative distributions, which were obtained by dividing the data in Figure l a for SCPs by m and those for maxima by pm, the number of maxima. It is apparent that these distributions are almost identical. Therefore, one can use thedistribution Fa(f), which one can calculate from the retention times and the number of maxima, as an approximation to F({), which one cannot calculate because all SCPs are not observable. The derivative of Fa(() then serves as an approximation tofcf), which is designated here as fa({). Estimation of m. Once fa(() is determined, parameter m can be estimated by at least three procedures. The first procedure is based on the numerical solution of a simple equation, and the latter two are based on a least-squares minimization of error between theory and experimental values of p . In the first two procedures, p is identified with the number pmof maxima and xo is identified with 2u. Such identifications are proposed here as reasonable, because they are valid in the constant-density c a ~ e . ~ J l In J ~ the first procedure, one solves numerically the equation
form, withpmand udetermined from a single chromatogram. The notation, q,in eq 13b simply is the value of Q in (space; i.e., ut = u / X . In the second procedure, several pm’s and u’s are determined from a series of chromatograms, whose efficiencies are altered in a manner that does not changefll). Analytical Chemism, Vol. 66, No. 5, March 1, 1994
737
In this case, one can calculate the sum of squares S S
analytically, and all g’s consequently were analytical functions. The coordinate {into which z maps was determined by solving eq 16 by the bisection method. In other words, the equation
and choose m to minimize this sum. The subscript, n, in equation 14 denotes the nthdata pair, (uy,pm). This procedure is more robust that the first, because it utilizes the results of several experiments. The analog of this procedure for the constant-density case is called the “multiple-chromatogram method” and is based on interpretation of a graph of In pmvs 2u/X = nC-l, which is linear.9q11J3,22 A drawback to both procedures is that u must be known. A third procedure that is independent of u is based on a variation of eq 14. Here, one arbitrarily choses an interval x i and counts the number p’ of intervals between adjacent maxima, whose spans are greater than x’o. For various X’O’S, one determines a series of p’values, which can be fit to an equation similar to eq 14
gU) - d o ) - z = 0 (17) was solved numerically for {, with the solution confined to the interval 0 I { 5 1. From a sequence of z’s, a sequence of r s having frequency f( {) thereby was generated. Testing of Equations 9 and l l b by Simulated Chromatograms ContainingSCPs with Zero Amplitudes. Because SCP amplitudes are known to impose limits on the constant-density case, it was decided first to test the validity of eqs 9 and 1 l b by chromatograms containing SCPs with zero amplitudes. Such “chromatograms” in fact are simply sequences of coordinates {distributed with frequencyflc) and are identical to the “line chromatograms” in ref 17. The author’s rationale for this action was that it was not clear that eqs 9 and 1l b were valid at all, and a preliminary study of them was necessary prior to considering the more complicated case of SCP amplitudes. A sequence of m = 50 000 {coordinates was generated as detailed above for each of several frequenciesflc). For each f( {), the coordinates were ordered with a Quicksort algorithm (a highly efficient sorting algorithm for large arrays24),and the intervals between successive coordinates were computed and compared to a series of x bfvalues. The number of intervals that exceeded xbr was interpreted as the number of resolved “peaks” at the saturation, a! = m x’op Similar comparisons were made to determine the numbers of singlet, doublet, and triplet “peaks”. For eachf({), 10 such sequences of 50 000 { coordinates were generated, and the numbers of Ypeaks” were averaged. Testing of Equation 9 by Simulated Chromatograms Containing SCPs with Amplitudes. The numbers of “peaks” in the zero-amplitude chromatograms described immediately above were found to agreevery well with eqs 9 and 1 1b (details are reported in the Results and Discussion section). Following this favorable observation, computer simulations containing SCPswith amplitudes other than zero werecalculated to gauge the a! ranges over whichp could be identified with the number of pmof maxima (the plural of the word, “range”, is used, because the range could depend on frequencyflr)). The amplitude c of chromatograms containing m = 250 SCPs was calculated as a sum of Gaussians
n
or n
As before, eq 15 is minimized by the appropriate choice of m. Here, the symbol, x’of, represents the interval x b in { space and the subscript n denotes the nth data pair, (x’of, p ? . The analog of this procedure for the constant-density case is called the “single-chromatogram method” and is based on interpretation of a graph of lnp’vs x b / X , which is As in that procedure, one carefully must excludep’values for which x b < 2a or so, becausep’is independent of these X’O’S. Estimation of a!. If u is known, as in the first procedure, then (Y can be estimated simply as Zmuy, once m is known. If u is not known, as in the third procedure, then the span, 2uf, required for resolution of maxima in {space can be calculated from eq 13b, with pm, m, and fa({) as known quantities. The average saturation a! then can be computed as 2 m q . Estimation of Pu.Oncem,fa({), and a! are known, equation 1l b can be evaluated. PROCEDURES
Generation of t Coordinates with Frequency fit) from Uniform Random Numbers. To test eqs 9 and 11b, computergenerated chromatograms containing SCPshaving coordinates consistent with various frequencies fl{) were required. Sequences of coordinates {distributed with frequencyflc) over the interval 0 I { I 1 were computed by mapping a uniformly distributed random number z into {space in accordance z =
s6fCy) dy =
-do)
= W)
(16)
wherey is a dummy variable of integration and g is the integral of the frequencyf({). The difference, g({) - g(O), is the cumulative distribution F ( { ) of the frequency A{). For all cases considered here, the integrals could be evaluated (23) Dahlquist, G.; BjBrck,
Cliffs, NJ,
738
1974.
A. Numerical Methods; Prentice-Hall: Englewocd
Analytical Chemisfry, Vol. 66, No. 5,March 1, 1994
where Ai is the amplitude, ti is the center of gravity, and up is the standard deviation of the ith SCP in { space. In any chromatogram, the distribution of amplitudes Ai was exponential, because both experiment2*’1,25 and theory26 suggest that amplitude distribution is typical for complex mixtures. The centers S; of gravity were determined for various frequencies f({) by the mapping algorithm described above. The standard deviations uti in any chromatogram were equal to the constant, ut, for all SCPs. The average saturation & (24) Starkley, J. D.; Ross,R.J. Fundamental Programming; West Publishing Co.: St. Paul, MN, 1984. (25) Nagels, L. J.; Creten, W. L. Anal. Chem. 1985, 57, 2706. (26) El Fallah, M. Z.; Martin, M. Chromarographia 1987, 24, 115.
= m xo/X of any chromatogram was interpreted as 2map In other words, m was approximated by m and xo/X was approximated by 2ap A wide range of Cu values, e.g., between 0.05 and 1.50, was investigated. The number pm of maxima was determined simply by scanning the vector c and determining the number of occurrences for which c(j - 1) < CG)> c ( j l ) , w h e r e j represents an element of c. The contribution to c of any SCP was computed for 6a units on either side of the SCP maximum to reduce to negligible levels the truncation errors that could be interpreted as false maxima. For each & and A{), 100 simulations were carried out, and pmwas identified with the average number of maxima found in them. Calculation of fa({) from Computer-SimulatedChromatograms. The frequency fa( {) was calculated from computersimulated chromatograms in two ways. In the first of these, chromatograms were divided into 50 equally spaced intervals, Le., { = 0.02,0.04, etc. The fraction Fa({)of maxima having coordinates less than these r s was graphed against {. A polynomial of degree between three and six then was fit to these data by least squares, and this polynomial was differentiated analytically to obtain&({). In the second way, the cumulative distribution Fa({) was Calculated as detailed above (except that 25, instead of 50,equally spaced {values were used to reduce oscillations) and then differentiated numerically in accordance with
+
fa