Probability distributions of the number of ... - ACS Publications

the probability of Interferences near the extremities of the retention Interval, It does not require that the number of sample components be large, an...
0 downloads 0 Views 876KB Size
2200

Anal. Chem. 1986, 58,2200-2207

Probability Distributions of the Number of Chromatographically Resolved Peaks and Resolvable Components in Mixtures Michel Martin,* David P. Herman, and Georges Guiochon Ecole Polytechnique, Laboratoire de Chimie Analytique Physique, 91128 Palaiseau, France

A theoretical model of peak overlapplng is developed on the bask that sample components are randomly dlstrlbuted along the retention axis. This model overcomes 801118 limitations of previous models, In that H takes into account the variation of the probability of Interferences near the extremities of the retention Interval, it does not require that the number of sample components be large, and it gives directly the probability distribution of the number of resolvable peaks for a known number of sample components rather than just sbnple averages. More Importantly, the theory allows a prediction of the probability dlstrlbutlon of the number of components In a sample when the number of resolved peaks is known. Finally, the adherence of real-world chromatograms to the retention randomness assumption of the theory is tested by using retention data of 20 arbitrarily selected compounds with dlfferent chemical functionaUlles on 65 gas chromatographic stationary phases. The normalized cumulative frequency dlstrlbutions of relattve consecutive distances calculated for their specific retention volumes indicate agreement with the general hypothesis that component retentions of complex mixtures are free energy random.

The analyst generally recognizes that the peaks observed in a chromatogram may not necessarily correspond to pure individual components of a sample, especially when it is complex. In the past, the severity of the component overlapping problem has been generally overlooked. Only recently have statistical models addressing this problem in quantitative terms been reported in the literature. Rosenthal developed a model of the balls-in-a-box type, based on combinatorial analysis to describe the number of occurrences of singlets? doublets, ..., multiplets in a chromatogram (1). His experimental results were in excellent agreement with the predicted values for the number of overlap occurrences. The Rosenthal model, however, applies only to discrete chromatograms such as those provided by GC/MS with finite time scans. More recently, Davis and Giddings developed a model of component overlap in multicomponent chromatograms based on Poisson statistics (2) which applies to continuous chromatograms. They derived a relationship that allows one to predict the probable number of components in the sample from a knowledge of the number of separated peaks in a chromatogram and the peak capacity of the system. Using the analogy between the depolymerization and separation processes, Martin and Guiochon derived an expression of the probable number of sample components in terms of the areas of the peaks observed in the chromatogram (3). These models allow an estimation of the average probable number of components in the sample mixture to be made, where the relative accuracy of the estimation is expected to increase with increasing component number. In addition, computer simulations of random chromatograms have shown that the Davis and Giddings model is strictly followed only when the system peak capacity exceeds the number of sample components (4-6).

The objective of the present work is therefore to provide a new model of the peak overlapping problem that can be applied equally well to chromatograms exhibiting both large and small component numbers. For this reason, the model must take into account the boundary effects, which have been neglected in the previous works. Indeed, the probability of interferences between component zones decreases in the vicinity of the extremities of the separation domain, because no zone can be present on either side of the extremities. This effect is negligible for large system peak capacities. Most importantly, this work intends to extend the previous models by predicting not only the average values of the number of peaks or of the number of components but also the probability distribution of these numbers.

THEORY In this model, one first considers that the number of components, m, is known. It represents the total number of detectable sample components eluting within an interval X. In order to take into account the boundary effects, it is assumed that two components are located a t the extremities of the retention interval, X, expressed in units of time, volume, length, retention index, or any other appropriate thermodynamic scale. In addition, the m - 2 other sample components are assumed to arrange themselves randomly within the interval X. Discussion about the validity of this assumption will be addressed later in this paper. As in the previous models (2, 3), a given component is represented by a point located a t the zone maximum or, if desired, a t its center of gravity. In order for two adjacent components to appear as two separated peaks in a chromatogram, the distance between their representative points must be larger or equal to some constant value, xg. The assumption of the constancy of xo implies that the width of each individual component zone is constant. In fact, this equal peak width assumption is generally considered to be valid only over limited retention time ranges in gas chromatography with linear temperature programming as well as in reverse-phase liquid chromatography with proper linear programming of the mobile phase composition (7, 8). For this reason, it is best to determine the total number of components in a sample by breaking a long chromatogram up into several smaller intervals where the individual peak widths within the subintervals can be assumed to be constant. The value of xo depends on the means used to scrutinize real chromatograms and to count the number of separated peaks. As previously discussed, it will be close to 6u, where u is the standard deviation of the zones, when base line separation is required between two peaks, or close to 2u when a peak is counted each time a maximum appears in the chromatogram. While the effect of individual peak height variations on the value of xo is not explicitly accounted for in the model, it is felt that this might be done by considering x o as an average value resulting from these variations. It seems, however, that this effect is relatively minor ( 4 ) . This value can be related to commonly used parameters reflecting the efficiency of the separation system, such as the peak capacity, n, defined on the basis of a xo, rather than of a 4u separation as originally described (9)

0003-2700/86/0358-2200$01.50/00 1986 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 58, NO. 11, SEPTEMBER 1986

n = 1 + X/xo

(1)

With the model described above, the problem of determining the probability, Pm,+(p), that p separated peaks are observed in a separation system with a fixed peak capacity, n, when analyzing a sample with a known number of components, m, becomes the problem of estimating the probability that (p - 1)intervals between adjacent points are larger than or equal to xo when (m - 2) points are randomly distributed in the interval X. The solution to this classical probability problem has been known since the beginning of this century (9) and is given by (9,10)

In this equation, the summation is terminated when i becomes larger than the smaller of the two values (m - p ) or the integer part of (n - p ) . Simple expressions for the average number of peaks, p , and for the variance of the probability distribution of the number of peaks, up2,can be obtained

(

p = l + ( m - l ) 1 - - n! 1)m-2

(3)

and

The major steps in the derivation of these expressions are indicated in the Appendix. Equations 2-4 thus provide a solution to the problem stated above; i.e., what is the probability of observing p peaks for a given number of components, m, and a t fixed peak capacity, n? The analyst, however, does not always know the number of components in the sample but can only count the number of peaks separated in the chromatogram. The problem becomes consequently the following: After having observed p peaks a t a given known peak capacity, n, what is the probability distribution of the number of components in the sample, PP,,(m)? The answer is provided by Bayes’ theorem on the probability of the hypotheses

PP,+(m) =

PO(^) Pm,n(P)

(5)

In this equation, Po(m) represents the a priori probability that the sample mixture contains m components, such that

2201

given by eq 2 divided by a normalization constant. In order to calculate this constant, the infinite sum in eq 7 can conveniently be replaced by a finite one, which is given in the Appendix.

RESULTS AND DISCUSSION Examples of the Probability Distribution of the Number of Peaks Obtained with a Known Number of Components. The calculation of the probability given by eq 2 for each value of p between 1 and inf(m,l(n)) when m and n are fixed leads to a histogram of the probability distribution of the number of xo-resolvedpeaks. Representative histograms are plotted in Figure 1for the singular case where m = 20 at four different values of the peak capacity, i.e., n = 10,20, 50, and 100. In all cases the distributions appear approximately symmetric about some mean value. It is interesting to compare them with a Gaussian probability density distribution, P‘, having the same mean value and variance as the true one c

-

with p and up2given by eq 3 and 4. The corresponding curves are also plotted in Figure 1. They are seen to closely approximate the true distributions, especially for intermediate values of the peak capacity when they are not truncated near the extremities of the range of p values, i.e., where neither mln > 1. The probability distributions shown in Figure 1give some complementary insight to previous works as to the seriousness of the peak overlapping problem. The most interesting feature of these distributions is the information contained in their breadth and more specifically their limits. For example, we see that for the case where m = 20, n = 50, if we were to perform a single chromatographic experiment, we could not (even with good luck) reasonably expect to observe more than about 18 resolved peaks in the resulting chromatogram. Conversely, we realize from these data that there is also a small yet distinct possibility that we may be so unfortunate as to obtain as little as 10 resolved peaks. It is exactly this information on the breadth of the probability distribution that distinguishes the current model from all previous models in that they only yield estimates of the average number of expected peaks and hence contain no inherent information as to practical limits. Probability That All the Componentsin a Mixture Are Separated. A common objective of the analyst is to resolve all the components of a sample. The theory developed above allows one to estimate what the probability is that this objective will be satisfied. The application of eq 2 to the case where p = m gives the probability, Pm,n(m), that all the sample components are separated. Obviously, this can occur only when m < n, in which case one has

m

C Po(m’) =1

m’=p

The analyst has usually no reason to suspect that the mixture is more likely to contain m, than m2 components (ml # m2) unless m,and m2differ greatly. Consequently, we will assume that Po(rn’)is essentially constant for all reasonable values of m. Equation 5 then becomes

(7) Pmf,n(P) m‘=p Therefore, the desired probability that p observed peaks originate from a mixture of m components is equal to the one

The variation of this probability is plotted in Figure 2 as a function of the peak capacity for four values of m. One notes that only under the most favorable of conditions (i.e., relatively long analysis times and/or use of highly efficient columns operated under programming conditions) can peak capacities based on a 2a separation definition be larger than approximately lo00 in capillary GC or 200 in packed column gradient elution LC. Hence we see from Figure 2 that one has only a marginal hope (less than 1% chance) of fully separating all the components of a sample from a single-shot, arbitrarily chosen, nonoptimized chromatogram in LC and GC when their component numbers exceed about 30 and 70, respectively. We

ANALYTICAL CHEMISTRY, VOL. 58, NO. 11, SEPTEMBER 1986

2202

30

% 20

30

-

20

-

75

10

-

r n = 20

10

-

n

=

io0

0 -

0 -

P 40

-

30

-

X

30

-

20

-

%

m t

m = 20

20

-

10

-

n = io 10

-

20

n = 50

0 -

P 0 -

P

Flgure 1. Probability distributions of the number of (p) peaks observed in chromatograms containing m = 20 components at peak capacities (n)of 10, 20, 50, and 100. The dotted curves represent the Gaussian approximations. I

I

?OC

I

-or

I

41

I

il)J

I

I

>or

Flgwe 2. Plots of the probability that all ( m )sample components will be separated, Pb@ = m),as a function of the system peak capacity, n , for the representative cases where m = 10, 30, 50, and 70. realize that eluent optimizationprocedures are directed toward making more efficient use of the separation space than those where chromatographic conditions are chosen at random. Hence, complete separation of mixtures of greater complexities than those indicated above may be frequently possible when eluent compositions are optimized. One should not conclude however that the process of optimization somehow invalidates the randomness hypothesis of the model, but rather optimization can be viewed as simply being the process whereby one searches for those sets of conditions where the probability distribution, Pm,n(p),is at an extreme limit, i.e., where p = m. In Figure 3, the minimum value of the peak capacity, nmm, required to separate all sample components with a given

probability, P,, is plotted vs. the number of components for three values of P,. According to eq 9, it is equal to

and is seen to increase dramatically with small increases in component numbers. Indeed, when this number is large, nmin increases as m2/[ln (l/PJ]. Figure 3 shows that for a single trial-and-error chromatogram, one has little hope to separate all the sample components in a mixture with good probability, for instance 9070, unless one makes use of separation systems with very large peak capacities and associated very long analysis times for analyzing samples of low or moderate complexities. If however the operator were to take a trialand-error approach where after an unsuccessful initial run several independent phase systems of differing selectivity were sequentially tried, the probability that the sample mixture can be totally resolved on at least one of the phase systems increases dramatically in accordance with the multiplication theorem for stochastically independent events. For example, let us assume that one has k mutually independent chromatographic systems or columns on which the sample components are randomly retained as described above. The probability Pb(k) that the mixture is totally resolved on at least one column is the complement of the probability that the separation is incomplete on all columns. Since this latter event must occur for all independent columns, one has according to the multiplication theorem for stochastically independent events

ANALYTICAL CHEMISTRY, VOL. 58, NO. 11, SEPTEMBER 1986

I

2203

70

6 -

4 -

500

IO

9 2 -

n

-

400

8'

'

'IO' '12'

'14'

'16'

'18'

'20' '22' '24'

'26'

'28'

'30'

'32' I3d1 '36'

m

Figure 4. Probability distrlbution of the number of ( m ) possible sample components responsible for producing a chromatogram exhibiting p = 10 peaks when the system peak capacity is measured to be n = 20, Pq,,,(m). The dotted curve represents the "Gaussian approximation'' (see text).

300

200

100

0

4

8

12

rn

16

20

Flgure 3. Plot of the peak capacity required to ensure with a glven probability that all ( m ) sample components will be x,, separated vs. the number of such sample components at the three probability levels of 0.9, 0.5, and 0.25.

If the k columns have the same peak capacity, n,one has, from equations 9 and 11

It can easily be calculated that for a 10-component mixture analyzed on a column with a peak capacity of 50, the probability of complete resolution of the mixture from a single experiment is only 20%. If now the mixture is injected on 10 independent noncorrelated columns of similar peak capacity (n = 50), there is an 89% probability that the mixture will be totally resolved on a t least one column. Probability Distribution of the Number of Sample Components When the Number of Separated Peaks and the Peak Capacity Are Known. The determination of the probability distribution of the number of sample components is of direct interest to the analyst who has enumerated the number of %,,-separated peaks in a mixture chromatogram and can determine the peak capacity of the chromatographic system by appropriate peak broadening measurements on pure standard compounds with a chemical structure similar to that of the sample components. This probability distribution is computed from eq 7 in combination with eq 2. More simply, a nonnormalized distribution is directly calculable from eq 2, by neglecting the normalization constant in the denominator of eq 7. Figure 4 shows the histogram of such a probability distribution when 10 peaks are observed with a chromatographic system having a peak capacity of 20. The calculation of the probability P,,Jm) can be simplified as shown above by using the Gaussian approximation expressed in eq 8, combined with eq 3 and 4. The approximation curve so obtained is also plotted in Figure 4 and is seen to be in close agreement with

the exact histogram. Although it is referred to as a "Gaussian approximation", this curve is obviously no longer Gaussian since the variable in eq 9 is now m rather than p , where p is now assumed to be known and hence fixed. Although the most probable value of the number of components is 19-20, the calculated probability distribution clearly indicates that there is a small yet real chance that a chromatogram exhibiting 10 peaks within an interval of peak capacity 20 could have originated from a mixture containing as little as 10 or 11components or as many as 36 or 37. The importance of the current model is thus due to its ability to calculate this range of "possible" component numbers at any specified probability level from experimentally determined p and n values, rather than simply ni, the average expected component numbers. Depending upon the relative values of the number of resolved peaks and of the system peak capacity, uni- or bimodal distributions of P,,"(m) can be obtained. In Figure 5 a histogram of the component number distribution, when p is still equal to 10 but where n is now equal to 50, is plotted. An expanded scale plot at relatively small values of m is also shown along with its Gaussian approximation. It is an interesting feature of the present model that ita single probability expression (eq 7) or ita Gaussian approximation provides the component number probability distribution of either one or two modes whenever they are present. Moreover, the theory allows any prior knowledge about the number of sample components to be included in the calculations. For instance, if it is known before performing the chromatographic analysis that the component number does not exceed, let us say, 50, then Po(m')will be set equal to 0 in eq 6 for every value of m > 50 and the distribution will have only the first mode shown in Figure 5. The results from the separation process, such as those shown in Figure 5, which constitute an a posteriori probability distribution of the number of components, can be used as an a priori distribution, Po(m'),to be entered into eq 5 when a further independent analysis of the sample is to be performed. The outcome from such a pair of analyses should lead to the virtual disappearance of one of the two initial apparent modes and to a significant sharpening of the remaining principal mode. Adherence of Real-World Chromatograms to the Theory. In a previous publication the underlying assumption of component retention time randomness was demonstrated for a limited set of real samples chromatographedby TLC and GLC methods (4). We now wish to choose a much larger set of data (multiple independent chromatograms for which we know the actual number of components in the mixture) and address ourselves to the question of which of the retention parameters, e.g., interaction energies, capacity factors, etc., in GC and HPLC are most nearly random. In one of the earlier publications by Davis and Giddings (2) an argument was made that the components of truly

2204

ANALYTICAL CHEMISTRY, VOL. 58, NO. 11, SEPTEMBER 1986

! ,?b

I

40

I

60

i

80

t

100

I

120

I ($0

I

t80

i

380

I

200

rn

m Figure 5. Total probability distribution of the component number, Pp,n(rn)for p = 10 and n = 50. The first mode is also shown on an expanded scale. The y axis is in arbitrary units and is therefore not indicated.

complex mixtures when chromatographed should distribute themselves randomly along the standard chemical potential scale, Ap”. If generally true, one would thus expect component retentions to be In k’rather than k’random under isocratic and isothermal conditions in HPLC and GC, respectively. To date, no direct evidence in support of this hypothesis has been advanced in the literature. We should stress that such a conclusion will only remain valid when utilizing nonprogramming techniques in GC and HPLC for it has previously been demonstrated in the literature that the apparent k’ of a sample component under typical programming conditions becomes nearly proportional to ita interaction free energy at the beginning of the gradient program and hence proportional to In kd (7,8).Thus, if the hypothesis of random standard chemical potentials is to be accepted as a general phenomenon in the chromatography of complex mixtures, large volumes of data must be scrutinized and shown to be In k ’and apparent k’ random under nonprogramming and programming conditions, respectively. The subsequent data and discussion below should be viewed by the reader as only a start in that ongoing process. A literature search was begun for the largest possible set of chromatographic retention data of pure compounds possessing diverse polarities, functional groups, shapes, etc., such that when artificially mixed they could be representative of the elution spectrum of a “typical” complex mixture. A rather large set of such data can be found in the handbook of Mc Reynolds (12). Therein the specific retention volumes, V,, of approximately 340 compounds on 79 distinct stationary phase9 at two different isothermal temperatures are tabulated. From these data we have chosen to use a much smaller subset of 20 compounds on 65 stationary phases to test the above hypothesis of free energy randomness for one of several reasons. First, the data are not complete for all compounds on all stationary phases a t a common analysis temperature. Second, many of the compounds tabularized are a member of the same homologous series as other entries in the handbook and are thus clearly not randomly spaced or hence assumed present in abundance in our hypothetical complex mixture. For this reason, all structural homologues that are known to be nearly equally spaced on the In k’scale were eliminated from consideration. Beyond these requirements and restrictions an attempt was made to include a t least one of each functional group type compound in the working subset of 20 and to choose compounds having approximately the same number of carbon atoms. Had the data set been much larger in size, i.e., had it contained many more compounds from among numerous additional functional groups, inclusion of

all components (inclusive of homologues) could have been justified. However, when forced to use a limited subset of data to arrive at some conclusion as to the inherent distribution of a larger parent population, obvious outliers at the extremes of the distribution as well as consecutive data points having obvious correlation need be excluded from the limited subset. A list of the 20 eluites and 65 stationary phases selected can be found in Table I. We have chosen to determine which of the two retention parameters, In V, or V , of the 20 compounds, is most nearly Poisson distributed on each of the 65 stationary phases by attempting to fit the data to the following normalized cumulative Poisson probability function:

Hence a plot of the cumulative integer number of points, N’, separated from their nearest neighbors by less than the relative distance x o / X when plotted as In (1- N’/(m - l))/(-m) vs. x , / X will yield a straight line with a slope of 1.0 and intercept equal to zero. In Figure 6 the normalized cumulative distributions of relative consecutive distances in units of V, and In V, for the 20 eluites of Table I are plotted for each of the 65 stationary phases and shown superimposed on single plots of In (1 ” / ( m - l))/(-m) vs. x o / X . Note that each point in these plots represents one consecutive distance on one of the 65 chosen stationary phases. The equation line of slope 1.0 and intercept equal zero predicted for ideally random data are also drawn. It is easily seen that the GLC retention data of the 65 stationary phases, when expressed on the In V, retention scale, are on average more equally centered about the predicted line for ideally random data than when scrutinized as specific volumes, V,. On the V , retention scale the consecutive distances are clearly consistently lower than that predicted for randomly displaced components on nearly all the stationary phases studied. Also shown in the figure are the normalized cumulative Poisson distributions for 20 data points generated by using the random number generator function of the Commodore 3032 computer and repeated 65 times so as to closely simulate the GLC retention data but where the data are now known to be a finite subset of an infinite random number population. As anticipated there are in all instances large degrees of scatter about the predicted lines owing to the finiteness of the data sets. Via comparisons to the normalized cumulative frequency distribution of the random number generator data, it is

ANALYTICAL CHEMISTRY, VOL. 58, NO. 11, SEPTEMBER 1986

.. .. . . .

2

.032-/-” ,016

..



.025

.OS

,

.I

.W5

.I25

.15

.175

.Z

.m

x. /x

8

,,

\

2

.

; ’ -- .

to the fact that the deviations are observed to increase nearly linearly with increasing yi and hence choose to weight the residuals accordingly. The ? values of the 65 stationary phases were averaged for the V,, In V,, and random number generator data of Figure 6 and were calculated to be 4.7 X 2.7 X and 2.6 X respectively. The correspondingstandard 1.4 X and 1.9 X deviations in r2 were 3.1 X respectively. We note that there is sizable overlap in these distributions such that on 12 of the 65 stationary phases the V , data of the chosen 20 compounds are more random according to the criteria of eq 13 than when expressed on the In V, scale. However, on the greater majority of the Stationary phases studied, In V, values are observed to be considerably more randomly spaced than when expressed as V , data as evidenced by the high correlation between the cumulative frequency distribution of the In V, and the random number generated data. Hence we conclude that our finding are in general agreement with the hypothesis of Davis and Giddings; Le., component retention of complex mixtures in chromatographic systems is free energy random. APPENDIX

.....

-.

.064

2205

RANDOM # GENERATOR

...-_. ..

Average a n d Standard Deviation of t h e Probability Distribution of the Number of Peaks when m and n Are Fixed. Let r be the number of intervals between adjacent points larger than or equal to xo. One has r=p-1

(A-1)

p = 1 + r

(A-3)

From eq A-1, one has

with inf(m-1 ,n-2)

p=

C

j=O .025

.E

.I

.075

.lt5

.I5

.175

.2

,225

x, /x Figure 8. Normalized cumulative frequency distributions of relative consecutive distances, x , / X . Comparisons between the V , (upper plot) and the In V , (lower plot) data for the 20 eluites on the 65 GC stationary phases listed in Table I. The cumulative dlstributlons calculated for 65 repetitions of a subset of 20 data from a random number generator are also shown (mlddle plot).

possible to obtain a numerical qualitative measure of the extent to which the finite V , and In V, data are Poisson distributed. Owing to the fact that all the data presented above must fall along the same universal line according to eq 13, a simple s u m of squares of deviations between the observed data points of Figure 6 for each stationary phase and the predicted ideal line will be an adequate numerical measure of apparent randomness of the finite data population. Defining y i and x i to be equivalent to the left-hand and righthand sides of eq 13, respectively, we calculate the sum of squares deviations for the i = m - 2 consecutive intervals of each stationary phase as

jP’’m,,,W

(A-4)

Developing the sum in eq A-4 for each value of j with the help of eq A-2 and rearranging the result by grouping terms with constant value of j + i, one gets inf(m-1,n-2)

(1-

ii= k=l

For all values of k except 1, one can write

The sum in the right-hand term of eq A-6 is easily recognized as equal to [ l + (-1)Ik-l, which is zero, according to the binomial theorem. Therefore, the double sum in equation A-5 is reduced to only the k = 1 term, which gives

m-2

r2 = C (yi- x J 2 / y i i-1

(14)

We divide the observed squared deviation by yi in eq 14 owing

From eq A-3, one gets the average value of p given in eq 3.

2206

ANALYTICAL CHEMISTRY, VOL. 58, NO. 11, SEPTEMBER 1986

Table I. Eluitee and Stationary Phases Selected from Reference 12 To Test Random Interaction Energy Hypothesis GLC stationary phases

eluites Apiezon J Apiezon N Carbowax 400 Carbowax 1540 Carbowax 20M dibutyl tetrachlorophthalate diisodecyl phthalate docosanol Ethofat 60-25 Hallcomid M18 OL neopentyl glycol adipate Oronite NIW Pluronic F88 Pluronic L63 Pluronic P65 polyphenyl ether-5 rings SE 30 SE 52 sucrose octaacetate tricresyl phosphate Ucon LB-1715 XF 1150

isobutyl alcohol sec-butyl alcohol tert-butyl alcohol 3-buten-1-01 3-buten-2-01 butyraldehyde isobutyraldehyde 2-butanone 3-buten-2-one propyl formate isopropyl formate ethyl acetate methyl propionate methyl acrylate ethyl methyl formal ethylene glycol acetal propyl methyl ether 1,2-butylene oxide trans-2,3-butylene oxide tetrahydrofuran

Apiezon L bis(2-ethoxy) phthalate Carbowax 600 Carbowax 4000 Castorwax di-2-ethylhexyl adipate dioctyl phthalate Dow Corning 550 Fluid Flexol8N8 Igepal CO 880 neopentyl glycol adipate terminated Pluronic F68 Pluronic L42 Pluronic L72 Pluronic P84 polyphenyl ether-6 rings SE 30 polyester NPGA terminated squalane Tergitol NPX triethylene glycol succinate Ucon 50 HB-2000 Zonyl E7

The variance of the p distribution is equal to the variance of the r distribution and is given by inf(m-l,n-2)

up2 =

C

r=O

i2P"m,n(i)- P2 = s - P2

64-8)

The double sum S can be rearranged similarly to what has been done above to give

S=

k

inf(m-1,n-2)

c

m-2

(1--)n - 1

i=O

Apiezon M Carbowax 300 Carbowax 1000 Carbowax 6000 Citroflex A-4 di-2-ethylhexyl sebacate dioctyl sebacate Dow Corning FS 1265 Fluid Hallcomid M18 isooctyl decyl adipate neopentyl glycol succinate Pluronic F77 Pluronic L44 Pluronic P46 Pluronic P85 Quadrol SE 31 sucrose acetate isobutyrate TMP tripelargonate Triton X 305 Versilube F-50

where P,,,r,n(p)is given by eq 2. Developing the sum for each value of m and grouping terms with constant value of m + i, it becomes n-p-1

c = i=Oc

(

1-

c-

n-1

i(-i)k-i (rn - l)! (rn - 1 - k ) ! i = ~ (-i l)!(k - i)!

5

+ +j)!

(-1)i (p- 1 i (p - l)!i!j=O j!

n-1

(A-9)

(A-14)

It is easy to show that the function

One can write

(A-15) k

(i - l)(k - I)!

k

(k - l)!

1

obeys to the following recurrence relationship:

(A-10)

(A-16)

S'can be rearranged in order to make apparent the binomial

Since So(%)represents the serial development of 1/(1- x ) , one gets

coefficients in the two sums of eq A-10. One gets

If one replaces k by (p - 1 + i) and x by [ l - ( p - 1 + i ) / ( n I)] in eq A-17 and combines the results with eq A-14, it becomes

-

According to the binomial theorem, the two sums in eq A-11 are reduced to their first j = 0 terms, which gives for S

s = (rn - l(1-

A) m-2

+ i)!

i=O

t

(

(rn - l)(m - 2) 1 - n! l)m-z

After rearrangement, one obtains

(

-1

p-l+i

m

C = C pm,,n(p)

(A-18)

(A-12)

After rearrangement of S and combination with eq A-8, one gets the desired expression for up2given in eq 4. Normalization Constant of the Probability Distribution of the Component Number When p and n Are Fixed. This normalization constant, C, is equal to m '=p

(p- 1

(-1)i

(A-13)

)'(

n - p - i)p-'+i p-l+i

Combining eq A-18 and A-19 and noting that

(A-19)

Anal. Chem. 1986. 58. 2207-2212

2207

Polytechnique, are greatly acknowledged.

LITERATURE CITED with the convention that expression for C

n = 1if i = 0, one obtains the final

(A-21) For practical computational purposes, one can note that

with To= 1. Although still expressed as a sum, eq A-21 is much more convenient for practical purposes than eq A-13 or A-14, since its s u m contains only n - p terms while the latter equations are made of an infinite number of terms.

ACKNOWLEDGMENT Fruitful discussions with Joseph Gril and Jean-Baptiste Leblond of the Laboratoire de MBcanique des Solides, Ecole

(1) Rosenthal, D. Anal. Chem. 1982, 5 4 , 63-66. (2) Davis, J. M.; Giddings, J. C. Anal. Chem. 1983, 55, 418-424. (3) Martin, M.; Guiochon, G. Anal. Chem. 1985. 5 7 , 289-295. (4) Herman, D. P.; Gonnord, M. F.; Guiochon, G. Anal. Chem. 1984, 56. 995-1003. (5) Davis, J. M.;Giddings, J. C. J . Chromatogr. 1984, 289, 277-298. (6) Giddings. J. C.; Davis, J. M.; Schure, M. R. Uffrahigh Resolut/on Chromatography;Ahuja, S.,Ed.; American Chemical Society: Washington, DC, 1984; ACS Symposium Series 250, pp 9-26. (7) Habgood, H. W.; Harris, W. E. Anal. Chem. 1980, 32,450-453. (8) Snyder, L. R. High-Performance LiquM‘ Chromatography-Advances and Perspectives; HorvBth, C., Ed.; Academic Press: New York, 1980 Val. 1, pp 207-318. (9) GMdlngs, J. C. Anal. Chem. 1967, 39, 1027-1028. (IO) Kendall, M. G.; Moran, P. A. P. GeometricalProbability;Charles Griffin & Co. Ltd.: London, 1963; Chapter 2. (1 1) Santalo, L. A. Integral Geometry and Geometric Probability; Encyclopedia of Mathematics and Its Applications; Addison-Wesley: London, 1976; Vol. 1, Chapter 2. (12) MC Reynolds, W. 0. Gas Chromatographic Retention Data; Preston Technical Abstracts: Evanston, IL, 1966.

RECEIVED for review October 19, 1984. Resubmitted May 5, 1986. Accepted May 5,1986. Part of this paper was presented at the 15th International Symposium on Chromatography, Nurnberg, Germany, October 1-5, 1984.

Physical Model for Gas-Solid Chromatography with Volatile and Nonvolatile Modifiers Jon F. Parcher,* Ping J. Lin, and David M. Johnson

Chemistry Department, University of Mississippi, University, Mississippi 38677

The two-dimensional (adsorption) version of the scaled partlcie solution theory has been used to Interpret the slnglecomponent isotherms of propane, butane, and acetone on a graphltired carbon black adsorbent over a range of temperatures. I n addttion, the muttlcomponent adsorption model has been used to develop an equatlon to descrlbe the retention volume of an infinite dilution solute as a function of surface coverage by a volatile modlfier as well as to interpret the experimental results from previous investigations of such quasi-blnary chromatographic systems. Several sets of binary chromatographic data In the form of the retention volume of one solute as a functbn of the surface coverage by a different solute were regressed to the proposed retention volume equation. The adsorption coeffklents of each solute obtained from the binary isotherm data (propane butane and acetone butane) agreed with those determined from the single-component isotherms as well as those calculated from specific retention volumes of the inflnlte dilution solutes with no modifier present. The retention volume equatlon provides the first quantltatlve description for the retention of one solute as a function of the surface coverage by a second component.

+

+

Volatile “modifiers”are commonly used in gas, liquid, and supercritical fluid chromatography (SFC) to decrease the retention and improve the peak shape of eluted samples. The modifiers are usually low molecular weight, polar materials, such as water, alcohols, or amines, which are usually added

to the inert gas mobile phase in relatively low concentrations. Nonvolatile modifiers have been used as well to improve the efficiency of gas-solid chromatographic systems. In particular, the graphitized carbon black adsorbents, such as the Carbopacks, are commonly coated with a low percentage of a polar liquid phase. Although the use of these modifiers is common, they introduce a certain “black-magic”aspect to chromatography because there is no theoretical model for the prediction of the effect of a given modifier in any chromatographic system. Most of the current applications involving volatile or nonvolatile modifiers have been developed empirically by experienced chromatographers. Thus, there is a tremendous need for a sound theoretical model that will allow the design of a system for a particular separation that requires the use of some type of modifier. Recently supercritical fluid chromatography has emerged as a viable separation method to bridge the gap between GC and HPLC. Many studies (1-5) have shown that volatile modifiers added to the supercritical mobile phase improved the chromatographic separations in SFC. The most common interpretation of the effect of polar modifiers, such as ethanol, was that the modifiers interacted with the stationary phase to block residual, active sites, usually silanol groups, on the solid support. However, Hirata ( 4 ) has shown that the mechanisms are probably more complex because in some cases the modifier, especially at high concentrations, can cause an increase in the retention of some solutes. This enhancement has been attributed to either decreased solubility of the solute in the mobile phase or increased adsorption on, or absorption in, the stationary phase ( 3 , 4 ) . However, the exact mechanisms are uncertain. In addition, the physical properties, such as

0003-2700/86/0358-2207$01.50/00 1986 American Chemical Soclety