Anal. Chem. 1985, 5 7 , 289-295
289
Analogy between the Depolymerization and Separation Processes. Application to the Statistical Evaluation of Complex Chromatograms Mich&@artin* and Georges Guiochon Laboratoire de Chimie Analytique Physique, Ecole Polytechnique, 91128 Palaiseau, France
The Separation process is regarded as a peak generating process and an analogy is made between this process and a polymer degradation process. A new Index of performance Is defined, the extent of separation. Reiatlonshlps between the number of sample components, the number of observed peaks, and the number of multiplets are derived, from the analog counterparts, In terms of the extent of Separation. They are shown to be identical with those derlved by Davis and Giddlngs provlded the extent of separation Is appropriately related to the p$ak capaclty. A method Is proposed for the determlnatlon of the extent of separation and the number of sample components from the measurement of the peak areas performed by any suitable integrators. The method Is applied to GC crude 011 fractions and LC fat samples.
In the present days, the reports established by analytical chemists are frequently of utmost importance for many industrial, environmental, or biological processes. Highly accurate and precise analytical data are requested about the major components of the sample, as well as about significant minor constituents, sometimes present at a trace level. This implies minimal interferences between the sample constituents at the measurement stage of the analytical system. This can be achieved either by using highly selective, as well as sensitive, measuring devices or by physically sorting the sample constituents. For this reason, highly efficient analytical separation techniques have been developed in the last 2 decades and are still being developed today. The driving force for this development is the achievement of higher and higher efficiencies in order to separate more and more components. Good examples of this trend are the widespread use of long capillary gas chromatographic columns and the effort for using relatively long, possibly microbore, columns packed with very small particles in liquid chromatography. The peak capacity of such columns, which is the maximal number of componenta separated with a minimum stated, usually unit, resolution (1) generally lies in the range 50-300. However, such values are too low for resolving highly complex mixtures, such as many samples of natural origin. Accordingly, in order to increase the overall resolving power, separation methods have been combined with analytical methods, like gas chromatography and mass spectrometry, or combined together. For instance, high-resolutiontwo-dimensional electrophoresis (2) allows the quantitation of several thousand proteins in physiological samples (3). Similarly, two-dimensional liquid chromatography is expected to yield peak (or spot) capacities exceeding 103 ( 4 ) . However, in spite of these high peak capacity values, it has been recognized in the past few years that there are fewer resolved peaks than expected, even when the number of components in the sample is much lower than the peak capacity. This has been pointed out by Rosenthal who found 0003-2700/85/0357-0289$0 1.50/0
that, during a GC/MS analysis, the reason for the computer failure to correctly identify some components in a complex sample was that the MS spectrum presented to the library search routine corresponded to that of a mixture of components rather than that of a single compound (5). He obtained excellent agreement between the number of occurrences of overlaps in the chromatogram and the theoretical prediction of the number of multiplets, based on combinational analysis. The seriousness of the component overlapping problem has been recently put in quantitative terms by Davis and Giddings, who, using the Poisson statistics, showed that, on the average, the maximum number of observable peaks in a chromatogram cannot exceed 37% of the peak capacity and that, worse, in this case, only 14% of the components can be isolated as pure substances (6). The Davis and Giddings approach is based on the assumption of a random distribution of retention times of the sample components. Although this appears to many chromatographers to be an unrealistic hypothesis as it gives impression that nothing is understood about the retention mechanisms and elution order of compounds, statistical analysis of data for various samples in GC as well as in LC has shown that this is a generally correct assumption, except in obvious cases, like for homologous series (7,8). One of the interesting results of this theory is that it gives a simple relationship between the number of observed peaks in the chromatogram and the probable number of components in the sample in terms of the system peak capacity. Strictly speaking, the Davis and Giddings model assumes, apart from the fact that retention times are uniformly (randomly) distributed, that the component zones have equal peak width and peak height and that the density of components is constant all over the retention interval on which the statistical theory is applied. The equal peak width assumption is usually valid for most complex analyses, generally performed in gradient conditions, temperature programming in GC, and gradient elution in LC. The equal peak height assumption has been shown to give estimates of the number of sample components, which are not significantly different from those obtained with more realistic peak height distributions (7, 9). In the following, we present a different approach of the peak overlapping problem, based on the analogy between the separation and depolymerization processes. This allows the definition of a new parameter, which we call the extent of separation, by analogy to the classical definition of the extent of a chemical reaction. While such parameters, as the plate number or peak capacity, characterize the efficiency of a chromatographic system whatever the separation difficulty, this extent of separation measures the efficiency of a chromatographic operation regarding a specific separation problem. A method is provided for the estimation of this extent of separation. The results of the analogy are compared with those of the Davis and Giddings model. THEORY Polymerization and Depolymerization Processes. Assume we have a very large number, m, of molecules of a 0 1984 American Chemical Society
290
ANALYTICAL CHEMISTRY, VOL. 57, NO. 1, JANUARY 1985
substance which may react together to give oligomeric and polymeric molecules. Each molecule of the reaction mixture can be characterized by its degree of polymerization,k , which is the number of structural units which it contains. After some time, the reaction mixture contains molecules with values of k varying over a wide range. Usually, there is a systematic variation of the frequency of occurrence of a given degree of polymerization, which may be expressed as a distribution function, Xk = f ( k ) ,where Xk represents the mole fraction of k-mer in the mixture. We are interested in the form of the distribution function when the polymerization reaction is random, that is, when a given molecule has an equal chance to bind to any other molecule present in the mixture whatever its k value. In practice, such a reaction can be modeled by the condensation polymerization of a bifunctional substance, such as H2N-RCOOH, which would polymerize to molecules of the form H2N-R-CO(-HN-R-CO)k-2-HN-R-COOH where k is the degree of polymerization (10). Indeed, in such a case, it can be assumed that there is virtually no change in reactivity of functional groups with chain length. Then, the distribution function can be simply obtained in terms of the De Donder extent of reaction, p. It is known as the Flory most probable distribution and is expressed as (11, 12) The extent of reaction represents the fractional number of functional groups of a given type, for instance, the acid group of an amino acid, which have reacted. Since each molecule in the reaction mixture contains one and only one nonreacted acid group, the fractional number of these nonreacted acid groups represents the ratio of the total polymeric molecules at a given time, p , to the number of initial monomers, m p/m=l-p
(2)
The probability of formation of a k-mer,pk, can be obtained as the ratio of the number of k-mers, nk, to the initial number of monomers. Combining eq 1 and 2 and noting that Xk represents n k / p ,one gets Pk = (1 - p)2pk-1 (3) In the process of polymer degradation, when the scission of a polymer chain is random, it can be shown that eq 1 to 3 remain valid to describe the distribution of the degrees of polymerization of the molecules (12). In this case, pk represents the probability, at a given time, that a given functional group is bound. Analogy between the Depolymerization and the Separation Processes. At the beginning of the separation process of a complex mixture, all the components of the sample are clumped together in a single peak of high complexity. During the course of the separation, several peaks of lower complexity appear progressively, in increasing number and decreasing complexity with time. Furthermore, peaks already separated can never be remixed in the later stages of the separation process. Therefore, the separation process can be assimilated to a polymer degradation, or depolymerization, process. The objective of the analyst is that the separation can be pursued during a time large enough for the degradation to occur completely until no more polymer exists and the whole substance is recovered as monomeric molecules. How long should such a process last depends, of course, of the complexity of the mixture to be resolved. Accordingly, in this analogy, the components of the sample are equivalent to the monomeric molecules. At the early stages of the course of the separation, they are clumped in several distinguishable peaks of varying complexity. These peaks are equivalent to the polymeric molecules and their complexity can be described similarly to a degree of polymerization. The evolutions with
Table I. Evolution of the Polymerization, Depolymerization, and Separation Processes with Time time 0
t m
polymerization
depolymerization
separation
m monomers
one very large one peak macromolecule (one very large multiplet) distribution of distribution of distribution of po1ymer s polymers multiplets (P peaks) one very large m monomers m singlets macromolecule (sample components)
time of the polymerization, depolymerization,and separation processes are compared in Table I. While theory predicts that the occurrence of one very large macromolecule or m monomers cannot be observed during the polymerization or degradation processes, respectively, but after an infinite time, the analyst hopes that the m singlet peaks corresponding to the m components of the sample will occur after a finite time. If the sample mixture is highly complex, however, this may be achieved only after a very long time, often quite longer than usual analysis times, which for all practical purposes can be considered as infinite. The analyst is chiefly interested in a correct description of the result of the separation, would it be a chromatogram, an electropherogram, or any record of another zonal separation process. He or she wants to get a picture about the level of complexity of the different peaks or zones which have been separated. Davis and Giddings, in their discussion of the randomness assumption of their model (6),have clearly shown that disorder is much more likely than order in the chromatographic retention spectrum of complex mixtures, which is supported by experimental observations (7,8) and, thus, validates the assumption. Then, peaks appear randomly on the chromatogram. Similar observations are likely to be made on isolectric focusing patterns of complex protein samples. Therefore, at the early stages of the separation process, when the initial peak containing all the sample components splits, this most likely occurs at random. Any further splitting again most likely occurs through a random process. Consequently, if this separation process, which is a peak generating process is to be compared with a polymer degradation process, the best description will be obtained by assuming that the scission of the polymer is random. The eq 1to 3, which describe a random polymerization as well as a random polymer degradation process, can be applied to the separation process. Then, m represents the number of components in the sample, which is unknown to the analyst, p is the observable number of separated peaks, nk is the number of peaks with a degree of multiplicity k (number of k-plets), and xk is the fraction of the number of peaks which are k-plets. Equations 1 to 3 depend on the parameter P, which has been defined as the extent of the polymerization reaction. As suggested by the comparison in Table I, @ can be considered as the complement to 1 of the extent of the polymer degradation reaction. Extent of Separation. Consequently, the analogy between the depolymerization and separation processes suggests the definition of the extent of separation, y, which in the above equations, is equal to r=1-/3 (4) Therefore, one gets P/m = Y
Pk = n k / m = y2(1 and
(5) (6)
ANALYTICAL CHEMISTRY, VOL. 57, NO. 1, JANUARY 1985
(7) In the special case when k = 1, one has the following expression of the fraction of the components.whichare resolved as singlets:
n l / m = y2 With this analogy, the above definition of the extent of separation arises naturally from the De Donder concept of the extent of reaction for chemical reactions and has similar properties. Especially, it is normalized and its value ranges between 0 and 1. As mentioned above, while widely used parameters such as the number of theoretical plates or the peak capacity characterize the efficiency of a separation system, the extent of separation can be used as an index for characterization of the efficiency of a separative operation regarding the specific sample mixture at hand. Moreover, it is applicable to any zonal separation method and can serve as a means of comparison of various techniques for a given problem as well as for optimization purposes. Indeed, as this extent of separation is simply equal to the ratio of the number of observed, separated peaks or zones to the number of sample components, the separation technique which will yield, for a given sample, the largest number of peaks will have the highest extent of separation. Many efforts have been devoted in the past at finding a suitable criterion of merit (13) for quantitative evaluation of a separation and comparison of the performances of separation met.hods. Several parameters have been defined for this purpose. Most of them, however, such as the impurity index (14), the widely used chromatographic resolution (151, the separation function (16),or the valley to peak ratio can be applied only to two-component mixtures. Another one has been called, by Rony, the extent of separation (18). Although also normalized between 0 and 1 and especially derived in the case of elution chromatography (19-21), it has, however, no similarity with the above y parameter, especially because it only applies to binary mixtures. The measurement of the entropy of separation has been suggested as a general separation criterion (22). Its application to multicomponent mixtures, however, is not easy and requires the determination of a resolution matrix (23). The determination of functions based on the overlap integrals is also complex (16). The more recently proposed inefficiency number is of limited convenience as, in order to evaluate it, the analyst has to define a penalty matrix on a somewhat arbitrary basis (24). The saturation factor (6) is a kind of separation efficiency parameter but its usefulness for this purpose is limited to the case where the observed number of peaks is strictly equal to the predicted average for a purely random chromatogram. It is not our purpose to criticize these various criteria of merit, some of them, for instance the chromatographic resolution, having proven to be extremely useful. The extent of separation, y, the introduction of which arises from the analogy between the polymer degradation reaction and the separation, may be considered as defiied by eq 5. As it stands, it has, compared to other indexes of evaluation of the separative performances, its own merits and limitations. It has a clear and simple physical meaning. But, in contradistinction with most above mentioned indexes, its usefulness lies mostly for complex mixtures. Indeed, in the case of a binary mixture, for instance, it could only take two values, 0.5 and 1. One could have defined it slightly differently, as the ( p - l ) / ( m - 1) ratio, so that it could effectively take all the values between 0 and l , including 0. However, the difference between this ratio and p l m decreases with increasing m and becomes negligible for sufficiently complex mixtures. One should note that, similarily, the extent of a chemical reaction is an intrinsically discrete variable when the number of reacting
(In,
291
Table 11. Comparison of the Results Obtained by the Davis and Giddings and the Present Models Davis and Giddings model p l m = e-u pk = n k / m = e-2*(1 n l / m = e-2u
-
present (analogy) model
Plm e-ulk-1
=Y
pk = n k / m = y2(1- y)k-' n l / m = y2
molecules is small. Nevertheless, its discrete nature arises from the discretness of the denumbering of separated peaks. At a given time during the course of the separation, the diachorismogram (from the greek diachorismos, which means the separation) appears as a succession of peaks and valleys delimiting the contours of the separated observable peaks in the separation space. In the above discussion, one peak is counted each time a maximum is encountered in the overall concentration profile of the sample. This mode of numbering peaks can be extended if more information is known about the individual component zones. For instance, in chromatograms with nearly Gaussian component concentration profiles, the appearance of a shoulder on a peak unambiguously reveals the presence of, at least, two components. In this case, the number of peaks can be stated as equal to half the number of zeros of the second derivative of the overall concentration profile, rather than equal to half the number of zeros of the first derivative minus one. Since it is known that each component zone has two and only two inflection points, the former number can never be smaller than the latter one. At this stage we leave open the possibility of further extension of the mode of numeration of the separated peaks, when a device for scrutinizing the diachroismogram more powerful than a overall concentration detector can be used, as, for instance, in the GC/MS coupling, or with diode-array HPLC detectors. Comparison with the Davis and Giddings Model. Equations 4 to 7 relate the number of multiplets of different kinds to the numbers of components and to the extent of separation. Similar relationships have been derived by Davis and Giddings (61, but in terms of the system peak capacity, n,,in their model based on the Poisson statistics. The results obtained by these two models are compared in Table 11. In this table, a,the saturation factor (6), is equal to mln,. The comparison in Table I1 clearly shows that both models are identical, provided that the extent of separation is related to the peak capacity as follows:
'= e-a = pin, (9) This simple relationship as well as the complete identity of the results obtained by these two models may appear quite surprising. In fact their basic underlying assumptions,random distribution of retention times for the Davis and Giddings model and random separated peak generation from the initial sample peak for the analogy model, lead to the same distribution pattern for the component zones. The application of these models is then restricted by the same limitations noted above. Therefore, all the conclusions derived from the Davis and Giddings model also apply to the polymerization analogy model. It can be noted in eq 9 that the extent of separation depends not only on the performance characteristics of the separation system, reflected in n,, but also, through m, on the difficulty of the separation problem. Equation 9 allows the estimation of the rate of separation, which is proportional to the rate of generation of separated peaks, as a function of the rate of generation of peak capacity. In Figure 1, the extent of separation, or the number of observed peaks is plotted as a function of n,/m (curve a). The fractional number of components which are isolated as single peaks is also plotted in
292
ANALYTICAL CHEMISTRY, VOL. 57, NO. 1, JANUARY 1985 P =
m2/m12
(12) In the case of the separation process, the analogue of the degree of polymerization or molecular weight of a polymeric molecule is the multiplicity of the separated peak. Let & be the polydispersity index of the peak multiplicity number distribution, defined in terms of the moments of this distribution like in eq 12. Then the combination of eq 4, 5, and 11 gives the following expressions for y and m: Y=2-pk
(13)
and
= P / Y = P / ( 2 - pk)
(14)
Since & cannot be directly determined, one will assume that it is equal to the polydispersity index of the peak area distribution, pa pk = I 0
1
I 3
2
n,/
I 4)
m
Flgure 1. (Upper curve) Extent of separation, 7, vs. the ratio n,lm of the peak capacity to the number of components. (Lower curve) Fraction, n ,lm, of the number of components isolated as singlet peaks vs. n ,Im .
Figure 1 vs. n,/m (curve b). In order for the number of peaks to become a significant fraction of the number of components, the peak capacity must largely exceed this number of components. In the above discussion, the peak capacity, n,, must be understood as the maximum number of peaks which can be accommodated in the separation space, when ordered in a perfect pattern. This definition of the peak capacity must be consistent with the method of counting the number of separated peaks. For instance, if the number of peaks corresponds to the number of maxima observed in separation systems, then the peak capacity of these systems will be close to the one defined on a 2a resolution basis because two maxima are obtained by addition of two equal height and equal width Gaussian curves when the distance between their apices exceeds 2a. Estimation of the Extent of Separation. The number of components present in a sample, m, as well as the number of singlets, n,,doublets, n2,...,multiplets, nk, can be estimated from the number of observed peaks, p , using eq 5-7, provided that the extent of separation, y, is known. One can go further into the analogy between the depolymerization and separation processes, to determine the extent of separation, recalling that, after eq 4, it is the complement to one of the extent of polymerization. The relative broadness of the distribution of the degree of polymerization (or of the molecular weight) is frequently characterized by a parameter p, called the polydispersity index and defined as the ratio of the weight average degree of polymerization, Kw (or weight average molecular weight M,) to the corresponding number average, kn (or M,,) (10) P = k w / k n = Mw/Mn In the case of a condensation polymerization, which models a random linear polymerization process, the polydispersity index is easily related to the extent of reaction through (1&22) p=1+p (11) Its value lies consequently between 1 and 2. It can easily be shown that p , classically defined by eq 10, may also be expressed in terms of the first and second moments, mland m2, respectively, of the nubmer distribution of the degree of polymerization (or molecular weight) as
pa
(15)
In fact, we do not presently know if & and pa are equal or not. There are, however, indexes that they may be close to each other. First of all, the amount of substance present in a sample is likely to vary more or less randomly from one component to another and in a rather broad range, several orders of magnitude. Therefore, with a detector giving the same response per unit amount of substance for all components, one single peak may have a large area. But at the same time, a multiplet may contain several minor components and have a rather small peak area which may level off somewhat the discrepancy brought between the multiplicity and area distributions by the fact that the former peak contributes only a little to the moments of the multiplicity distribution and relatively largely to those of the peak area distribution while the opposite is true for the latter peak. Furthermore, with some detectors, the response factor may vary largely and more or less randomly from one component to another, which may constitute another factor leveling off the differences between the multiplicity and peak area distributions. In addition, the frequency distribution of relative peak areas observed by Nagels et al. (25) in HPLC determinations, with an UV detector, of phenolic compounds in 62 different extracts of plant leaves can be shown to closely approach the distribution of the degrees of polymerization given by eq 1. This suggests that the pavalue of the peak area distribution is close to the F k value of the separated peaks, according to the analogy presented above. Work is in progress to elucidate the relationship between l.Lk and pa. pa can easily be determined, provided that the individual areas, ai, of the separated peaks are known. The two first moments of the distribution are given by P
and P
m2 = ( C ~ ? ) / P i=l
The polydispersity index, pa,is then, according to eq 12, equal to pa
= p(Caj2)/(Caj)2
(18)
If the relative peak areas, ri, are known, with
ri = a i / ( C a i )
(19)
then eq 18 becomes P Fa
= p(Crl7 i=l
(20)
ANALYTICAL CHEMISTRY, VOL. 57, NO. 1, JANUARY 1985
The desired values of the extent of separation and of the number of constituents are obtained by combination of eq 13-15, 18, and 20 y =2-
[p(Cu~)/(C~i)~l = 2 - [p(Cr?)]
m = p / y = 1/[(2/p) - (Cri2)I
~
Table 111. Estimation of the Extent of Separation, the Number of Components, and the Peak Capacity of Three Isomeric Families of Substituted Aromatics in an Emeraude (Congo) Crude Oil Fraction
(21)
and (22)
Most integrators designed for chromatography or electrophoresis provide a report of the areas of the separated peaks in arbitrary wits, together with their retention characteristics. In order to extend the capabilities of data stations, one can suggest the inclusion in the report of the values of the square of the peak areas, or, at least, the sum of these squares, so that the probable number of components can easily be estimated after eq 21 and 22. This is an easy task for microcomputerized systems. Furthermore, in the case when the equal peak width assumption is justified for the component zones, the peak capacity of the retention inverval in which peaks are counted can be estimated from the combination of eq 9, 21, and 22
family dimethylnaphthalene trimethylnaphthalene 3-alkylphenanthrene
n, = -m/(ln y) = - p / [ ( 2 - d In (2 - d l
(23)
This peak capacity value is consistent with the mode used to number the separated peaks and may be different from the one directly calculated from peak width measurements on the basis of a 4a resolution.
EXPERIMENTAL SECTION Fractions of two crude oils, Emeraude (Congo) and Midway (California) were analyzed by GC on wall-coated capillary columns. Chromatograms of the MezSOand n-pentane extraction fractions of the Emeraude crude oil, containing aromatic and aliphatic compounds, respectively, were obtained on a 25 m long, 0.3 mm i.d. column coated with a 0.5 r m thick layer of the apolar OV73 stationary phase, in temperature programming conditions from 40 "C to 250 "C at 3 OC/min (26).Analyses of the Midway crude oil extract containing nitrogen bases were performed on a 56 m long column coated with a 0.15 r m thick layer of the OV73 liquid phase by programming the temperature from 50 "C to 260 OC at 1.5 OC/min (27). A flame ionization detector was used and hydrogen was the carrier gas. Fats and fat oils were analyzed in LC on two 30 cm X 4 mm Lichrospher 100 CH-18/2 columns (Merck, Darmstad, Germany) coupled in series and packed with 5-rm particles in gradient elution conditions by means of a Model 660 solvent programmer (Waters Associates, Milford, MA). The butterfat chromatogram was obtained by linearly programming the content of acetonitrile in acetone from 20% to 1% in 20 min. With the homemade light scattering detector used, the peak area is found to be proportional to the power 1.55 of the amount of sample, nearly independently of the compound (28). The nonlinearity of the detector was taken into account in the peak area calculations. RESULTS AND DISCUSSION Equations 21 to 23 have been applied to GC chromatograms of petroleum fractions. With the flame ionization detector used, the response factor for a given component is nearly proportional to its number of carbon atoms. In order to avoid introduction of any systematic variation in the response factor, one has computed the peak areas for fractions of the chromatogram containing isomeric substances, like alkyl derivatives of aromatic hydrocarbons. In the present analytical conditions it has been found that there is virtually no interference between various isomeric families, a t least when the number of carbon atoms of the substituents as well as the number of aromatic rings are small. In fact, it may happen that, for example, alkyl derivatives of benzene having a total number of substituent carbon atoms equal to six interfer with dimethyl- or ethylnaphthalene. However, the abundance of
estimated estimated no. of extent of no. of peak peaks obsd separation components capacity 7
0.671
10.4
26.4
11
0.677
16.3
41.6
11
0.155
70.8
38.0
Table IV. Estimation of the Extent of Separation, the Number of Components, and the Peak Capacity of Two Isomeric Families of Alkanes in an Emeraude (Congo) Crude Oil Fractiono
family
as
293
dodecan e tridecane dodecane t tridecane
estimated estimated no. of extent of no. of peak peaks obsd separation components capacity 28 22 50
0.536 0.411 0.476
52.2 53.6 105.0
83.8 60.2 141.5
OThe n-alkane peaks are not included in the calculations because their peak areas in the chromatogram are systematically much larger than the areas of their isomers. These families are not purely isomeric families, as discussed in the text. the former derivatives in petroleum fractions is much less than that of the latter ones so that their contribution to the total peak area is negligible. The results of the computations of the peak areas are reported in Table I11 for three families, dimethylnaphthalene, trimethylnaphthalene, and Cs-alkylphenanthrene of the MezSO extraction fraction of the Emeraude crude oil. This last family corresponds to the isopropyl, n-propyl, methylethyl, and trimethyl derivatives of phenanthrene. The numbers of components estimated from eq 22 and reported in Table I11 can be compared with the number of possible isomers which are 10 for the dimethylnaphthalenes, 14 for the trimethylnaphthalenes, and 115 for the C,-alkylphenanthrenes (60 trimethyl, 45 methylethyl, 5 n-propyl, and 5 isopropyl derivatives). The values of m for the first two families are quite close to the total numbers of isomers, so that one may reasonably think they are all or almost all present in the chromatogram. However, for the last family, the estimated number of components is significantly smaller than the total number of isomers. It is therefore likely that only a small fraction of the possible isomers are present at a detectable level in the sample. It must be emphasized that statistical methods of evaluation of diachorismograms can give information only on those sample components which are present in sufficiently large amounts to be measurable by the detection device used. The concept of number of components in a sample has no meaning by itself if the limit of detection of the measuring device is not specified. However, it may happen that two or several components present at too small a concentration to be individually detectable elute so closely that the resulting band become detectable. The importance of the occurrence of such interferences in the results should be evaluated by probabilities methods in order to properly interpret these results. The results obtained for the aliphatic fraction of the Emeraude sample are reported in Table IV for the dodecane and tridecane isomers, respectively. The estimated number of components is much smaller than the number of possible
294
ANALYTICAL CHEMISTRY, VOL. 57, NO. 1, JANUARY 1985
I
0
I'l
0
20
3o
mi"
Figure 2. LC chromatogram of butterfat: columns, 2 X 30 cm X 4 mm Lichrospher 100 CH-1812 (Merck, Germany), 5 pm particles; gradient elution, 80-20 to 99- 1 acetone-acetonltrlle in 20 mln; sample, 5 pL of a 10% solution of butterfat in acetone: detector, homemade
light scattering detector.
isomers equal to 355 and 802 for the dodecane and tridecane, respectively (29). It is here obvious that only a small fraction of these isomers are present in the sample at a measurable level in the typical GC operating conditions. In fact, the portion of the chromatogram attributed to the dodecane family, for example, also contained substituted cyclanes, as well as higher order largely branched alkanes, while some largely branched dodecanes eluted before the n-undecane. It is interesting to compare the results obtained for these individual families with the one obtained by combining them. The polydispersity index of the whole peak area distribution of the dodecane and tridecane portions of the chromatogram was computed without correction for the slight difference in the response factor of the flame ionization detector for the two families. The results are also reported in Table IV. It is seen that both the estimated number of components and the estimated peak capacity are very close to the sum of the corresponding numbers for the two families. The excellent agreement found in this internal consistency test gives confidence to the applicability of the method for samples of natural origin. Although the peak area method of estimation of the component number is strictly valid for complex mixtures, one may try to apply it to less complex samples or fractions of a chromatogram. The portion of the chromatogram of the Midway extract where C2alkyl-benzo[h]quinolinesare found contains only three observed peaks while there are 350 possible isomers of these compounds. Computation of the peak areas of these peaks sensed with a flame ionization detector yields an extent of separation equal to 0.986 and m equals 3.0. That only three components are present in this fraction was confirmed by GC/MS and high-resolution Shpol'skii spectrofluorimetry (27). Such an agreement is quite surprising as the peak area method is not expected to work properly for simple samples or fractions of sample. In another case, for the similar fraction of an Emeraude sample, while three peaks are observed in GC, GC/MS, and Shpol'skii fluorimetry, the method yields 6.8 components. The agreement is less satisfactory than for the other sample but, nevertheless, this method confirms that the number of components present in the sample is much smaller than the number of possible isomers. Equations 21-23 were also applied to the 36 peaks observed in an LC chromatogram of butterfat, which contains mostly triglycerides, shown in Figure 2. The extent of separation is found equal to 0.435, which gives an estimate of 82.7 for the number of components and 99.4 for the peak capacity. In the case of the chromatogram of the triglycerides in cod liver oil obtained in chloroform-acetonitrile gradient conditions, 22 peaks are observed, the extent of separation extends to 0.298 which gives the values 73.9 and 61.0 for the estimates
of m and n,, respectively. While it is not possible to compare these values with other data because they are not available, they appear to be quite reasonable. It can be even expected that this statistical analysis of chromatograms together with the analysis of fatty acid composition of the samples can give some clues to the formation of triglycerides in natural oils. For the purpose of estimating the number of sample components present at a measurable level by the detecting device used, the peak area method offers an alternative to the Davis and Giddings method based on peak capacity measurements and has specific advantages over this method. First of all, it does not require any measurement other than the generally reported determination of the peak areas. Then it gives a single value of m while the Davis and Giddings method requires that the analysis is performed at least twice in different operating conditions to solve its double value problem. Furthermore, the mode of peak area measurement is automatically adjusted to the mode of peak number counting. However, this peak area method has also its critical counterparts. It requires well-operating integrators which do not distort the peaks, especially since it is quite sensitive to the presence of small peaks. Indeed, a small peak will not significantly contribute to the total peak area or to the sum of the squares of the peak areas, but, it will increment p of one unit and, therefore, from eq 18, it may affect significantly p and, consequently, the extent of separation and m. It must be kept in mind that both the Davis and Giddings theory and the present theory based on the analogy are statistical methods and their results are expected to closely describe the real separation process only for complex mixtures. For samples of limited complexity, it may happen that these methods do not work, for instance, when p calculated from eq 18 becomes larger than 2, which gives, in eq 21, a meaningless negative extent of separation, or in the Davis and Giddings method when p / n , becomes larger than 0.37 since there is, in this case, no solution for m. In spite of these limitations, the application of these methods to samples of limited complexity, such as the portions of chromatograms of crude oils fractions, discussed above, has given fairly reasonable results. With Davis and Giddings, we consider that these methods should serve as guidelines for the analyst attempting to improve the characterization of complex mixtures. Furthermore, it is our view that the extent of separation defined above should serve as a criterion of utmost importance for the optimization development and comparison of separation methods for complex samples.
ACKNOWLEDGMENT We thank Gael de Rycke and Jean-Marie Schmitter for providing data on the GC crude oil samples and Andrzej Stolyhwo for performing the chromatogram shown in Figure 2. Discussion with Ioannis Ignatiadis about the Greek term for separation was greatly appreciated. Registry No. Dimethylnaphthalene,28804-88-8;trimethylnaphthalene, 28652-77-9;dodecane, 112-40-3;tridecane, 629-50-5.
LITERATURE CITED (1) Giddlngs, J. C. Anal. Chem. 1987, 3 9 , 1027. (2) O'Farrell, P. H. J. Blol. Chem. 1975, 250, 4007. (3) Anderson, N. L.; Taylor, J.; Scandora, A. E.;Coulter, 8. P.; Anderson, N. G. Clln. Chem. (Winston-Salem, N . C . ) 1981, 2 7 , 1807. (4) Guiochon, G.; Beaver, L. A,; Gonnord, M. F.; Siouffl, A. M.; Zakarla, M. J. Chromatogr. 1983, 255, 415. (5) Rosenthal, D. Anal. Chem. 1982, 5 4 , 63. (6) Davis, J. M.: Giddings, J. C. Anal. Chem. 1983, 55, 418. (7) Herman, D. P.; Gonnord, M. F.; Gulochon, G. Anal. Chem. 1984, 5 6 , 995. (8) Martin, M.; Herman, D. P.; Gulochon, G. presented at the 15th International Symposium on Chromatography, Nhrnberg, Germany, October 1-5, 1984. (9) Davis, J. M.; Giddings, J. C. J. Chromatogr. 1984, 289, 277. (IO) Champetler, G.; Monnerie, L. Introduction 6 la Chimie Macromol6culaire"; Masson: Paris, 1969; p 46. (11) Fiory, P. J. J. Am. Chem. Soc. 1938, 5 8 , 1877.
Anal. Chem. 1985, 57, 295-302 (12) Tanford, C. "Physical Chernlstry of Macromolecules"; Wlley: New York, 1961; Chapters 3 and 9. (13) de Clerk, K.; Buys, T. S.; Pretorius, V. Sep. Scl. 1971, 6 , 759. (14) Glueckauf, E. Trans. Faraday SOC. 1955, 57, 34. (15) Jones, W. L.; Kleselbach, R. Anal. Chem. 1958, 30, 1590. (16) Giddlngs, J. C. Anal. Chem. 1980, 32,1707. (17) Christophe, A. B. Chromatographia 1971, 4 , 445. (18) Rony, P. R. Sep. Sci. 1968, 3 , 239. (19) Rony, P. R. Sep. Scl. 1988, 3 , 357. (20) Rony, P. R. Sep. Sci. 1970, 5 , 121. (21) Rony, P. R. J . Chromatogr. Sci. 1971, 9 , 350. (22) de Clerk, K.; Cloete, C. E. Sep. Scl. 1971, 6 , 627. (23) Stewart, 0. H. Sep. Scl. Technol. 1978, 13, 201. (24) Corry, W. D.; Seaman, G. V. F.; Szafron, D. A. Sep. Sci. Techno/. 1982, 77, 1469.
295
(25) Nagels, L. G.; Creten, W. L.; Vanpeperstraete, P. M. Anal. Chem. 1983, 55, 216. (26) de Rycke, G. These de Docteur-IngBnieur, University of Paris 6, 1983. (27) Schrnltter, J. M. These de Doctorat d'Etat, University of Paris 6, 1983. (28) Stolyhwo, A,; Colln, H.; Martin, M.; Guiochon, G. J . Chromatogr. 1984, 288, 253. (29) Hunt, J. M. "Petroleum Geochemistry and Geology"; Freeman: San Francisco, CA, 1979; Chapter 3.
RECEIVED for review June 4, 1984. Accepted September 4, 1984. Part of this paper was presented at the 8th International Symposium on Column Liquid Chromatography, New York, May 20-25, 1984.
Application of Pyrolysis/Gas Chromatography/Pattern Recognition to the Detection of Cystic Fibrosis Heterozygotes Judith A. Pino and John E. McMurry* Department of Chemistry, Baker Laboratory, Cornell University, Ithaca, New York 14853 Peter C. Jurs* and Barry K. Lavine Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802 Alice M. Harper* Biomaterials Profiling Center, University of Utah, Salt Lake City, Utah 84112
Hlgh-resolutlonpyrolysls/gas chromatography/pattern recognltlon methods have been used to develop a potential method for the detectlon of carrlers of the cystlc flbrosls gene. The test data consisted of 144 pyrochromatograms (Py/GCs) of cultured human skin flbroblasts from obligate cystlc flbrosls heterozygotesand from normal controls. A two-stage pyrolysis procedure using a modlfled Chromatographic Inlet ylelded well-resolved reproducible profiles. Mlcrocomputer-controlled Instrumentation enabled transmlsslon of pyrochromatograms to a host facility, where data-condltlonlng software provided for peak alignment and data set optlmlratlon. Each Py/GC contained 214 peaks correspondlng to a set of standardlred retention-tlme wlndows. Discrimlnants were developed by nonparametrlc pattern-recognition procedures that could classlfy these Py/GCs Into the proper group based on chemical differences. A dlscrlmlnant based on six of the Py/GC peaks correctly classifled 136 of the 144 Py/GCs (94%), and a dlscrlmlnant based on nine prlnclpal components formed from the Py/GC peaks correctly classlfied 134 of the 144 (93%).
Cystic fibrosis (CF) is the most common life-threatening genetic disorder in Caucasians. With an occurrence rate of one in every 16o(r2000 live births, CF appears to be inherited as an autosomal recessive trait and to have a gene frequency of 0.05. In spite of intensive effort, the underlying genetic defect(s) has not been identified (1). One of the most critical problems in CF research is the development of a method for detecting carriers (heterozygotes) of the CF gene. There is currently no method available for identifying carriers and there is no reliable method of prenatal
diagnosis. We wish to report our work evaluating the use of pyrolysis/gas chromatography/pattern recognition (Py/ GC/PR) as an analytical technique for detecting carriers of the CF gene. Pyrolysis/gas chromatography (Py/GC) is an analytical technique that consists of rapid thermal fragmentation of a sample in the absence of oxygen, followed by separation of the volatile fragments on a gas chromatograph (2-4). The chromatographic record of pyrolysates forms a reproducible fingerprint of the parent material, while the individual peaks and their relative intensities provide both qualitative and quantitative information about the original sample. Pyrolytic analysis was first applied to complex materials in 1952 by Zemany who showed that reproducible decomposition patterns could be obtained from biopolymers such as albumin and pepsin (5). In 1960, the application of Py/GC to amino acids was reported (6),and in 1965the use of Py/GC for characterization of bacteria was first published (7). In addition, applications have been reported in the past decade for tissue pathology (8,9), forensic science (lo),microorganism taxonomy (11,12),and carbohydrate chemistry (13). It now seems well established that Py/GC is suitable for the analysis of complex biomaterials that are nonvolatile or for which derivatization is not feasible. Two major problems have plagued investigators in the Py/GC field. The first has been reproducibility: Minor variations in sample preparation may affect fragmentation pathways, and minor variations in analytical conditions may affect retention times, thus hindering the comparative identification of peaks between pyrochromatograms. To a considerable extent, this reproducibility problem has been minimized by improved instrument design. Better control of pyrolysis conditions has been achieved by the design of lowmass pyrolyzers with minimal dead volume and with tem-
0003-2700/85/0357-0295$01.50/0 0 1984 American Chemical Society