HY
full page
D e c od i n g C om p l ex Mu l t i c om p o n e n t Chromatograms New statistical approaches can extract more information from complex chromatograms with overlapping peaks. atural samples often contain hundreds of components. For example, crude oil products can contain thousands or tens of thousands of components. The complete chromatographic resolution of such complex samples requires tens of millions of theoretical plates, considering that peak capacity is roughly proportional to the square root of the number of theoretical plates. Because peaks do not elute equidistantly, analyzing a complex sample creates many coeluting components even with an extremely high-efficiency separation system. The severe peak overlap often observed in such multicomponent separations arises mostly because of the random distribution of retention times and the limited peak capacity of the separation system. Routine one-dimensional (1-D) chromatographic methods cannot handle a complete qualitative and quantitative analysis of complex mixtures. For example, a 450-m long open tubular GC column with more than 1.3 million effective plates identified ~970 compounds in a gasoline standard, yet that separation still had many unresolved peaks (1). To better understand the complexity of multicomponent separations, serious efforts have been made in the past 20 years to describe the retention pattern of complex chromatograms. In this report, we summarize the most important characteristics of multicomponent chromatograms and show how much information is hidden—usually neglected and buried—behind the forest of peaks. Different approaches are presented for decoding complex chromatograms, showing how analytical information can be extracted by going beyond conventional data such as retention time and peak area. We also describe an approach for decoding complex chromatograms that differs from the standard deconvolution methods. In a deconvolution process, a short section of a chromatogram, usually one cluster of several overlapping peaks, is investigated, and the profiles of the individual sin-
Attila Felinger University of Veszprém (Hungary)
Maria Chiara Pietrogrande University of Ferrara (Italy)
N O V E M B E R 1 , 2 0 0 1 / A N A LY T I C A L C H E M I S T R Y
619 A
(e)
T
T (d) Td Td (c) Tc Tc (b)
cally identical chromatograms—the concept of randomness provides a very powerful way to describe the extent of peak overlap. For example, when a sample is composed of many compounds, the interval between adjacent peaks varies; peak clusters and void spaces can also be observed in the chromatogram. The retention patterns of complex mixtures can be remarkably different. This is because the distribution of the standard free energy differences between the stationary and mobile phases define a pseudorandom retention time distribution (2). Accordingly, a complex chromatogram looks like a random series of peaks. When there is no chemical similarity between the sample components because they come from numerous chemical families, the chromatogram is considered disordered. In this case, it is assumed that single-component peaks can be found with a constant probability per unit time, , at any point in the chromatogram. This assumption leads to the Poisson distribution of the number of single-component peaks. The probability density function of the Poisson distribution is given by P(m) =
Tb Tb (a) Ta Ta Time FIGURE 1. The superposition of ordered chromatograms leads to a disordered complex chromatogram with serious peak overlap. On the right-hand side of the figures, the density functions of change in retention times, ∆tR, are plotted. (a)–(d) Ordered chromatograms with normally distributed ∆tR (relative standard deviation = 0.2). (e) Pooled chromatogram. The color of the lines below the peaks in (e) identify their origin.
gle-component peaks are estimated with an algorithm. However, by using a statistical analysis, no specific information on a particular component is obtained, and the presence or absence of a compound cannot be determined, nor can its concentration be estimated. The result is that the total chromatogram is regarded as a statistical ensemble whose common attributes, such as peak width, peak shape, extent of separation, number of detectable components, saturation of the separation space, and order/disorder of the peaks, are estimated.
A statistical model of overlap Although chromatography is a deterministic process—that is, repeated injections of the same sample will lead to practi-
620 A
A N A LY T I C A L C H E M I S T R Y / N O V E M B E R 1 , 2 0 0 1
(lx)m –lx e m!
(1)
in which P(m) expresses the probability that there are exactly m single-component peaks within the length x. (A glossary of terms is found on p 621 A.) Due to the fundamental properties of the Poisson distribution, both the mean value and the variance of the number of components are x. For an ordered chromatogram, the probability of finding a peak at a given location is not constant. For instance, if compounds of a homologous series are separated, peaks are found at regular intervals, and the retention time increments can be forecast. On the other hand, for a disordered multicomponent chromatogram, the distribution of the intervals—that is, the retention time increments—can be derived by applying Poisson statistics. In this case, the retention time increments are given by the exponential distribution. This distribution of the retention times is uniform, provided that peak density is constant along the chromatogram. Davis et al. proposed and tested a three-part model in which the enthalpy change followed a Poisson distribution. The average entropy change depends on the enthalpy change, and the actual entropy change is uniformly distributed about the average entropy change (3). This model also confirms that the retention pattern of complex chromatograms is controlled by Poisson statistics. To characterize the complexity of a multicomponent chromatogram, Giddings introduced the concept of dimensionality (4). The chromatogram of a series of homologues is quite ordered and slightly complex. The dimensionality of such a sample is low; increasing the dimension of the sepa-
pn /nc
Probability (p = m)
(b) 1 p=m ration space would not really improve the (a) 0.4 m = 10 extent of separation. The higher the com0.8 0.3 plexity of the mixture, the higher the samPeaks ple dimensionality. In many instances, the m = 20 0.6 separation pattern is less disordered when 0.2 the dimension of the separation space inSinglets 0.4 m = 30 creases. For really complex mixtures, the Doublets repetitive retention patterns disappear and 0.1 m = 40 0.2 the retention times become irregular. m = 50 Klein and Tyler first applied the concept Triplets of Poisson statistics to complex chromato0 0.5 1 1.5 2 2.5 3 200 400 600 800 1000 0 grams and determined the probabilities of a nc finding several peaks in a given interval of FIGURE 2. Using statistics to study complex chromatograms. a chromatogram (5). When a mixture containing a homolo- (a) Plot of the ratio of the number of singlets, doublets, triplets, and peaks against ␣. (b) Plot of the gous series is separated, the multicompo- probability that all sample components will be separated against nc. nent chromatogram, of course, is not disordered, but mixing a few homologous series can result in a pseudodisordered chromatogram. Figure 1 illus- gram. Thus, the total number of peaks—either stand-alone or trates that the superposition of four ordered chromatograms the fused cluster of several overlapping single components— yields a disordered chromatogram. Rather than an elementary, which can be counted in the chromatogram is ordered chromatogram, in which the average interval between adjacent peaks, T, goes as Ta = Tb = Tc = Td, separation is in- p = pn (5) n complete in the complex chromatogram as evidence by the presence of doublets and triplets. In the pooled chromatoThe critical distance needed for resolution is determined gram (Figure 1e), the average interval between adjacent peaks is 4 times smaller than in the elementary chromatograms, by the peak width and the critical resolution by and the probability density function of the intervals has become exponential—that is, the superposition process results in a Poisson chromatogram. Complex multicomponent chro- Glossary of terms matograms are, therefore, generally disordered, either because the compounds are dissimilar or the number of the homolaa Average value of peak areas ogous series is high. In either case, the severity of peak overAT Total area of a chromatogram c(t) Autocovariance function of a signal lap can be estimated by Poisson statistics (6). dh,c Half width of c(t) at half height This statistical model of peak overlap is originally based on m Number of observable components the assumption that multicomponent chromatograms are comnc Peak capacity p Total number of peaks pletely disordered (2), although this restriction has since pn Average number of fused peaks containing n singlebeen resolved, and the extent of peak overlap can be estimatcomponent peaks ed for more-or-less ordered, complex chromatograms (7 ). P(m) Probability density function Very simple equations can be derived to determine the Rs Peak resolution t Time number of stand-alone peaks (singlets), doublets, triplets, tR Retention time etc., in a chromatogram or even the total number of observT Average interval between adjacent peaks able peaks. The average number, pn, of n-tets (fused peaks u Auxiliary variable w Average baseline width of individual overlapping peaks that are composed of n single-component peaks) is
S
–2␣
–␣ n–1
pn = me (1 – e ) ␣ = m/nc nc = X/x0
(2) (3) (4)
in which m is the total number of detectable components, ␣ is the saturation level of the chromatogram, nc is the peak capacity, x0 is the distance needed for the resolution of two adjacent peaks, and X is the total length of the chromato-
x x0 X y (t) –y y–obs ␣ a
Length Distance needed for resolving two adjacent peaks Total length of the chromatogram Total chromatogram signal Mean intensity of isolated peaks Computed mean intensity of observed peaks Saturation level of the chromatogram Probability per unit time Gaussian peak standard deviation (peak width) Standard deviation of peak area
N O V E M B E R 1 , 2 0 0 1 / A N A LY T I C A L C H E M I S T R Y
621 A
no fused peaks. The number of singlets very soon deviates from this line when ␣ is ∆t = 230 s increased. The plot of pn/nc indicates that ∆t = 230 s its maximum value is 0.37 when m equals ∆t = 230 s nc. This indicates that in a complex multicomponent chromatogram, p is never higher than 37% of nc. The number of standalone peaks is still smaller. In a disordered chromatogram, even in the most favorable cases, the number of single-component peaks is