Anal. Chem. 1999, 71, 709-714
Automated Measurement of Peak Widths for the Determination of Peak Capacity in Complex Chromatograms Kevin Lan and James W. Jorgenson*
Department of Chemistry, University of North Carolina at Chapel Hill, Venable Hall, CB 3290, Chapel Hill, North Carolina 27599-3290
The peak capacity was measured for an ultrahigh-pressure gradient elution chromatogram of a fluorescently tagged tryptic digest of ovalbumin. The peak widths in the chromatogram were determined by measuring the peak height and the second derivative at the peak maximum. This approach for measuring peak widths was programmed into a computer, and the software accurately determined the general progression of peak widths by measuring 47 peaks throughout the chromatogram in under 10 s. Peak capacity was determined by taking the definite integral of the plot of reciprocal base peak width versus retention time. This calculation of peak capacity is a linear transformation with respect to separation space, so the method is more rigorously accurate than previous methods. The peak capacity for the chromatogram was calculated to be 316. Measurement of Peak Capacity. Peak capacity is commonly used as an indicator of separation power, yet its value is frequently determined by subjective approximation. Much of the difficulty in the determination of peak capacity arises from the conceptual nature of its definition: peak capacity is the maximum number of components that can be separated to a specified resolution within a given separation space.1 Since it is impossible to fill a separation space experimentally so that all adjacent peaks have exactly the specified resolution, the determination of peak capacity is inherently approximate. Nonetheless, the estimation of peak capacity can be made more accurate by establishing a clear approach to its measurement. The most common method of measuring peak capacity nc is by applying the equation2,3
nc ) (t2 - t1)/4σ j Rs
(1)
where t1 and t2 are the bounds of retention times within the temporal separation space, σj is the average standard deviation of the peaks (determined by either arithmetic or integrated averaging4), and Rs is the specified level of resolution. Although this (1) Giddings, J. C. Anal. Chem. 1967, 39, 1027-1028. (2) Giddings, J. C.; Dahlgren, K. Sep. Sci. 1971, 6, 345-356. (3) Giddings, J. C. Unified Separation Science; John Wiley & Sons: New York, 1991; p 105. (4) Delinger, S. L.; Davis, J. M. Anal. Chem. 1990, 62, 436-443. 10.1021/ac980702v CCC: $18.00 Published on Web 01/05/1999
© 1999 American Chemical Society
equation is easily applied, it is inconsistent with the conceptual definition of peak capacity because it is not a linear transformation with respect to separation space; i.e., it implies that peak capacities are not additive. For example, suppose a separation space that ranges from t1 to t2 is divided into two sections: section A ranges from t1 to t1.5, and section B ranges from t1.5 to t2. According to eq 1, the peak capacities for these individual sections are
j ARs nA ) (t1.5 - t1)/4σ
(2)
nB ) (t2 - t1.5)/4σ j BRs
(3)
and
where nA and nB are the peak capacities of sections A and B, respectively, and σj A and σj B are the average standard deviations of the peaks in sections A and B, respectively. The sum of these two peak capacities is not necessarily equal to the peak capacity calculated for the entire range:
t2 - t1.5 t1.5 - t1 t2 - t1 + * 4σ j ARs 4σ j BRs 4σ j Rs
(4)
It is therefore desirable to develop a method of measuring peak capacity that is more rigorously consistent with its conceptual definition. Peaks at a resolution of Rs are separated by 4σj Rs of space, where is the average standard deviation of the peaks. Thus, if all peaks within a temporal separation space have the exact same width, the peak capacity nc can be calculated exactly by
nc ) (t2 - t1)/4σRs
(5)
where σ is the standard deviation of all peaks within these bounds. Let us denote the quantity 4σRs as the base peak width w so that eq 5 can be rewritten as
nc ) (t2 - t1)/w
(6)
Although this equation is completely valid, it cannot be used in practice to calculate peak capacities because experimental separaAnalytical Chemistry, Vol. 71, No. 3, February 1, 1999 709
tions do not have peaks of constant width. Nonetheless, the differential form5 of eq 6 remains applicable across infinitesimal ranges of retention times dtr:
dnc ) dtr/w
(7)
Integration of eq 7 across the entire separation space yields
nc )
∫
t2dtr
t1
w
(8)
Thus, when reciprocal base peak width is plotted against retention time, the definite integral of the curve gives the peak capacity. This calculation of peak capacity is linear with respect to separation space. Measurement of Peak Widths. Another obstacle in the accurate determination of peak capacity is the inconvenience of measuring the widths of numerous peaks in the chromatogram. Since this task is usually very tedious, peak capacity is frequently estimated on the basis of only a few measurements of peak width. It would thus be useful to establish a computer automation of peak width measurements so that many measurements can be easily gathered. Common methods for determining the standard deviation of a peak include fitting a peak model to the local data or measuring the peak width at half-height. Unfortunately, both of these techniques are relatively sensitive to peak overlap, and it is difficult to program a computer to take the effect of unresolved peaks into account. A more robust method of determining the standard deviation of a Gaussian peak h(t) is to measure the peak height h(tr) (i.e., magnitude) and the second derivative at the peak maximum h"(tr) (i.e., concavity):
σ ) x-h(tr)/h′′(tr)
(9)
(See Appendix for derivation.) Since both of the measurements required for this calculation are taken at the center of the peak, this determination of standard deviation is less sensitive to the overlap of peaks. The magnitude-concavity method is thus amenable to computer automation because the computer is less likely to report highly inaccurate peak width measurements. For example, using this approach to measure the peak widths of two similar peaks (equal width, equal height) at a resolution of 0.70 produces an error of only 10%. In contrast, measuring the peak width at half-height for the same profile produces an error of ∼119%. This enormous error arises because the valley between the two peaks does not extend below half of the profile’s maximum height. The method of fitting a Gaussian peak model can also lead to very large errors (>100%) if the bounds of the fit are poorly selected by the computer. The retention time, peak height, and second derivative at the peak maximum can be determined by fitting a second- or higherorder polynomial to the tip of the peak. The local maximum of the polynomial fit gives the retention time, the magnitude of the polynomial fit at the retention time gives the peak height, and the second derivative of the polynomial fit at the retention time (5) Giddings, J. C. Sep. Sci. 1969, 4, 181-189.
710
Analytical Chemistry, Vol. 71, No. 3, February 1, 1999
gives the second derivative at the peak maximum. Although such calculations may seem tedious, they are easily accomplished by computer. EXPERIMENTAL SECTION Effect of Peak Overlap on the Magnitude-Concavity Method. In-house software was written in Mathematica 3.0 (Wolfram Research, Champaign, IL) to produce simulated chromatograms of two overlapped Gaussian peaks of equal width. This program varied the resolution of these peaks from 0 to 1, inclusive, in 33 increments. At each value of resolution, the decimal logarithm of the peak height ratio was varied from 1 to -1, inclusive, in 33 increments (i.e., the peak height ratio was varied exponentially from 10 to 0.1), where the peak height ratio is the height of the first peak divided by the height of the second. For each of these simulated chromatograms, the program numerically found the coordinate of the first local maximum and then calculated the second derivative at that coordinate. Equation 9 was then applied to determine the measured standard deviation of the peaks. Effect of Peak Asymmetry on the Magnitude-Concavity Method. Mathematica 3.0 was used to generate simulated chromatograms of an exponentially modified Gaussian6-8 (EMG) peak. The arc tangent of the ratio of the exponential time constant τ to the standard deviation σg of the precursor Gaussian was varied from 0 to π/2, exclusive, in 127 increments (i.e., the EMG profile was varied from a nearly Gaussian profile to a nearly exponential profile) while the second normalized central statistical moment9 (σg2 + τ2) remained constant. For each simulated chromatogram, the program numerically found the coordinate of the peak maximum and then calculated the second derivative at that coordinate. Equation 9 was then applied to determine the measured standard deviation of the peak. The actual standard deviation of the peak is the square root of the second normalized central statistical moment. Chromatogram. Peak capacity was determined for an ultrahigh-pressure gradient elution chromatogram of a tetramethylrhodamine isothiocynate-tagged tryptic digest of ovalbumin (Figure 1). The column was a 53.5-cm-long × 33-µm-i.d. capillary column packed with 1.5-µm octadecylsilane-modified nonporous silica particles (Micra Scientific, Northbrook, IL). The exponential dilution method10 was used to vary the mobile phase from an initial composition of 15:85:0.1 acetonitrile/water/trifluoroacetic acid to a limiting composition of 55:45:0.1 acetonitrile/water/trifluoroacetic acid. The time constant of the exponential dilution was ∼120 min, and the mobile-phase linear velocity through the column was ∼0.11 cm/s. Laser-induced fluorescence was the detection method, and data were acquired at 20 Hz. Details of the instrumentation and chromatographic procedures can be found in ref 11. Data Analysis. Digital Filtering and Baseline Subtraction. Igor Pro (WaveMetrics, Lake Oswego, OR) was used to median filter (6) Sternberg, J. C. In Advances in Chromatography; Giddings, J. C., Keller, R. A., Eds.; Marcel Dekker: New York, 1966; Vol. 2, pp 205-270. (7) Kissinger, P. T.; Felice, L. J.; Miner, D. J.; Reddy, C. R.; Shoup, R. E. In Contempory Topics in Analytical and Clinical Chemistry; Hercules, D. M., et al., Eds.; Plenum Press: New York, 1978; Vol. 2, pp 67-74, 159-175. (8) Hanggi, D.; Carr, P. W. Anal. Chem. 1985, 57, 2394-2395. (9) Grushka, E. Anal. Chem. 1972, 44, 1733-1738. (10) Donaldson, K. O.; Tulane, V. J.; Marshall, L. M. Anal. Chem. 1952, 24, 185-187. (11) MacNair, J. E.; Patel, K. D.; Jorgenson, J. W. 1999, 71, 700-708.
Figure 1. Ultrahigh-pressure gradient elution chromatogram of a fluorescently tagged tryptic digest of ovalbumin. This chromatogram was median filtered at a rank of 5 (0.25 s) and baseline subtracted. See the Experimental Section for conditions.
the chromatogram at a rank of 5 to reduce noise. An in-house program written in LabVIEW 4.1 (National Instruments, Austin, TX) was then used to subtract a line of background fluorescence from the chromatogram. Magnitude-Concavity Program. In-house software was written in LabVIEW 4.1 on a Power Macintosh 7100/66 (Apple Computer, Cupertino, CA) to search automatically a chromatogram for peaks and to measure their retention times and peak widths using the magnitude-concavity method. This program uses the Peak Detector subroutine (Analysis:Additional Numerical Methods:Peak Detector.vi), which identifies peaks by least-squares fitting a quadratic polynomial to a moving window of chromatographic data points. A change in the sign (from positive to negative) of the first derivative of the quadratic polynomial indicates a peak; however, only peaks that exceed a specified threshold of magnitude are identified by this subroutine. For each of the identified peaks, the retention time, peak height, and second derivative at the peak maximum are reported. Since these measurements will be incorrect for peaks that exceed the range of the signal converter (i.e., a “pegged” signal), the main program rejects the peaks that are above a specified maximum magnitude. The standard deviation for each of the remaining peaks is then determined by applying eq 9. (This program was written in ∼1 h.) For the analysis of the chromatogram in Figure 1, the peak height threshold of the program was set to 2 µA, and the peak height cutoff was set to 14 µA. The width of the quadratic fit in the moving window was set to 60 data points (3 s), which was chosen because it is ∼1 standard deviation of the narrowest peak in the chromatogram. We recommend this guideline so that the window of data points at the tip of a given peak is sufficiently small to closely approximate a quadratic polynomial, yet not so small that the fit is strongly influenced by moderate levels of noise. EMG Fitting. Using in-house software written in LabVIEW 4.1, an EMG peak model was fit to each of the peaks that was identified by the program described above. EMG functions were fit to resolved peaks, and sums of EMG functions were fit to unresolved peaks. This software employs the Levenberg-Marquardt method12 (12) Sen, A.; Srivastava, M. Regression Analysis: Theory, Methods, and Applications; Springer-Verlag: New York, 1990.
Figure 2. Contour plot of the relative error of peak width measurements by the magnitude-concavity method for various resolutions and relative peak heights. See the Results and Discussion section for a discussion of the three domains. The bounds of domain 1 indicated in this figure are approximate.
for finding the least-squares regression solution. The standard deviation of each peak was calculated by taking the square root of the second normalized central statistical moment. Peak Capacity. The reciprocal base peak width for a resolution of unity was plotted against retention time using Excel 98 (Microsoft Corp., Redmond, WA). A quadratic least-squares fit of the data series established the general progression of reciprocal base peak widths across the chromatogram. The definite integral of this quadratic fit gives the peak capacity (eq 8). RESULTS AND DISCUSSION Effect of Baseline Offset on the Magnitude-Concavity Method. The baseline subtraction procedure did not remove all background fluorescence from the chromatogram, so the central region of the chromatogram still exhibits a slight, positive baseline offset. This shift of the baseline causes the measurements of peak height in the central region to be artificially large, and such error affects the calculation of peak width via eq 9. Since the standard deviation is directly proportional to the 1/2 power of the peak height, the relative error in the calculated peak width is only about half of the relative error in the measured peak height.13 For example, if a positive baseline offset causes a 20% error in the measured peak height, eq 9 will produce a ∼10% error in the calculated peak width. Note that the signs of the relative errors are the same, so a positive baseline offset causes overestimation of the peak widths. Effect of Peak Overlap on the Magnitude-Concavity Method. Most complex chromatograms (such as Figure 1) have many sets of unresolved peaks, so it is important to establish how peak overlap affects the magnitude-concavity method. The contour plot in Figure 2 shows the relative error of calculated peak widths (13) Skoog, D. A.; Leary, J. J. Principles of Instrumental Analysis, 4th ed.; Saunders College Publishing: Fort Worth, 1992; p A16.
Analytical Chemistry, Vol. 71, No. 3, February 1, 1999
711
Figure 3. (a) Relative error of peak width measurements by the magnitude-concavity method for EMG profiles at various levels of asymmetry. (b) EMG profiles of various asymmetries constrained to the same peak height and the same second normalized central statistical moment. Each profile has been truncated to discard ordinate values less than 1/1024 of the peak height. The center of each profile is located approximately below the corresponding asymmetry value in the abscissa of Figure 3a. From left to right, arctan τ/σg ) 0, π/8, π/4, 3π/8, and π/2.
for various possibilities of overlap between two Gaussian peaks. Although this plot contains quantitative information, its major utility is to provide qualitative insights about the robustness of the magnitude-concavity method. There are three major domains in the plot, each of which is discussed below. All relative errors in this plot are greater than zero, so overlapping peaks cause overestimation of the actual peak widths. Domain 1 represents cases where the two peaks are so overlapped that only a single local maximum is apparent. Since there is only one local maximum to be measured, this region is symmetric about the horizontal line at the log peak height ratio of zero. When the two peaks have similar heights and have a resolution close to 0.5, enormous errors arise because the tops of the profiles are nearly flat, which produces a second derivative close to zero. When the resolution between the peaks is extremely poor (Rs < 0.25), the sum of the two peaks is very similar in shape to a single peak. Consequently, measurements that are made at extremely poor resolution are still representative of the actual peak widths despite the severe overlap. Domain 2 corresponds to the measurement of the larger of two peaks that are sufficiently resolved to yield two local maximums. Note that only a small fraction of this region produces significant error. Domain 3 corresponds to the measurement of the smaller of two peaks that are sufficiently resolved to yield two local maximums. The left edge of this region has large relative errors because the local maximum of the smaller peak has a nearly flat profile at low resolutions. 712 Analytical Chemistry, Vol. 71, No. 3, February 1, 1999
Figure 4. Progression of reciprocal base peak width for the determination of peak capacity. Quadratic regression curves are given by the lines. (a) Magnitude-concavity method. Outliers are indicated by solid circles. (b) EMG peak model fitting method.
Effect of Peak Asymmetry on the Magnitude-Concavity Method. Although gradient chromatograms usually have relatively symmetric peaks, a few of the peaks in Figure 1 are noticeably tailed. The measurement of such peaks using the magnitude-concavity method produces errors because eq 9 was derived specifically for Gaussian (symmetric) profiles. Nonetheless, mildly asymmetric peaks are still similar in shape to Gaussian profiles, so the magnitude-concavity method is applicable to peaks of limited asymmetry. The relative error for various asymmetries of EMG profiles is shown in Figure 3. Since the asymmetry quantity τ/σg is plotted on an arc tangent scale, all levels of peak asymmetry are represented in this figure. Peaks that have asymmetry quantities τ/σg less than unity (arctan 1 ) π/4 ≈ 0.79) produce relative errors of -15.4% or smaller. All relative errors in this plot are negative, so the magnitude-concavity method is expected to underestimate the widths of asymmetric peaks.
fit suggest that this outlier represents a pair of peaks that have a resolution of 0.45 and a log peak height ratio of -0.07. According to Figure 2, the magnitude-concavity method yields excessively large measurements of peak width for such cases. The second outlier in the data series corresponds to the second peak of Figure 5b, which shows a distortion for unknown reasons. The sharp point in the peak profile gives a large negative second derivative, which leads to an underestimated peak width. Neither of these outliers was omitted from the quadratic regression of the data series. Peak Capacity. A quadratic fit of the progression of reciprocal base peak width (Rs ) 1) was generated for each of the measurement methods. (See Figure 4 for regression coefficients.) The bounds of the separation space were selected as the retention times of the earliest and latest peaks exceeding 2 µA in magnitude (35.2 and 146.4 min). The calculated peak capacities based on the magnitude-concavity and EMG fitting methods are 316 and 320, respectively.
SUMMARY
Figure 5. Profiles that caused large errors in the measurement of standard deviation by the magnitude-concavity method. (a) The earlier outlier in Figure 4a. (b) The later outlier in Figure 4a.
Magnitude-Concavity Program. When a large peak height threshold (such as 2 µA for Figure 1) is selected, the magnitudeconcavity program ignores the majority of peaks in the chromatogram and analyzes only the large peaks. If these large peaks are distributed throughout the chromatogram, they are likely to be much larger than their adjacent peaks. According to Figure 2, the measurement of peak width usually produces little error when the height of the measured peak is relatively large, so the selection of a large peak height threshold is likely to improve the accuracy of peak width calculations. Furthermore, when the peak height threshold is set much higher than the baseline offset, the relative error of the measured peak height is reduced, which also improves the accuracy of peak width measurements. Note that the strategy of selecting a large peak height threshold assumes that there is little or no correlation between peak magnitude and peak width, which is usually a fair assumption. Comparison between Magnitude-Concavity and EMG Fitting Methods. In under 10 s, the automated magnitude-concavity program identified 47 peaks in the chromatogram and measured their retention times and peak widths. In contrast, the process of fitting EMG peak models to each of these 47 peaks required ∼2 h of tedious work. Despite the discrepancy in data analysis time, the two methods produced very similar measurements of reciprocal base peak width in most cases (Figure 4); only seven pairs of corresponding measurements differ by more than 15%. The two main outliers of the magnitude-concavity data series are indicated in Figure 4a by solid circles. The peaks corresponding to these data points are shown in Figure 5. A sum of two EMG functions were fit to the data in Figure 5a, and the results of the
Measurement of Peak Capacity. Peak capacity is given by the definite integral of a plot of reciprocal base peak width versus retention time. This method of calculating peak capacity is linear with respect to separation space, so it is more rigorously accurate than previous methods. The approach may be useful for the experimental confirmation of equations involving peak capacity.1,2,4,5,14-20 Measurement of Peak Widths. The magnitude-concavity method establishes a robust means of measuring the peak widths that is relatively insensitive to baseline error, peak overlap, and peak asymmetry. This method is easily automated by computer software so that numerous representative measurements of peak width can be quickly obtained from a chromatogram.
ACKNOWLEDGMENT This research was supported by the National Institutes of Health under Grant GM 39515. We thank the Department of Mathematics at the University of North Carolina at Chapel Hill for providing us access to their computer laboratory, where Mathematica 3.0 was used.
APPENDIX A Gaussian profile h(t) is given by
(
)
-(t - tr)2 A exp h(t) ) 2σ2 σx2π
(10)
where A is the peak area, tr is the retention time, and σ is the (14) Grushka, E. Anal. Chem. 1970, 42, 1142-1147. (15) Rosenthal, D. Anal. Chem. 1982, 54, 63-66. (16) Davis, J. M.; Giddings, J. C. Anal. Chem. 1983, 55, 418-424. (17) Herman, D. P.; Gonnord M. F.; Guiochon, G. Anal. Chem. 1984, 56, 9951003. (18) Martin, M.; Guiochon, G. Anal. Chem. 1985, 57, 289-295. (19) Davis, J. M.; Giddings, J. C. Anal. Chem. 1985, 57, 2178-2182. (20) Shen, Y.; Lee, M. L. Anal. Chem. 1998, 70, 3853-3856.
Analytical Chemistry, Vol. 71, No. 3, February 1, 1999
713
standard deviation. The second derivative of eq 10 is 2
h′′(t) )
(t - tr) - σ 4
σ
2
(
)
(12)
2
(t - tr) A exp 2σ2 σx2π
)
(t - tr)2 - σ2 h(t) (11) σ4 Evaluating the second derivative at the retention time yields
714
h′′(tr) ) -h(tr)/σ2
Analytical Chemistry, Vol. 71, No. 3, February 1, 1999
which can be rearranged to give eq 9.
Received for review July 1, 1998. Accepted November 10, 1998. AC980702V