Iterative Curve Fitting of Chromatographic Peaks Stephen N. Chesler and Stuart P. Cram Analytical Chemistry Division, National Bureau of Standards, Washington, D. C. 20234
Iterative curve fitting of an eight parameter function to chromatographic peak profiles by nonlinear residual least squares is reported. Gaussian, exponential, and hyperbolic tangent functions are convoluted and iteratively fit to any experimental chromatographic peak shape and integrated to give total statistical moments with errors as small as 1%, even for the higher order moments. Exponential and band broadening operators are deconvolved for measurement of physicochemical and analytical studies. The models and calculations may be extended to the resolution of overlapping peaks and complex elution profiles for the measurement of the rate of on-column chemical reactions.
Characterization of chromatographic peak profiles has become increasingly important for the deconvolution of analytical and physicochemical data. This work has been stimulated by the development of excellent models describing the equilibrium and mass transport behavior of adsorption (1-11) and partition (12) columns. The partial differential equations for these models may be solved in terms of Laplace coordinates which permit the calculation of statistical moments. From an analysis of the physical significance of the moments it is possible to design experiments to measure thermodynamic, diffusion, and kinetic parameters and surface phenomena (13-23). The analytical significance of these calculations resides in the importance of a fundamental understanding of chromatographic processes, and the formulation of a very sensitive and complete set of peak profile characteristics which do not assume any idealized form or model function. Peak shapes will be discussed here in terms of their statistical moments because the moments are applicable to all peak profiles as long as the finite integral (1) (2) (3) (4) (5) (6) (7)
(8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23)
1354
J. C. Giddings and H. Eyring, J. Phys. Chem., 59, 416 (1955). J . C. Giddings, J. Chem. Phys., 26, 169 (1957). J. C. Giddings, J. Chromatogr., 2, 44 (1959). J. C. Giddings, J , Chem. Phys., 31, 1462 (1959). J. C. Giddings,J. Chromatogr., 3, 443 (1960). M. McQuarrie, J. Chem. Phys., 38, 437, (1963). M. Kubin, Collect. Czech. Chem. Commun., 30, 1104 (1965). M. Kubin, Collect. Czech. Chem. Commun., 30, 2900 (1965). E. Kucera, J. Chromatogr., 19, 237 (1965). 0 . Grubner and E. Kucera. "Gas Chromatographie 1965," H. G. Struppe, Ed., Akademie Verlag, Berlin, 1965, p 157. 0. Grubner,Advan. Chromatogr.. 6, 173 (1968). E. Grushka, J. Phys. Chem., in press. S. P. Cram and S. N. Chesler, Pittsburgh Conference on Analytical Chemistry and Applied Spectroscopy, Cleveland, Ohio, March 6-10, 1972, paper No. 24. P. Schneider and J. M. Smith, AlChEJ.. 14, 762 (1968). G. Padberg and J . M. Smith, J. Cafal., 1 2 , 172 (1968). J . C. Adrian and J . M. Smith, J. Catal., 18, 57 (1970). E. N. Fuller, K. Ensley. and J. C. Giddings. J. Phys. Chem., 73, 3679 ( 1969). S. L. Seager. L. R. Geertson, and J. C. Giddings, J. Chem. Eng. Data, 8 , 168 (1963). J. C. Giddingsand S. L. Seager. J. Chem. Phys., 35, 2242 (1961). J. C. Giddings and S. L. Seager, J. Chem. Phys., 33, 1579 (1960). E. N. Fuller, P. D. Schettler, and J. C. Giddings, lnd. Eng. Chem.. 58 (5). 19 (1966). J. C. Giddings,Anal. Chem., 36, 1170 (1964). J. C. Giddings and K. L. Mallik, lnd. Eng. Chem.. 59 (4), 18 (1967). A N A L Y T I C A L CHEMISTRY, VOL. 45,
NO. 8,
JULY 1973
exists and can be evaluated either in closed form or numerically. Therefore nothing is assumed about the peak shape and all peak profiles may be treated in this manner. The hybrid moments, such as skew ( p 3 / ~ 2 3 / ~ specific ) , asymmetry ( p 3 / p 1 3 ) , and excess (114/pz2), are excellent peak shape parameters as they are insensitive to any changes in scale and therefore reflect only intrinsic peak shape characteristics. Consequently, they have been used as indicators of the presence of peak overlap and even proposed as a means of qualitative identification (24). The moments may be determined directly from digital chromatographic data or by integration of a function which accurately describes the experimental peak shape. The limitations of the digital summation method have been previously considered and shown to approach a limiting accuracy which is determined by the data acquisition rate, peak symmetry, noise level, order of the moment, and resolution of the ADC (25, 26). On the other hand, the complexity of chromatographic separation phenomena and the endless number of different peak shapes has made it impossible to derive a single closed-form function which will accurately characterize all peak shapes. However, by developing a convoluted function, it is possible to iteratively fit any experimental curve if enough variables are defined. The fitted function may then be integrated in closed form or numerically to determine all of the statistical moments and subsequently the physicochemical and analytical properties of the column. The advantages, limitations, and theory of curve fitting for chromatographic peaks have been very well treated by Anderson, Gibb, and Littlewood (27, 28). It should be noted that the evaluation of peak areas by integration of the convoluted function and statistical moments is not subject to errors of starting or terminating the peak, and therefore, total moments can be determined. When slope sensing algorithms or analog methods of peak detection are used, only partial moments may be calculated and these are assumed to be good approximations to the total moments. This assumption rapidly breaks down as the order of the moment increases. Curve fitting techniques generally assume that a continuous function is the sum of one or more independent operators, each of which can be described by a known functional form. This technique has been successfully applied to the analysis of overlapping spectroscopic lines (29) and bands (30) where the function is well defined. For example, Stone (29) showed that if every line in the spectrum has the same equation of shape and the same known bandwidth parameter that individual components E. Grushka, M. N. Myers, and J . C. Giddings, Anal. Chem.. 42, 21 (1970). S. N. Chesler and S. P. Cram, Anal. Chem., 43, 1922 (1971) S. N. Chesler and S. P. Cram, Anal. Chem., 44, 2240 (1972). A. H. Anderson, T. C. Gibb, and A. B. Littlewood, J. Chromafogr. Sci., 8 , 640 (1970). A. H . Anderson, T. C. Gibb, and A. B. Littlewood, Anal. Chem., 42, 434 (1970). H. Stone, J . Opt. SOC.Amer., 52, 998 (1962). R. D. 6.Fraser and E. Suzuki, Anal. Chem.. 38, 1770 (1966).
can be deconvolved by Fourier transforms. For lines of different half-widths, an iterative least squares procedure may be used. These techniques have been extensively applied for the deconvolution or spectrum stripping of y-ray pulse-height spectra. Often a large number of parameters are required for polynomial or other continuous function fits to be quantitative. Heath et al. (31) used a modified Gaussian function to fit the photopeak and three sine series to fit the other parts of the spectrum. Kowalski and Isenhour (32) developed a continuous, differentiable, analytical function for describing monoenergetic NaI(T1) pulseheight distributions. Their program determined the optimum value of the parameters for a function which was the sum of a modified hyperbolic secant (representing the photopeak) and a quadratic with a modified hyperbolic tangent multiplier (for generating the Compton area) by using the residual least squares method and counting statistics criteria. Gutknecht and Perone (33) reported the numerical deconvolution of overlapping stationary electrode polarographic curves by curve fitting techniques. They used an empirical equation of closed form to fit the experimental waveforms by adjustment of the peak shape parameters. The function was first fit to a number of different standard polarograms and the constants of the function for each species were stored in a library. For analysis of an unknown mixture, these constants were used to regenerate the standard curves and a composite fit to the unknown signal by nonlinear residual least squares. Quantitative analyses were achieved a t the 1-2% error level for peak separations of only 100/n mV, where n is the number of electrons in the reduction reaction, and thereby significantly enhanced the analysis of mixtures by the stationary electrode technique. In addition to the spectroscopic and electrochemical applications of curve fitting cited above, we envision a t least four applications for chromatographic separations: development of a multivariant function for moment analysis; deconvolution of exponential operators; sensing and determination of on-column rates of reaction; and deconvolution of overlapping peaks. In this work we will consider only the first two objectives. These are significant in terms of developing a general functional relationship to describe all experimental chromatographic peak shapes (without assuming a model or idealized behavior) on a basis which can be related to fundamental parameters such as the peak area, center of gravity, sorption equilibrium constants, etc. Exponential contributions are difficult to measure precisely and yet they are necessary in order to characterize instrumental effects, such as the time constant of the detector-transducer system (34, 35), and the kinetic effects of sorption and desorption (36). The exponential contribution may be further deconvolved and treated as a series expansion, and it will be shown to be an accurate representation of high performance gas chromatographic data. The accuracy of measuring the peak area, center of gravity, hybrid moments, and higher order moments will be considered and compared to the summation approximation methods commonly used with digital data. On-column chemical reac(31) R. L. Heath, R. G. Helrner. L. A. Schrnittroth, and G. A. Cazier, Nucl. lnstrum. Methods. 47, 281 (1967). (32) 6. R. Kowalski and T. L. Isenhour. Anal. Chem.. 40, 1186 (1968). (33) W. F. Gutknecht and S. P. Perone, Anal. Chem., 42, 906 (1970). (34) I. G. McWilliarn and H. C. Bolton, Anal. Chem., 43, 883 (1971). (35) T. H. Glenn and S. P. Cram, Anal. Chem., in press. (36) J. C. Giddings, Anal. Chem., 35, 1999 (1983).
tions, their detection, and the measurement of reaction rates are beyond the scope of this work and will be treated in a subsequent paper. A number of excellent and innovative approaches for resolution enhancement of overlapping chromatographic peaks have been reported. Westerberg (37) has shown that the triangulation method is less accurate than perpendicular drop for peaks of about the same height, but triangulation becomes more accurate when the peak heights are significantly different. The triangulation method is much more sensitive to the degree of overlap, although it is not as reproducible, even when triangulation is more accurate than the perpendicular drop. Therefore, techniques such as curve fitting are proposed in this paper because the method is not inherently sensitive to changes in the degree of overlap and is reproducible. I t is also more accurate in an absolute sense, although in the past it was necessary to assume a model in order for the method to be operational. The peak area ratios are much less sensitive to errors in the assumed model if the peak shapes are similar to each other. Resolution has also been enhanced using the fast Fourier transform to mathematically sharpen peaks while retaining the true peak area. The advantage of working in the Fourier domain is the enhanced discrimination of the information above the noise (38). Goldberg (39) developed a numerical linear regression method of resolving peak areas by using a number of statistical subroutines to decrease the calculation time, estimate the base line drift, and to detect systematic errors in the presence of random experimental errors. This method is unique in that it is not necessary to assume a functional form of the peak shape. It should also be more accurate than those methods which rely on changes in slope for peak sensing. Least squares approximation methods of curve fitting for deconvoluting chromatographic peaks avoid many of the problems commonly encountered with less sophisticated data reduction techniques, such as differentiation in the presence of high frequency noise. It does require that a model function be assumed and this has generally been a simple exponentiated Gaussian (40-42). This model requires only that four variables be specified in order to define the peak shape. I t is important that these be kept to a minimum when doing overlapped peaks in order that the fit converge properly and with the minimum number of iterations. Further, a function of the type
12) where f(t) is the distribution function, t o the retention time, A the amplitude of the peak, u the line width, r a skewness factor, and t the independent variable, can be integrated simply in closed form to determine the peak area directly once the parameters have been determined by the least squares calculation (40) from J
(3) -m
(37) A. W. Westerberg, Anal. Chem.. 41, 1770 (1969). (38) D. W. Kirmse and A. W. Westerberg, Anal. Chem., 43, 1035 (1971). (39) 6. Goldberg, J. Chromatogr. Sci., 9 , 287 (1971). (40) H. M. Gladney, B. F. Snowden. and J. D. Swalen, Anal. Chem., 41, 883 (1969). (41) S. M. Roberts, D. H. Wilkinson, and L. R. Walker, Anal. Chem.. 42, 886 (1970). (42) S. M. Roberts, Anal. Chem., 44, 502 (1972) A N A L Y T I C A L C H E M I S T R Y , VOL. 45, NO. 8, J U L Y 1973
1355
the sum of the squares are significantly increased when a large number of variables are used. Thus good initial estimates, the use of forcing functions, and minimization techniques in solving the least squares matrix are of paramount importance. The functional form of the equation is given by convoluting a Gaussian, exponential, and hyperbolic tangent joining function
'
I
c4
/ I
i
c3 c8
Figure 1. Simulated representation of the three component function using eight parameters to characterize the peak amplitude, width, symmetry, and tailing However, this model severely restricts the accuracy [about 5% (41)] and the versatility of fitting, and the assumption that the relative asymmetry is uniform within a group of peaks is not entirely valid. In their detailed treatment of the least squares method, Roberts e t al. (41) point out that this method requires high accuracy digital data, storage of the raw data, removal of base-line drift, estimation of the partial derivatives, scaling the matrix of the normal equations, estimating the initial values of the parameters, methods of forcing the convergency, and enough computer time that it must be done off line or in a background mode. Gaussian and exponentiated G4ussian peaks were also used by Anderson, Gibb, and Littlewood (27, 43) to resolve convoluted chromatograms by curve fitting. They point out that curve fitting enables the resolution required for the separation to be decreased by a factor of 4.5, which corresponds to a reduction in the analysis time of 20-fold. Their results also indicate that the choice of the model to be fit to the experimental peak determines directly the accuracy of the results and the residuals of the least squares fit. Other improved models for chromatographic peaks have been sought, such as the Cauchy function (30), bi-Gaussian (24, 441, Poisson f44), Gramm-Charlier series (11, 451, and a linear summation of Gaussian, exponential, and triangular functions (46). The other alternative is to use a chromatographic peak of a pure component which elutes near the envelope to be resolved as a working model (28). This assumes that all of the components in the convolved peak are similar to the model component and exhibit the same thermodynamic and kinetic behavior on the column.
THEORY A N D CALCULATIONS This paper presents the development of a new function for iterative curve fitting of single peaks with the expressed objective of increasing the measurement accuracy of statistical moments and deconvolving exponential operators in order to measure column kinetic effects. The function described is a convoluted three component function with eight variables, and was designed to be run on a small dedicated computer (8K core) in real time. As the number of variables in the fit are increased, the precision, accuracy, and versatility in fitting is increased. However, the degree of difficulty in getting the fit to converge may be increased as are the number of false minima. The time per iteration, the number of iterations, and the residual of (43) A . 6.Littlewood, A . H. Anderson, and T. C. Gibb, "Gas Chromatography 1968," C. L. A . Harbourn, Ed., Institute of Petroleum, London, 1969, p 297. (44) T. S. Buys and K. de Clerk, Anal. Chem., 44, 1273 (1972). (45) 0. Grubner,Ana/, Chem.. 43, 1934 (1971). (46) J. C. Sternberg. Advan. Chrornafogr.. 2, 205 (1966).
1356
A N A L Y T I C A L C H E M I S T R Y , VOL. 45, NO. 8, J U L Y 1973
[I
-
0.5(1
-
t a n h (C,(t
-
C6 exp[-0.5C7{[(t - C,)*]'*
C,)))]X
+ (t
- C,))])
(4)
where Y ( t ) is the amplitude at any time, t , C1 the peak height, Cz the rate of change of the tanh, C3 is the position of the location of the midpoint of the symmetrical tanh function, C4 the position of the peak maximum, C5 the Gaussian variance, Cg the amplitude of the exponential, C7 the rate of decrease of the exponential, and CS is the starting position of the exponential function. These parameters are illustrated with the simulated peak profile shown in Figure 1. This particular form of the equation assumes that the leading edge of the peak can be fit with a Gaussian (for the case where all of the band broadening is diffusional). The fitting form for the back side of the peak is very versatile and incorporates an exponential tail and a broadening function. Obviously this treatment can be easily modified and extended to the leading edge of the peaks, or arranged so that complex peak profiles, such as on-column chemical reactions, can be described by an appropriate combination of the same functions, but Equation 4 was chosen here as being the most representative case. The Gaussian (C4.C 5 ) and exponential (c6, C7, cg) parameters do describe real chromatographic phenomena, although they have been combined in an empirical but functional fashion in order to also introduce asymmetrical band broadening. Equation 4 can be integrated precisely and used to describe the functiqn f ( t ) (Equation 1) and thereby derive all of the statistical moments, the analytical parameters of interest, and physicochemical data. It can be seen that the Gaussian variance, C5, should be taken from the leading edge of the peak at 60.77~of the peak height. The variance, position of the peak maximum, C4, and the amplitude, C1,were obtained with high accuracy by running simple peak parameter search routines to establish the initial estimates for these parameters. Since the final residual of the sum of the squares and the rate of convergence will depend on the initial estimates, it is imperative that these three parameters be predetermined directly from the digital data prior to fitting. Equation 4 can be simplified to
Y ( t ) = C , G ( t ) + rl - J ( t ) ] E ( t )
(5)
where the three functions are the Gaussian. G ( t ) , the joining function, J ( t ) , and the exponential, E ( t ) . Then it can be seen that the 1 - J ( t ) term goes from zero at t > C3 and is 112 at C3 The rate of the sigmoidal transition from zero to one is determined by C Z . and therefore both C2 and C3 determine the peak broadening and the relative weighting of the exponential. As long as the joining function [I - J ( t ) ]is zero (where t