Deconvolution of overlapping chromatographic peaks - Analytical

Analytical Chemistry OA Policy · ACS Open Access Programs ... View: PDF | PDF w/ Links .... Digital signal processing in measurement microsystems ... ...
0 downloads 0 Views 934KB Size
1404

Anal. Chem. 1986, 58, 1404-1410 Glueckauf, E. J. Chem. SOC. 1947, 1302-1308. James, 0.H.; Phillips, C. 0. S. J. Chem. SOC. 1954, 1066-1070. Gregg, S. J.; Stock, R. I n Gas Chromatogrsphy 1956;Desty, D. H., Ed.; Butterworths: London, 1958; pp 90-98. Klselev, A. V.; Yashln, Ya. I. Gas-Adsorption Chromatography;Bradley, J. E. s., Translator; Plenum: New York, 1969, Chapter IV. Conder, J. R. I n Advances In Anaiytlcal Chemistry and Instrumentation; Purnell. J. H., Ed.; Wllev-Interscience: New York. 1967; DD .. 209-270.

Conder, J. R. J. Chromatogr. 1969. 39,273-281. Conder, J. R. Chromatographla 1974, 7 , 387-394. Conder, J. R.; Purnell, J. H. Trans. Faraday SOC. 1968, 6 4 , 3100-3 11 1. Conder, J. R.; Purnell, J. H. Trans. Faraday SOC. 1969, 65,824-838. Conder, J. R.; Purnell, J. H. Trans. Faraday SOC. 1969, 65,839-848. Gray, D. G.; Guillet, J. E. Macromolecules 1972, 5,316-321. Tremaine, P. R.; Gray, D. G. Anal. Chem. 1976, 4 8 , 380-382. Katz, S.;Gray, D. 0. J. Colloid Interface Sci. 1981, 8 2 , 326-338. Koster, F.; Findenegg, G. H. Chromatographa 1982, 15, 743-747. DeVault, D. J. Am. Chem. SOC.1943, 65,532-540. Dreisbach, R. R. Physical Propertles of Chemical Compounds; American Chemical Soclety: Washington, D C Voi I (1955), I1 (1959), and I11 (1961). Riddick, J. E.; Bunger, W. B. Organic Solvents: Physlcal Properties and Methods of Purlflcations, 3rd 4.;Wlley: New York, 1970; Vol. 11. Laub, R. J.; Martire. D. E.; Purnell, J. H. J. Chem. SOC., Faraday Trans. 2 1978, 7 4 , 213-221.

Laub, R. J.; Purnell, J. H.; Williams, P. S.; Harblson, M. W. P.; Martire, D. E. J. Chromatogr. 1978. 155,233-240. Laub, R. J.; Pecsok. R. L. Physlcochem/cai Applications of Gas Chromatography; Wlley-Interscience: New York, 1978; Chapter 2. Harbison, M. W. P.; Laub, R. J.; Martlre, D. E.; Purnell, J. H.; Willlams, P. S. J. Phys. Chem. 1979, 8 3 , 1262-1268. Ashworth, A. J.; Hooker, D. M. J. Chromatogr. 1979, 174, 307-313. Cadogan, D. F.; Purnell, J. H. J. Chem. SOC. A 1968, 2133-2137. Purnell, J. H.; Vargas de Andrade, J. M. J. Am. Chem. SOC. 1975,

97,3585-3590.

Laub, R. J. Anal. Chem. 1984, 56,2110-2115. Laub. R. J. Anal. Chem. 1984, 56, 2115-2119. Katz, S.; Gray, D. 0. J. Colloid Interface Scl. 1981, 62, 318-325. MilonJlc, S. K.; Kopecni, M. M. Chromatographla 1984, 19,342-346. de Boer, J. H. The Dynamlcal Character of Adsorption; Clarendon: Oxford, 1953; p 49. (50) Johnson, J. F.; Barrall E. M., I 1 J. Chromatogr. 1967, 31, 547-549. (51) Fuller, E. N. Anal. Chem. 1972, 4 4 , 1747-1753. (52) Pollock, G. E.;O'Hara, D.;Hollis, 0. L. J. Chromatogr. Sci. 1984, 22, (45) (46) (47) (48) (49)

343-347. (53) Ali, S. G. A. H.: Purnell, J. H.: Williams, P. S. J. Chromatoor. 1964. 302, 119-133. Brookman, D. J.; Sawyer, D. T. Anal. Chem. 1968, 4 0 , 106-110. Sawyer, D. T.; Brookman, D.J. Anal. Chem. 1968, 4 0 , 1847-1853. de Boer, J. H. The Dynamical Character of Adsorption; Clarendon: Oxford, 1953; pp 112-115. Gearhart, H. L.; Burke, M. F. J. Chromatogr. Sci. 1973, 11, 411-417. Kiselev, A. V.; Yashin, Ya. I. Gas-Adsorption Chromatography; Bradley, J. E. s., Translator; Plenum: New York, 1969; Chapter 11. Ross, S.; Olivier, J. P. On Physical Adsorptlon; Interscience: New York, 1964; Chapter 3. Brunauer, S.; Emmett, P. H.; Teller, E. J. Am. Chem. SOC.1938, 6 0 , 309-319. Freundllch, H. Coilold and Caplllary Chemistry; Methuen: London, 1926. Dubinin, M. M. Chem. Rev. 1960, 6 0 , 235-241.

RECEIVED for review August 13,1985. Resubmitted February 10,1986. Accepted February 10,1986. Support provided for this work in part by the Department of E n e r b Office of Basic Energy Sciences (analytical considerations), by the National Science Foundation (high-precisionand finite concentration work), and by the Boris KidriE Institute of Nuclear Sciences is gratefully acknowledged.

Deconvolution of Overlapping Chromatographic Peaks Richard F. Lacey Hewlett-Packard Laboratories, 1651 Page Mill Road, Palo Alto, California 94304

The use of multichannel chromatographic detectors that produce spectra characteristlc of the eluents permits the deconvolutlon of partially overlapping chromatographic peaks wlthout any assumptions about chromatographic peak shape or prlor knowledge of the spectra of the individual compounds. With only weak assumptlons about peak shape (e.g., nonnegatlvlty of concentration), an error bound on the areas of the deconvoived peaks can be computed and the estimates of the spectra of eiutlng compounds improved. Examples are glven of the application of a deconvolution method based on factor analysts of the spectra of eiutlng mixtures to GC/MS, LC/ UV-vis, and GC/FTIR spectra. The method is embodied in a computer program that performs the deconvoiutlon within a few minutes with a minimum of intervention by the user. For this deconvolution method to be successfui, no more than three compounds can coeiute at a given instant, and the periods when three compounds elute must be preceded and followed by periods where varying mixtures of lust two compounds are eiutlng.

The information available in spectra produced by gas chromatography/mass spectroscopy (GC/MS), liquid chromatography with diode array detection (LC/UV-vis), or gas chromatography/Fourier transform infrared spectroscopy (GC/FTIR) can be used to separate mathematically over0003-2700/86/0358-1404$01.50~0

lapping chromatographic peaks. This deconvolution is straightforward if the spectra of the compounds responsible for the peaks are known. Generally, they are not known, however, and in order to perform the deconvolution, it is necessary to make estimates of them from information in the experimental spectra. The contributions of each component to the experimental mixture spectra are then computed by least-squares fitting. The sequence of computed contributions vs. time describes the actual elution profiles of the components. A technique for estimating the component spectra of a series of binary mixtures was described by Lawton and Sylvestre (1) and applied to GC/MS data by Sharaf and Kowalski (2) and to LC/UV-vis data by Osten and Kowalski (3). A similar approach was taken by Chen and Hwang ( 4 ) and by Vandeginste et al. (5) to deal with three-component mixtures eluting from GC/MS and LC/UV-vis, respectively. Borgen and Kowalski (6) have described a method for placing bounds on the possible component spectra for three components that is generally applicable to chromatography. Gemperline (7) and Vandeginste et al. (8)have developed a method for deconvolution using iterative target transformation factor analysis, also generally applicable to chromatography, that has produced results comparableto those given here. In their method elution profiles are estimated first and the spectra computed from these. In this paper, using an approach related to those mentioned above, estimates of the component spectra are first obtained from the information in the experimental spectra, and then 0 1986 Amerlcan Chemical Society

ANALYTICAL CHEMISTRY, VOL. 58, NO. 7, JUNE 1988

improved estimates are found by using results of deconvolution with the earlier estimates. The method can be used to deconvolve a series of overlapping chromatographicpeaks as long as no more than three compounds are coeluting at once, and each period when three compounds are eluting is preceded and followed by a period when substantially only two compounds are eluting. That is, if compounds A, B, C, D, and E are eluting successively from a chromatograph, the method assumes that spectra are acquired while first compound A elutes, then A and B together in varying proportions, then A, B, and C, then B and C, followed by B, C, and D, then C and D, etc., until finally E alone elutes. Of course, simpler sequences, without one or more of the regions where three compounds are coeluting, are also handled satisfactorily. As Borgen and Kowalski have pointed out (6),good estimates of the spectra of compounds B, C, and D can be obtained even though no experimental spectrum is close to that of the pure spectra of those compounds. In order for the method to be generally applicable, the only assumption made about the spectra is that spectral amplitudes are linear with component concentration. After estimated spectra of the pure compounds have been obtained, and the contributions of each of the experimental spectra have been computed, we show how to use weak (i.e., generally true and not highly constraining) assumptions about the chromatographic peak shapes to compute improved estimates of the component spectra. In the work described below, two assumptions are used. The first is the very weak (because universally true) one that all concentrations are nonnegative. The other is somewhat stronger. For each chromatographic peak an interval equal to 3 times the square root of its second moment is computed. It is assumed that the peak’s amplitude is negligible for points that differ from the mean elution time by more than this interval. This assumption is less strong than requiring the chromatographic peaks to be Gaussian, for example. The improved estimates are used to compute new elution profiles, which may be used to reestimate the component spectra in an iterative process. Finally, the various computed quantities are used to calculate an error bound on the computed amounts of all the constituents. This bound, together with the computed elution curves, can be used to judge the success of the deconvolution process. The deconvolution method is described in detail below. A computer program to carry it out runs in a few minutes on a Hewlett-Packard 9OOO series 500 minicomputer.

THEORY Assumptions. The underlying assumption of the deconvolution method, generally true for chromatography, is that the spectra of mixtures are linear sums of the spectra of the components of the mixtures, with the contribution of each component proportional to its concentration. A second assumption is that the component spectra are linearly independent. The experimental spectra are factor analyzed to find the principal factors, which are assumed to be equal in number to the compounds present. A third assumption is that the spectra of these compounds can be represented to very good approximationby a linear combination of the principal factors. Below we describe in detail how the number of compounds present is determined and how the principal factors are found. The changes from spectrum to spectrum are assumed to be due to changes in relative concentrationsof the components present, rather than noise. If several successive spectra are essentially similar, it is assumed they are spectra of a pure compound. As stated in the introduction, the method assumes that no more than three compounds are coeluting at any time and that a series of spectra resulting from three compounds is preceded and followed by several spectra (at least four) that result essentially from mixtures of two compounds. As many

1405

as six compounds overall can be handled by the program in its present form. While modifying it to handle more, as long as the above criteria are met, is straightforward, computation time and mathematical uncertainty quickly increase as the number of components increases. If a long chain of overlapped peaks is to be deconvolved, it should be broken up. The program in its present state has not been designed to handle such cases; there must be regions where only one compound is eluting to effect a successful division. A final assumption is that the component spectra, when expressed as linear combinations of the first three principal components, are distinctly different. The reason for this will be seen in the next section. If these assumptions are not true, the results from the program will be invalid and will appear to be invalid because of poor fit, implausible peak shapes, or large values of estimated error. Estimation of Component Spectra. Factor Analysis. The series of spectra spanning the peaks to be deconvolved is used to construct a set of principal components in the following way. The total series, typically 30-100 spectra long, is divided into up to 12 equal groups of successive spectra, the division being done in such a way that the number of groups is maximized, up to 12, consistent with minimizing the number of spectra left over a t the beginning and end of the series. The result of the coaddition of the spectra in each group, a spectrum being regarded as a column vector, comprises a column of a data matrix D. Coaddition reduces noise and makes use of all the data, while limiting the number of columns in D to 12 reduces computation time. This treatment does not seriously impair finding the principal components as long as the span of each group of spectra is no greater than the width of a chromatographic peak. The experimental spectra may be windowed or weighted, but they are not normalized. A square covariance matrix is formed by multiplying D by its transpose D’

Z = D’D

(1) The eigenvectors and eigenvalues of Z are found by solving the equation ZQ = QA,where Q is a square matrix whose columns are the eigenvectorsof Z and A is a diagonal matrix whose elements are the eigenvalues. These both are arranged in order of the size of the eigenvalues. A matrix of factors is computed from the relation The principal factors, or principal components, are the columns of F corresponding to the larger eigenvalues. It is easy to show that the columns of F are orthogonal to each other, and they consequently form an orthogonal basis that spans the same space as the columns of D and the experimental spectra. From eq 2 it follows that

D = FA1/2Q’ (3) This is the singular value decomposition of D. The relation generally remains true to good approximation even if columns of F and matching rows of Q’corresponding to the smaller eigenvaluesare dropped from these matrices. In the absence of noise, the number of nonzero eigenvalues would be equal to the number of distinct compounds contributing to the spectra from which the data matrix is formed. In the presence of noise, and with small concentrations of one or more compounds, it may not be clear how many components are present. Projection of Normalized Spectra of a Plane. The heart of the method for estimating the spectra of the component compounds lies in expressing each experimental spectrum as a linear combination of the principal components or factors

TO‘)= FF’RG)

(4) where R(j) is the j t h experimental spectrum, TO’) is the ex-

1408

ANALYTICAL CHEMISTRY, VOL. 58, NO. 7, JUNE 1986

pansion of RG) as a linear combination of factors, and F is the factor matrix truncated to retain the desired number of factors. Each spectrum TO’)may be considered as a point in the space spanned by the orthonormal set of factors. Furthermore, if each TO’)is normalized so that the sum of its components is one, all the points lie on a (hyper) plane in this space, as is shown in ref 6. If there are two factors, all the points lie on a straight line; if three, they all lie on a plane; and if more, on a hyperplane in the higher dimensional space. Since each spectrum is a point, similar spectra must lie on the same point. Spectra that result from mixtures of two compounds become points that lie on a straight line between the points representing the spectra of the pure compounds. Similarly, spectra that result from mixtures of three compounds lie within the triangle whose vertices are the points for the spectra of the three pure compounds. The generalization to four or more compounds is obvious, but is not useful, because there is no simple way to use the geometrical information. To attack our problem of estimating the pure spectra, we limit the factors to the first three principal components so that all the points for our normalized experimental spectra lie on a plane. The spectra may then be represented by two coordinates in the plane, plus a third representing the distance of the plane from the origin, by means of a coordinate transformation. Viewed in this plane, the sequence of points corresponding to the sequence of spectra lies along a curve. Ideally, at least, this curve will consist of straight segments with more or less sharp curved segments joining them. Determining the Vertices. In the limit of low noise, the number of vertices or cusps in the curve described in the preceding section is equal to the number of components represented in the series of spectra. The series is subdivided for further analysis using the positions of the vertices as reference points. Thus it is important for the program to determine the vertices if it is to proceed without inputs from the user. Noise displaces the spectral points from the location they would otherwise have and, as might be expected, can make determining the number and position of the vertices difficult. The algorithm found most satisfactory for locating the vertices consists of finding maxima in the correlation between each spectrum TO’) and the next, TO’+ 1). If maxima are too close to each other, they are merged to the point midway between. If there are more than three vertices, the series of spectra is subdivided into overlapping groups at points midway between vertices in such a way that each group will have three components in it. Each group is factor analyzed as described above, and the three principal factors are used for the expansion of the spectra in a range containing three vertices. This expansion of each spectrum into a linear combination of the three principal components is used for the estimation of the component spectra. Estimation of the Component Spectra. The expansions of the spectra at the first and last vertices in the total series of spectra are used for the estimates of the spectra of the first and last componentsthat elute in the section of chromatogram being analyzed. Estimates for the intermediate components are found by extrapolating the straight-line segments on each side of the vertex (q.v., ref 6). The coordinates of their intersection are the coefficients of the linear expansion of the estimated spectrum in terms of the principal components. The spectra are given by the relation SO’)= FXG), where SO’)is the estimated spectrum, F is the matrix of factors, and XG) is the vector comprised of the coordinates of intersection by the jth vertex. An iterative procedure is used to find the fit to the experimental spectral points that produces the least uncertainty in the intersections. The estimates of the spectra

of the component compounds that have been made as described above comprise a matrix, s,whose columns are the spectra SO’). Computation of Concentration. The concentrations, relative to the concentrations of each compound that produce a total absorbance or total ion current of one, are found by leasbsquares fitting of the estimated spectra of the component compounds to each experimental spectrum and then integrating over the peaks using the trapezoidal rule to find their total area. Since the spectra are formed from linear combinations of the principal factors, the overall fit will be excellent; i.e., the sum of the relative concentrations will agree very well with the total relative concentration. The individual concentrations may nevertheless be substantially in error. The estimation of this error will be discussed in the next section. In this section we discuss some aspects of the computation of relative concentrations. The least-squares solution to the equation

RG) = SCG)

(5)

where RG) is the experimental spectrum to be fitted by the spectra that are the columns of the matrix S and CG) is the correspondingvector of relative concentrations, is that which minimizes the Euclidean norm IlRG) - SCG)ll. There are a variety of techniques for solving this problem. An orthogonalization method in which Householder transformations change S into an upper triangular matrix was chosen because it is efficient and has very good numerical stability. In the initial solution, the series of experimental spectra is segmented in a way similar to that used for factor analysis in connection with estimating the spectra. The matrix S is then limited to three columns containing the estimated spectra of the compounds present in that interval. This procedure improves the overall fit. It is convenient to proceed differently in estimatingthe error and improving the estimates of the component spectra and concentrations. Instead of the complete spectra, the coefficients of an expansion in principal factors are used. The latter are found by factor analyzing all the experimental spectra between the first and last vertices. The number of principal factors is equal to the number of compounds present. If the matrix of factors is F, then The columns of A are now vectors whose elements are equal in number to the compounds present; i.e., A is a square matrix. The elements of CG) found by solving the equation YG)= ACG) are no longer overdetermined, but the same algorithm used before to solve the least squares works to provide the solution. Error Analysis and Improvement of Fit. There are two sources of error in the computation of concentration: noise in the experimental spectra and error in the estimated component spectra. Given values for these, it is possible to compute a bound on the error in the concentrations. Of course, an error bound may not be a sufficient measure of error if a mistake is made in determining the number of compounds present or if an assumption such as linearity does not apply. In addition to the error bound, it is important to examine the deconvoluted chromatographic peaks. The program will return a result, but if the shape of the peaks is implausible, the result should be treated with extreme, if unquantifiable, skepticism. Noise on Spectra. The noise on each experimental spectrum is assumed to be orthogonal to the principal factors. Its root mean square amplitude is therefore given by the Euclidean norm IlRG) - FF’RG)II. Because of the factor expansion we are using, this is also equal to the norm of the residual misfit between the experimental spectrum and the

ANALYTICAL CHEMISTRY, VOL. 58, NO. 7, JUNE 1986

least-squares fitted linear combination of estimated component spectra. Error i n Component Spectra. The error in the matrix of component spectra is more difficult to evaluate. We estimate it by using the deviations of the computed concentration c w e s from a standard. Consider the matrix of concentrations C, whose rows are the computed elution profiles for the components, and whose columns are the computed concentrations found from each experimental spectrum as described above. All negative values of the concentrations are set to zero, and the standard deviations of the concentration profiles are computed. All values three standard deivations from the centers of gravity of the concentration profiles are set to zero also. The remaining concentrations are renormalized so that the sum of the elements in each column remains what it was before. The difference between the original value of C and its corrected value is E, the estimated error matrix for concentrations. Specifically including these error matrices in the equation Y = AC, we have

Y = (A

+ B ) (C + E)

(7)

This reduces to (A

+ B)E = -BC

(8)

+

where we know A B and have estimates for C and E. This can be solved to get an estimate for B. Improvement of Deconvolution and Spectral Estimates. We can now get an improved value for the matrix A from A = (A B) - B. An improved value of C is then obtained by solving Y = AC. By iterating, new values for E and B are computed, leading to presumably better values of A and C. The improved estimates of the component spectra can be made explicit by the relation S = FA. Generally, with iteration, the value of B becomes smaller, but it does not necessarily go to zero, for the result we are trying to force may not be attainable with a limited number of factors. When the iteration process seems to work well, the change in the spectral estimates is relatively slight. Estimation of Total Error Bound. The expressions derived in Stoer and Bulirsch (9),p 202 ff, are adapted to compute the error bound. An additional quantity is also evaluated the norm of the difference between the total signal chromatogram and the reconstructed total signal chromatogram:

+

This is generally a smaller number, unimportant in its contribution. It is a measure of the misfit of the total chromatogram. Since A is a square matrix, the important quantity in estimating the error bound is the least upper bound of its inverse, lub(A-’) = where A, is the smallest characteristic value of A’A. We also need to know lub (B), as well as the norms of the total concentration vector, X, and the total noise, AY

The error bound, IlAXll, adapting the expression of Stoer and Bulirsch, is then

+

IlAXll = lub(A-l) lub(B)IIXII (lub(A-1))2 lub(B) 11AY I I + lub (A-l) 11AY II (12)

It should be emphasized that the error bound is only an

1407

estimate, primarily because we can only estimate B, and lub(B) in turn is not precisely the same thing as the least upper bound of the errors in the matrix of component spectra. If lub (A-l) is large, as it will be if component spectra are highly correlated, the error bound may very well be larger than the integrated area of a component peak, especially if the peak is small. The error in each component peak must be less than the error bound, but one cannot really say how much of the total error should be ascribed to each peak. While the shape of the peaks in an important diagnostic tool for deciding whether the answers produced are to be believed, the peaks may look all right, and yet there can still be considerable error. A large value for lub(A-’) will indicate that this may be the case. Synthesis of Component Spectra. The complete spectra of the components are reconstructed from the same spectra used to form the data matrix, without any weighting or windowing, by assuming that they are the same linear combination of the original data that the weighted component spectra used for deconvolution are of the weighted spectra that comprise the data matrix. When the estimates of the weighted component spectra are “improved”, the complete spectra are also updated and stored. EXPERIMENTAL SECTION The LC/UV-vis data were obtained by using a mixture of polynuclear aromatics (PNA) that was partially separated by liquid chromatography. The eluting mixtures were detected with a Hewlett-Packard 8450A UV-vis spectrometer using a special LC absorption cell. The data were stored on a floppy disk; another computer was used to read the disk and generate an ASCII file on a tape cartridge, which was read by the computer used to analyze the data. Finally the data were stored in a binary file on a hard disk. The spectra were windowed to include the 200 absorbances at 1-nm intervals between 200 and 399 nm. The GC/MS data was obtained with a Hewlett-Packard 5790 gas chromatograph and a Hewlett-Packard 5970 mass-selective detector. The column was 50 m X 0.2 mm X 0.5 fim film, coated with cross-linked methyl silicone. The injection was 1fiL, split 400:l;the injection port was at 250 O C . The oven was programmed for 35 “C (15 min) to 70 OC at 1.5 OC/min to 130 “C (60 min) at 3 OC/min. Data from a PNA calibration sample were windowed between 40 and 150 amu. The data were transferred to the computer’s disk memory in ASCII format with a tape cartridge. For the infrared data, a synthetic mixture was injected in a Hewlett-Packard 5890 chromatograph using a split/splitless injection port in the splitless mode. The column was 12 m long, 0.32-mm-i.d. fused silica with a 1.0-fimcross-linked 5% mthyl silicone coating. Oven temperature and flow rate were adjusted to give overlapped peaks. The effluent was measured with a noncommercial FTIR spectrometer. Interferograms to be transformed into spectra with a nominal 8 cm-’ resolution were acquired at a rate of three per second, and stored on disk. The interferogramswere Fourier transformed and the resulting spectra normalized and converted to absorbances after acquisition. The processed spectra were transferred to the analyzing computer by tape cartridge. During execution of the program, the data were windowed between 800 and 1800 cm-’ and 2800 and 3100 cm-’ and further processed as described below. RESULTS AND DISCUSSION All the data analyzed are real experimental data taken by others and were total unknowns to the author. UV-vis Spectra. A section of LC/UV-vis chromatogram, where the total absorbance shows three peaks, was analyzed. The three dominant principal Components of the set of spectra comprising this section are used as a basis for the projections of each spectrum in the series, normalized so that the sum of the elements of the expansion is unity. As noted above, each spectrum is then a point in the space spanned by the three basis vectors, and all the points lie in a plane, shown in Figure 1. The algorithm for finding the number of chemical compounds present in the chromatographic stream detects

1408

ANALYTICAL CHEMISTRY, VOL. 58, NO. 7, JUNE 1986

Figure 1. Normalized projections of UV-vis spectra in the space of their first three principal factors. The solid line joins running averages of five consecutive points. The arrows polnt to the vertices.

Figure 4. Normalized projections of mass spectra in the space of their first three principal factors. The vertices are indicated by arrows.

I

w TIME

TIME

Figure 2. Section of chromatogram of overlapped peaks deconvolved using UV-vis spectra. Solid ilne is the sum of the components.

Flgure 5. Section of chromatogram deconvoived using mass spectra. The negatlve concentratlon for part of the second peak is the result of a poor estimate for the spectrum of the first compound.

1

200

240

280 320 360 WAVELENGTH (nm)

400

TIME

Flgure 6. The section of chromatogram of Figure 5 after the spectral estimates have been automatically improved.

b l

200

240

280 320 360 WAVELENGTH (nm)

400

Figure 3. (a) Spectrum of the second compound of Figure 2 deduced from the experimental mixture spectra. (b) Spectrum of the same compound well-Isolated chromatographically.

four of them, whose spectra are estimated as described in the previous section. The deconvolved chromatogram is shown in Figure 2. In this case the noise is small, and consequently the spectral estimates are good, so the procedure to improve them results in barely perceptible changes. The estimated spectrum of the second, and least resolved, of the four compounds is shown in Figure 3a; the spectrum of the same

compound, well-isolated chromatographically, is given in Figure 3b. Mass Spectra. The effect of the improvement procedure shows up much more strongly in the deconvolution of a section of GC/MS chromatogram discussed below. Figure 4 shows the projections of the mass spectra in the space of the first three principal components. From the figure, it can be seen that there are initial and final bunches of points and two changes in direction of the locus of points in between. This constitutes four vertices, and hence four compounds are taken to be present. In this case, unless there is some preliminary smoothing of the data, the vertex-picking algorithm finds too many vertices because of the noise in the data. Vertices should be at regions where the density of data points in their locus plane is a maximum, correspondingto maxima in the relative concentrations of the eluting compounds. It is therefore easy to reject the extra ones that appear where the density of points is low. Doing so leads to the deconvoluted peaks shown in Figure 5. The negative amplitude for the second component indicates that the estimates for the spectra could be improved.

ANALYTICAL CHEMISTRY, VOL. 58, NO. 7, JUNE 1986

1409

a I-

I K

a

I

2

0

5 Y

I

K

TIME

MASS NUMBER

b'

Figure 8. Deconvoived GC/FTIR chromatogram. The solid line Is the sum of the components: the dashed line is the total signal.

I

i

50

100

150

MASS NUMBER ~~

Figure 7. ?a) Estimated mass spectrum of peak 2 in Figure 6. (b) Estimated spectrum of peak 3 in Figure 6.

Table I. Change in Peak Areas and Error Bound with Iteration of Routine To Improve Spectral Estimates area of peak iteration

1

2

3

4

error bound

1 2 3 4 5

167.2 155.0 153.4 152.5 151.9

17.8 31.2 33.3 34.4 35.3

7.7 9.4 9.3 9.3 9.3

41.1 39.5 39.1 38.8 38.6

77.0 20.2 15.2 11.7 10.1

The results after five iterations of the "improvement" procedure are shown in Figure 6. The final estimated mass spectra for compounds 2 and 3 are shown in Figure 7 . The negative mass peaks, which are also present with even larger amplitude in the original spectral estimates, must be considered as noise. The small preliminary peak in the elution profile of the second compound may be the result of the presence of another compound very similar to the first. It is not possible to deconvolute two peaks that are both strongly overlapped and where the differences in their spectra are similar in magnitude to the noise. The iterative improvement process does not change the spectral estimates very much. As might be expected when there are only weak requirements on the elution profiles, the estimates must be reasonably accurate in the first place if the procedure is to converge to a better result. Table I shows how the estimates of peak areas and error bound change with each iteration. Except for the increase in the second peak's area as the negative part disappears, changes in peak area are less than 3% after the first iteration. IR Spectra. The same methods work with GC/FTIR data as well. Because GC/FTIR spectra tend to be noisier than LC-UV-vis spectra, for example, and often have base lines with a slight curvature that varies from spectrum to spectrum, it helps to treat the E t spectra to minimize the effects of noise and base-line curvature. The procedure used is to first window the spectra to include only those areas where there is information and then to take second differences, with a difference interval of several resolution elements to match a typical IR bandwidth. Typically this is five resolution intervals for IR spectra taken with 8 cm-I resolution and 1 order of zero filling.

About 100 of the largest values are retained and the rest set to zero. The final result is not sensitive to the number of second differences used over a very large range. Although this process is nonlinear, it works well in practice. Figure 8 shows a mixture deconvoluted by use of infrared spectra. Discussion. Mathematical deconvolution is by no means a substitute for good chromatography. For one thing, there must be a chromatographic resolution of about 0.4-0.5 or greater for mathematical methods to work with any reliability. However, since overlapping chromatographic peaks are common (IO), a mathematical method that can in many cases complete a separation will be useful if it requires about the same or less time as an additional chromatographic run. Further, in the case of intractable separations, spectral identification can help lead to more appropriate chromatographic conditions. The program described here runs in several minutes (approximately1-5, depending on the amount of data processing done) on a moderately powerful technical computer. It does not require much input or judgment from the user. In those cases where the mathematical deconvolution does not work, it is apparent from the result, and even then useful information on the complexity of the chromatographic region examines will have been obtained. When it does work, time may be saved, and there will be a better understanding of the reliability and nature of the result than if the information present in the spectra had not been used. Another obvious requirement for practicality is for the computer on which the program runs to have access to the data produced by the chromatographic detector in a resonably direct way. Ideally, the program would run on a computer that controls the instrument producing the series of spectra, or at least has direct access to data files the instrument produces. There is now a considerable body of literature on deconvoluting overlapping chromatographic peaks. This article describes an attempt to increase the practicality and ease of use of mathematical deconvolution. The method has these characteristics: (1) Deconvolution can be performed in a reasonable time. (2) The chromatographic overlap can be fairly complicated; i.e., up to three components may elute at once, and chains of up to six compounds can be handled at a time. (3) The method is applicable to any sort of chromatographic detector that produces spectra whose amplitudes are linear with concentration. (4) No assumptions are made about the shape of the elution peaks except nonnegativity and negligible amplitude far from the mean. (5) No assumptions whatsoever are made about the structure of the spectra of the constituent compounds. (6) An error bound is obtained on the computed concentrations. ( 7 ) No previous knowledge of the spectra of the eluting compounds is required. (8) Estimates of the full spectra of the eluting compounds are created that can be used to identify them. With stronger assumptions on peak shape, and with more computation, it is possible to deconvolve overlapping peaks where more than three compounds elute simultaneously.

1410

Anal. Chem. 1986, 58, 1410-1414

Harris et al. (11, 12) have done this by assuming a specific peak shape and finding by iteration the set of peak shape parameters (two for each compound) that minimizes the residual of the least-squares fit to the data. It is important to realize that more can be done with more prior information and more computational effort. However, the biggest obstacle to the usefulness of these methods is not their limitations in handling complicated, highly overlapping peaks, but the problem of doing the required computations on the data produced by a chromatographic detector with a minimum of time and effort.

LITERATURE CITED (1) (2) (3) (4) (5)

Lawton, W. H.; Sylvestre, E. A. Technometrics 1971, 13, 617-633. Sharaf, M. A.; Kowalski, 8. R. Anal. Chem. 1982, 5 4 , 1291-1296. Osten, D. W.; Kowalski, B. R. Anal. Chem. 1984, 56, 991-995. Chen, J.-H.; Hwang, L.-P. Anal. Chlm. Acta 1981, 133, 271-281. Vandeglnste, 8.; Essers, R.; Bosman, T.; Reijnen, J.; Kateman, G.

Anal. Chem. 1985, 5 7 , 971-965. (6) Borgen, 0. S.;Kowalski, B. R. Anal. Chlm. Acta 1985, 174, 1-26. (7) Gemperline, P. J. J. Chem. Inf. Comput. Scl. 1984, 2 4 , 206-212. (8) Vandeglnste, B.; Derks, W.; Kateman, G. Anal. Chim. Acta 1985, 173, 253-264. (9) Stoer, J.; Bulirsch, R. Introduction to Numerical Analysis ; SpringerVerlag: New York, 1980. (10) Davis, J. M.; Glddlngs, J. C. Anal. Chem. 1983, 5 5 , 418-424. (11) Knorr, F. J.; Thorshelm, H. R.; Harris, J. M. Anal. Chem. 1981. 53, 821-825. (12) Frans, S. D.; McConnell, M. L.; Harris, J. M. Anal. Chem. 1985, 5 7 , 1552-1559.

ACKNOWLEDGMENT I thank Barry Willis, Edward Reus, Leslie Hodges, and Edward Darland for providing the experimentaldata on which the program was developed and tested and John Michnowicz for guidance and encouragement.

RECEIVED for review November 13, 1985. Accepted January 21, 1986.

Integral Method for Evaluating Component Contribution to Total Solution Absorbance from Chromatographic Data David T. Rossi,* Frank Pacholec, and Linda M. Dawkins

Monsanto Industrial Chemicals Company, 800 N . Lindbergh Blvd., St. Louis, Missouri 63167

A method for the converslon of lsocratic HPLC ultravioletvisible absorbance data into static (solution) absorbance data has been developed and evaluated. The method is used to Indicate If all chromophores of a mutticomponent mixture elute from the chromatographic column and to estimate what percentage of solution absorbance at a given wavelength is attributable to a particular peak. The procedure Involves summation of absorbance vs. time chromatographicdata and subsequent normaiizatlon by constants that account for the sample Injection volume, the flow rate, and the tlme interval between data polnts. For single-component solutions linear regresslons of data obtained from this Integration process vs. static absorbance data yield straight lines wlth unity slopes and zero intercepts. The standard devlatlon associated with the integrated absorbance Is f8 mAU compared wlth f3 mAU for static absorbance.

A problem often presented to the analytical chemist involves the characterization of mixtures of solutes containing chromophores (1-3). High-pressure liquid chromatography (HPLC) is one of the most frequently employed techniques for separating such mixtures because it is readily implemented, has a great deal of resolving power, and in one form or another, is almost universally applicable (4). In such characterizations, it would be useful to determine if all the components that contribute to the static absorbance spectrum of the mixture contribute to the chromatogram in the same proportion. This is desirable because components that are stable in solution are not always stable under the conditions required for a HPLC separation and because components of interest often cannot be eluted from a particular column packing. With these limitations in mind, methodology for direct comparison of HPLC elution profiles with solution absorbance spectra has been developed. Previous work involved numerical integration and normalization of multiwavelength chromatographic data sets as a means for optimizing detection limits 0003-2700/86/0358-1410$0 1.50/0

of chromatographic components (5). Through a related numerical procedure, multiwavelength chromatographic data can now be integrated and normalized in order to yield absorbance data that are equivalent to those obtained by static absorbance measurements (Le., a spectrum of the solution). This data processing strategy has been applied to one-component solutions in order to evaluate accuracy and precision and to the characterization of a multicomponent reaction mixture.

THEORY The mass, m,, of component i flowing through the chromatographic detector cell during a period between two observations can be given by the average concentration of that component between observations, ci,times the volume of the mobile phase passing through the cell between the observations (6),AV m, = CiAV (1) The volume of mobile phase that will pass through the cell between observations is given by (2) AV = FAt where F is the flow rate and At is the time interval between observations (6). Combining eq 1and 2 and summing between observation times a and b, the total mass of component i in the sample is

m=

b

b

t=a

t=a

Em, = x C i F A t

(3)

The concentration of the component in the sample, C,, is the total mass of i injected divided by the injected sample volume, V , (7),

CiFAt

c,

=

t=a

-

v,

(4)

By use of Beer’s law, the average concentration of component i between observations is given by Ci= Ai/lti (5) 0 1966 American Chemical Soclety