Computer analysis of unresolved nonGaussian gas chromatograms by

School of Chemistry, The University, Newcastle upon Tyne, England NE1 7RU. Overlapping ... be a separate instrument, or a computer program operating...
0 downloads 3 Views 845KB Size
Computer Analysis of Unresolved Non-Gaussian Gas Chromatograms by Curve-Fitting Andrew H. Anderson, Terence C. Gibb, and Anthony B. Littlewood School of Chemistry, The University, Newcastle upon Tyne, England NE1 7RU. Overlapping chromatographic peaks can be analyzed quantitatively by constructin mathematically a synthetic chromatogram which i s h t e d to the experimental chromatogram by iterative nonlinear regression analThe analytical parameters are then obtained rom the synthetic chromatogram. In this paper, the method is extended so that nonanalytic peak shapes such as are obtained in ordinary nonideal chromatography may be fitted. The mathematical technique was coded for machine digital computation and evaluated on realistic nonideal chromatograms of mixtures of aromatic hydrocarbons. For any degree of peak overlap for which the existence of more than one peak could be detected by the program, the analytical accuracy obtained from the fitted chromatogram was not worse than about A1l2%,which was the accuracy of the experimentation. Thus, the accuracy of this technique of mathematical analysis at least equals that of the authors’ technique in chemical analysis and is compatible with the standards of ordinary gas chromatographic analysis.

. ris

I N OBTAINING ANALYTICAL data from chromatographic peaks, it is usual to measure the chromatogram itself. The simplest method is to estimate peak positions, areas, etc. from a stripchart record. This procedure can be automated by using an integrator for area determination; the integrator can either be a separate instrument, or a computer program operating on a digitized record. Most computer programs used for this purpose evaluate the area by summation of strips, by trapezoidal addition, by the use of Simpson’s Rule, or by some other such simple numerical method applied to the digitized data points. A fundamentally different method of data reduction is not to obtain information directly from the data record, but instead to construct a synthetic data record from estimates of the parameters of analytical interest, subsequently modifying the parameters until the synthetic record fits the actual record as closely as possible. The parameters giving this fit are then used to give the analytical information. Curvefitting procedures of this kind have several advantages.

They can provide separate parameters for each peak in a data record consisting of overlapping peaks. This is the aspect emphasized in this paper. As an example, the authors showed previously that quantitative analysis is possible on Gaussian peaks separated by only about 1.5 u where u is the standard deviation. Data subject to noise can be fitted as easily as smooth data; furthermore, if the criterion of fit adopted is that of least squares residues, the final result is the most probable fit, and the method contains the means of calculating the relevant statistical parameters, e.g., confidence limits for the derived parameters. If the data are not noisy, the number of data points required to obtain the analytically significant parameters need not grossly exceed the number of parameters to be found. With perfect data, of course, the number of data points need usually not exceed the number of parameters. With smooth data such as are found in ordinary gas chromatography, the number of data points need only be two to three times the number of parameters, which is of importance where data transmission is difficult. 434

ANALYTICAL CHEMISTRY, VOL. 42, NO. 4, APRIL 1970

It is an easy matter to carry out preliminary tests on the data to identify and correct trivial imperfections such as base-line shift, spikes of noise, etc., before applying the main data reduction procedures. Against these advantages, there are two main drawbacks: It is necessary to know the general shape of the peak to be fitted even though for a particular application the shape is not relevant in obtaining the analytical result. In curve-fitting by nonlinear regression analysis, as used here, one must have an initial estimate of the parameters to be determined. In the case of badly overlapping peaks where the curves being fitted are not perfect representations of the data, the minimum in least squares residues can be a very shallow function of the values of the parameters, usually resulting in slow convergence or convergence on to false values. Previously, we described fitting Gaussian peaks to the peaks of gas chromatograms ( I ) . With alkanes as solutes on alkane solvents, we found that analytically acceptable results were obtainable so long as peaks in the data did not overlap by more than corresponds to less than 2~ separation. It was clear, however, that the inaccuracies obtained for peaks overlapping more than this were due to the fact that the peaks were not Gaussian. There are two remedies for this defect. One is to contrive, either empirically or on the basis of theories of nonideal chromatography, equations for chromatographic peaks more realistic than Gaussian (1-3). The other is the more empirical approach of using the chromatogram of one pure component to define the peak shape, and to assume that this shape is modified in different chromatographic peaks only by change of a limited number of parameters, e.g., position, height, or width. In this paper we consider the second solution ( 4 5 ) . Though the procedures described in this paper are applied to gas chromatography, it should be emphasized that they are applicable to any technique where the instrumental response appears as “peaks” even if these be noisy, for example, in the analysis of Mossbauer spectra (6) in which Poissontype noise is inherent. THEORY OF CURVE-FITTING PROCEDURES

General Theory. In the programs, curves are fitted by nonlinear regression analysis, a detailed general account of which is given by Box (7); the method as pertains to IR analysis is (1) A. B. Littlewood, A. H. Anderson, and T. C. Gibb, “Gas Chromatography 1968,” Proceedings of the Seventh International

Symposium, C. L. A. Harbourn, Ed., Institute of Petroleum, London, Eng., 1969,p 297. (2) R. D. B. Fraser and E. Suzuki, ANAL.CHEM.,41, 37 (1969). (3) J. Pitha and R. N. Jones, Can. J. Chem., 45,2347 (1967). (4) A. H.Anderson, T. C. Gibb, and A. B. Littlewood, Chromatogruphia, 2, 466 (1969). ( 5 ) W. D. Keller, T. R. Lusebrink, and C. H. Sederholm, J. Chem. Phys., 44,782 (1966). (6) B. J. Duke and T. C. Gibb, J. Chem. Soc. (A), 1478 (1967). (7) G. E. P.Box, Ann. N. Y.Acad. Sci., 86,792 (1960).

outlined by Pitha and Jones (8) and as pertains to G C by the authors ( I ) and by Fraser and Suzuki (9). The description below indicates how the basic technique can be generalized to cover most of the circumstances encountered in fitting data in the form of peaks on a base line. It is assumed throughout that each and every peak can be specified by a single general function f of a single independent variable conveniently taken as the time, I, each peak being specified by the values of a finite set of parameters in f. Thus, an individual peak k can be specified by:

where bk is a vector of parameters. Most commonly, a peak is specified by parameters of position, width, and height. In the case of overlapping peaks, a set of which we call a “range,” for which curve-fitting methods are particularly applicable, each peak is specified by a set of independent parameters. Also, there may be additional parameters belonging to the range, e.g., base-line position, or a slit width or introducer volume providing a function convoluted with the peaks, etc. (3,IO). Thus, in general, a range of peaks is described by : Y = f(t,b)

+

+

(3)

Let the error of a point ( ‘ ) A ,be (l)F,,so that (‘)F, = c i (l)Ai. Thus the sum ( l ) S of the squares of the errors at all (8) J. Pitha and R. N. Jones, Can. J. Chem., 44, 3031 (1966). (9) R. D. B. Fraser and E. Suzuki, ANAL. CHEM.,38, 1130 (1966). (10) A. Savitzky and M. J. E. Golay, ANAL. CHEM.,36, 1627

(1964). (11) A. D. Booth, Trans. Soc. Inst. Tech., 19, 12 (1967).

(1)s= (I)F’(l)F

(4)

and similarly for any subsequent iteration q. The criterion of the best fit is that S be minimized with respect to variation in any parameter (q)b$,i.e.,

a w

--

b(q)bj

-0

( j = 1 ton)

Since the initial estimate is not too far from the best estimate and subsequent estimates are presumed to be closer, the changes in the components of (q)b,A(q)b,required to convert (q)Ainto (q+’)Acan be related with (q)Aby a Taylor expansion in which all terms higher than the first can be neglected. This leads to the relation (q)A = (q+’)A+ G . A ( q ) b

(6)

A(q)b = Cs+l)b - (q)b

(7)

and G is a m X n (row X column) matrix of differential coefficients defined by

(2)

where b is a vector of parameters which includes all those which characterize the peaks and also those which characterize the range as a whole. In practice, the choice of parameters depends very much on the experimental circumstances, and different authors have made different choices. The simplest is to specify a range of p peaks by p height parameters, assuming positions and widths to be given (11). In this case, the regression analysis becomes linear. However, any slight error in external specification of the unfitted parameters can lead to large errors in analysis, Alternatively, a range of p peaks can be specified by 3p parameters of height, width, and position, and assuming a constant base line (9). Since, however, peak areas are extremely sensitive to the position of the base line, the authors consider that this should always be included as a parameter, thus giving 3p 1 per range ( I ) . If necessary, a fourth parameter per peak can be added to take account of changes in peak shape (2,3). Another choice adopted for the analysis of experimental data described in this paper is to assume a constant chromatographic plate number, thus mak2 parameters per range. The ability thus to retain ing 2p freedom of choice of variables appropriate to the data forms a valuable feature of the technique. Let the experimental data to be fitted be an array of m points c = (cl, c-, c 3 .. . . .c,,J. Since the relation between Y and at least some of the parameters is nonlinear, the curve fitting must be performed by iterative improvement of an initial estimate of the n parameters (7). Let the initial esti, . . . .(l)bn); mate of the parameters be (‘)b = ( ( l ) b ~(‘)bz,. let these parameters give rise to estimates of the data ( ‘ ) A = ( ( ‘ ) A i ,(l)Aq,,. . . . ( ‘ ) A , ) by (’)Ai = f(ti,(’)b)

data points is :

Equations 4,5 , and 6 can now be combined as in the standard technique of linear regression analysis in many variables (12) to give a set of n normal equations: G’GAb

=

G’F

(9)

where the prime denotes the transpose. G’G is a n X n symmetrical matrix; for discussion let G’G = Z . Providing that m 3 n and that the data points are not by remote chance such that the rank of G is less than n, Equation 9 can be solved for Ab given the components of Z. In most previous work, the latter have been obtained by analytic differentiation of fk for each peak, f being available as an analytic function, e.g., Gaussian, Lorentzian, etc. (I-3, 8, 9). Use of Empirical Peak Shapes. In the present work, we do not assume any analytic form for fk; merely that such a function exists. We also assume the peaks are geometrically similar and thus can be specified by parameters of position, ordinate scale, and abscissa scale. Thus, we assume that the equation of a peak may be written

so that bk = ( h k , tk, wk),the components being chosen to stand for height, position (on a time axis), and width. An analogous technique has been applied to NMR spectra by Keller, Lusebrink, and Sederholm (5). The peak shape f is obtained in a separate experiment in which a single peak is digitized. In the computer program, it is convenient to modify this peak so that its base line is set at zero and all points are normalized so as to give a maximum of unit height. Since in this case flat base-line data are not significant, the total size of the data set can be reduced to typically below one hundred numbers during this preliminary processing. Let this peak be called the “standard,” with parameters b, of which t , and w, can be arbitrary. Points (s)A1:are obtained from the standard array and (12) e.g., P. G. Guest, “Numerical Methods of Curve-Fitting,” Cambridge, 1961. ANALYTICAL CHEMISTRY, VOL. 42, NO. 4, APRIL 1970

435

from (g)b as follows. (g)AI is the sum of the current baseline estimate and a set of contributions from each peak:

convergence, and there are also comparisons of the use of different techniques in spectroscopic contexts (5, 7). The majority of the work described here uses the method of Powell (13). In this, Equation 7 is modified to A(db

The base-line contribution requires no separate calculation, being already a component of (*)b. The contributions (q)AIkfor each peak k at each data point i are calculated by operations which consist, in effect, of applying (Qba of each peak to the standard, thus constructing a peak from which values of the ordinate at any point may be measured from the standard array. The procedure employed to do this is complicated by the fact that both the standard and the data consist of arrays of digits discontinuous in the abscissa, and in the alteration in the abscissa scale implied when wk is substituted for ws,digital points in the abscissa of the data array will not coincide with those of the array of the standard unless by remote chance ws/wkfor a given peak k happens to be an integer. It is thus necessary to render either the standard or the array of peaks effectively continuous, so that points scaled from one array can be assigned values in the other. We choose to render the space of the standard continuous. Thus, for each peak k let there be a variable :

r

= (i

- tk) - - fa W8

(12)

Wk

r is discontinuous and nonintegral, but has the scale of standard space, and if i were a continuous variable, (q)Atkwould be given by :

=

((g+l)b - (a)b)/X

(16)

where X is a scalar parameter which is calculated in a separate minimization procedure also described by Powell (14) so that cq)F is minimized. In an alternative method, derivatives were calculated by the finite difference method described, and were incorporated into the procedure CALFUN previously used by the authors in the minimization procedure LABMIN (1). Both this and Powell's method converged satisfactorily and the computation times used by each were comparable on the experimental data tested. The procedure of Powell is sophisticated, but as at present coded has the drawback that the matrix G is stored in the course of an elegant method of matrix inversion of Z (15). Since G is of dimension m X n, the storage required is proportional to the number of data points in a range which may prove prohibitive. The LABMIN technique stores only Z, which is n X n, which is necessarily smaller. It should be noted, however, that since one of the advantages of curve-fitting techniques with smooth data is that they required comparatively small data arrays, the extra storage required by G as compared to Z may be of little importance. For example, if the parameters are four times over-determined (m = 4n), and in a typical instance there are 4 peaks so that probably n = 13, Z has 169 numbers, while G has 676 numbers. Neither figure is large for storage in even a small computer. PROGRAM ORGANIZATION

As mentioned previously it is convenient to set h, = 1 . The r-space is now rendered sectionally smooth in the first derivative by enabling points from the data array to be linearly interpolated between its successive digits; let t be the integer next below r (i.e,,t = entier r) so that points t are those of the standard array. Then: Aik = hk(f(t)

+ Lf(t + 1) - f(t)l ( r - 0 )

(14)

In order to obtain differentials of the form required for G , values of A are obtained from Equations 11 and 14 from a set of parameters cq)b. Each parameter in (q)bis then augmented by an arbitrary small fraction, E , and new values of A are obtained. This procedure is repeated for every data point and every parameter, and the finite differences are given by:

where i is the unit vector along directionj. Since the standard is rendered sectionally smooth for the evaluation of the differentials, the choice of the finite difference E matters little, except that if it is made too small, rounding errors may be appreciable in computers with small It should be noted word lengths. We have used E = that this procedure is suitable only for calculating first differences, since the standard is smoothed only in its first differential. Organization of the Iterations. Straightforward application of Equation 7 for iterative improvement of b is impractical, since, because of the nonlinear nature of the problem, it can lead to slow convergence, oscillation, or convergence on to false minima. The literature contains many techniques for modifying the iteration step so as to secure rapid, uniform 436

ANALYTICAL CHEMISTRY, VOL. 42, NO. 4, APRIL 19

The procedures for curve-fitting were incorporated into a program for handling gas-chromatographic data in which any imperfections in the data were either corrected or noted, after which an initial estimate (l)b was found. This estimate was improved iteratively by curve-fitting as described, and eventually the results were displayed in the form of retentions and peak areas. The principal logical decisions particularly relevant to G C data are outlined below. Finding Peaks. Peaks were found by a procedure based on first differences similar to that of PEAKFIND previously described ( I ) , operating on data smoothed as in the section following. Smoothing. The method described previously was used (1). It may be noted that the technique of deducing the first derivative from the slope of the straight line drawn through a group of points and giving the smallest least squares residues is identical with the technique described by Savitzky and Golay (10) for quadratic smoothing when the number of points over which the smoothing is applied is odd. Subdivision of the Chromatogram. The chromatogram is divided into ranges of peaks each of which is processed separately by curve-fitting. The boundaries of ranges are defined by points where the record has virtually returned to the base line. Such boundaries can be located either on a threshold basis by looking for points between which the record is higher than the base line by at least a given value, or on a slope basis by looking for points where the slope rises above a given value. A similar choice exists with the (13) M. J. D. Powell, Computer J . , 7, 303 (1965). (14) Zbid.,p 155. (15) J. B. Rosen, J. SOC.Znd. Appl. Math. 8, 181 (1960).

familiar digital integrator. We have used procedures with both criteria, and, as was found for digital integrators, the slope criterion was found more satisfactory, especially for picking up small peaks on a sloping base line. In the present context, however, the experiments were designed to produce “bad” peaks, i.e., non-Gaussian ones with which to evaluate the curve-fitting procedures, and these peaks usually had very long tails for which the slope criterion was unsuitable. Thus, we have used a procedure embodying the threshold criterion. Since an accurate value of the base line is of great importance for accurate quantitative analysis, the bounds of a range are extended outward from the points indicated by crossing the threshold to include where possible extra weighting of the base line in the curve-fitting. Base-Line Drift. A sample of base line is selected from a group of data points near the beginning of any chromatogram, such that it satisfies tests for a low level of noise. Any sample which contains e.g. a spike of noise or an initial disturbance often found in chromatograms will not be accepted. Its average is taken as the initial base-line value. The same procedure is employed at the far end of the chromatogram in order to select a suitable end-value for the base line. The base line is then assumed to be the straight line between the initial and final values. It should be noted that the value of the base line so constructed is not necessarily, and indeed not usually, that credited to a range of peaks after fitting, since the base line is a separate parameter in b. However, both values will have the same relative slope. Therefore, if in a given run the base line is actually “bowed” between its initial and final values, areas determined from the finally fitted values of b for any range need not be in gross error because of the bowing. Avoidance of Gross Over-Determination. The computer central processing unit time is roughly proportional to the number of points in the data array, and in the case of programs using the matrix G , so is the storage required. Thus, with smooth data, it is wasteful to have very many more data points than are required. Thus, a procedure is adopted so that in any range the total number of data points is determined and compared with the total number of peaks. A parameter d is then defined and applied so that only 1 in d of the data points is admitted to the curve-fitting procedures, d being that integer such that there are between 8 and 16 points per peak. In the case of badly tailing peaks, it is sometimes judicious to increase this number somewhat to ensure adequate definition of the leading edges of the peaks. Other Logic. The criteria for minimum range size, peak rejection, and termination of iterations are as described previously ( I ) . Spikes of noise from an FID detector were rejected by a clause replacing any data point grossly different from its neighbors by an interpolated value. EXPERIMENTAL In order to evaluate programs embodying the above procedures, they were applied to chromatograms obtained from accurately compounded mixtures of ethylbenzene, p-xylene, and m-xylene chromatographed on columns using dinonyl phthalate as stationary liquid at temperatures between about 40 and 60 ‘C. The elution is in the order given, with relative retention of ethylbenzene and p-xylene being between 1.09 and 1.10, and that between the xylenes being between 1.03 and 1.04, the separation increasing with decrease in tempera2000 plates were used, ture. Columns of about 7000 thus usually securing good separation of ethylbenzene, but only very partial separation of the xylenes. The badly overlapping xylene peaks provided the principal test of the accuracy of the analysis using our procedures.

*

Since also the aim was to test the programs on non-Gaussian peaks, no precautions against tailing were taken. Packed columns about 4 m long were used, and a capillary column of about 15 m. The introducer used had an appreciable volume, and thus peaks from the capillary column were considerably convoluted with an exponential function. As may be seen from the figures, the resulting peaks were both skewed and tailed, but not unreasonably so, and the peak shapes are not inconsistent with those obtained in a great deal of ordinary practice. Two detectors were used; first a reaction coulometer (16, I 7 ) , and second, a flame ionization detector (FID). With the former, areas are accurately proportional to weights since molecules of the CS hydrocarbons each burn the same amount of oxygen. With the latter, runs on accurately compounded mixtures of ethylbenzene and p-xylene, and ethylbenzene and m-xylene were also performed; with these mixtures, the degree of overlap is small, and calibration coefficients for the FID could be determined without assuming a priori the accuracy of the procedures we were evaluating. The calibration coefficients for the three compounds in the FID varied appreciably according to the operating conditions; values for the ratio A E ~ Bw. , m . x v ~ A p ( m ) - X y l WEtBz

where A is for peak area and w is for sample weight varied between about 1.08 and 1.00 on different occasions. In order to avoid the possibility of any segregation of components, no splitter was used in the inlet to the capillary column, and thus samples were restricted to the submicrogram region; for consistency, also, similar samples were used with the packed columns. Samples of the order of 10 p1 liquid of a given mixture were injected into a clean, nitrogen filled bottle fitted with a serum cap, thus forming an unsaturated vapor. Samples of unsaturated vapor were then extracted for injection into the chromatographs. Chromatograms lasted 10 to 15 minutes, and were digitized onto paper tape by a simple data logger. The digitization rate was accurate and was usually about once every 2 seconds. The data were processed and reached as described above either by an IBM 360/67 computer operated on a time-shared basis, or by an English Electric KDF9 computer on batch. In the former case the use of a remote terminal to control data processing was demonstrated with considerable success. RESULTS AND DISCUSSION

Choice of Variables. Given that the standard has exactly the same shape as the peaks, that there is no noise, and that the data both of the standard and the chromatogram are perfect, fits cannot fail to be perfect; this has been demonstrated previously with synthetic data ( I ) . With real data, however, fits are less than perfect. We have found by experience that the principal defect that can occur is that the exact optimum of all the variables for which the least square residues are minimum can be very considerably perturbed with little increase in residues by even a small perturbation of either the standard or the data. In other words the optimum may lie at the bottom of a valley so shallow that there are other regions considerably distant from the true optimum that are little higher, and any small depression in one of these regions may easily be mistaken by the program for the true 16) G . Burton, A. B. Littlewood, and W. A. Wiseman, “Gas Chromatography 1966,” Proceedings of the Sixth International Symposium, A. B. Littlewood. Ed.. Institute of Petroleum. London, Eng., 1967, p 193. (17) A. B. Littlewood and W. A. Wiseman, J. Gas Chromatog., 5, 334 (1967). ANALYTICAL CHEMISTRY, VOL. 42, NO. 4, APRIL 1970

437

X4

b Figure 1. Typical example of a chromatogram (outer envelope) and peaks fitted to it (inner peaks) traced from a computer printout Upper line represents error in fit on an ordinate scale multiplied by the factor stated. Time axis, left to right

A

Figure 2. Examples as in Figure 1 of program malfunctions (a) Standard of Gaussian shape fitted to chromatogram composed of non-Gaussian peaks (b) Convergence onto false fit (c) Introductionof spurious peak, yet retaining good fit

minimum. The result is that the program either gives the right answer or a grossly wrong answer according to whether the true or a false minimum is found. There is, however, no way of easily distinguishing one from the other. In general, the more variables in the fit, the shallower the true minimum and hence the greater the chance of finding a false minimum. Thus, when operating on ranges of peaks which are short but badly overlapping, as with our Cs hydrocarbons, we have usually reduced the number of variables by declaring only one plate number per range and thus fitting 2p 2 variables. With this set of variables, we have never encountered convergence onto a false minimum except in cases where the data array was so faulty (because of instrumental misbehavior) that no analysis of any sort was possible. Examples of Runs. For purposes of program development, a monitor procedure was employed in which the initial estimates of the parameters, the final values of the parameters after the ultimate iteration, the initial sum of residuals at each point in a range, and the final sum of residuals for each point were printed out. The final fit was illustrated by a

+

438

ANALYTICAL CHEMISTRY, VOL. 42, NO. 4, APRIL 1970

Table I. Data Pertaining to Chromatogram of Figure 1 Initial estimates Final result Ethylbenzene Position of peak maximum 335.8 335.6 Peak height 219.3 249.0 Width. 11.3 9.7 p-Xylene 365.0 366.4 Position of peak maximum Peak height 69.7 79.0 Width 12.3 10.6 m-Xylene Position of peak maximum 377.8 377.8 391.2 403.3 Peak height 12.8 10.9 Width Base line 179.3 163.4 Sum of residuals 7.9 x 104 1 . 6 x 103 a Since the peak shape is non-analytic, the figure for width is arbitrary, and is generated in the program merely to be able to scale the peaks. Thus, it is quoted only so that initial and final values can be compared.

graph procedure showing the experimental data chromatogram, each calculated peak, and their sum, the latter being the calculated chromatogram, Figure 1 illustrates a typical chromatogram traced from the computer print-out. For presentation purposes, a smooth curve has been drawn through the digitized points. In this and Figure 2, the upper line of irregular appearance is drawn as a smooth line through a plot of the error array at the ultimate iteration, (QF, which was also plotted out by the monitor program on a magnified scale. The magnification factor is added to the diagram. A good fit is indicated by the absence of large undulations corresponding in position to the peaks. The higher frequency variations indicate the noise, which can either be inherent in the chromatogram, or due to the effect of rounding-off in digitization and calculation. Table I gives the initial and final estimates of the parameters (in this case, width is also fitted) and the sum of residues. Abscissa units of position and width are in ordinal number of digits of the digitizer; in the case shown there one digit per 2 seconds. Ordinate units of height are in digits of the digital voltmeter. The product of the two can be converted into coulombs in the case of the reaction coulometer to give an absolute quantity. Figure 2 illustrates malfunctions. Figure 2a illustrates a case where a Gaussian standard has been fitted to a chromatogram. The large, regular undulations of the error array indicate error in the peak shape fitted to the data. Figure 2b illustrates a case in which width was retained as a variable

Figure 3. Examples of chromatograms of results contributing to Table 11; time axis, right to left (a) Mixture 3 at 50 "Con RC/packed (b) Mixture 3 at 70 "Con RC/packed (c) Mixture 4 at 60 O C on FID/packed (4Mixture 2 at 50 "C on RC/packed

and in which there has been convergence onto a false minimum. The program has interpreted the doublet as a tall narrow peak superimposed upon a low wide one covering the whole doublet, yet the error array has no large significant regular undulations. Figure 2c illustrates an isolated defect only once encountered, in which the PEAKFIND procedure has found a spurious peak which has, however, been well fitted. This case is discussed below. Overall Evaluation. Mixtures of the hydrocarbons were accurately compounded, and were chromatographed several times on one or more of the four instrumental combinations of RC/packed, FID/packed, RC/capillary, FID/capillary. In some cases also, different chromatograms on any one instrumental combination were made in different conditions, e.g., of temperature or flow rate. For each new temperature a new standard was required, but it was found by experience that, for results on a given type of column at a given temperature, an FID standard could be used for R C chromatograms and vice versa. We found that there were no significant differences either in the error of the mean result or in the average deviations of individual results from the mean for sets of results from any of the four instrumental combinations. Thus, errors and deviations do not appear to have depended on the instrumentation, Given this, therefore, results from the various instrumentations have been included indiscriminately in determining means and deviations. The accuracy of the method was evaluated by comparing the analysis given by the mean of several runs with the composition of the mixture as compounded. Examples of such comparisons are given in Table 11, and some of the chromatograms corresponding to entries in this table are illustrated in Figure 3. Figure 3c is a chromatogram of Mixture 4 of Table 11, and illustrates about the greatest overlap for which PEAKFIND will operate. It is apparent from Table I1 that the errors in analysis are usually less than 1 %. We find in general that the error for the p-xylenelm-xylene analysis is approximately the same as that for ethylbenzene, and is certainly no greater; indeed, in the results shown in Table 11, the m-xylene analysis is the most accurate. Thus, the overlap does not appear to influence the accuracy of the analysis. Since the errors are apparently also not due to the instrumentation, they have some other cause, and this probably lies in our sampling technique. It is widely recognized that requires elaborate to obtain analyses better than about 1 to care, and we have here used only that care which is typical of routine analytical gas chromatography. Our net conclusion is, therefore, that the use of the curve-fitting procedures on overlapping peaks produces no extra error at all in ana-

Table 11. Quantitative Analysis of Mixtures of Ethylbenzene and Xylenes Giving Overlapping Peaks Ethylbenzene pXylene m-Xylene

Mix 1 Correct composition Mean on four determinations on RC/cap and FIDlcap Error Av dev

(z)

31.92 30.98

11.56 11.10

56.53 57.85

-0.94 0.55

-0.46 0.61

+1.32 1.11

15.18 15.84

32.04 31.43

52.78 52.73

+0.66 0.80

-0.61 0.81

-0.05 0.10

61.64 62.92

19.51 18.07

18.85 19.06

+1.29 0.56

-1.44 0.42

$0.21 0.55

45.17 44.98

46.04 46.20

8.79 8.76

-0.19 0.45

+0.16 0.55

-0.03

Mix 2

Correct composition Mean of three determinations on RC/packedand FID/cap. Error Av dev Mix 3 Correct composition Mean of ten determinations on all apparatus combinations Error Av dev Mix 4 Correct composition Mean of four determinations on RC/packed, FID/packed Error Av dev

0.64

lytical gas chromatography as found in ordinary practice. It is not unreasonable to suppose that the use of our data reduction techniques must necessarily add its own characteristic error, but this error, if it exists, appears to be less than Incidence of Malfunctions. When the method fails to give accurate analyses, the reasons are normally associated with the misbehaviors illustrated in Figures 2a and 26. If two peaks are widely separated, the use of a standard with shape different from that of the peaks produces only a small error, but as the peaks overlap more and more, the error from using a false standard becomes progressively greater. Thus, it is necessary for accurate analysis to use a standard whose shape is very close to that of the peaks to be determined. It is well known that the shape of a chromatographic peak depends on its position in the chromatogram; and sometimes it also depends on the molecular composition of the solute, particularly if there is any adsorption. The ideal regimens for the application of the present technique, therefore, is where the ANALYTICAL CHEMISTRY, VOL. 42, NO. 4, APRIL 1970

439

Table 111. Comparison of Use of Different Standards and of Use of Different FID/Capillary Results Actual Ethvl benzene o-Xvlene compn 61.64 19.51 Std A Std B Std A Std B Free Na Const Na Free N Const N Free N Const N Free N Const N 63.27 62.89 61.12 61.80 19.24 18.45 17.00 18.34 Run 1 Run2 63.02 62.89 61.80 61.76 18.75 18.79 17.32 18.88 16.95 18.52 18.14 17.64 63.08 63.23 Run 3 64.38 63.99 19.92 19.13 17.67 18.59 60.52 59.95 Run4 63.65 61.23 18.71 18.75 17.54 18.36 61.63 61.68 Av 63.58 62.75 N is plate-number.

components of interest are of the same general type and occupy the same region of the chromatogram. These regimens apply frequently; for example in the xylenes illustrated here, in the separatiorr of isotopically varied compounds, the separation of close fatty acids esters, etc. In the present context, we found that standards of different sizes did not greatly affect the analyses of xylenes and ethylbenzene, so long as they were applied to chromatograms taken at the same temperature on the same column type. This is illustrated in Table 111, which compares the results of four runs on a given mixture (No. 3 of Table 11) taken on FID/ capillary processed according to two different standards. Differences are of the order of 1%. The misbehavior of Figure 26 occurs particularly commonly with doublets for which one peak is of the order of twice the height of the other, and it is obvious from the error arrays produced by the misfit that there is little difference in residues between the misfit and the true fit. The misbehavior is completely removed by constraining the peak widths by declaring a constant plate number. The error introduced by using a fixed plate number for each range appears in practice to be small for a small range of closely overlapping peaks, which form the only circumstances where it is necessary. This is illustrated for the xylenes by the results of Table 111, for which the runs described in the section on “Incidence of Malfunctions” were compared 1 variables (free plate by being processed both by fitting 3p number) and by fitting 2p 2 variables (constant plate number). It is seen that the differences in analysis are again of the order of 1 %. Curve-fitting techniques should be used only for quantitative analysis if each and every peak to be fitted has been reliably identified before the regression analysis is used to fit parameters to them, since it is almost invariably possible to produce a virtually perfect fit to smooth data by fitting an arbitrarily large number of peaks to it. This important point is illustrated by Figure 2c, which shows an isolated instance in which the procedure PEAKFIND has falsely found a spurious peak appearing just before p-xylene. It is seen from the

+

440

+

ANALYTICAL CHEMISTRY, VOL. 42, NO. 4, APRIL 1970

Numbers of Variables.

rn-Xvlene 18:85 Std A Std B Free N Const N Free N Const N 17.43 18.65 21.82 19.80 18.18 18.32 20.80 19.30 18.61 17.49 18.71 19.07 16.77 19.66 21.75 21.40 17.75 18.53 20.77 19.89

error line that the fit is excellent, and if the percentages for the spurious peak and the real p-xylene peak are added together, an excellent analysis is obtained. However, the spurious peak is not there! Conclusions. It is apparent from the above discussion that malfunctions of the program can be avoided for any data for which PEAKFIND can find all the peaks by suitable choice of standards and the number of variables to be fitted. It is apparent from the general evaluation that the program enables analyses to be obtained without additional error from peaks overlapping to the extent indicated in Figure 3. Since the peaks of Figure 3 are nonanalytic, specification of their degree of overlap in precise terms, e.g., as the ratio of the difference in their means to their standard deviations, is not completely informative. However, for approximate discussion, we can regard them as approximately Gaussian. In this case, measurement shows that the program can handle analyses without apparent loss of accuracy for any overlap corresponding to separation greater than about 1.3 U. Since complete separation adequate for quantitative analysis by simple area determination is normally about 6 u, the use of curve-fitting enables the degree of resolution to be reduced by a factor of 4.5. The relation between resolution and analysis time is complex (18); nevertheless, it is adequate here to consider that resolution is proportional to the square of analysis time for a given column type. If this is so, therefore, the use of curve-fitting secures a reduction in analysis time of up to 20-fold. This, together with the other factors described at the beginning of the paper, recommends the use of curve-fitting procedures in reduction of G C data by computer.

RECEIVED for review August 25, 1969. Accepted December 1, 1969. Work supported by the Office for Scientific and Technical Information. (18) A. B. Littlewood, “Gas Chromatography,” Academic Press, New York, N. Y., 1962, pp 178-182.