Computer-assisted gas-liquid chromatography

(Figure 1) to simulate outputs of typicalreaction monitor- signal modifier systems. T able I shows the results of measure- ments of input rates from 5...
1 downloads 0 Views 687KB Size
RESULTS

The reciprocal time computer was tested by applying voltage signals of known input slopes to the voltage interval sensor (Figure 1) to simulate outputs of typical reaction monitorsignal modifier systems. Table I shows the results of measurements of input rates from 5 mV/sec to 20 V/sec. The instrument was calibrated with a 50 mV/sec input rate by adjusting Pz(Figure 3) to give the desired readout. This adjustment is necessary because of component tolerances in the linear sweep generator. The entire range of input rates shown in Table I was measured with only one instrument change. For rates > lV/sec, the scaling factor F and integrator input voltage Em were changed by factors of ten. At the highest rate measured, the actual time interval (At) was 5 msec. With this short Ar, the time interval to voltage converter output was only 5 mV. Faster rates could be measured by increasing Ei,. However, the analog gates used to switch Et, (Figure 3) are only capable of switching up to 5 V. Table I1 shows results for the determination of phosphate. For this procedure, the input voltage to the reciprocal time

computer had a negative slope, and the NAND gates shown in Figure 2 were used. With the 100% transmittance set at 1.000 V, recorded curves enabled the selection of VI and AV. Vl was set at 0.975 volt and AV was adjusted to be 25 mV. Because of the slowness of the reaction, Et, was reduced to 50 mV to avoid limiting OA1. Potentiometer PZ(Figure 3) was adjusted to give a direct concentration readout using a phosphate standard of 5 ppm P. At the lowest concentration used, the measurement time At was about 120 sec. The results shown in Tables I and I1 are typical of many which have been obtained. The reciprocal time computer was constructed using the same digital modules as used in the fixed time readout system described earlier (2). By a simple change of circuit cards and intercard connections, the same basic modules can be used for both variable time and fixed time methods. This allows the user of reaction rate methods to have both automated readout systems readily available for use with different reactions.

RECEIVED for review March 14, 1969. Accepted April 11, 1969.

Cornputer-Assisted Gas-Liq uid Chromatography H. M. Gladney, B. F. Dowden, and J. D. Swalen International Business Machine Corp., Research Division, San Jose, CaIq. 95114 The use of computers for data acquisition and analysis in gas-liquid chromatography is becoming increasingly widespread. We describe an implementation of computer-assisted chromatography within a generalpurpose time-shared laboratory automation system. Particular attention is paid to a method of avoiding timing conflicts at the computer, to an inexpensive method of providing a computer-to-instrument interface, and to an economical method of curve-fitting to resolve overlapping skewed peaks. It is anticipated that many of the methods and some of the computer programs will be directly applicable to various spectroscopic experiments.

THERECENT IMPLEMENTATION in this laboratory of a timeshared laboratory automation system for spectroscopic-type instruments ( I ) has made possible on-line data collection and computation for a high-sensitivity gas chromatograph. We wish to describe our system which includes two novel features: a simple and inexpensive interface for time-shared communications between an instrument and a remote computer, and a method for the resolution of overlapping skewed peaks, involving a least-squares curve fitting procedure. In Figure 1 is shown a simple calculated curve compared with the observed chromatogram. In general, the methods used to schedule the communications between a process-control computer and several instruments are largely selected by considerations of economy and of the relative requirements of the instruments being connected. The choice of method becomes particularly difficult if the full requirements of the laboratory cannot be anticipated-which is usually the case. It is also conceded that an extremely desirable, if not absolutely necessary, feature be that each experiment can be programmed and run inde(1) H. M. Gladney, J. Cornp. Physics, 2, 255 (1968).

Figure 1. A poorly resolved chromatogram of 30-60 O C petroleum ether showing computer fitted peaks

pendently. From the point of view of the experiment and experimenter, it is perhaps conceptually simpler if the computer is enslaved to the laboratory instrument. However, at the computer, this arrangement implies not one master, but several, usually with conflicting requirements. There are at least three possible resolutions of the problem: To dedicate a computer or a sub-computer (commonly called a data channel) either permanently or temporarily to each experiment (2). To employ a computer which is so fast relative to the time-sensitive experiments that the unavoidable inter(2) T. R. Lusebrink and C.H. Sederholm, ZBM J. Res. Develop., 13,65 (1969). VOL. 41, NO. 7, JUNE 1969

883

ferences become negligibly small (3). To avoid guaranteeing the user precisely controlled timing. The last possibility becomes viable when it is recognized that it is generally not necessary to sample at equal time intervals, but only to know, a posteriori, when you have sampled. It is not the purpose of this paper to argue the relative merits of the three approaches, but only to show that the last one, which has not been heretofore exploited, works very well even for our time-dependent experiments, in which the sampling interval varies between 0.5 and 2.0 seconds with allowable but infrequent delays resulting in the loss of up to four data points. Data reduction for gas chromatography can be considered in two consecutive operations : estimation of integrated detector response for a number of peaks and comparison of those peaks with standards for qualitative and quantitative analysis of the sample’s components. The arithmetic of,the second stage is trivial, although the interpretation of areas as concentrations is still only imperfectly understood (4). In any case, in this laboratory the anticipated variety of samples is large and the expected duration of interest in any particular sample is short, so that the planning for cataloging multiple samples has not yet seemed worthwhile. Accordingly, discussion will be restricted to a method of resolving overlapping peaks. Until now, changes in the sign of the first derivatives of the absorption have been used to detect multiple peaks, followed by ad hoc methods for estimating the area of each peak of the group. This procedure seemed to us to be inadequate and unsatisfactory. Differentiation is notoriously sensitive to high frequency noise, even if analog and digital filtering have already been applied to the signal. Another way to look at the problem of using derivatives is that their determination depends only on a few points in the neighborhood of the extrema1 points. The bulk of the data, which one has presumably taken some pains to collect, is, for the most part, simply ignored. Of more import, however, is that, if two peaks overlap, the estimate of the area of one, particularly if it is the smaller of the two, is sensitive to the amplitude of the other. Now, least squares methods of resolving overlapping peaks are well known (5, 6) and are relatively free of the problems cited. It seemed to us worthwhile, therefore, to investigate whether they could be made reliable, completely automatic, and fast enough to be feasible on a relatively small process control computer working in real time. In addition we propose a new line shape function, a convoluted Gaussian distribution, which has the desired shape, is valid over the complete range of variables, and requires only four parameters (amplitude, retention time, line width, and asymmetry) for acceptable fitting. Data Collection and Control. Direct connection of a gasliquid chromatograph to a digital computer is no longer novel. However, a brief description of our hardware and software interface is perhaps of interest. The output signal of the Varian Aerograph 1200, prior to attenuation is applied to an automatic ranging device (7) capable of switching over three decades. The output signal from this, together with the ranging information are transmitted to unfiltered, low-level analog inputs of the 1800 ADC. (3) IBM 1800 Gas Chromatograph Monitoring Program 1800-

23.5.001.

(4) W. 0.Caster. P. Ahm. and R. Pogue, Chem. Phys. Lipids, 1, 393 (1967). (5) . , W.0.Keller, T. R. Lusebrink, and C. H. Sederholm, J. Chem. .

I

Phys. 44,782 (1%).

(6) H.Stone, J. Op. SOC.Amer., 52,998 (1962). (7) D.Horne, private communication, IBM Research Laboratory, San Jose, Calif., 1968. 884

ANALYTICAL CHEMISTRY

The instrument is removed from the computer by some 500 feet; to minimize noise pick up, shielded twisted pairs are used to carry these low level (0-50 mV) signals. Run conditions may be specified by means of toggle switches connected to a digital input group and action by the computer is initiated by means of a momentary closure switch connected across the terminals of a contact interrupt input. A typewriter at the location of the chromatograph completes the hardware interface. Closure of the interrupt switch causes a program to be loaded and executed which either reads the data switches or initiates data collection, depending upon the position of one of the data switches. Data acquisition is carried out by the submonitor system ( I ) , commanded by a single FORTRAN subroutine call. At the conclusion of the run a second program is queued which upon execution reads the data switches, which are now used to specify the type of data reduction required-i.e., initial peak and area determination, Calcomp plots, and least squares fitting of overlapped peaks. The data reduction is then performed and a report is presented at the typewriter. Resolution and Integration of Overlapping Distributions. As has already been mentioned, we felt it worthwhile to investigate the possibility of using least squares methods for the final data reduction. However, first-order iterative least squares calculations are known to be subject to divergence on occasion; even when they do converge, the process may require an inordinate number of iterations and may not result in the desired solution. It appeared that these potential difficulties might be resolved if fairly accurate starting estimates of the peak descriptors could be provided by the conventional peak searching methods, and if a suitable order for variation of the adjustable parameters could be discovered and implemented by program decisions. Before such a procedure could be considered a suitable distribution function for the shape of a single skewed peak had to be discovered; it was desirable that the function include a minimum number of variable parameters. We chose four numbers, A , w , to, 7, which could be identified as measures of amplitude, width, retention time, and asymmetry, respectively. Criteria for an acceptable calculation method for GLC suggest themselves naturally. The final estimate for the concentration of any component should be independent of the concentrations of interfering components. This- requires, inter alia, that the width and skewing of peaks, which are known to vary with retention time, should not be constrained to be uniform, even for overlapping peaks. If this objective is achieved, the criteria for acceptable separations on a column are significantly relaxed. The calculations must be relatively insensitive to noise in the signal. Curve fitting is much superior to extremum sensing in this respect, In any case, as much analog and digital filtering as is economically possible without sacrificing resolution is indicated. Spurious solutions should be infrequent, and fairly obvious when they occur. The computation must be entirely automatic, with no need for human decisions or intervention to help an iterative method to a reasonable solution. Hand adjustment of interacting parameters, such as employed with some recent analog curve comparison devices, or their digital counterpart, is impossibly time-consuming if large numbers of samples are to be processed. Finally, the method must be economical; on a time-shared process-control computer as powerful as the IBM 1800, which is expected to support a large number of instruments, one or two minutes computation for a record with about 10 overlapping peaks seems an acceptable time.

Figure 3. This least squares fit to an experimental skewed chromatographic peak indicates that the functional form adopted to represent peaks is reasonable

Figure 2. Skewed-Gaussian distributions for varying relative asymmetries Our search for an analytic form to approximate skewed GLC lines started from the well-known fact that for symmetric peaks the Gaussian distribution exp{ -(t - t0)2/2w2] is usually acceptable (8). Functions with ad hoc skewing a(t - to) Kt { exp) -(t factors, such as ( 1 t0)2/2w2)and exp[-(t - t0)2/{2w2 d(t - t o ) ] ]were considered. However, the former has negative regions, and the latter a singularity, so they are not very appealing and, in fact, have been shown by us not to fit experimental curves very well. However, a Gaussian probability distribution convoluted by an exponential decay does not possess such difficulties. Let: l t dt’exp(-(t’ - t0)2/2w2}X v(t, to, w, r) = -

+

+

+

r

exPf -0

- t‘ + to>/.],

(1)

where to is the retention time, w the line width, and r the asymmetry. Its functional shapes, plotted for a series of values of r / w in Figure 2, are certainly evocative of variously skewed GLC absorptions. A least-squares fit, illustrated in Figure 3, to a line from methanol on a 20% SE-30 column at 110 OC was convincing enough to justify the adoption of this line shape function. It may be well to point out however, that most of the calculation methods used can easily be adapted to other line shape functions which might be considered appropriate (9, IO). The useful properties of u(t, to, w, r ) follow directly from its definition and an examination of Figure 3. For small r, u is approximated closely by a Gaussian distribution function: Urn

7.+o

Lft, to, w, r ) = exp(-(t

- t0)2/2w2}

J-=

1 and where erf(x) = __ fi

exp(--x2)du and

m

is the error integral or probability integral normalized on the full range of the abscissa (11). If erf(x) is calculated by linear interpolation into a table of about 20 unequally spaced values, u can be sufficiently accurately calculated by Equation 3 for r/w > 1.O. In the intervening region of r / w , for convenience, our algorithm combines a delayed Gaussian curve on the leading edge with values calculated by Equation 3 on the trailing edge. Further, the area off(t) = Au, where A is the amplitude of the peak, can be determined by integration in closed form

s:m

dtf(t) = f i r Aw = 2.506 Aw

with a result which is independent of the asymmetry. For least squares calculations, derivatives of the distributions with respect to the parameters are required. Trivially,

For small r , or on the leading edge of the curve, the derivative with respect to to may be approximated as the derivative of the limiting Gaussian

(2)

and for our purposes can be so estimated when r/w < 0.4. For larger values of r , the integral must be evaluated using the simple expression:

but this approximation is considerably improved upon by replacing the Gaussian on the right by the distribution function itself. (This is also very convenient as a calculation algorithm.)

(8) J. C. Bartlett and D. M. Smith, Can.J. Chem., 38,2057 (1960). (9) E. J. Levy and A. J. Martin, Pittsburgh Conference on An-

alytical Chemistry and Applied Spectroscopy, Cleveland, Ohio, March 3-8, 1968. 41, 37 (10) R. D. B. Frazer and Eikichi Suzuki, ANAL.CHEM., (1969).

(4)

(7) (11) B. 0. Pierce, “A Short Table of Integrals,” Ginn & Company, Boston, Mass., 1929. VOL. 41, NO. 7,JUNE 1969

885

Table I. Peak Maximum us. Relative Asymmetry Ymax

T/W

1.000 1.000

0.0

0.2

4.0 10.0

0.998 0.971 0.832 0.782 0.738 0.599 0.397 0.202

20.0

0.111

40.0

0.059

0.4 0.6

0.8

1.o

1.2 2.0

similarly

The derivatives with respect to r we estimate numerically, To perform least squares fit of individual lines, it is important to know how many experimental points are necessary for a noise-insensitive fit, how they are best distributed within the line, whether a Newton-Raphson local linearization procedure can be employed, which parameters should be simultaneously varied, and how sensitive the convergence is to bad initial estimates of the parameters. A series of test calculations with a general nonlinear least-squares program (12) on an IBM 360/50indicated that Newton-Raphson least squares converged fairly quickly, particularly if the calculated corrections were somewhat damped at each iteration. The parameters A, w , and to were simultaneously adjusted by fitting about six points on the leading edge and about half way down the trailing edge of the peak (Figure 3). A separate adjustment of T was made by using four points in the tail. The whole procedure, repeated once, was found to give an acceptable fit to the entire line. It was also determined that the A, w,and toconvergence was much faster if the estimate of T was too small rather than if it was too large, A strategy to fit multiple overlapping peaks is suggested by the work of Keller, Lusebrink, and Sederholm (5) in connection with the similar problem for nuclear magnetic resonance. The necessary matrix inversions become cumbersome if all peaks in a group are simultaneously fit. However, at any stage in the calculation, estimates of the parameters of all peaks are available, so that one can correct the intensity for a single peak of interest by the expected overlaps of all other peaks before undertaking a fit of that peak alone. The accuracy of this procedure is enhanced if one weights the corrected observations by their own values. By ordering the calculation with the largest peaks first, three iterations of the entire procedure can give excellent results. Good initial estimates of A and to are available for each peak from the coordinates correspondence to a local minimum of the second derivative of the observations. Because these coordinates are, however, a function of the asymmetry, r, (large T values results in a delayed peak and reduced amplitude as shown in Figure 2), corrections are necessary. The delay of the peak may be approximated by t,,,

=

to f Max (0.5

T,

2w)

(9)

(12) E. T. Whittaker and G . Robinson, “The Calculus of Observations,’’ Blackie & Son, Ltd., London, 1924. 886

ANALYTICAL CHEMISTRY

where Max(x1, XZ)is a function which selects the larger of or XZ. The peak amplitudes, y, are listed in Table I for various asymmetries for the normalized function v. The width of one peak moderately well separated from its neighbors can be estimated by locating the point of half amplitude on the leading edge. For the other peaks, the approximation that the observed width is nearly proportional to the retention time has been exploited. This is particularly effective if the width estimate for the Nth peak fit is taken from the least squares value for the ( N - 1)th peak. No such self-evident procedure exists for initial estimates of T , so we adopted an analogous procedure. Data Reduction Programs. In the preliminary data reduction program, following recovery (from disk file) of the intensities and the time intervals between successive readings and the correction for analog-to-digital converter d.c. offset, the data are interpolated to equal time intervals and the range factors for automatic gain control are included. If a digital plot of the input data has been requested, it is presented at this time. Smoothing with the 9-point 3rd degree polynomial fit (12). XI

+

Yi’ = 59Yi f 54(Yi-1 f Yi+d ~ ~ ( Y cfz Yi+z) f 14(yi-a f Yi+3) - 21(Yi-4 f yi+4) (10) is performed at every even numbered point. The intervening odd numbered points are no longer used but note their large contribution in Equation 10. The background reduction program compensates for a linearly varying instrument drift by locating line-free regions near the beginning and end of the record, estimating a baseline and subtracting it from each datum. At the same time this program estimates the rootmean-square noise. “Interesting regions” containing a line or a group of overlapping peaks are identified as sets of contiguous points whose amplitude exceeds twice the noise plus about 20% of the mean background correction. Fortunately, noise spikes are quite effectively differentiated against. Over each “interesting region” the smoothed second derivative correct to third differences is calculated at alternate points. Peaks and valleys are identified as minima and maxima in this quantity-changes in sign of the derivative are acknowledged only relative to a threshold about 40 times the noise. Peaks too close together (