Evaluation of Chromatographic Integrators and Data Systems Andrew N. Papas
US. Food and Dng Administration Winchester Enginwring and Analytical
Center Winchester. Mass. 01890
Michael F. Delaney DepartmentOf chemisby &ton University Boston, Mass. 02215
Computer use has increased exponentially in the past decade, as evidenced in analytical chemistry by the increasing number of computer-aided techniques and the development of computer-assisted instrumentation. In manufacturing and process control. computers automate entire proceses. In fact, entire plant facilities are run by computers, remote sensors, robots, and one supervisor. Most problems that could arise in these processes have heen anticipated, and the controlling computers have been preprogrammed to resolve them. As the programming hecomes more sophisticated, however, unexpected or unanticipated conditions can result in “bugs.” Sometimes thew bugs are flagged, whereas others remain undetected. One problem thus precipitated is the validation or quality assurance of the computer-automated final product.
T
General users of computer softwme and hardware, like general users of analytical instrumentation, must be confident that the results are accurate. As with instrumentation, a rigorous qualit y assurance program can he implemented and the results compared with well-known, expected values. With software, however, one must also look at the resulta obtained with erroneous and out-of-range data or conditions. Thus, the overall system should be rigorously exercised through all poasible data combinations. In pharmaceutical manufacturing, there are many efforts under way to define the extent of computer validation required for the regulated industry. Software and hardware vendors are required to perform validation studies internally and to provide these studies to the end user-the manufacturer. If these studies are not available from the vendor, the manufacturer must perform the validation. Quality assurance of the computer-automated final product-pharmaceuticals-is thus validated. In analytical laboratories chromatography is the predominant separation method, with gas chromatography (GC)or high-performance liquid chromatography (HPLC) used most often. Detection schemes vary, hut data re-
.
2
i
available. An instrument as vital as the chromatograph itself remains unchecked. Here we present a study designed to evaluate chromatographic data system and integrators objectively. We describe possible approaches to the problem. the evaluation study protocol, chromatographic parameters investigated, data system optimization schemes, analog and digital data conversion constraints, and some preliminary results. Questions a d d r e d by this evaluation include: Will an integrator evaluate the same raw data in a similar fashion from one time to the next? (repeatability) Do integrators differ significantly in their interpretation of the same raw data? (accuracy)
separations using the electronic integrator it possessed. With very well-be haved chromatography, it was found that peak area relative standard deviations (RSDs) obtained in different laboratories ranged from 1.8%to 4.5%. This represents a sum of the chromatographic variances (pumps, detector, etc.) and the electronic inteerators’ ” variances. To avoid the scheduline Drohlem and data system optimizat&n problems, one could record the Chromatograms on some form of magnetic medium, such as an audio cassette. The fidelity of recording and playing back these chtomatograms must he considered, however. In recording, one creates time vs. amplitude data. For high-precision studies, both the timing and recording
To what extent does the operator’s selection of integration parameters or h e line reconstruction affect the results? (mggednesa) How fast can the peaks elute before the results change significantly? If differences are found, can they he attributed to one chromatographic factor or to a combination of factors that adversely affecta the results?
must he carefully controlled. Conventional magnetic media would not provide the precision required. This experimental chromatogram approach does not allow any preselection of chromatographic factors, such as peak width, skewnesa, and resolution or the independent control of each; it would require a massive data base of preliminary chromatograms for final selection. Additionally, characterizing peak skewness, signal-to-noise (SM) ratios, true base line construction, eic., would be difficult. Finally, for statistical evaluation purposes the true peak area or parent population is never known. Thus, an experimental chromatographic approach was rejected hecause we could not retain control of all systematic factors and sources of variability. Sirnolation approach. Many previous chromatographic studies have relied on nonexperimental, mathematically modeled chromatography to exhibit various chromatographic effects, such as detector response time, resolution of overlapped GC peaks via leastsquares curve fitting, effects of dead volume and flow rate on peak shape, and analysis of instrumentally distorted overlapping curves, as well as other effects summarized in Reference 2. Unlike ideal chromatographic theory in which peaks assume a Gaussian distribution, these studies make use of the exponentially modified Gaussian (EMG) to simulate more typical experimental data. The EMG, a convolution of a Gaussian distribution with an exponential function, gives a realistic and theoretidy sound representation of skewed chromatographic peaks. (A recent paper [3]has cited some confusion over the published EMG equations.) A simulation approach was taken in
-
-
Deslgoltheevaluatknsbnty To conduct this study, we needed a highly reliable set of test waveforms that could he described in chromatographic terms. These stored chromatograms had to he easily reproducible and repeatable. They had to test the ability of the integrators to handle normally encountered chromatographic conditions. It was important that the process of reproducing the chromatograms introduce as little variability as possible. Finally, the study had to use commercially available equipment so that this validation process could be easily duplicated hy others. Several approaches were considered for reliability and practicality. An experimental approach. From a chromatographer’s viewpoint, the most obvious approach would he to wire the same chromatographic detector’s output to several different integrators and r m r d each data system’s response. One would have to carefully design a chromatographic separation scheme, carefully control the chromatography, then optimize each and every data system for that particular chromatogram. Only the data systems present could he tested, however. A recent collaborativestudy (I)used this approach, comparing the precision of peak areas among laboratories using well-specified, well-resolved chromatography. Each lab performed the same
ANALYTICAL CI-EMISTRY.
this study. We mathematically constructed chromatograms hy specifying peak retention times, peak width, tau value (the time constant of the exponential peak modifier), peak areas, and base line construction, thereby maintaining complete control of chromatogram parameters. The EMG ratio (tad sigma) controls the level of skewness (4) where its relationship to measured peak asymmetry has heen previously characterized (5). This apptoach circumvents all three of the previous problems: The integrators need not he present simultaneously since the test chromatograms can be exactly repeated; computer-generated chromatograms have exactly known true peak areas; and computer generation provides complete freedom of choice and independent control of the chromatographic parameters (number of peaks, peak shapes, noise, drift, overlap, and underlying base line). Time has restricted the scope of the chromatography simulated here. Negative peaks occur too infrequently to warrant attention. Peak area quantitation instead of peak height was evaluated because there is more variability observed with peak areas (6). Irregular base lines, such as a rectangular rise and fall, that are not commonly observed were not considered. Only conditions found in well-behaved, quantitative chromatography were selected. One final point of discusaion centers on how precisely we could evaluate our results. This is limited hy the precision of waveform generation. Chromatographic integrators use an analog-todigital converter (ADC)to transform analog data into digital data. How small a change in analog voltage the ADC can ultimately discern depends on the resolving power of the ADC. Typical ADCs used in data systems have 16-hit resolution, or 1indivisible unit in 65,536 or 0.002%of full scale. In our simulated chromatograms we are transforming digital data into analog data. The resolution of our digital-toanalog converter (DAC) must be equivalent or greater than the data system’s ADC. If we assume all other contrihutions to waveform generation variance are smaller than the DAC resolution, we have the capability of observing a 0.002% variation. Generathg simulated dwomatogams Chromatogram simulation was conducted in two stages. Peak response vs. time profiles were first created using the EMG function, converted to binary data and stored on a data disk, then later sent point by point in a precisely timed sequence to the DAC. This twostep approach was necessary for speed, convenience, and practicality. Calmlating one datum point in real time for output to the DAC a t 1-kHz frequency
VOL. 59, NO,
1, JANUARY 1, 1987
-
55A
requires that the calculation be completed before 1 ms elapses. The waveform generation software also must complete its operation hefore 1 ms elapses. This was not feasible. Chromatogram construction was performed in Pascal on a VAX-11pIsO (DigitalEquipment Corporation) computer. Specification of chromatogram length, base line construction, peak number, individual peak retention time, peak width (sigma), peak tailing fador (tau), and sampling rate was required. A polynomial approximation to the EMG (7)was used, providing a 1ooO-fold decrease in execution time compared with the dosed form of the EMG. Binary conversion of the chromatograms required two steps. The first was to scale the chromatogram so that the minimum = 0 and the maximum = 177777 (octal) or 16 bits full d e , the format required hy the DAC. The seeond was to circumvent the shortcomings of VAX Paseal, which does not allow 2-byte binary data types. Using logical binary operations, two binary data points were forced into the VAX Pascal 4-byte INTEGER word. A sampling rate of 1lrHzwas c h m n as a compromise of large data storage requirements and computation time vs. a better approximation of an analog waveform. Thii sampling rate limited the simulated noise and maximally constructed waveform to 500 Hz, according to sampling theory. Data storage requirements of each 10-min chromatogram at I-kHz sampling rate required approximately 1.15 Mbytes. Approximately 27 chromatograms are currently used. Waveform generation. When simulating a continuous waveform with discrete steps, the time interval between steps must he precisely controlled. Generally a high-precision clock generates a pulse that is used to send data via strobe to the DAC. If there is no buffering of the data at the DAC, there are severe restrictions on what the central processing unit can be doing in between these steps; the process of sending out the next datum value must have the highest priority at the dock interrupt. Because the DAC used did not support buffering, a multiuser environment was counterprcductive. Every h t h seeond the operating system performs overhead operations such as checking queues, updating the clock, etc, that are variable in length. The timing of the resultant waveform would he irregular. A more controlled, single-userenvironment was required. Based on the resources availahle to the project, a PDP-11/23+ 16/22-hit minicomputer under the single-user RT-11 operating system, an Analogic Corporation ANDS5400 data acquisition system 16-bit DAC, and a Data 56A
Translation DT2769 programmable IO-MHz crystal-controlled clock hoard were used for waveform generation. (Other combinations of hardware and software may also he appropriate.) Each millisecond. the clock generates an interrupt, halting lower priority processes. The interrupt service routine then moves a datum point fmm memory to the DAC where it is convert@ to an analog voltage. Where the range of the DAC is 0 4 V, and the input of the integratorsrequires W1 V, an operational amplifier circuit was designed to divide the voltage hy 5.
-
ChaMtogapNcparMalSn
The five parametem that were inveatigated independently and then paired were four levels of background noise, four levels of peak skewneas with a re. sultant loss of resolution, a wide range of peak widths, chromatographic elution speeds, and three examplea of base line disruptions. Each is discussed helow. Most were perturbations of a pure Gaussian test set of 10 peaks separated with a resolution of 2.0. incrementally increased in peak (base) width fmm 4 s to 40 s (Figure la). AU simulated peah were of the anme net peak m. In one aeries of chromatograms,various levels of noise were overlaid onto the Gaussian peaks to observe the direct effect of noise. In another series, varying amounts of peak tailing were simulated, which results in a loss of reaolution (Figure lh-d). These two series show the independent effecta of background noise and peak skewness. The four levels of noise were then paired to these levels of skewing systematically. This coupling tests the ability of the
ANALYTICAL Cl€MISTRY, VOL. 59. NO. 1. JANUARY 1. 1987
integrator to filter noise without miscalculatine skewed waks. Results were compared-with theGawian test set. Capillary speed chromatogramswith peaks of 400-ms base width and varying levels of peak skewness at one level of noise were also constructed. This was the minimal chromatographic speed that we considered routine in capillary GC. Base line diturbanees were formed by taking skewed, noisy chromatograms and adding either a much larger, skewed solvent peak or an underlying broad peak where valley-tovalley base line construction .was required or a large ramp was overlaid. Thus, most normally encountered chromatogram typea were considered. Another set currentlv b e i i developed looks at various levels of -&areas within the same chromatogram to find a minimum quantitation level of small impurity peaks. These results will not he presented here. I n t e g r a t o r testing procedure. Several assumptions were made in testing each integrator. The first is that procedures for correct parameter optimbation come from the manufamuer's documentation. The integrator provides clues in its peak plotting and final report for correct Dlacement of tick markers, printing o f retention times, base line construction, and ways to optimize the'key peak integration parameters. This includes use of a utility normally provided that will monitor the (noisy) base line and report a noise sensitivity value. Where different parametern gave different results, the results using the monitored sensitivity value were reported. Proper shielding is essential to avoid unnecessary noise. In all casea, the
r
I p'
I
1 I'I t
A A A A B B B B C C C C D D D D
A
Mamda*urer
A A A B B B d C Manufachlrer
C D U U
1
30 35
A A A A B B B B C C C C D D D Manufaclurer
A A A A B B B B C C C C D D D D Manufacturer
1
a k u r e 2. Box Dlots of results for manufacturers A, B, C. 40
I
T
II]
ANALYTICAL CEMISTRY. VOC. 59, No. 1, JANUARY 1. 1987
57A
manufacturer’s cabling and cabling instructions were used. Also, where we were simulating realistic chromatography, few if any special base line constructions were necessary. Therefore, in almost all cases, time-programmed special base line construction was avoided. Resub and disarssion To date, we have tested four integrators with several more scheduled for testing. Here we describe trends in performance among those tested. Each chromatogram of 10 peaks was repeated three times, an average and standard deviation were calculated, and the results were graphed. All data calculations were done in RS/l. Gaussian test set. The pure Gaussian test set is a chromatogram of 10 peaks, separated by a resolution of 2.0, and increasing from 4 s to 40 s in base width (Figure la). All peaks have the same net area with no simulated noise present. Results among integrators spanned a wide range; the repeatability (RSDof 6 runs X 10peaks) of manufacturers A to D was 0.024%. 0.031%, 0.44% and 0.0031%, respectively. This repeatability can he compared with the resolution of the DAC, which is 0.00%. AU other chromatogram peak area results were normaliied to the Gaussian areas and presented as percent deviation hecaus6 all chromatographic peaks were created with this same net area, even though the EMG distorted the peak shape in the skewed data set. Simulated noise. Normally distrihuted random noise was generated from the product of two random numbers and then scaled to the desired S/N ratio, representative of typical background or “white” noise. One-half of the sampling rate determined its frequency. Four levels of noise were overlaid onto the Gaussian test set to yield S/N ratios (peak to peak) of 12,s. 6, and 3 for the smallest peak. Where full scale was 1 V, and the smallest peak was approximately KOof full scale, maximal noise was approximately 8 mV, 12 mV, 17 mV, and 33 mV, respectively. Figure 2a shows the results of the four noise levels for four manufacturers, A to D, exp& as percent deviation from the noise-free Gaussian test set. In these box plots, the outer bars indicate the range for the 10 peaka averaged together for the three replicates, whereas the metangular box shows a nonparametric *25% range, and the inner line represents the median. As with all of the box plots, each parameter change is presented in a separate column for each manufacturer. Medians range h m +6% to -3% with an apparent correlation of noise levels to percent deviation for D only. The repeatability of the smallest peak ~ O A ANALYTWL
CMMISTRY.
va.
(not indicated by the graph) is not significantly different from the larger peaks. This indicates that the noise rejection of the systems tested is exceptional, especially when most (not all) integrators center their internal sampling rate around the line frequency of 60 Hz and not at the 1 kHz used. Peak tailing. From the base Gaussian test set, four levels of peak skewness were investigated, characterized by their tau/sigma (t/s) ratio of 0.5, 1, 1.5, and 2.0, corresponding to USP XXI (8) tailing factors of 1.0, 1.2, 1.4, and 1.6. In each case, the calculated resolution deteriorates progressively (R = 2.0.1.6, 1.3, and 1.1, respectively, based on USP XXI criteria). It is important to note that when t/s = 1.5 and 2.0 the peaka overlap, are no longer base line resolved, and become a fused group. Noise sensitivities for each system were set at a minimum as there was no simulated noise present. Among integrators medians range from 0 to -13% (Figure 2h) at the four skewness levels. For manufacturers A, C, and D, the medians center around zero, whereas B shows large negative deviations with overlapped, skewed peaks. These large variations can he explained by oberving the h e line construction. Normally, the fused group is treated in the following manner: Vertical limes are dropped in the valleys, a base line is drawn for the entire fused group, and the areas are summed between the verticals (Figure 3a). Instead of treating
59. NO.
I,
JANUARY 1. 1987
these peaks as a fused group, the software of B considered some of these valley points as ending base line point (Figure 3h), causing large differences among peak areas. Every effort was made to adjust parameters beyond those recommended to force the system to treat these chromatograms as a fused group but with no success. This base line construction of B was consistent throughout this evaluation study. T a i l i n g a n d noise combined. Twelve chromatograms of 10 peaks were investigated using three levels of noise (S/N= 9,6,3)and the four levels of peak skewing (t/s = 0.5,1.0,1.5,2.0). In preliminary testing, chiomatograms with a S/N = 12 matched the results of S/N = 9 , w in the interest of time the chromatogram with S/N = 12was eliminated. The levels of noise combined with the levels of peak skewness showed little effect beyond those already observed independently (Figure 2 4 . One item to note is the performance of manufacturer A. As with all integrators tested, A required that the noise sensitivity parameter be increased as the S/N decreased from 9 to 6 to 3. When this happened, however, it began to perform similar to manufacturer B’s integrator by confusing valley points with base line points, as evidenced hy an increase in the variability. This base line selection by A was consistent throughout the noisy, skewed test set. Base line construction. Three base line disturbances were used. The first
overlaid a large, broad peak in the middle of the chromatogram, as would occur when a highly retained compound elutes during a later chromatogram. A second consisted of a linear ramp at 20% of full scale to mimic an ideal temperature gradient rise. The third base line irregularity placed a large, tailed peak 20X the peak height of the first peak to simulate a solvent peak. In all cases, the unperturbed chromatogram had a S/N = 6 and t/s = 1.5, which was shown above to reveal differences among manufacturers. In the case of the underlying solvent peak, all integrators recognized the first peak as a solvent and tangent skimmed the remaining peaks. All had difficulty recognizing the smaller peaks; further investigation is needed to show any trends definitively. In the case of the ramped base line, all were capable of recognizing the rising base line and were able to quantitate the peaks. One unit, however, failed in many cases to recognize an ending base line before the ramp dropped to its initial level. It incorporated the ramp into the base line and caused large errors. In the case of the underlying broad peak, it was necessary to force a valleyto-valley base line construction with all data systems. When allowed to run through their normal routines, all integrators found proportionally larger peak areas for the middle peaks where the underlying peak eluted. There is no reasonable way that present-day integrators can recognize the underlying peak, so the burden for correct operation remains with the chromatographer in this case. Fast-eluting chromatographic peaks. With a S/N = 6 and four baseline-resolved peaks (t/s = 0.5, 1.0, 1.5, and 2.0) at a nominal 400-ms peak width, the reproducibility of manufacturers A-D was 0.54%,0.034%, 0.17%, and 1.06%, respectively. There was no correlation of this repeatability with the Gaussian repeatability. All integrators flagged these peak areas as being potentially erroneous because of the 400-ms peak width, even though all data systems were set to sample at their maximum rate. Peak widths between this 400-ms value and the 4 s of the Gaussian series are currently under investigation. Integrator parameter effects. Correct integrator parameter selection (sampling rate, anticipated peak width, and noise sensitivity) is paramount for correct base line construction, yielding correct peak areas. Guided by peak tick markers, base line codes, and any other clues provided by the manufacturer, parameters were extended beyond their normal ranges to observe their effects on peak quantitation. In most cases, large deviations
were observed only with very obvious warnings of improper settings. Manufacturer D, however, did not provide clues of improper settings when parameter changes caused >40% error. In fact, we were guided in selecting its parameters only by knowing the results beforehand and not by warnings or guidance in the documentation. Chromatographers do not generally have this a priori knowledge, so that parameter optimization here is extremely difficult. COnClUSiOn Gaussian waveforms may provide a means of evaluating the precision of a data system, but they do not indicate how well the system works with less than ideal chromatography. Noise does not independently appear to affect peak quantitation of Gaussian peaks. Differences among manufacturers occur in base line construction with noisy, fused, skewed (EMG) peaks, which affect peak quantitation. This may present a problem if comparisons are not made with a calibration run having the same noise levels and skewness or if different integrators are used for external standardization. How prevalent these trends are will be known only after additional data systems are evaluated.
Acknowledgment The authors would like to acknowledge the Boston University Department of Chemistry for allowing us unrestricted use of its computer facilities while this study was being conducted. We would also like to acknowledge the members of the Massachusetts Institute of Technology DEC Local Users Group, especially Garth Fletcher of Fletcher Applied Science, for advice and counseling on DEC minicomputer nuances. This work is being continued a t the University of Lowell Department of Chemistry, Lowell, Mass.
References (1) Pauls, R. E.; McCoy, R. W.; Ziegel, E. R.; Wolf, T.; Fritz, G. T.; Marmion, D. M. J . Chromatogr. Sci. 1986,24,27377. (2) Foley, J. P.; Dorsey, J. G. J. Chromatom. Sci. 1984.22.40-46. (3) Hanggi, D.; CG,P. W. Anal. Chem. 1985,57,2395-397. (4) Delanev. M. F. Analyst London 1982. (5) Barber, W. E.; Carr, P. W. Anal. Chem. 1981,53,1939-942. (6) Kbiniak. W. J. Chromatogr. - Sci. 1981. 19,332-37: (7) Foley, J. P.; Dorsey, J. G. Anal. Chem. 1983,55,730-37. (8) The United States Pharmacopeia, 2lst ed.; U S . Pharmacopeial Convention, Inc.: Rockville, Md., 1985. This work was sponsored by the US. Food and Drug Administration (FDA) Science Advisors Research Associate Program. Reference to any commercial materials, equipment, or process does not constitute approval, endorsement, or recommendation by FDA.
l
l
l
l
l
l
l
l
l
i
l
CIRCLE 73 ON READER SERVICE CARD
ANALYTICAL CHEMISTRY,
'OL. 59, NO. 1, JANUARY 1, 1987
59A