828
Anal. Chem. 1983, 55, 626-629
Pattern Recognition for Identification and Quantitation of Complex Mixtures in Chromatography Robert E. Lea Entek, Inc., 1015 South Louisiana. Little Rock, Arkansas 72202
Randall Bramston-Cook' Varlan Assoclates, Instrument Group, 990 I Paramount Boulevard, Downey, Callfornla 90240
Janlce Tschlda Varlan Associates, Instrument Group, 4665 Klpllng Avenue, Wheatrldge, Colorado 80033
Numerous appllcatlons In gas and llquld chromatography require the ldentlflcatlon and quantltation of mlxtures of compounds whlch together constitute a dngle entity. Thk analysls may be compllcated by Interference with one or more peaks by other compounds that ordlnarlly are not part of the mbtture. A method Is descrlbed which uses currently avallable chromatographlc data systems to provlde pattern recognltlon, quantttatlon, and ellmlnatlon of Interferences by almple programmlng technlques. The method Is Illustrated by Identlflcatlon and quantltatlon of some polychlorlnated blphenyl (PCB) mlxtures.
Both gas chromatography and liquid chromatography have been in routine use for determination of specific single components in a sample matrix. Concentrations are calculated by the ratio of standard responses of the detector to the response when the unknown sample is chromatographed
where C, is the concentration of a single unknown component and C,is the known concentration of the single standard. A, and A, are the corresponding computed areas. Retention times are normally used to identify specific compounds in mixtures. If a chromatographic peak does not match the retention time of a known compound, the peak can neither be identified nor quantitated without severe assumptions on predicting retention times and detector responses ( I , 2). By use of classical chromatographic methods in analyzing complex mixtures, each peak is required to be identified by retention time, and a standard must be created to calculate the unknown concentration by the equation
where C, is the total concentration of the mixture, COiis the standard concentration for each component i, and Ab and Aa are the computed areas for each unknown peak and its corresponding standard, respectively. This method requires extensive analysis of the mixture, frequently involving mass, infrared, or nuclear magnetic resonance spectrometries, and synthesis of pure standards of individual components (3). This complete analysis is often beyond the capabilities of many analytical laboratories. Even if this kind of analysis were undertaken, so many peaks may be involved that coelution of compounds may occur, prohibiting proper quantitations. Analysis would be improved significantly in gas chromatog 0003-2700/83/0355-0626$01.50/0
raphy with capillary columns, since peaks would be more fully resolved ( 4 , 5 ) ,and yet the analytical problem still would be very complicated since additional peaks would have to be identified and quantitated. Another method used to quantitate mixtures is measuring total area of the standard mixture and comparing that total to the total area of the unknown to determine concentration (2)
I
I
(3)
This method assumes overlapping, unrelated peaks are absent-a difficult requirement in most analyses (6). Both of these methods for mixtures require visual inspection of the chromatogram to determine the identity of the pattern prior to quantitation (5). This may require repeated injections at different attenuations to produce on-scale, significant peaks, even though no new quantitative data are generated by the data system. The method described here is a straightforward, reliable mechanism for simultaneous identification and quantitation of mixtures without characterization of each component. This method reduces operator judgement in the identification of the mixture and properly eliminates interfering peaks due to possible contaminants. This simplified numerical pattern recognition is used to compare unknown mixtures to known standards. Individual chromatographic peak areas of the standard mixture are mathematically ratioed with the corresponding areas of the unknown peaks at the same retention times
(4) where C, is the total concentration of the standard mixture. A perfect match of patterns would result when the reported ratios, C,i, have consistent values for all peaks. The extent to which ratio values agree provides an objective measurement of the match. Statistical treatment of the results gives an assignment of a confidence level for the match. The resultant ratios would have a scattered distribution with no regions of consistency if the unknown mixture is not a match with the standard mixture. If a match is confirmed, concentration of the unknown mixture, c,, can be reported as the average of the reported values used in the match
(5) 0 1983 Amerlcan Chemlcai Society
ANALYTICAL CHEMISTRY, VOL. 55, NO. 4, APRIL 1983 627
If several complex mixtures have been combined, consistent ratios are observed in chromatographic regions that are unique to each mixture. Overlapping regions of peaks for both mixtures will have reported ratios that are inconsistent and meaningless. Examination of the unique regions permits combinations of mixtures to be both identified and quantitated. EXPERIMENTAL SECTION The Varian Vista 401 chromatographic data system (Varian Associates, Walnut Creek, CA) has been used to illustrate this simplified pattern recognition method. The demonstration has been performed both with synthetic chromatogramsand with an authentic analytical problem. In order to more readily illustrate the concept of this pattern recognition method, artificial chromatogramsof mixtures had been created, using the base line subtraction routine of the data system. Manual entry of negative values at specific retention times, when subtracted from a flat base line, resulted in positive triangular peaks, suitable for controlled experiments. Secondly, mixtures of PCB were analyzed with the Varian Vista 44 gas chromatographicsystem (VarianAssociates) utilizing a 2-m stainless steel column, l/s in. 0.d. packed with 3% OV-101 on Chromosorb G-AW, 80/100 mesh. Detection was made by an electron capture detector with nitrogen as the carrier gas, set to a flow rate of 30 mL/min. Raw chromatographicdata were stored on floppy disk for later recall and remanipulation. The Varian CDS-111 (Varian Associates) has also been successfully programmed to perform this pattern recognition computation. NUMERICAL PATTERN RECOGNITION In the overall scheme for pattern matching, a method is created in the data system for each standard mixture. After the floppy disk containing the raw chromatographic data of each standard mixture is loaded into the data system, the “learn’! mode is executed first, which automatically enters retention times of all peaks to be used in the matching process into the peak table for that method. The “learn’! mode eliminates the necessity to manually enter peak retention times into the peak table. To perform pattern recognition, peak areas from the standard mixtures are divided into the corresponding peak areas of the unknown with matching retention times. The computation is performed in two steps. The mathematical reciprocal of peak area is calculated for each peak of the standard mixture. The “calibration” mode in the data system is used to perform this step
where Ri becomes the reciprocal of the peak area for each component in the standard mixture. When the areas from the unknown mixture are multiplied with the corresponding reciprocals using the “analysis” mode, a match w ill be confirmed if the reported ratios are consistent
(7) Multiplication of the ratios by the total concentration of the standard mixture, C,, results in concentration of the unknown mixture, Cxi, being reported. Linkage of data system methods, each specific to individual standard mixtures, results in multiple reports being automatically generated. The report with consistent results for the most peaks in the unknown chromatogram provides identity of the match. Ratios may not be perfectly consistent due to sample or chromatography imperfections. RESULTS AND DISCUSSION Artificially created chromatograms to demonstrate the principle of this method are illustrated in Figure 1. Each of
Figure 1. Artificially generated peak patterns illustrate the method of pattern recognition. Three standard patterns are used to compute retentlon times and factors to represent that mixture.
1 1
Flgure 2. Computation of an unknown artificial sample against the standard patterns shown in Figure 1 results in consistent reported concentratlons matching pattern B (except for one reported value). (See Table I.)
Table I. Results for Areas of the Unknown Artificial Sample Divided by Areas for Each Standard Pattern peak no.
peak name
result ratio
time, min
1 2
six seven
Pattern “A” 0.49 0.53
3.50 4.00
1 2 3 4 5 4 5
Pattern “B” 0.74 seven 0.71 eight 0.75 nine 1.63 ten 0.72 Pattern “Cy’ nine 2.45 ten 0.48 six
3.50 4.00 4.50 5.00 5.50 5.00 5.50
the three standard patterns (A, B, and C) yields a set of computed reciprocal values that are then stored in the method for that standard. When these reciprocals are multiplied with the corresponding areas of the “unknownn (Figure 2), 80% of the unknown peaks have very consistent reported ratios
828
ANALYTICAL CHEMISTRY, VOL. 55, NO. 4, APRIL 1983
Flgure 4. By computing concentrations for peaks found in an unknown sample from response factors of standards (determlned from data In Figure 3), the unknown Is matched with Arocior 1260. (See Table 11.)
Table 11. Results for Comparison of the Unknown PCB Sample and the Three Standard Aroclor Mixtures peak no.
I
I
Flgure 3. Standard mixtures of three common PCB mixtures are assigned individual data system programs. Retention times and response factors are determined for peaks In each mixture. These values are stored for later comparlson with unknown samples.
when compared with pattern B (Table I); whereas, the other reports indicate inconsistent ratios (pattern C) or a small percentage of peaks (40%) that match in retention times (pattern A). Several data treatments are tried on the ratios obtained from the matching comparisons. The objective is to reduce operator interpretation to confirm a match. To establish a basis for evaluating the validity of data obtained from these treatments, three major criteria are identified-reduction of operator judgement, proper elimination of interfering peaks due to possible contaminants, and assignment of a confidence level for a match. A visual inspection of chromatograms for identification of mixtures does not fulfill these requirements. The matching is very dependent upon the operator’s judgement. Major peaks will dominate the matching process. Multiple comparisons become tedious, and a numerical confidence value cannot be assigned. Calculation of the relative standard deviation of the resultant ratios of peak areas from the pattern recognition results can relate the consistency of a match. If one peak ratio were eliminated from the comparison of the “unknown“ with pattern B, a low relative standard deviation of 2% for the results indicates a match. Since the four matched peaks in the “unknown” corresponded in retention times to four of five peaks in pattern B, the two patterns must be a perfect match. Relative standard deviation levels required to indicate a match must be determined empirically from replicate analyses of standards and control samples to correct for experimental errors. A judgement level can then be assigned based on computations and not on operator bias. Utilization of the “Q test” permits statistical elimination of a deviate ratio from the relative standard deviation calculation (7). Ratio of the deviation of the suspect value from its closest neighbor and the total range of all values (difference between the highest and lowest ratios) yields a value of 0.95 from results obtained with the comparison of the %nknown” with pattern B. This value is well above the rejection ratio
result ratio, ppm
time, min
Unknown Sample vs. Aroclor 1242 6 0.32 0.982 9 0.11 1.690 10 0.81 2.518 11 0.07 2.790 Unknown Sample vs. Aroclor 1254 12 0.22 3.080 13 0.20 3.627 14 0.21 4.456 16 0.28 5.321 16 0.48 6.351 18 0.38 7.465 19 1.29 8.688 20 1.08 10.035 21 3.13 12.144 22 1.95 14.379 Unknown Sample vs. Aroclor 1260 0.38 6.809 0.53 7.466 0.40 8.688 0.40 10.036 0.37 12.144 0.40 14.379 0.33 16.403 0.30 19.757 0.31 23.347
17 18 19 20 21 22 23 24 25
of 0.64, based on the averaging of five data points, as defined in the “Q test”. This outside peak is simply dropped from the averaging and the reported mean concentration becomes more representative of the true value. This mathematical treatment of the data permits elimination of apparent ratio inconsistencies due to overlapping chromatographic interferences. If too many inconsistent ratios are involved to be properly eliminated by the “Q test”, a match could be determined by computing the percentage of peaks that are near the median value of the series of ratios. The median is the middle value of an ordered set of values. A large percentage of peaks close to the median value would confirm a match (8). The above data treatments can be applied to the analysis problem of polychlorinated biphenyls (PCB). Figure 3 illustrates the chromatograms for three standard PCB mixtures (Aroclors 1242,1248, and 1260). When the unknown sample (Figure 4) is ratioed with each of the three standards (Table 11),the low relative standard deviation of 17% for the results with the Aroclor 1260 comparison indicates that the unknown is matched with this standard. Also indicative of a match is the high number (92%) of standard peaks found in the unknown sample. The other two comparisons (Aroclors 1242 and 1254) result in very high relative standard deviations (89% and loo%,respectively) and cannot be considered a possible match. Crucial to the matching process is retention time repeatability. Slight shifts in retention time can cause mismatching of corresponding peaks and very inconsistent ratios. To
ANALYTICAL CHEMISTRY, VOL. 55, NO. 4, APRIL 1983
829
Table 111. Results for Artificial Chromatogram Generated by Combining Two Standard Mixtures when Compared with Each Standard Pattern peak no. 1
2 1 2
3 4 5 4 5 6 7 8 9 10 11
12 13
peak name
result ratio
Pattern “A” six 0.67 seven 0.76 Pattern “B” six 1.00 seven 1.00 eight 1.00 nine 1.67 ten 2.49 Pattern “C” nine 2.50 ten 1.67 eleven 1.25 twelve 1.00 thirteen 1 .oo fourteen 1.00 fifteen 1.00 sixteen 1.00 seventeen 1.00 eighteen 1.00
time, min 3.50 4.00 3.50 4.00
4.50 5.00 5.50 5.00 5.50 6.00 6.50 7.00 7.50 8.00
8.50 9.00 9.50
correct for minor shifts in retention time normally encountered in chromatography,major peaks can be identified as reference peaks. The data system assigns a reference peak as the largest peak within a specified time window of ita expected retention time. Retention times for the rest of the peaks are then adjusted in the data system peak table to correct for small time shifts. This correction ensures proper peaks to be matched in the pattern recognition process. Combinations of mixtures can be determined when reported ratios are consistent for regions unique to each mixture. An artificial chromatogramis shown in Figure 5. Table I11 shows consistent concentrations for regions unique to pattern B and to pattern C (see Figure 1). Reported peaks common to both mixtures have very inconsistent values compared with either standard and can be eliminated from the matching. Concentration for each mixture is then computed as the average of the consistent values. Close examination of results for PCB’s presented in Table I1 indicate consistent ratios near the beginning of the chromatogram that correspond to Aroclor 1254. A relatively low relative standard deviation of 33% for the six early peaks provides suspicion that the unknown mixture of PCB’s contains both Aroclor 1254 and Aroclor 1260, with average concentrations of 0.38 ppm and 0.29 ppm, respectively. CONCLUSION This simplified numerical pattern recognition method can greatly assist in identifying complex mixtures which have
Figure 5. Computation of an unknown artificial sample against the standard patterns in Figure 1, resuits In two reports each wkh a region of consistent concentrations. Pattern B is consistent with early eluting peaks and pattern C is consistent with late eiutlng peaks. (See Table 111.)
consistent proportions of unidentified components. This method overcomes several difficulties with existing analytical procedures. Since both positive identification and quantitation are given in the same report, the chromatographer’s interaction is greatly simplified. Judgements are based on objective results of numerical data. Reported concentrations are improved since the results become an average value. However, this method is not applicable when patterns are not consistent between standards and samples from preferential degradation of only a few of the components due to metabolism or volatility differences. The analyst must then revert to quantitation by the summation of individual component concentrations, since the pattern of the mixture cannot be identified. ACKNOWLEDGMENT The authors extend appreciation to Anne Warner for the preparation of this manuscript. LITERATURE CITED Gas Processors Assoclatlon “Method of Analysis for Natural Gas and Similar Gaseous Mixtures by Gas Chromatography”; GPA: Tulsa, OK, 1972 (Revised); GPA Publlcatlon 2261-72 p 8. Association of Offlclal Analytical Chemists “Official Methods of Anaiysls,” 12th ed.; AOAC: Washington, DC, 1975; Chapter 29. Webb, T. G.; McCall, A. C. J . Chromatogr. Scl. 1973, 1 1 , 366-373. Webb, R. Q.; McCall, A. C. J . Assoc. Off. Anal. Chem. 1972, 55, 746-752. Gordon, R. J.; Szita, J.; Faeder, E. J. Anal. Chem. 1982, 5 4 , 470-401. U.S. Environmental Protection Agency “Manual for Analytical Quality Control for Pestlcldes and Related Compounds”; Natlonal Technical Information Service, U.S. Department of Commerce: Sprlngfleld, VA, 1981; EPA 60012 81-059. Waser, J. “Quantitatlve Chemistry”; W. A. Benjamin: New York, 1964; p 83. Lepisto, M.; Bramston-Cook, R., submitted for publication In J . Forensic Sci .
RECEIVED for review November 13, 1981. Resubmitted January 7, 1983. Accepted January 7, 1983. This work was presented at the 1981 Pacific Conference on Chemistry and Spectroscopy, Anaheim, CA, October 20, 1981.