Anal. Chem. 1994,66, 4288-4294
Prediction of TemperaturelProgrammed Retention indexes for Polynuclear Aromatic Hydrocarbons in Gas Chromatography Agneta Bemgird,* Anders ColmsjI, and Kent Wrangskog Department of Analytical Chemistry, National Institute of Occupational Health, S-171 84 Solna, Sweden
The objective of this study was to relate temperatureprogrammed GC retention indexes of nonsubstituted PAHs with calculated molecular properties. Prediction of retention was made by use of a partial least squares regression equation usingtke descriptors, Le., first-order valence molecular connectivity, ionization potential, length, height, and quadrupole moment. Retention index predictability was studied with respect to individual molecular weight ranges. Pericondensed PAHs were more accurately predicted (RMS = 1.6 index units (iu) or coefficient of variation of 0.31%)than catacondensed PAHs (RMS = 5.8 iu or coefficient of variation of 1.1%) with.the descriptor set employed. Polycyclic aromatic hydrocarbons (PAHs) comprise a large group of environmentally interesting and hazardous compounds. The complexity and the comparatively low amounts of these compounds in the environment make it necessary to develop suitable cleanup and separation methods for analysis. However, identificationof many of these substances is difticult due to a lack of pure reference compounds. An attempt to solve this problem would be to find the relationship between gas chromatographic retention and molecular properties. Retention in gas chromatography is a result of interactions between the solutes and the stationary phase and is determined primarily by dispersion, induction, and orientation forces.'B2 For nonpolar stationary phases, the dispersion, or London forces, is considered to be of upmost importance; i.e., the chromatographic retention is related to the solute vapor pressure or the solute boiling point, which has been demonstrated by, for example, Borwitzky and S ~ h o m b u r gBartle ,~ et a1.,4 and White.5 Unfortunately, boiling point data of PAHs are very limited,4especially for the higher molecular weight PAHs. However, boiling points have been found to correlate with molecular structure (topology), and in 1947 Wienet-6 had already determined boiling points of paraftins by use of two topological indexes. After this report, many authors followed and introduced new topological indexes for a wide range of compound^.^-^^ As has been pointed out in several studies, a major disadvantage of the topological indexes is their degeneracy; (1) Keulemans, A. I. M.; Kwantes, A; Zaal, P. Anal. Chim. Acta 1955,13, 357. (2) Vernon, F. Deu. Chromatogr. 1978,1, 1. (3) Borwitzky, H.; Schomburg, G. J. Chromatogr. 1979,170, 99. (4) Bartle, K. D.; Lee. M. L.; Wise, S. A Chromatographia 1981,14, 69. (5) White, C. M. J. Chem. Eng. Data 1986,31, 198. (6) Wiener, H. J. Am. Chem. SOC.1947,69, 17. (7) Hosoya, H. Bull. Chem. SOC.Jpn. 1971,44, 2332. (8) Randic, M. J. Am. Chem. SOC.1975,97, 6609. (9) Kier, L. B.; Murray, W. J.; Randic, M.; Hall, L. H.J P h a m . Sci. 1976,65, 1226.
4288 Analytical Chemistry, Vol. 66, No. 23, December 1, 7994
i.e., isomers tend, in many cases, to obtain identical numerical values. The above-mentioned structural indexes are all possible to calculate only by knowledge of the chemical structure of the molecule. The availability of computer programs makes it possible to energy-minimize molecules in three dimensions, which provides the values for bond lengths, bond angles, planarity, energy, and charge on each atom in the molecule. Thereafter, geometrical and electrical descriptors can be derived as well. Recently, rather rigorous works have been permitted by use of statistical methods to correlate retention indexes (or boiling points) to a variety of descriptors. By use of linear combinations of quantitative structure-property relationship (QSPR),more or less accurate relations to retention have been derived for different types of compound groups such as alcohols and olefin^,'^-'^ PCBs, PCDDs, and PCDFS,'~-~ hydrocarbons,"23-26and PAHs and nitr~-PAHs.3-~3~,~-~~ The results in these studies show good correlation of topological, geometrical, and electrical descriptors vs retention on chromatographic columns. However, the physicochemical meaning with a special computer choice of descriptors is often hard to evaluate. As mentioned by Ong and Hites,20there may be risks with this type of linear combinations, such as random correlation when to many descriptors are used in the correlation data set. (10) Balaban, A T. Chem. Phys. Len. 1982,89, 399. (11) Randic, M. J Chem. Inf Comput. Sci 1984,24, 164. (12) Kier, L. B. Quant. Stnrct.-Act. Relat. Pharmacol. Chem. Biol. 1985,4, 109. (13) Hansen, P. J.; Jurs, P. C. Anal. Chem. 1987,59, 2322. (14) Smeeks, F. C.; Jurs. P. C. Anal. Chim. Acta 1990,233, 111. (15) Armda, A C.; Heinzen, V. E. F.; Yunes, R A j . Chromatogr. 1993,630, 251. (16) Hale, M. D.; Hileman, F. D.; Mazer, T.; Shell, T. L.;Noble, R. W.; Brooks, J. J. Anal. Chem. 1985,57, 640. (17) Donnelly, J. R; Munslow, W. D.; Mitchum, R K; Sovocool, G. W. J Chromatogr. 1987,392, 51. (18) Hasan, M. N.; Jurs, P. C. Anal. Chem. 1988,60, 978. (19) Robbat, A, Jr.; Xyrafas, G.; Marshall, D. Anal. Chem. 1988,60, 982. (20) Ong, V. S.; Hites, R A Anal. Chem. 1991,63, 2829. (21) Makino, M.; Kamiya, M.; Matsushita, H. Chemosphere 1992,25, 1839. (22) Seybold, P. G.; Bertrand, J. Anal. Chem. 1993,65, 1631. (23) Stanton, D. T.;Jurs, P. C. Anal. Chem. 1990,62, 2323. (24) Stanton, D. T.; Jurs, P. C.; Hicks, M. G. /. Chem. Inf Comput. Sci. 1991, 31, 301. (25) Woloszyn, T. F.; Jurs, P. C. Anal. Chem. 1992,64, 3059. (26) Woloszyn, T. F.; Jurs, P. C. Anal. Chem. 1993,65, 582. (27) Lamparczyk, H.; Wilczynska. D.; Radecki, A Chromatographia 1983,17, 300. (28) Kaliszan, R; Lamparczyk, H. J. Chromatogr. Sci. 1978,16, 246. (29) Whalen-Pedersen, E. K.; Jurs, P. C. Anal. Chem. 1981,53, 2184. (30) White, C. M.; Robbat, A, Jr.; Hoes, R M. Chromatographia 1983,17, 605. (31) Doherty, P. J.; Hoes, R M.; Robbat, A, Jr.:White, C. M. Anal. Chem. 1984, 56, 2697. (32) Robbat, A, Jr.; Corso, N. P.; Doherty, P. J.; Marshall, D. Anal. Chem. 1986, 58, 2072. (33) Rohrbaugh, R H.; Jurs, P. C. Anal. Chem. 1986,58, 1210. 0003-2700/94/0366-4288$04.50/0 0 1994 American Chemical Society
In this work, focus has been maintained on finding a relation between different descriptors and retention indexes for nonsub stituted PAHs with molecular range 178-350. Temperature programmed retention indexes, with PAHs as bracketing comp o u n d ~ were ? ~ correlated to a set of physical descriptors by means of multiple linear regression (MLR) and partial linear regression (PLS). The two statistical methods were further compared. As purely statistical methods are used in this process, the quality of the result can only be interpreted by means of predictability. Obviously, a good prediction model is of outermost interest for qualitative chemical analysis. Unknown compounds might thus be identified despite the lack of reference compounds. This technique is especially useful for the identification of polynuclear aromatic compounds (PACs), polychlorinated PACs and other types of compounds where great efforts are being made in order to distinguish between isomeric forms or congeners. Due to a large number of similar compounds in such groups, synthesis is often a very time consuming and difficult way to approach the identification process. EXPERIMENTAL SECTION Chromatography. Temperatureprogrammed retention indexes (RI) for 70 compounds with molecular weight ranging from 178 to 350 (Table 1) were taken from either ref 34 or ref 35 or obtained according to Lee et al. and Vassilaros et a l . , % on ~ ~a~CarloErba Mega HRGC (Milan, Italy) equipped with a flame ionization detector and a 15 m x 0.28 mm i.d. XTI-5 column, having a film thickness of 0.25 pm (Restek Corp., Bellefonte, PA). This column can be considered equivalent to the homemade SE-52 column used in ref 34. As bracketing compounds, phenanthrene, chrysene, picene, benzo [c]picene, and dinaphtho[2,1-~:2,l-h]anthracene were used. On-column injections were performed at an oven temperature of 70 "C, and after 2 min the oven temperature was programmed at 5 "C/min to 350 OC. Hydrogen was used as carrier gas. The 70 P M s , Table 1, were either purchased from Promchem (Wessel, Germany) or obtained from other laboratories (see Acknowledgment). The retention indexes were recalculated and linearized by the use of two index determinators before further calculations. Phenanthrene was given RI = 300 and picene RI = 500. Thus, for the GC system used, chrysene was adjusted to 407.50 and benzo[c]picene to 582.50 in order to achieve linearity between elution time and retention index. Only interpolated index values were used; Le., a bracketing standard must elute on each side of the compound to be determined. Descriptor Generation. The molecular structures were initially created by use of MacMimic (Instar Software, Lund, Sweden), and the molecular geometrical structures were transferred to a molecular mechanics MM2(91) computer program for self-consistent field calculations to obtain geometrically optimized structures by energy minimization. The optimized structural geometries obtained for aromatic C-C bond lengths were 1.3781.462 C-H bond lengths were 1.099-1.103 Bond angels of C-C-C were between 117.6 and 124.3", and for H-C-C, the angles varied between 116.land 122.4". The variations in bond lengths and dihedral angels were dependent on the planarity or nonplanarity of the molecules, i.e., fjord and bay regions as
A;
A.
(34) Lee, M. L.; Vassilaros, D. L.; White, C. M.; Novotny, M.Anal. Chem. 1979,
51,768. (35) BemgArd, A;Colmsjo, A; Lundmark, B. 0.JChromatogr. 1992,595,247. (36) Vassilaros, D.L.; Kong, R C.; Later D.W.; Lee M.L.1. Chromntogr. 1982, 252,1.
Table I. Compound Names, Molecular Weights (MW), and References from Where the Retention Indexes Were Obtained
no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27
28 29 30 31
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
compound phenanthrene anthracene
pyrene fluoranthene triphenylene
chrysene benz[alanthracene benzo [clphenanthrene naphthacene benzo [ elpyrene benzo [alpyrene perylene benzo[u]fluoranthene benzo[b]fluoranthene benzo Lflfluoranthene benzo [klfluoranthene dibenzo [ghi,k]fluoranthene indeno [ 1,2,3-cd]fluoranthene benzo @i]perylene indeno[ 1,2,3-cd]pyrene dibenzo[cg]phenanthrene dibenz[a,c]anthracene benzo[g]chrysene dibenzo [bglphenanthrene benzo [clchrysene dibenz[uj]anthracene pentaphene benzo [alnaphthacene dibenz[u,h]anthracene benzo [b] chrysene picene pentacene coronene dibenzo[a,elpyrene dibenzo[u,h]pyrene dibenzo[a,Zlpyrene naphtho [2,3-ulpyrene naphtho [2,%e]pyrene dibenzo[u,e]fluoranthene dibenzo[aA fluoranthene dibenzo[a&]fluoranthene dibenzo[b,e]fluoranthene dibenzo[b,k]fluoranthene
dibenzob,l]fluoranthene naphtho[1,241fluoranthene naphth0[2,34]fluoranthene
naphtho [2,3-j]fluoranthene naphtho[2,3-k]fluoranthene
benzo[u]perylene benzo[b]perylene rubicene peropyrene naphtho [8,1,2-bcd]perylene dibenzo[eghilperylene phenanthro [3,4c]phenanthrene dibenzo[g,p]chrysene naphtho [2,3-g]chrysene naphtho [1,24]triphenylene benzo[h]pentaphene benzo [c]pentaphene naphtho [ 1,241chrysene dibenzo[b,klchrysene benzo[b]picene benzo[c]picene naphtho [ 1,2-u]naphthacene dibenzo [a,c]naphthacene hexaphene
naphtho [2,1-u]naphthacene benzo [a]pentacene
benzo[u]coronene
Mw
ref
178 178 202 202 228 228
34 34 34 34 34 34
228
34
228
34 34 34 34 34
228 252 252 252 252 252 252 252 276 276
276 276 278
a
34 a
34 U U
a a U
278
34
278
U
278
a a
278 278 278 278
278 278 278 278 300 302 302 302 302 302 302 302 302 302 302 302 302 302 302 302 302 302 326 326 326 326 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 350
(2
a a
34 34
34 34 35 a U
a
a a a U
a a
a a a a a a a a a a a a
35 35 35 35 35 35 35
35 35 35 35 35 35 35 35 35
Retention indexes measured in this work according to the
Exuerimental Section.
Analytical Chemistty, Vol. 66, No. 23, December 1, 1994
4289
well as the five-membered ring content influencing the dihedral angels. The angel of C-C-C in the five-membered ring varied between 104.6 and 135.7". The geometrically optimized molecular structures, defined by the three-dimensional Cartesian coordinates, were entered as input files for the calculation of the physicochemical descriptors. The calculations were made by use of a semiempirical molecular orbital package of MOPAC 6.00, QCPE 455 (F. J. Seiler Research Laboratory, US. Air Force Academy) on a VAX-11/750 @EC) computer running under a VMS 5.4-3 operating system. The semiempirical modified neglect of diatomic differential overlap (MNDO) Hamiltonian was used in the electronic part of the calculations to obtain data for molecular orbitals, heat of formation, polarizabilities, ionization potential, dipole moment, highest occupied molecular orbital (HOMO), and lowest unoccupied molecular orbital (LUMO) . Multipole moments and topological indexes were calculated by structure and charge data from MNDO on a homemade program for IBM-PC. Molecular dimensions were calculated from the three-dimensional Cartesian coordinates after addition of the van der Waals radii of the carbon and the hydrogen atoms, respectively. The dimensions (in A) were obtained by fitting the molecule into a box of maximum length to breadth dimensions, yielding a height value as a measure of planarity. Two types of boxes were created: one based only on hydrogen atoms (Y = 1.20 A) and one based on all atoms (radii of aromatic carbon = 1.85 A). The difference in the boxes will be found mainly in heights, where the box based on hydrogen atoms detects small nonplanarities otherwise concealed by the larger carbon atoms. Computer Programs Used for Regression. All statistics were run on an IBM-PC program that handles PLS statistics38with optimization of number of factors and prediction statistics according to predicted error sum of squares (PRESS) statistics. All data were extracted from a matrix containing 25 descriptor values for 70 compounds. Retention Index Models. Physicochemical variables or descriptors were linearly correlated to the linearized retention indexes by use of MLR or PLS. The validity of a correlation was performed by PRESS statistics, where each retention index is predicted by a model based on the others16 and the root-meansquare errors (RMS) of the residuals give the root-mean-square errors from prediction (RMSP). The results from PRESS statistics give an idea of the prediction strength of the model. As a final test of prediction strength, isomeric groups were withdrawn from the data set during modeling. The obtained model was then used for prediction of the withheld group. Outlier Analysis. Outliers were excluded from the calculations based on their individual deviations from experimental data. The level of rejection was determined by how much a particular compound constitutes of the group to be calculated. 1.e a large group of compounds demands a lower level of rejection than does a smaller group. Outliers were rejected at the 5%level. Units. In all the proceeding calculations, ionization potential was measured in eV, physical dimensions in A and quadrupole moment in C*m2.lO-4? (37) STATED. Multivariate Statistic Software for IBM-PC: Arbetsmiljoinstitutet/ IKA, Solna, Sweden, 1994. (38) Martens, H.; Naes, T. Multiuan'ate Calibration; John Wiley & Sons: New York, 1992.
4290
Analytical Chemistry, Vol. 66, No. 23, December 1 , 1994
RESULTS AND DISCUSSION
Choice of Descriptors. Recently, Ong and HitesZ0presented a reasonable equation relating the retention of PAHs to polarizability, and ionization potential and the square of the dipole moment, of which polarizability was shown previously to correlate linearly to retention.27 All these descriptors can further be explained by chromatographic theories, although dipole moments are very small for all compounds in this study, yielding an error of calculation in the same range as the calculated dipole moment. The model presented by Ong and Hites was also applied to the 70 PAHs in our work and compared with a similar model, but instead with the square of the dipole moment replaced by the quadrupole moment.39 This exchange resulted in approximately 15%reduction of the RMS and the RMSP. The predictability of the model was in the same range as reported earlier,20 Le., a standard error of the mean of approximately 2.4%. From this the conclusion was drawn that the quadrupole moment could be a possible descriptor in the assessment of gas chromatographic retention of PAHs. Molecular connectivity indexes 01) have previously been correlated to retention and boiling points of PAHs and nitrated PAHs.5.28~29~31~32 First-, second-, and third-order valence molecular connectivities were thus added to our descriptor set. Finally, the molecular shape, mostly described by length to breadth ratio (LIB or q), has been shown to correlate linearly to retention in both gas-40and reversed-phase liquid chromatography.4l The molecular shape was taken into consideration by inclusion of the length, breadth, and height (in A) in the descriptor set as well as the maximum length to breadth ratio. Molecular weight can be used as a descriptor, but because of correlation with the first-order valence molecular connectivity and absence of discrepancy for isomers, this descriptor was not included in the data set. Apart from the above-mentioned descriptors, heat of formation, LUMO, number of fiord regions, and number of fivemembered rings were included in the data set. Based on negligible or small prediction properties of individual descriptors or collinearity between descriptors, five were chosen for the final model, Le., Ionization potential (IP), first-order valence molecular connectivity (lx), length (L), height (H), and quadrupole moment (9). The molecular length and height were obtained from the box based on the carbon atom radii. Neither polarizability nor the squared dipole moment fell out as useful descriptors from this descriptor set. Modeling Process. The PLS method is supposed to determine the trends in a material with few data points but many descriptors and should thus be superior to MLR. The number of factors that have to be included in the model is a measure of how fast the trend is located. If the maximum number of factors is used, the PLS method is equivalent to MLR The number of factors used in this work was determined by the first local minimum obtained by PRESS statistics, Le., when the addition of an extra factor to the model resulted in decreased predictability. If, on the other hand, no such minimum was obtained, the maximum number of factors was used (five in this work), making MLR and PLS equivalent. The latter can also indicate that no simple linear relation can be found with the descriptors used. (39) Buckingham, A D. Quat. Reu. 1958,183. (40) Kaliszan, R; Lamparczyk, H.; Radecki, A Biochem. Phanacol. 1979,28, 123. (41)Wise, S. A; Bonnett, W. J.; Guenther, F. R: May, W. E. J. Chromatogr. Sci. 1981,19,457.
Table 2. Regression Equations for Isomeric Groups: I = Io kilP kaiX k3L kqH k3O.C
+
+
Mwc n outliers factors in PLS
IO ki OP)
k5
(Q)
RMS (iu) RMSP (iu) RMS (%)
RMSP (%)
+
+
+
Table 3. Regression Equations for Molecular Intervals: I = IO kilP kZix k31+ kqH k s O . v b
+
302 17 0 3 -1256.8 -12.00 (4.12) 244.24 (88.58) 0.370 (0.23) -7.157 (1.20) -1.042 (5.86)
328 15
Mw
If
outliers factors in PLS
-0.957 (2.91)
278 12 3e 2 -957.18 9.717 (5.33) 156.85 (80.43) 2.190 (2.20) -4.579 (1.27) 1.427 (10.77)
0.60 1.73 0.13 0.38
1.36 2.65 0.28 0.54
0.86 1.27 0.16 0.24
5.20 6.01 0.89 1.03
252d 7 0 4 -2612.4 -15.65 (3.60) 470.6 (93.18) 0.849 (0.31)
n
1 5193.9 -15.36 (2.62) -468.67 (90.79) 1.625 (0.59) -4.207 (0.38) -1.947 (5.62)
Abbreviations: IP, ionization potential; I x , first-order valence molecular connectivity; L, length; H, height; Q, quadrupole moment. * In parentheses are the relative contributions in percent from the descriptors to the retention model of PAHs. For compound identity see Table 1. dThe height (H)was not included in the regression. e Outlier compounds 21, 28, and 32 (Table 1).f Outlier compound 55 (Table 1). (I
Predictability, expressed as PRESS statistics, was always better for the PLS model than for the MLR model in all calculations where less than the maximum number of factors was needed. Thus, only the results for the PLS models are presented. Predictionof Retention Indexes for Isomeric Groups. The obtained relationships between retention indexes of PAHs and the five descriptors for four isomeric groups are shown in Table 2. For all groups the first-order valence molecular connectivity turned out to be the most important factor, contributing by more than 80%to the regression. The physical dimensions, length and height, only contributed to a small extent to the regression equations. However, removal of L and H from the model resulted in significantly increased RMS and RMSP. In general, the contribution of different descriptors to the regression model apart from the connectivity is difficult to interpret. RMS derived from the difference of the actual measured values and the values given by the regression equation was comparatively small. Even the RMS values derived from the prediction models (RMSP) were in some cases small even though the data points are few. For examples, the RMSP for isomers with molecular weight (MW) 302 (n = 17) was 1.27 index units (iu). The best predicted individual residual value was 0.11 iu, and the least accurate was 3.4 iu. Expressed as retention time for the GC system employed in this study, this means that the retention of a PAH with MW 302 can be expected to be predicted with an error of less than 6.5 s in a 40 min chromatogram (RMSP). This can also be expressed as a 0.24% standard error of the mean. Outliers were found in the groups with MW 278 and 328. In MW group 328 compound 55 (Table l),phenanthro[3,4c]phenanthrene was erroneously predicted to be a higher value by more than 40 iu compared with the experimental value. This is the most nonplanar compound within the data set. Nonplanar compounds are known to elute earlier than planar, and the descriptor expected to take this property into account, the height,
Io ki (IP)
RMS (iu) RMSP (iu)
RMs (%I
RMSP (%)
+
+
+
178-350 70 3‘ 5 180.63 -17.631 (15.21) 77.344 (66.20) 1.041 (1.63) -9.116 (3.87) -1.009 (13.10)
178-300 33 3d 5 80.457 -9.693 (10.01) 82.354 (72.42) 0.497 (0.84) -7.389 (3.60) -0.994 (13.12)
300-350 38 5 226.96 -20.319 (15.16) 78.429 (64.54) 0.822 (1.19) -9.069 (3.39) -1.246 (15.72)
5.09 5.70 0.99 1.11
3.77 5.05 0.84 1.13
4.40 5.16 0.78 0.92
le
“Abbreviations: as in Table 2. In parentheses are the relative contributions in percent kom the descriptors to the retention model of PAHs. Outlier compounds 2, 32, and 55 (Table 1). Outlier compounds 21, 28, and 32 (Table 1). e Outlier compound 55 (Table 1).
did not to a full extent explain this deviation. In MW group 278, three outliers were found; i.e., compounds 21,28, and 32, whereas compound 21, dibenzo [cg]phenanthrene is strongly nonplanar and also predicted to be a higher value, probably due to insufficient information in the height descriptor. No simple explanation for the deviation of the two other compounds can be given. A maximum number of factors was needed for the isomers with MW 252, Le., four factors. Due to zero information content in the height descriptor, this was excluded. Only seven data points were available, which were too few compared to the number of descriptors (four in that regression), indicating that no conclusions could be drawn from the results. Apart from the MW 252, optimum was achieved with less than maximum number of factors for all other groups, indicating more reliable results, particularly for the isomers with MW 302. Prediction of Retention Indexes in Molecular Weight Ranges. Regarding molecular weight intervals in Table 3, the accuracy was somewhat lower but the trends were more significant. Different outliers were found when regression was made for the whole interval W 178-350), Le., anthracene, pentacene, and phenanthro[3,4-~1phenanthrene (compounds 2, 32, and 55, Table 1). The maximum number of factors was required in the PLS algorithm in order to obtain optimal predictability. This yielded equivalent regression equations for PLS and MLR The connectivity index is still the most significant factor although slightly less important when compared to the isomeric groups. Thus, the connectivity descriptor may replace the molecular weight in all models discussed in this paper due to strong correlation to molecular weight and a good linear regression between connectivity and retention within isomeric groups. Figure l a shows the correlation between experimental and predicted (by P R E S statistics) retention indexes according to the equation for the total molecular weight interval, shown in Table 3, row 1; and Figure l b shows the corresponding residual plot. Only minor changes in the adapted model was achieved upon Analytical Chemistty, Vol. 66, No. 23, December 1, 1994
4291
Table 4. Regression Equations for Cata. and k d+ t H Pericondensed PAHs: I = lo + krlP + kzlx
+
+
k5Wb
m
cata
pen
n
34 3' 5 114.42 -14.038 (7.38) 118.193 (61.08) 1.873 (1.87) -9.032 (2.43) -3.576 (27.24) 5.80 7.45 1.13 1.45
36 3* 5 166.49 -14.080 (13.25) 73.924 (70.61) 0.146 (0.24) -6.750 (3.02) -0.867 (12.88) 1.62 1.98 0.31 0.38
outliers factors Io
EXPERIMENTAL RETENTION INDEX
ki
UP)
k2
('x)
k3
(L)
k4
(H)
k5
(Q)
RMS (iu) RMSP (iu) RMS (%) RMSP (%)
Abbreviations: as in Table 2. In parentheses are the relative contributions in percent from the descriptors to the retention model of PAHs. Outlier compounds 2, 32, and 55 (Table 1). Outlier compounds 3, 4, and 53 (Table 1).
MOLECULAR WEIGHT
Figure 1. (a, top) Predicted (by press statistics) vs experimental retention indexes for the total molecular weight interval 178-350 and (b, bottom) the corresponding residual plot.
division of the data set into low (178-300) and high (300-350) molecular weight intervals, suggesting similar chromatographic behavior for low- and high-boiling compounds.
Prediction of Retention Indexes for Cab- and Pericondensed PAHs, Respectively. From the residual plot in Figure lb, it was observed that the equation fitted better for some isomers than for others; e.g., compounds with molecular weights of 252, 276, and 302 were predicted with higher accuracy than the rest. Due to the relatively good predictability of the pericondensed PAHs, an assumption was made that different models should be applied for pen- and catacondensed PAHs or that the latter group requires a more complex approach. The data were thus divided into a catacondensed group consisting of PAHs having molecular weights of 228,278, and 328 and a pericondensed group including molecular weights of 202, 252, 276, 300, 302, 326, and 350. Subsequent outlier analysis resulted in the same compounds as earlier being assessed as outliers in the catacondensed group, i.e., compounds 2,32, and 55. In the pericondensed, group, three new outliers appeared (compounds 3,4, and 53 in Table 1). The results obtained from the PLS calculations are presented in Table 4. From the RMS and RMSP values, it is obvious that the model is good for prediction of retention of pericondensed PAHs. Overall predictability determined by PRESS statistics was 2 iu. In Figure 4292
Analytical Chemistry, Vol. 66, No. 23, December 7 , 7994
2a experimentally determined vs predicted retention indexes for the pericondensed PAHs are plotted. The residual plot in Figure 2b, derived from PRESS statistics, implicated a nondistorted model with quite a homogenous spreading of the residuals around zero. For the pericondensed compounds with molecular weights of 252, 276, and 302, the equation is useful, although predictability for Mw 326 is lower. The results obtained by the model derived from the catacondensed group indicated decreased accuracy of predictability compared to the total regression equation in Table 3. Evaluation of the Models by Predictionof Isomeric Group Retention Indexes. From the discussion above, it can be concluded that, among the presented models, the equation obtained by relating retention indexes of pericondensed PAHs to the five descriptors gave the most accurate predictability for PAHs. The predictability has been assessed, this far, by a jack-knifing procedure and expressed by PRESS statistics. In order to evaluate the models for different isomers, all isomeric groups were excluded, one by one, from the data set and predicted by the model determined from the remaining PAHs (except outliers). This was performed by the equation of all 70 (67 after removal of outliers) PAHs (similar to Table 3, row l),and of all pericondensed and catacondensed PAHs from Table 4. For the compounds with molecular weights of 202 (n = 2) , 228 (n = 5), 252 (n 5 7), 276 (n = 4), 278 (n = ll),and 326 (n = 3 or n = 4), all compounds in the groups were withdrawn for the model processing whereas the groups having molecular weights of 302 (n = 17) and 328 (n = 14) were divided into two. However, when all compounds were used for regression (n = 67), all isomers with MW 302 and 328 were withdrawn for prediction, respectively. The residual plots from prediction are presented in Figure 3a-c. Comparing these plots with the residual plots in Figures l b and 2b, no pronounced changes can be observed. RMSP for all predicted isomers indexes (Figure 3a) was 6.2 iu, RMSP for all predicted catacondensed isomers (Figure 3b) was 6.88 iu, and RMSP for the predicted
600-
550-
500-
r2 E 0.998 RMSP = 1.98 i.u. n = 33 I
400
450
500
I
550
600
MOLECULAR WEIGHT
650
EXPERIMENTAL RETENTIONINDEX 10
4
1I
. . I
,, !
I
2 5
ea
5 0
-5
-10
-41
8
-15-1 228
4
8I
278
328
MOLECULARWEIGHT MOLECULAR WEIGHT
Figure 2. (a, top) Predicted (by press statistics) vs experimental retention indexes for the pericondensed PAHs and (b, bottom) the corresponding residual plot.
pericondensed isomers (Figure 3c) was 2.5 iu, thus indicating that the models were stable for these types of compounds. Practical Remarks. In practice, the RMSP of the retention data must be compared to the retention data of the entire isomeric group. That is, a small elution time interval demands a very good model of prediction if individual compounds are to be discerned from other unknown isomers. On the other hand, a more or less inaccurate model can be useful if the compounds to be distinguished are spread out in a large elution time interval. As an example, the GC/MS analysis of the commercially available rubicene (MW = 326) showed two compounds with the same molecular weight in the product. Fluorescence analysis indicated that the minor compound was rubicene. If so, rubicene must be considered as an outlier in the models calculated in this paper (the error of prediction was 19 index units). Further analysis proved that previously recorded fluorescence data had been based on the impurity of the product. The inclusion of the other compound in the model described here yielded an error of only 2.3 index units. This is an example which demonstrates that the purpose of the model must be considered together with the evaluation of RMSP values. Comparing predictability of this work with results presented by other authors is not simple. In general, the standard error of the mean is used in order to describe the overall predictability of a model. In Table 5, an attempt was made to compare the
,
I
,,
,
I I 252
276
302
326
MOLECULAR WEIGHT
Figure 3. Residual plots when each isomeric group was withdrawn from the modeling process and predicted by the rest of the compounds using a model including (a, top) all PAHs for the regression, (b, middle) the catacondensed PAHs for regression, and (c, bottom) the pericondensed PAHs for regression.
accuracy of the predicted values obtained by the models presented in this work with the predictability of a number of other models, calculated for PAHs, PCBs, etc. Comparably successful results concerning the prediction accuracy have been achieved by the selection of the presented five descriptors. Descriptors that are important but not yet used are probably missing in the models presented. The interaction of solutes with different phases in a chromatographic column may not be Analytical Chemistry, Vol. 66, No. 23, December 7, 7994
4293
~~~~
~
Table 5. Comparison of Coefficient of Variation Data Described as Standard Error of the Mean with Literature Data
compd types
std error of the mean (CV %)
ref
PCB PCB PCB NO-PAH PAH PAH PAH PAH (pericondensed)
2.5 1.7