Energy & Fuels 2006, 20, 1111-1117
1111
Prediction of Gas Chromatographic Retention Times of Carbazoles in Light Cycle Oil Nobumasa Nakajima,† Cecile Lay,‡ Hongbin Du,§ and Zbigniew Ring*,‡ National Center for Upgrading Technology, 1 Oil Patch DriVe, DeVon, Alberta, Canada T9G 1A8 ReceiVed September 6, 2005. ReVised Manuscript ReceiVed March 5, 2006
Forty-nine individual carbazoles were identified in a fluid-catalytic-cracking light cycle oil (LCO) using gas chromatography (GC) with nitrogen chemiluminescence detection (GC-NCD). Their retention times were correlated with molecular descriptors generated from their molecular structures. The best seven-parameter multilinear-regression models showed good fit with a standard deviation of s ) 0.58 min for the retention times. The descriptors involved in the models reflect the geometrical, topological, and electronic properties of the molecules, related to the interactions between the solute and the stationary phase. For forty-five out of the forty-nine carbazoles (carbazole, alkylcarbazoles, and hydrocarbazoles) of most interest in LCO processing, another seven-parameter multilinear-regression model was developed with standard deviation s ) 0.37 min. This model further improved the model fit to the database of retention times. QSPR models of this quality can be used as an important source of information in the process of identification of unknown chromatographic peaks by matching their retention times with those of carbazoles of known molecular structures when the corresponding chemical standards are unavailable.
1. Introduction Light cycle oil (LCO) is a byproduct of the fluid-catalyticcracking process (FCC), an important refining process for converting heavy gas oils to naphtha. Consequently, the increasing demand for gasoline is accompanied by increased production of LCO. LCO can be used as a diesel fuel blending component. However, it is very aromatic and contains large concentrations of refractory sulfur (and nitrogen) compounds. Direct introduction of LCO into the diesel fuel lowers diesel quality unless it is previously hydrotreated. With diesel specifications becoming stricter around the world, hydrotreating of LCO has been receiving increasing attention.1-9 Deep hydrodesulfurization (HDS) of diesel blending components to meet the ultralow-sulfur specification targets the removal of specific sulfur compounds and is affected by specific * To whom correspondence should be addressed. E-mail: zring@ nrcan.gc.ca. † Visiting researcher from Cosmo Oil Co. Ltd in Japan. ‡ National Centre for Upgrading Technology, NCUT. § State Key Laboratory of Coordination Chemistry, Coordination Chemistry Institute, Nanjing University. (1) Sano, Y.; Choi, K.-H.; Korai, Y.; Mochida, I. Appl. Catal. B 2004, 53, 169-174. (2) Yang, H.; Chen, J.; Fairbridge, C.; Briker, Y.; Zhu, Y. J.; Ring, Z. Fuel Proc. Technol. 2004, 85, 1415-1429. (3) Thakkar, V. P.; Abdo, S. F.; Gembicki, V. A.; Gehee, J. F. M. LCO Upgrading: A Novel Approach for Greater Added Value and Improved Returns. Proceedings of the NPRA 2005 Annual Meeting, San Francisco, CA, 2005. (4) Turaga, U. T.; Song, C. Catal. Today 2003, 86, 129-140. (5) Corma, A.; Gonza´lez-Alfaro, V.; Orchille´s, A. V. J. Catal. 2001, 200, 34-44. (6) Saint-Martin, G. C. L.; Martinez, M. C.; Castillo, J.; Cano, J. L. Fuel 2004, 83, 1381-1389. (7) Ancheyta-Jua´rez, J.; Aguilar-Rodrı´guez, E.; Salazar-Sotelo, D.; Betancourt-Rivera, G.; Leiva-Nuncio, M. Appl. Catal. A 1999, 180, 195205. (8) Moyse, B. M.; Topsøe, H. World Refin. 2001, January/Feburary, 28. (9) Jaramillo, J. C. P.; Velazco, D. R. M.; Baldrich, C. Fuel 2004, 83, 337-342.
nitrogen and aromatic compounds. Therefore, knowledge of the feedstock composition at the molecular level is becoming increasingly important. Several authors have performed detailed characterizations of the sulfur compounds in LCO.10,11 Fewer studies have been devoted to the characterization of nitrogen compounds in LCO.12-15 Yet full identification of individual nitrogen or even sulfur compounds in LCO is far from a reality. Most of the compounds of potential interest are unavailable as chemical standards and the large number of similar structures present in LCO makes it difficult to obtain fully resolved chromatograms and identify and assign structures to specific chromatographic peaks. The advent of 2D-GC and the development of speciation methods that do not require chemical standards, including more rigorous quantification of various factors affecting the chromatographic retention times, may soon make it possible to fully speciate sulfur and nitrogen at least in cracked light distillates and, consequently, to predict their processability with respect to sulfur removal. This paper presents an attempt to develop a quantitative structure-property relationship (QSPR) model for the chromatographic retention times of carbazole derivatives using a limited database developed using chemical standards. The intent is to later predict retention times of other carbazole derivatives for which such standards are unavailable. The QSPR method very effectively captures the properties of the molecular structure in the form of a set of molecular descriptors with fundamental (10) Depauw, G. A.; Froment, G. F. J. Chromatogr. A 1997, 761, 231247. (11) Nyle´n, U.; Delgado, J. F.; Ja¨rås, S.; Boutonnet, M. Fuel Proc. Technol. 2004, 86, 223-234. (12) Du, H.; Ring, Z.; Briker, Y.; Arboleda, P. Catal. Today 2004, 98, 217-225. (13) Dorbon, M.; Bernasconi, C. Fuel 1989, 68, 1067-1074. (14) Barrie, P. J.; Lee, C. K.; Gladden, L. F. Chem. Eng. Sci. 2004, 59, 1139-1151. (15) Laredo, G. C.; Leyva, S.; Alvarez, R.; Mares, M. T.; Castillo, J.; Cano, J. L. Fuel 2002, 81, 1341-1350.
10.1021/ef050289b CCC: $33.50 Published 2006 by the American Chemical Society Published on Web 04/15/2006
1112 Energy & Fuels, Vol. 20, No. 3, 2006
Nakajima et al.
meaning. This method probably offers the best hope to understand how the molecular structure affects a compound’s property of interest (in this paper its chromatographic retention time) and “extrapolate” the model to “unknown” compounds. Several researchers have suggested the usefulness of this method to predict the gas chromatographic retention times.10,12,16-18 For example, Depauw and Froment10 developed a linear regression correlation for substituted benzothiophene component retention time based on the positions of substituents. Their model predicted elution times well for di-, tri-, and tetramethylsubstituted benzothiophenes, providing a useful method for identifying some sulfur compounds in a cracked middle distillate. Recently, Du et al.12 successfully predicted the retention times of sulfur components in LCO by the QSPR method. We expanded Du’s study12 to carbazole compounds. In this study, forty-nine individual carbazoles of most interest in LCO processing were used to develop a general model between the chromatographic retention times and molecular structures of the carbazoles. Our intent was to enable the prediction of retention times on the basis of the compounds’ structures in the absence of corresponding chemical standards. A simple attempt to use this model for the analysis of an LCO chromatogram was also made. 2. Experimental Section 2.1. Sample Analysis. The database of retention times and LCO chromatograms used in this study was generated by means of an Agilent Technologies HP-6890 GC coupled to an Antek model 705E nitrogen chemiluminescence detector (NCD). The GC was equipped with a cool-on-column injection port and a Restek RTx-1 capillary column (Crossbond 100% dimethylpolysiloxane with dimensions of 60 m × 250 µm i.d. × 1.0 µm film thickness). The injection volume used in this assignment was 0.1 µL, but it can be optimized depending on the nitrogen concentration level of LGO. Helium was used as the carrier gas at a constant flow of 1.9 mL/min and an initial pressure of 28.23 psi. The oven was heated from 40 °C to 300 °C at a rate of 2 °C/min and then held at this temperature for 20 min. Figure 1 shows the calibration straight lines for 1,4-dimethylcarbazole, 9-phenylcarbazole, 1,3,6,8-tetra-tert-butylcarbazole, and 9-ethylcarbazole. These lines were obtained by fitting straight lines to the corresponding five data sets with a single slope. The good fit (R2 ) 0.9897-0.9958) indicates that a common slope is indeed a good assumption and only the intercepts of those plots depend on the compound. These lines adequately model the chemical standard data in the 2-160 wt ppm concentration range. All the retention time measurements used in this study were obtained in the 50-80 wt ppm range, and the retention times used to develop QSPR models below were corrected to 20 wt ppm using the slope determined above. A separate study, to be reported in the future, looked at the effects of hydrocarbon matrix composition on the retention time. The stability of the NCD system was evaluated by analyzing two standards 19 times over a one-month period. The satisfactory repeatability of the retention time for the system is illustrated in Figure 2. The standard deviations of the retention times for carbazole and 1,4-dimethylcarbazole were 0.06 and 0.07 min, respectively. 2.2. Data Sets. The 49 nitrogen compounds considered in this study are listed in Table 1. During the attempt to analyze an LCO (16) Lucˇic´, B.; Trinajstic´, N. J. Chem. Inf. Comput. Sci. 1999, 39, 610621. (17) Katritzky, A. R.; Ignatchenko, E. S.; Barcock, R. A.; Lobanov, V. S.; Karelson, M. Anal. Chem. 1994, 66, 1799-1807. (18) Georgakopoulos, C. G.; Kiburis, J. C.; Jurs, P. C. Anal. Chem. 1991, 63, 2021-2024.
Figure 1. (a) Retention time vs concentration of carbazole. (b) Retention time vs concentration of 9-ethylcarbazole. (c) Retention time vs concentration of 1,4-dimethylcarbazole. (d) Retention time vs concentration of 9-phenylcarbazole. (e) Retention time vs concentration of 1, 3, 6, 8-tetra-tert-buthylcarbazole.
chromatogram, the peak assignments were established based on retention times that matched previously analyzed chemical standards. Kovats retention indices (RIs) were calculated for the retention-time database to account for retention time shifts due to
Gas Chromatographic Retention Times of Carbazoles
Energy & Fuels, Vol. 20, No. 3, 2006 1113 Table 1. Comparison of Measured Retention Times and Those Predicted Using the Multilinear 7-Descriptor Model for 49 Carbazoles tR (min) no. 1 2 3 4 5 6 7 8 9 10
Figure 2. (a) Retention time vs run number for carbazole. (b) Retention time vs run number for 1,4-dimethylcarbazole. UWL, upper warning limit (UWL ) average + 3s); LWL, lower warning limit (LWL ) average - 3s); UCL, upper control limit (UCL ) average + 2s); LCL, lower control limit (LCL ) average - 2s).
the effects of the instrument parameters.19 4A-methyl-3,4,4A,9tetrahydro-2H-carbazole, carbazole, 1,4-dimethylcarbazole, 9-phenylcarbazole, and 1,3,6,8-tetra-tert-butylcarbazole were chosen as retention time standards (structures shown in Figure 3) and arbitrarily assigned the exact RI values of 100, 200, 300, 400, and 500. In the case of temperature-programmed chromatography, an approximate Kovats retention index can be calculated using the retention times directly instead of their logarithm, according to the formula RI ) 100y + 100
[
]
tR(x) - tR(y) tR(z) - tR(y)
where tR is the retention time, x is the compound of interest, and y and z correspond to the standards eluting immediately prior to and after the compound of interest, respectively. Although many groups reported the prediction of retention indices by the QSPR method,20-26 retention times more directly reflect the compound property than indices. Therefore, in the modeling step, we used the retention times recalculated from the retention indices rather than the retention indices themselves (see Figure 4). 2.3. Descriptor Generation. Over 400 descriptors corresponding to the molecule structures of the 49 chemical standards used in (19) McNaught, A. D.; Wilkinson, A. Kovats (retention) Index. Compendium of Chemical Terminology “The Gold Book”, 2nd ed.; Blackwell Science: Cambridge, MA, 1997; (http://www.iupac.org/publications/ compendium). (20) Sutter, J. M.; Peterson, T. A.; Jurs, P. C. Anal. Chim. Acta 1997, 342, 113-122. (21) Woloszyn, T. F.; Jurs, P. C. Anal. Chem. 1992, 64, 3059-3063. (22) Guo, W.; Lu, Y.; Zheng, X. M. Talanta 2000, 51, 479-488. (23) Jalali-Heravi, M.; Fatemi, M. H. J. Chromatogr. A 2001, 915, 177183. (24) Yao, X.; Zhang, X.; Zhang, R.; Liu, M.; Hu, Z.; Fan, B. Talanta 2002, 57. (25) Junkes, B. d. S.; Amboni, R. D. d. M. C.; Yunes, R. A.; Heinzen, V. E. F. Anal. Chim. Acta 2003, 477, 29-39. (26) Woloszyn, T. F.; Jurs, P. C. Anal. Chem. 1993, 65, 582-587.
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
compound
measureda
model
∆b
%∆c
4A-methyl-3,4,4A,9tetrahydro-2H-carbazole carbazole 1,4-dimethylcarbazole 9-phenylcarbazole 1,3,6,8-tetratert-butylcarbazole 1,2,3,4-tetrahydrocarbazole 9-methylcarbazole 9-ethylcarbazole 9-vinylcarbazole 5,11-dihydro-6Hbenzo(A)carbazole 1-methylcarbazole 1,6-dimethylcarbazole 2,5-dimethylcarbazole 2,6-dimethylcarbazole 1,4,5-trimethylcarbazole 1,4,8-trimethylcarbazole 2,3,5-trimethylcarbazole 2,3,6-trimethylcarbazole 2,4,7-trimethylcarbazole 2-methylcarbazole 3-methylcarbazole 4-methylcarbazole 1,2-dimethylcarbazole 1,3-dimethylcarbazole 1,5-dimethylcarbazole 1,8-dimethylcarbazole 2,3-dimethylcarbazole 2,4-dimethylcarbazole 2,7-dimethylcarbazole 3,4-dimethylcarbazole 3,5-dimethylcarbazole 3,6-dimethylcarbazole 4,5-dimethylcarbazole 1,3,4-trimethylcarbazole 1,3,5-trimethylcarbazole 1,4,7-trimethylcarbazole 1,5,7-trimethylcarbazole 2,4,6-trimethylcarbazole 3,4,6-trimethylcarbazole 1,2,5,7-tetramethylcarbazole 1,2,6,7-tetramethylcarbazole 1,3,4,6-tetramethylcarbazole 1,3,6,7-tetramethylcarbazole 1,4,5,8-tetramethylcarbazole 2,3,5,7-tetramethylcarbazole 2,3,6,7-tetramethylcarbazole 3,4,5,6-tetramethylcarbazole 1,2,4,6,8-pentamethylcarbazole 2,4,5,6-tetramethylcarbazole
75.91
76.19
0.28
0.4
89.77 100.28 112.19 128.17
90.77 100.75 112.28 127.95
1.01 0.47 0.10 -0.22
1.1 0.5 0.1 -0.2
88.19 89.85 91.64 92.44 115.54
88.01 89.46 92.04 92.31 115.19
-0.19 -0.39 0.40 -0.13 -0.36
-0.2 -0.4 0.4 -0.1 -0.3
94.12 99.59 102.17 101.28 107.98 103.72 108.54 107.93 107.07 95.79 95.49 96.56 102.01 99.56 100.66 97.95 103.09 101.92 101.54 103.68 101.43 100.99 105.84 107.30 105.24 105.97 105.86 106.47 108.34 112.71
94.59 99.03 102.14 101.03 107.82 104.19 107.76 107.50 107.65 96.09 95.20 97.14 101.80 98.64 101.06 98.20 101.88 101.56 102.29 103.74 101.57 100.61 104.24 106.84 104.88 106.00 106.06 106.50 108.83 112.86
0.47 -0.56 -0.02 -0.25 -0.15 0.47 -0.78 -0.43 0.58 0.30 -0.28 0.58 -0.21 -0.92 0.40 0.24 -1.21 -0.36 0.75 0.05 0.14 -0.38 -1.59 -0.46 -0.36 0.03 0.20 0.04 0.48 0.15
0.5 -0.6 0.0 -0.2 -0.1 0.5 -0.7 -0.4 0.5 0.3 -0.3 0.6 -0.2 -0.9 0.4 0.2 -1.2 -0.4 0.7 0.1 0.1 -0.4 -1.5 -0.4 -0.3 0.0 0.2 0.0 0.4 0.1
113.80
114.09
0.29
0.3
111.85
111.99
0.14
0.1
111.67
110.91
-0.77
-0.7
110.99
111.31
0.32
0.3
113.12
114.02
0.90
0.8
114.72
115.55
0.83
0.7
115.96
116.55
0.59
0.5
114.76
114.80
0.04
0.0
115.21
114.99
-0.21
-0.2
a Observed retention times (recalculated from retention indices). b Absolute error. c Relative error.
this study were generated using the CODESSA program17 (CODESSA 2.64 by SemiChem, Inc.). This program generates sets of conventional constitutional, geometrical, topological, electronic, and quantum-chemical descriptors after minimizing the internal energy of the respective molecule. The constitutional descriptors are derived from the molecular composition of the compounds, including
1114 Energy & Fuels, Vol. 20, No. 3, 2006
Nakajima et al.
Figure 3. Carbazoles used as standards in the retention index calculation.
Figure 4. Calculation of corrected retention times.
molecular weights, numbers of atoms, bonds, rings, special groups, etc. Topological descriptors, derived from molecular graph invariants, describe the atomic connectivity in the molecule (e.g., Wiener index,27 Randic index,28 Kier and Hall index,29-31 the Balaban index,32,33 and the information content index and its derivatives34,35). The geometrical descriptors include moments of inertia, shadow indices,36 molecular volume, surface area, and gravitational indices. These are calculated from the 3D coordinates of the atoms in the given molecule. The electronic descriptors reflect characteristics of the charge distribution of the molecule. These include minimum and maximum partial charges in the molecule, minimum and maximum partial charges for particular types of atoms, polarity parameter (qmax-qmin) and the polarity parameter factorized by the division with the square of the distance between atoms bearing minimum and maximum partial charges, the topological electronic index, and charged-partial surface-area (CPSA) descriptors. The quantum chemical descriptors are supplementary to the conventional (27) Wiener, H. J. Am. Chem. Soc. 1947, 69, 17. (28) Randic, M. J. Am. Chem. Soc. 1975, 97, 6609-6615. (29) Kier, L. B.; Hall, L. H. Structure-ActiVity; J. Wiley & Sons: New York, 1986. (30) Kier, L. B.; Hall, L. H. Eur. J. Med. Chem. 1977, 12, 307. (31) Kier, L. B.; Hall, L. H. J. Pharm. Sci. 1981, 70, 583. (32) Balaban, A. T. Chem. Phys. Lett. 1981, 89, 399. (33) Balaban, A. T. Pure Appl. Chem. 1983, 55, 199. (34) Basak, S. C.; Harriss, D. K.; Manguson, V. R. J. Pharm. Sci. 1984, 73, 429. (35) Kier, L. B. J. Pharm. Sci. 1980, 69, 807. (36) Rohrbaugh, R. H.; Jurs, P. C. Anal. Chim. Acta 1987, 199, 99.
descriptors and related to the charges, bonding, energies, reactivities, and thermodynamic properties of molecules. These were calculated by the AM1 quantum semiempirical method37 with the AMPAC program (AMPAC 7.0, Semichem Inc.). 2.4. Regression Analysis. The QSPR models of this study were built using the multilinear-regression method and following the methodology implemented in the CODESSA heuristic program.17 Over 412 descriptors were initially calculated for the entire data set of 49 compounds. After the elimination of high-pairwise correlations, all orthogonal pairs were found from these leftover descriptors and treated using two-parameter regression with the property. The top pairs with the highest regression correlation coefficients were chosen for higher-order regression analysis that involved successively adding (from the pool of still available descriptors) the third (and then higher) descriptor to the best twodescriptor models until the Fisher criterion at a given probability level was smaller than that for the previous model or until all the available descriptors had been tried. Table 2 lists the subset of descriptors that resulted from such elimination. The general form of the retention time models was n
tRj ) c0 +
∑c D i
i,j
i)1
where tRj is the retention time for a given compound j, c0 is the (37) Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. J. Am. Chem. Soc. 1985, 107, 3902-3909.
Gas Chromatographic Retention Times of Carbazoles
Energy & Fuels, Vol. 20, No. 3, 2006 1115
Table 2. Descriptors Used in This Study Di
descriptor name
D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 D21 D22
intercept Kier & Hall index (order 3) HOMO-1 energy 1/ × BETA polarizability (DIP) 2 no. of benzene rings max partial charge (Qmax) ESP-max net atomic charge for a H atom relative positive-charged surface area (Zefirov’s) moment of inertia C max σ-π bond order max electrophilic reactivity index for C av electrophilic reactivity index for C HA-dependent HDCA - 2 (Zefirov’s PC) max partial charge for C (Zefirov’s PC) min. net atomic charge for C ESP-max net atomic charge for H HA-dependent HDCA - 1/TMSA ESP-HA-dependent HDCA - 2 HOMO-LUMO energy gap HA-dependent HDCA - 2/TMSA av nucleophic reactivity index for C moment of inertia, A min. total interaction for a H-N bond
Figure 5. Calculated vs observed retention times for the 49-carbazole model.
Table 3. Seven-parameter Multilinear Regression Model for 49 Carbazoles i
ci (regress coeff)
Di (std error of regress)
t
0 1 2 3 4 5 6 7
280.170 9.598 25.086 -0.042 8.470 417.720 -85.769 -0.947
7.2607 0.3245 0.7399 0.0028 0.4842 27.9900 13.5940 0.1622
38.59 29.58 33.90 -14.98 17.49 14.92 -6.31 -5.84
Table 4. Development of the Multilinear Regression Model for 49 Carbazoles eq
Na
descriptorsb
Fc
R2 d
R2CVe
sf
1 2 3 4 5 6
2 3 4 5 6 7
D8, D9 D10, D11, D12 D1, D2, D3, D13 D1, D2, D3, D13, D14 D1, D2, D3, D4, D16, D17 D1, D2, D3, D4, D5, D6, D
333 439 544 1119 1436 1703
0.9353 0.9670 0.9802 0.9924 0.9951 0.9966
0.9183 0.9372 0.9665 0.9849 0.9917 0.9938
2.37 1.71 1.34 0.84 0.68 0.58
a No. of params in the correlation. b Descriptors involved in the correlation; numbering corresponds to that in Table 2. c Fisher criterion. d Correlation coefficient. e Cross-validation correlation coefficient. f Standard deviation.
intercept, ci is the descriptor coefficient, and Di,j represents the values of descriptor Di for compound j.
3. Results and Discussion As the first step, the best seven-parameter multilinear model was found for the entire data set of 49 nitrogen compounds. The ratio of the number of data points to the number of descriptors for the model was seven. Ratios greater than five are usually considered statistically sufficient.21 A summary of this model is given in Table 3. The Student’s t-value characterizes the relative significance of the regression coefficient in a particular model. The report of the model development is shown in Table 4. The Fisher criterion (F ) 1702 for the final model) can be interpreted as a ratio of the variance explained by the model and the variance not explained by the model. High F values indicate that the model is statistically credible. The correlation coefficient, R2 ) 0.9966 for the final model, is an indicator of the amount of variation in the dependent variable accounted for by the model. R2 values close to 1 indicate good model fit. The leave-one-out correlation coefficient (R2CV )
Figure 6. Residuals vs observed retention times for the 49-carbazole model.
0.9938 for the final model) is a measure of the predictive ability (“stability”) of the model on one hand and the quality of the database on the other. By the predictive ability, we mean the ability of the model to predict the retention time of a compound whose retention time was not used in the model development. A more stable model has a Rcv coefficient closer to R. This model can predict retention times surprisingly well with a standard deviation of s ) 0.58 min. This means that 68.27% of the data is predicted (0.58 min and 95.45% of data is predicted in the error range of (1.16 min when the errors have a normal distribution. The calculated retention times are compared with the corresponding experimental values in Table 1. Figure 5 shows the relationships between the retention times measured and those calculated from the model. The plot of the corresponding residuals versus the observed retention times is shown in Figure 6. As shown in Table 3 and Figures 5 and 6, the multilinear QSPR model of retention times is of relatively good quality. All of the residuals are within (1.00 min, except for carbazole (1.01 min) and 4,5-dimethylcarbazole (-1.59 min). In chromatographic separation, the molecules of those components that exhibit stronger affinities for the stationary phase, venture into the carrier gas less frequently and require a
1116 Energy & Fuels, Vol. 20, No. 3, 2006
Nakajima et al.
Table 5. Seven-Parameter Multilinear Regression Model for 45 Carbazoles i
ci (regress coeff)
Di (std error of regress
t
0 1 2 18 19 20 21 22
29.655 7.063 19.947 -25.238 11238 -4894.000 -46.617 35.191
107.4100 0.1969 1.3496 1.1410 1247 586.8700 765.0200 7.3674
0.28 35.88 14.78 -22.12 9.01 -8.34 -6.09 4.78
longer period of time to reach the detector than the molecules with components that are less oriented toward the solid phase; hence, separation is achieved.38 The affinity of a molecule to the stationary phase is related to adsorptivity, solubility, and chemical bonding. Such information is reflected by the various molecular descriptors used in the development of our QSPR model. As indicated in Table 3, the model contains one topological (D1), one constitutional (D4), one electrostatic (D5), and four quantum-chemical (D2, D3, D6, and D7) descriptors. The Kier & Hall index (order 3) (D1) describes the connectivity of three contiguous bond-fragment valences containing information about the size and degree of branching in the molecules. The HOMO-1 energy (D2) is the second highest-occupied molecular orbital. This descriptor is the most significant descriptor in the model as indicated by the Student’s t-factor. Generally, the HOMO level is affected by electron-donating groups, such as the alkyl-group. At our QSPR study, this descriptor has the most significant effect on the retention time. The 1/2 × BETA polarizability (D3), which is the hyperpolizability calculated based on the dipole moment of the molecule, represents the effectiveness of intermolecular induction and dispersion interaction with the medium. The number of the benzene ring (D4) is related to the size and electronic distribution of the molecule. The maximum partial charge (Qmax) (D5) is calculated by the method based on the Sanderson electronegativity scale and uses the concept which represents the molecular electronegativity as a geometric mean of atomic electronegativities.39 The ESP-maximum net atomic charge for the H atom (D6) is related to the ability of a molecule to participate in electrostatic interaction between molecules and the stationary phase. The relative positive-charged surface area (D7) is one of the CPSA descriptors, which encodes features responsible for polar interactions between molecules and the stationary phase. Since the Kier & Hall index (D1), HOMO-1 energy (D2), and BETA polarizability (D3) descriptors are common for most of the models, these three descriptors should have a high contribution to the seven-parameter QSPR model. The retention time prediction errors make it difficult for these models alone to be applied for peak identification in a chromatogram, particularly at small retention times. To further improve the model, we critically reviewed our database. It is generally accepted that the carbazole compounds in LCO are mostly alkylcarbazoles. Also, alkylcarbazoles substituted at the 9 position (on the nitrogen atom) are rare.13,15 Therefore, we decided to redevelop the model using a 45-compound subset of the original database that excluded 9-phenylcarbazole, 9-vinylcarbazole, 9-methylcarbazole, and 9-ethylcarbazole. The resulting seven-parameter model is summarized in Table 5. The descriptors involved in this model are similar in nature but are not the same as those from the study using the full set. This is an indication of a weakness of a completely automatic statistical (38) Jennings, W. GTas Chromatography with Glass Capillary Columns, 2nd ed.; Academic Press: New York, 1980. (39) Csizmadia, I. G. Theory and Practice of MO Calculations on Organic Molecules; Elsevier: Amsterdam, 1976.
Table 6. Comparison of Measured Retention Times and Those Predicted Using the Multilinear 7-Descriptor Model for 45 Carbazoles tR(min) no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
compound
obsd (corr.)
calcd
∆
4A-methyl-3,4,4A,9tetrahydro-2H-carbazole carbazole 1,4-dimethylcarbazole 9-phenylcarbazole 1,3,6,8-tetra-tertbutylcarbazole 1,2,3,4-tetrahydrocarbazole 9-methylcarbazole 9-ethylcarbazole 9-vinylcarbazole 5,11-dihydro-6Hbenzo(A)carbazole 1-methylcarbazole 1,6-dimethylcarbazole 2,5-dimethylcarbazole 2,6-dimethylcarbazole 1,4,5-trimethylcarbazole 1,4,8-trimethylcarbazole 2,3,5-trimethylcarbazole 2,3,6-trimethylcarbazole 2,4,7-trimethylcarbazole 2-methylcarbazole 3-methylcarbazole 4-methylcarbazole 1,2-dimethylcarbazole 1,3-dimethylcarbazole 1,5-dimethylcarbazole 1,8-dimethylcarbazole 2,3-dimethylcarbazole 2,4-dimethylcarbazole 2,7-dimethylcarbazole 3,4-dimethylcarbazole 3,5-dimethylcarbazole 3,6-dimethylcarbazole 4,5-dimethylcarbazole 1,3,4-trimethylcarbazole 1,3,5-trimethylcarbazole 1,4,7-trimethylcarbazole 1,5,7-trimethylcarbazole 2,4,6-trimethylcarbazole 3,4,6-trimethylcarbazole 1,2,5,7-tetramethylcarbazole 1,2,6,7-tetramethylcarbazole 1,3,4,6-tetramethylcarbazole 1,3,6,7-tetramethyl carbazole 1,4,5,8-tetramethylcarbazole 2,3,5,7-tetramethylcarbazole 2,3,6,7-tetramethylcarbazole 3,4,5,6-tetramethylcarbazole 1,2,4,6,8-pentamethylcarbazole 2,4,5,6-tetramethylcarbazole
75.91
75.89
-0.02
89.77 100.28
89.65 100.79
-0.11 0.51
-0.1 0.5
128.17
128.28
0.11
0.1
88.19
88.19
0.00
0.0
115.54
115.51
-0.03
-0.02
94.12 99.59 102.17 101.28 107.98 103.72 108.54 107.93 107.07 95.79 95.49 96.56 102.01 99.56 100.66 97.95 103.09 101.92 101.54 103.68 101.43 100.99 105.84 107.30 105.24 105.97 105.86 106.47 108.34 112.71
94.37 99.27 102.48 101.41 107.91 104.36 108.61 108.21 107.15 95.87 95.51 96.87 101.69 99.36 100.69 97.78 102.93 102.15 101.54 103.67 101.61 101.07 104.38 107.26 105.02 105.99 105.85 106.94 108.33 112.59
0.25 -0.32 0.31 0.13 -0.07 0.64 0.07 0.28 0.09 0.08 0.02 0.31 -0.32 -0.21 0.04 -0.17 -0.16 0.22 0.01 -0.01 0.18 0.08 -1.46 -0.04 -0.22 0.03 0.00 0.47 -0.01 -0.12
0.26 -0.32 0.30 0.13 -0.06 0.62 0.06 0.26 0.08 0.08 0.02 0.32 -0.32 -0.21 0.04 -0.18 -0.15 0.22 0.01 -0.01 0.18 0.08 -1.38 -0.04 -0.21 0.02 0.00 0.44 -0.01 -0.11
113.80
113.24
-0.56
-0.49
111.85
111.32
-0.52
-0.47
111.67
111.15
-0.52
-0.47
110.99
111.13
0.14
0.13
113.12
113.19
0.07
0.06
114.72
114.80
0.09
0.08
115.96
116.45
0.49
0.42
114.76
114.81
0.05
0.04
115.21
115.42
0.22
0.19
%∆
approach to the QSPR model development and its arbitrary multilinear form. It is also an indication that further work, particularly using a more sophisticated method for descriptor selection and a nonlinear model, could possibly result in a much better model. The descriptors involved in the second model are as follows: one topological (D1), one geometrical (D21), and five quantum-chemical (D2, D18, D19, D20, D22). D18 expresses the energy gap between the highest-occupied molecular orbital (HOMO) and the lowest-occupied molecular orbital (LUMO). Generally, the HOMO level is affected by electron-donating
Gas Chromatographic Retention Times of Carbazoles
Figure 7. Calculated vs observed retention times for the 45-carbazole model.
Energy & Fuels, Vol. 20, No. 3, 2006 1117
level. The moment of inertia A (D21) is related to the difficulty of rotation around the x rotational axis. The minimum total interaction for H-N bond (D22) reflects the interaction energy of the bond between nitrogen and hydrogen. The total interaction energy is defined as the difference between electrostatic interaction energy and electronic exchange energy. The sevenparameter model for the 45-carbazole subset has better correlation characteristics (F ) 3548, R2 ) 0.9985, R2CV ) 0.9980, and s ) 0.37 min) than the model based on all 49 carbazoles. The calculated retention times are compared with the corresponding experimental values in Table 6 and Figure 7. The plot of residuals versus observed retention times is shown in Figure 8. All of the residuals for the subset data are within (0.64 min, except for the residual for 4,5-dimethylcarbazole (-1.46 min) that, similarly to the first model, is clearly an outlier. It was our intention to use the retention-time model we developed here to analyze an LCO chromatogram. Figure 9 shows such a GC-NCD chromatogram for material derived from a Canadian crude oil. The procedure involved assuming a molecular structure of a carbazole, calculation of the corresponding values of the model descriptors, and an estimation of the retention time. To assess the probability of successful identification, we identified and zoomed in on the most “crowded” ((0.64 min) range of the chromatogram and found five peaks. This indicates that the retention-time model we developed here cannot be used alone to positively identify a peak. Another source of information is needed to discriminate between those peaks. This could possibly be achieved by measuring the molecular masses associated with those peaks found with a mass spectrometry technique, by examining the expected (modeled) relative reactivity of the associated structures, etc. Therefore, although at this time our model alone cannot provide positive identification of all individual carbazole species in a sample, it is a useful tool that can be used in association with other sources of information to completely speciate carbazoles in a fully resolved chromatogram. Onedimensional chromatography is an adequate tool for obtaining a fully resolved chromatogram of a hydrotreated sample, but the chromatogram for the original LCO would likely have to be obtained using a 2D-GC. 4. Conclusion
Figure 8. Residuals vs observed retention times for the 45-carbazole model.
Figure 9. GC-NCD nitrogen chromatogram of LCO.
groups, such as an alkyl group, while the LUMO level is affected by electron-accepting groups. The HA-dependent HDCA 2/TMSA (HA ) hydrogen acceptor, HDCA ) hydrogenbonding molecular surface area, and TMSA ) total molecular surface area) (D19) is one of the quantum-chemical descriptors, which reflects their capabilities to create hydrogen bonds. The average nucleophilic reactivity index (D20) is based on the frontier-electron theory and reflects the property of the HOMO
This study demonstrates that a QSPR model is a useful tool that can be used for speciation of organic nitrogen (and sulfur) in cracked distillates (e.g., LCO) with only limited need for chemical standards. The retention times of 49 carbazole compounds identified in LCO were modeled using a seven-parameter multilinear regression model based on calculated molecular descriptors. The model was further improved over a subset of data consisting of 45 carbazoles of most interest in petroleum processing. Both models have relatively high correlation coefficients and show good predictive ability, so they can be used to estimate the retention times of carbazoles that are unavailable as chemical standards, with a significant degree of confidence. The initial success of the QSPR approach in predicting retention times of dibenzothiophenes (Du12) and carbazoles encouraged us to continue this work toward full speciation of sulfur and nitrogen in middle distillates and, in the future, toward the prediction of retention times of heavy distillates. Acknowledgment. Special thanks to Professor Isao Mochida, Kyushu University in Japan, who kindly provided a precious set of 40 alkylcarbazole compounds. Partial funding for NCUT has been provided by the Canadian Program for Energy Research and Development (PERD), the Alberta Research Council (ARC), and the Alberta Energy Research Institute (AERI). EF050289B