Anal. Chem. 1990, 62, 1884-1891
1884
Simulation of Carbon- 13 Nuclear Magnetic Resonance Spectra of Alkyl-Substituted Aromatic Compounds G . P a u l Sutton a n d Peter C. Jurs* Department of Chemistry, The Pennsylvania State University, 152 Davey Laboratory, University Park, Pennsylvania 16802
Carbon-13 nuclear magnetlc resonance spectra for a collection of alkyl-substituted benzenes and polycycllc aromatic compounds have been slmulated by uslng computer-based methodology. Several new types of structure-based parameters were developed and were shown to be useful In characterizing structural features related to the conjugated T systems presenl In these molecules. These new descriptors, along with others developed prevlously, were used to generate parametrk equations for chemical shm prediction uslng a training set of 30 compounds. Two dlfferent approaches for the grouping of aromatic rlng carbons were investigated and evaluated for use In generatlng accurate chemical shift models and complete slmulated spectra. Partial slmulated spectra were also constructed, such that the methyl carbon atom resonances were excluded, to asslst in the evaluatlon of the rlng carbon atom model equations. The external predictive ability of the derived models was evaluated through the slmulatlon of ''C NMR spectra for a group of 44 compounds not Included In the model development process. External predlctlon accuracy Improved when ring-brldglng and non-ring-brldglng carbons were treated separately.
INTRODUCTION Carbon-13 nuclear magnetic resonance spectroscopy ( 13C NMR) is an important analytical tool, useful in the structural identification of organic compounds. The interpretation of complex 13C NMR spectra can be aided by making computer-assisted spectral comparisons of observed spectra with library reference spectra and through the use of spectrum simulation methods. Library search methods require that high-quality reference spectra be available for comparison, thus placing limits upon the utility of this approach. Spectrum simulation techniques offer an important alternative to library search methods and allow the chemist to generate approximate spectra which can be compared to the actual spectrum of an unidentified compound. Aromatic molecules are a chemically important class of compounds that have been examined extensively by both 'H and 13C NMR spectroscopies. Chemical shift prediction methods are an important means of aiding the spectral interpretation and identification of such molecules. Numerous studies have focused on the derivation and use of additivity relationships based upon substituent chemical shift effects (1-4), while others have used theoretical and empirical methods to characterize the nuclear shielding and electronic environments of aromatic carbons (5-8). Retrieval of chemical shifts from libraries of specially coded structural environments (9,lO)has also been used as a means for assembling simulated spectra for aromatic compounds. However, these approaches have their drawbacks. Methods based upon substituent chemical shift incrementa or coded structural environments require that the appropriate substituent parameters or coded
substructures exist in the database being used. The theoretical methods can be hindered by heavy computational demands and often yield poor chemical shift predictions. One approach to spectrum simulation centers around the development of mathematical and statistical relationships between the structural environment of carbon atoms and their observed chemical shift values. Through the use of multiple linear regression analysis, linear model equations are formed, which relate the observed chemical shift of a carbon atom to a series of numerically encoded structural parameters (descriptors). These models are of the form where S is the predicted chemical shift of a given carbon atom and the Xiare the numerical descriptors encoding structural features of that atom's chemical environment. The b; are the regression coefficients derived from a set of known chemical shifts for a set of reference compounds and d is the number of descriptors in the model. This parametric approach to spectrum simulation was first demonstrated for linear and branched alkanes (11,121 and later applied to compounds containing unsaturations and heteroatoms (13,141. Other investigations by Bernassau and co-workers have applied this approach to rigid alkanes (15) and to collections of alcohols and ketones (16). Work in this laboratory has led to the development of an interactive, computer-based system which allows large numbers of complex structural descriptors to be easily calculated and managed, while also providing model development and spectral prediction capabilities (17). With this system, spectral simulations for several classes of compounds have been studied: cyclohexanols and decalols ( l a ) ,hydroxy steroids (19), cyclopentanes and cyclopentanols (201, norbornanols (211,cyclic ketones (22),piperidines (23),and polychlorinated biphenyls (24). Efforts by Small and co-workers have extended this methodology to include the spectral prediction for carbohydrates (25, 26) as well as methyl-substituted linear cyclic aromatic compounds (27). In this paper, work is presented that involves the further extension of this spectrum simulation methodology to include alkyl-substituted linear and nonlinear aromatic compounds. Several new structural descriptors are developed and examined in an effort to encode structural features reflecting the conjugated a-electron systems present in these molecules. Different methods of subsetting the aromatic ring carbons prior to model generation are investigated and compared. Further developments regarding the evaluation and use of partial simulated spectra are presented, as is an examination of spectral simulation accuracy for selected subgroups of the aromatic compounds used in this study.
EXPERIMENTAL SECTION A total of 74 alkyl-substituted aromatic compounds were used in this investigation. Thirty compounds were used as the reference
(training) set for the formation of chemical shift models, while the remaining 44 molecules served as a prediction set to test the
0003-2700/90/0362-1884$02.50/00 1990 American Chemical Society
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
Table 11. Prediction Set Compounds
Table I. Training Set Compounds
no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
name benzene 1,2-dimethylbenzene 1,3-dimethylbenzene 1,2,4-trimethylbenzene 1,2,3,5-tetramethylbenzene pentamethylbenzene ethylbenzene isopropylbenzene 1,3-di-tert-butylbenzene
naphthalene 2-methylnaphthalene 1,3-dimethylnaphthalene 1,6-dimethvlnaphthalene 1;8-dimethylnaphthalene 2,3,6-trimethylnaphthalene 1,4,6,7-tetramethylnaphthalene 1-isopropyl-2-methylnaphthalene 2-tert-butylnaphthalene 1-methylanthracene 9-methylanthracene 2,3-dimethylanthracene 1,4,9-trimethylanthracene 1,4,5,9-tetramethylanthracene
9-tert-butylanthracene phenanthrene 4-methylphenanthrene pyrene triphenylene 7,12-dimethylbenz[a]anthracene dibenzla,hlanthracene
1885
ref
no.
28 28 28 28 28 28 28 28 28 29 29 29 29 29 30 30 31 32 33 33 34 33 33 35 36 36 37 38 39 38
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
external predictive ability of the chemical shift models. The NMR compound names and literature references for the spectral data for the training and prediction set compounds are contained in Tables I and 11, respectively. The spectra were recorded under a variety of experimental conditions, not described here. All the chemical shift values reported in this work are relative to tetramethylsilane (Me,Si). The chemical shift values for compound 29,originally reported were later updated by Jones and Shaw (39). by Ozubko et al. (S), The shifts in ref 39 were therefore used for compound 29. The chemical shift values for the methyl carbons in octamethylnaphthalene (compound 56) had not been reported in ref 41. The structures of the 74 compounds used in this study were entered into computer disk files using graphics routines implemented as part of the ADAPT software system (45-47). The approximate three-dimensionalcoordinates for the atoms within each compound were generated by using an interactive molecular mechanics program (48). All of the computer programs used in this work are written in FORTRAN and are implemented on a Sun 4/110 Workstation. Tektronix PLOT-10 software provides the graphics capabilities for this study.
RESULTS AND DISCUSSION Description of the Data Set. The 30 compounds in the training set include nine benzenes, nine naphthalenes, six anthracenes, two phenanthrenes, one pyrene, one triphenylene, one benz [a]anthracene, and one dibenz [a,h] anthracene and have substituents which include methyl, ethyl, isopropyl, and tert-butyl. The 44 prediction set molecules can be categorized as follows: 15 benzenes; 14 naphthalenes; 9 anthracenes; 2 phenanthrenes; 1tetracene; 2 benz[a]anthracenes; and 1dibenz[a,c]anthracene. The substituents include methyl, npropyl, isopropyl, n-butyl, sec-butyl, isobutyl, tert-butyl, and neopentyl groups. Both linear and nonlinear aromatic ring systems are present. Prediction compounds 38-41,58, and 59 contain alkyl substituents that were not included among the 30 reference set molecules. In addition, the aromatic ring systems present in compounds 71 and 74 of the prediction set
name 1-methylbenzene 1,4-dimethylbenzene 1,2,3-trimethylbenzene 1,3,5-trimethylbenzene 1,2,3,4-tetramethylbenzene 1,2,4,5-tetramethylbenzene
hexamethylbenzene n-propylbenzene n-butylbenzene sec-butylbenzene isobutylbenzene tert-butylbenzene 1,Z-di-tert-butylbenzene 1,4-di-tert-butylbenzene 1-methyl-4-tert-butylbenzene 1-methylnaphthalene 1,2-dimethylnaphthalene 1,4-dimethylnaphthalene 1,5-dimethylnaphthalene 1,7-dimethylnaphthalene 2,3-dimethylnaphthalene 2,6-dimethylnaphthalene 2,7-dimethylnaphthalene 2,3,5-trimethylnaphthalene 1,3,5,84etramethylnaphthalene octamethylnaphthalene 1-tert-butylnaphthalene 2-neopentylnaphthalene 2-neopentyl-6-methylnaphthalene
anthracene 2-methylanthracene 1,4-dimethylanthracene 1,8-dimethylanthracene 9,lO-dimethylanthracene 2,7,94rimethylanthracene 1,4,5,84etramethylanthracene 1,4,5,8,9-pentamethylanthracene
9-isopropylanthracene 1-methylphenanthrene 4,5-dimethylphenanthrene tetracene benz[a]anthracene 7-methylbenz[a]anthracene dibenzla.clanthracene
ref 28 28 28 28 28 28 28 28 28 28 28 28 28 28 40 29 29
29 29 29 29 29 29 30 30 41
32 42 42
33 33 33 33 33 33 33 33 31 43 36 44
38 38 38
are not represented in the training set. The accuracy of spectral predictions for compounds containing structural attributes not present in the modeling phase of the study would help in evaluating the scope of the calculated regression equations. Atom List Construction. The 30 compounds in the training set contain a total of 399 carbon atoms. One hundred twenty of these carbon atoms are structural duplicates and were removed from the model development process. Twelve of the remaining atoms present in the ethyl, isopropyl, and tert-butyl side chains of compounds 7-9,17,18, and 24 were also removed from consideration. Reduction of the atom list in this manner prevents the introduction of statistical bias into the regression models arising from the presence of replicate carbons and those atoms insufficiently represented in the data set. This left a collection of 267 structurally unique carbon atoms for the development of chemical shift models.
New Structural Parameters for Aromatic Compounds. Previous 13C NMR spectral simulation work in this laboratory has focused almost exclusively upon saturated molecules. The chemical shifts of aromatic carbons can be influenced by structural attributes over greater (interatomic) distances than for saturated bonding configurations. The modeling of chemical shifts in aromatic compounds therefore requires that the structural environments of carbon atoms be described such that features related to conjugated r-electron systems are
1886
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
Table 111. New Structural Descriptors for Aromatic Compounds Aromatic Shell Count Descriptors AASC n n number of heavy atomsa located in the nth spherical shell from the target carbon n
radial distance, a 2.3-3.1 3.1-3.9 3.9-4.6 4.6-5.4
AHSC n number of hydrogen atoms located in the nth spherical shell from the target carbon n
radial distance, 8, 0.0-1.5 1.5-2.4 2.4-3.0 3.0-3.5 3.5-4.3 4.3-5.5
Carbon Bonding Configuration Type Descriptors CBxx nb number of carbon atoms of bonding configuration type xx, located n bonds from the carbon center of interest CxxD p c inverse throughspace distance, to power p from the carbon center of interest to the closest carbon atom of bonding configuration type xx CACG 1 AVAC nd NTAC n TOAC n MPAC n MNAC n
Partial Atomic Charge Descriptors the partial Abraham/Smith (49, 50) atomic charge on the carbon center of interest the average of the Abraham/Smith atomic charges for atoms n bonds removed from the carbon center of interest the sum of the Abraham/Smith atomic charges for atoms n bonds removed from the carbon center of interest the sum of the absolute values of the Abraham/Smith atomic charges for atoms n bonds removed from the carbon center of interest the most positive Abraham/Smith atomic charge for atoms n bonds removed from the carbon center of interest the most negative Abraham/Smith atomic charge for atoms n bonds removed from the carbon center of interest
Continuous Electronic Environment Descriptors WCGne pf the sum of the partial sigma atomic charges, each divided by the number of bonds from the target carbon center raised to the p power, for all heavy atoms8 from 1 through n bonds away WHKn p the sum of the partial Huckel atomic charges, each divided by the number of bonds from the target carbon center raised to the p power, for all heavy atoms from 1 through n bonds away WACn p the sum of the partial Abraham/Smith atomic charges, each divided by the number of bonds from the target carbon center raised to the p power, for all heavy atoms from 1 through n bonds away ""Heavy atoms" denote all non-hydrogen atoms. For example, AASC 3 refers to the number of non-hydrogen atoms located in the spherical region 3.9-4.6 A from the carbon center being described. * n ranges from 1 to 5 bonds. For example, CB08 2 is the number of carbons of bonding configuration type 8 (see Figure l),located 2 bonds from the target carbon. c p can be 1, 2, or 3. For example, C l l D 3 is the inverse cubed throughspace distance from the target carbon to the closest carbon of bonding configuration type 11 (see Figure 1). d n ranges from 1 to 5 bonds. ranges from 2 to 5 bonds. f p is the weighting factor (power), which ranges from 1 to 3. 8"Heaw atoms" refer to all non-hvdroeen atoms. encoded. Thus, in order to more fully characterize the effects upon chemical shifts arising from the presence of aromaticity,
new structural descriptori are required. In this study four new types of descriptors were investigated and are presented in Table 111. Aromatic Shell Count Descriptors. One type of structural parameter developed previously (17) was a count of the number of hydrogen or non-hydrogen atoms located between two boundaries (shells) at specified radial distances from the carbon being described. The shell boundaries were determined empirically by examining the natural clustering of bonding arrangements observed in histograms of interatomic distances. Interatomic distances in aromatic molecules are somewhat different from those observed in saturated ring systems, thus changing the bond geometry grouping patterns, requiring modification of the shell count boundaries. The upper portion of Table I11 shows the 10 new shell count descriptors developed for aromatic molecules. Correlation studies, using several sets of compounds having aromatic and saturated ring systems, indicated that these new aromatic shell count descriptors were encoding information significantly different from that of their nonaromatic predecessors. Carbon Bonding Configuration Descriptors. Valency count descriptors have been found useful in earlier spectral simulation studies. These are counts of the number of atoms of a given connectivity (lo, 2", 3 O , and 4 O ) located a t various topological bond distances from the target carbon. An atom's valency was defined by simply specifying the number of connections it possessed, regardless of bond type. Thus, for example, all of the following bonding configurations would be classified as secondary carbon atoms:
In order to explicitly encode information regarding unsaturations and the presence of aromaticity, it is necessary to specify both the valencies and the bond types of these connections. Thus, the precise bonding configuration of each carbon atom can be described in a more rigorous fashion. Figure 1shows the 13 different carbon bonding Configurations defined for use in developing two new types of structural descriptors. The dashed lines in the figure represent aromatic bonds. The first type of descriptor constructed by using this scheme is simply a count of the number of carbon atoms of each given bonding configuration located a t different topological bond distances from the carbon atom being described. A second class of parameters generated by using this approach involves computations of inverse throughspace distances from the target carbon to the closest carbon of a given bonding configuration. The second section of Table I11 describes these two new groups of descriptors. Partial Atomic Charge Descriptors. Earlier spectral simulation work has included the use of several descriptors based upon the partial Q charge of the carbon atom of interest and its surrounding atoms. T o more fully characterize the electronic environments of carbon atoms in aromatic systems, new electronic descriptors are required. Recent work involving a set of polychlorinated biphenyl (PCBs) compounds (24) has yielded a set of descriptors, analogous to the u charge parameters, but which are based upon partial atomic charges calculated by using extended Huckel methods. To improve upon this, a new scheme for calculating partial atomic charges, based upon the method described by Abraham and ceworkers (49,501, has been implemented as part of the ADAPT software system. The partial atomic charge on each atom is computed as a sum of u and R charge contributions, parameterized to yield accurate electric dipole moments for a wide variety of test compounds. The u charge portion is based upon charge transfer due to differences in atomic orbital electronegativities
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990 H3C1
4
HC E 3
2
6
5
9
=
H,C
10
7
11
8
12
I -C-
I 13
Flgure 1. The thirteen different bonding configurations for carbon that were used as a basis for the development of new structurai descriptors.
and charge-dependent atomic polarizabilities. The r contribution is derived from the application of simple Huckel molecular orbital calculations to each conjugated r system in the molecule. This has yielded a new set of descriptors, analogous to previously existing electronic parameters, presented in the third section of Table 111. These new descriptors were shown to be significantly different from the corresponding parameters based upon u or extended Huckel partial atomic charges. Continuous Electronic Environment Descriptors. All of the electronic descriptors developed up to this point have been functions of partial atomic charges computed a t specified bond distances from the carbon center being described. Thus, these parameters encode electronic structural features in a stepwise, rather than a continuous, fashion. A new set of descriptors was developed which describe the electronic environment about each carbon atom in terms of a weighted summation of partial atomic charges on surrounding atoms. These descriptors are presented in the final segment of Table 111. These parameters can encode information from two through five bonds away from the target carbon, and the topological weighting schemes allow the user to select the degree of influence that partial atomic charges on distant atoms have upon calculated descriptor values. The partial atomic charges used in these descriptors can be computed by using any one of the three available methods described above. Carbon Atom Subdivision. Previous spectral simulation studies have indicated that improved results are obtained when carbon atoms in similar structural environments are grouped together and each atom subset is modeled separately. By use of automated methods previously described @ I ) , the pool of 267 unique carbons present in the training set was divided into four atom subsets as follows: (1) 231 aromatic ring carbons, (2) 36 methyl carbons, (3) 178 non-ring-bridging carbon atoms, and (4) 53 ring-bridging carbons. Atom groups 3 and 4 represented a subdivision of group 1 and would allow for the investigation of alternate schemes to be used in the spectral assembly phase of the study. This would allow for the investigation of whether or not it would be necessary and useful to divide aromatic ring carbons into ring-bridging and non-ring-bridging groups in order to obtain accurate simulated spectra. Model Construction and Evaluation. Topological, geometrical, and electronic descriptors were calculated for all 267 atoms in the training set and included the newly developed parameters described above. The dependent variable for regression was the chemical shift value corresponding to each
1887
carbon atom. Prior to regression analysis, the descriptors were screened by using several statistical criteria described in previous spectrum simulation studies ( I 7,18).Separate regression models were developed for each of the four atom subsets. Stepwise multiple linear regression was performed (52,53),coupled with an automated method of progressive descriptor deletion (In,designed to help improve the accuracy of the model equations. Numerous models were generated for each of the four atom subsets and were evaluated by using statistical criteria (52-56) focusing on outlier detection and collinearity diagnostics (54) and internal validation (55,56). Further evaluation of the regression models was performed by examining the accuracy of complete simulated spectra using various combinations of models from the different atom subsets. Thus, the performance of the regression equations, in the context of other models, was used as a means of testing the models. The best models for each of the four atom subsets are presented in Table IV. p is the number of parameters (number of descriptors plus the intercept) in each equation. The descriptor labels are defined and the mean, standard deviation, regression coefficient, and the mean effect (average measure of shielding or deshielding contribution) upon the predicted chemical shifts are shown for each descriptor in the model. All four of the models contain topological, geometrical, and electronic parameters, highlighting the importance of using all three fundamental classes of structural descriptors. Several of the newly developed descriptors are included in the four models, demonstrating the utility of these parameters for characterizing chemical shifts in aromatic compounds. Five of the ten descriptors appearing in model 1 encode structural features involving hydrogen atoms and only two of the descriptors are explicitly characterizing electronic effects. The non-ring-bridging carbon equation (model 3) shares these properties, as only one electronic measure is present and five of the nine parameters, including CB08 1,are related to hydrogen. Model 4, however, contains three electronic descriptors, while hydrogen atom influence among the descriptors is not present. Thus, the modeling of chemical shifts for ring-bridging carbons requires the encoding of structural features quite different from those of their nonbridging counterparts. It is also noteworthy that all four models contain descriptors based upon molecular connectivity. A summary of several important statistics for each of the four atom subgroups and their corresponding regression models appears in Table V. The low, high, mean, and standard deviation of the chemical shift values are shown for each group of n carbons. The standard errors of models 1, 3, and 4 are on the order of 1 ppm, while that of the methyl carbons is only 0.49 ppm. The multiple correlation coefficients for models 1-3 indicate a high degree of fit between predicted and observed chemical shifts, and the F values of all the models are well above their respective statistical cutoffs. The results for the ring-bridging carbons (group 4) indicate difficulty in modeling the chemical shifts for this atom subset. The narrow range of chemical shift values (10.50 ppm) for these 53 atoms requires that descriptors used to model the shifts be extremely sensitive to subtle structural differences within a molecule. Further work will be required to develop new structural Parameters to more fully characterize the local chemical environment of atoms of this type. Generation of Complete Simulated Spectra. To further investigate the utility of calculated regression equations, the predicted chemical shifts from different atom subsets were combined to yield complete simulated 13C NMR spectra. Two schemes were used to construct simulated spectra for the 30 reference set compounds and to evaluate the need for separate treatment of ring-bridging and nonbridging aromatic ring carbons. Models 1 and 2 were used in combination as model
1888
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
Table LV. Chemical Shift Models for Aromatic Compounds P
des@
1 2
CHCG 1 HRDB 2 NNHY 1 HXI3 3 HXT3 2 ICNC 1 ICON 3 HHT3 2 AASC 1 MPAC 1 intercept
Model 1: All Aromatic Ring Carbons (subset 1) 0.0558 55.1 f 7.0 -0.0193 0.0900 0.0779 83.1 f 3.2 13.7 f 0.8 1.66 1.23 0.0124 0.0289 36.4 f 4.8 0.0854 0.0855 9.83 f 2.97 0.717 0.150 -78.8 f 5.2 1.89 0.667 4.55 f 0.37 0.0843 0.0781 30.6 f 3.8 4.75 1.46 1.47 f 0.24 -0.0356 0.0205 41.4 f 7.0 138
-1.06 f 0.14 7.48 f 0.29 22.7 f 1.3 0.451 f 0.060 0.839 f 0.254 -56.5 f 3.7 8.60 k 0.70 2.58 f 0.32 6.98 f 1.14 -1.47 & 0.25 138
HXI3 2 ARD3 4 MPCG 2 NTAC 4 TCNC 4 intercept
Model 2: Methyl Carbons (subset 2) 0.0605 0.0230 -169 f 7 0.0397 0.0283 127 f 5 -0.00115 0.01531 79.4 f 9.2 -0.178 0.096 22.7 f 2.6 3.19 0.46 2.89 f 0.54 20.4
-10.2 f 0.4 5.04 f 0.20 -0.0913 f 0.0106 -4.04 f 0.46 9.22 f 1.72 20.4
TCNC 2 CB08 1 "13 1 CllD 3 MPCG 1 "13 3 ACON 3 AHSC 5 AHSC 4 intercept
Model 3: Nonbridging Ring Carbons (subset 3) 1.61 0.38 8.33 f 0.51 1.12 0.69 4.96 f 0.29 0.0573 0.0503 -45.5 f 4.0 0.126 0.152 -10.8 f 1.0 -0.00513 0.01700 71.6 f 11.3 0.0453 0.0979 -13.0 f 1.4 0.416 0.041 14.1 f 2.9 1.71 1.38 0.533 f 0.096 1.34 1.16 0.508 f 0.131 107
13.4 f 0.8 5.56 f 0.32 -2.61 f 0.23 -1.36 f 0.13 -0.367 f 0.058 -0.589 f 0.063 5.87 f 1.21 0.911 f 0.164 0.681 f 0.176 107
WAC4 2 MPAC 1 ACON 4 CXVD 1 WHK4 1 intercept
Model 4: Bridging Ring Carbons (subset 4) -0.229 0.018 -74.1 f 13.6 -0.0 190 0.0046 151 f 43 0.404 0.169 -3.20 f 0.82 1.43 0.20 -1.76 f 0.71 -0.180 0.051 -7.21 f 3.32 120
17.0 f 3.1 -2.87 f 0.82 -1.29 f 0.33 -2.52 f 1.02 1.30 f 0.60 120
3 4 5
6 n
8 9 10 11 1 2
3 4
5 6 1
2
3 4
5 6 7
8 9 10
1 2
3 4 5 6
mean
SDb
coefficient
mean effect, ppm
a Descriptor definition ("heavy atoms" denote all non-hydrogen atoms). Topological: ACON 3, the molecular connectivity index computed over bonds three bonds from the carbon center divided by the number of bonds three bonds away; ACON 4, the molecular connectivity index computed over bonds four bonds from the carbon center divided by the number of bonds four bonds away; CBO8 1, the number of carbons of bonding configuration type 8 located one bond from the carbon center; ICNC 1, the valence-corrected molecular connectivity index computed over bonds one bond from the carbon center; ICON 3, the molecular connectivity index computed over bonds three bonds from the carbon center; NNHY 1, the total number of hydrogen atoms attached to heavy atoms one bond from the carbon center; TCNC 2, the sum of the valence-corrected molecular connectivity index terms from one through two bonds from the carbon center; TCNC 4, the sum of the valence-corrected molecular connectivity index terms from one through four bonds from the carbon center. Electronic: CHCG 1, the Huckel charge on the carbon center; MPAC 1, the most positive Abraham/Smith charge among heavy atoms one bond from the carbon center; MPCG 1, the most positive u charge among heavy atoms one bond from the carbon center; MPCG 2, the most positive u charge among heavy atoms two bonds from the carbon center; NTAC 4, the sum of the Abraham/Smith charges for heavy atoms four bonds from the carbon center; WAC4 2, the sum of the Abraham/Smith charges, each divided by the number of bonds from the carbon center squared, for all heavy atoms from one through four bonds away; WHK4 1, the sum of the Huckel charges, each divided by the number of bonds from the carbon center, for all heavy atoms from one through four bonds away. Geometrical: AASC 1,the number of heavy atoms contained in a spherical shell 2.3-3.1 8, from the carbon center; AHSC 4, the number of hydrogen atoms contained in a spherical shell 3.0-3.5 8, from the carbon center; AHSC 5, the number of hydrogen atoms contained in a spherical shell 3.5-4.3 8, from the carbon center; ARD3 4, the sum of the inverse cubed throughspace distances from the carbon center to heavy atoms four bonds away; CllD 3, the inverse cubed throughspace distance from the carbon center to the closest carbon atom of bonding configuration type 11; CXVD 1, the van der Waals energy due to interactions between the carbon center and other heavy atoms; "13 1, the average of the sum of the inverse cubed throughspace distances from the hydrogens attached to the carbon center to hydrogens three bonds away; "13 3, the average of the sum of the inverse cubed throughspace distances from the hydrogens attached to the carbon center to hydrogens five bonds away; HHT3 2, the average of the sum of the inverse cubed throughspace distances from the hydrogens attached to the carbon center to all hydrogens from three to four bonds away; HRDB 2, the sum of the inverse cubed throughspace distances from the carbon center to hydrogens attached to heavy atoms two bonds away; HX13 2, the average of the sum of the inverse cubed throughspace distances from hydrogens attached to the carbon center to heavy atoms four bonds away; HX13 3, the average of the sum of of the inverse cubed throughspace distances from hydrogens attached to the carbon center to heavy atoms five bonds away; HXT3 2, the average of the sum of the inverse cubed throughspace distances from hydrogens attached to the carbon center to all heavy atoms from three to four bonds away. *SD = standard deviation.
set A, while model set B was comprised of models 2, 3, and 4. The residual mean square (rms) error between each simdated spectrum and its corresponding observed spectrum was computed for all 30 compounds in the training set. The mean rms error for model set A was 1.16 ppm, while model set B yielded an average rms spectral error of 1.13 ppm.
A computer-assisted library searching procedure was employed to further evaluate spectral simulation accuracy. The spectral library consisted of 261 spectra of a wide variety of six-membered ring aromatic compounds. It included the 74 compounds used in this study, along with related substituted benzenes and PAHs containing various combinations of alkyl,
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
1889
Table V. Summary of Chemical Shift Data and Model Statistics set
low
1 2 3
118.30 13.70 118.30 124.90
4
observed chemical shifts high mean SD" 150.60 27.20 150.60 135.40
129.10 20.29 128.40 131.50
4.57 3.11 4.92 1.66
model statistics R(adj)'
n
mod
db
R
231 36 178 53
1 2 3 4
10
0.965 0.989 0.968 0.845
5 9 5
0.963 0.988 0.966 0.830
Fd
€e
294 271 275 23
1.23 0.49 1.27 0.93
SD = standard deviation. b d = number of descriptors in the model. R(adj) = multiple correlation coefficient, adjusted for degrees of freedom. F = F value for statistical significance of the model. e~ = standard error of the estimate, ppm.
hydroxy, and halogen substituents. Also included were the spectra of 49 PCBs. The search was performed such that the top five spectral matches were retrieved, and it utilized the Euclidean distance metric to assess the degree of similarity between simulated and observed spectra. In addition, the procedure allowed for the comparison of a simulated spectrum containing n peaks with all observed spectra having n, n + 1, n 2, or n 3 peaks. This provides for the evaluation of spectral prediction accuracy in cases where the simulated spectrum is not complete, due to (actual) 13CNMR resonances not included in development of the parametric model equations. These peak difference corrections were employed for compounds 7-9, 17, 18, and 24, since the carbon atoms found in the ethyl, isopropyl, and tert-butyl side chains had been removed from consideration prior to model generation. Twenty-eight of the 30 reference set compounds were retrieved as the best spectral match using model set A. Comwas selected as the fifth pound 9 (1,3-di-tert-butylbenzene) best match to its simulated spectrum, while dibenz[a,h]anthracene (30) was chosen as the second closest match to its corresponding predicted spectrum. Model set B generated simulated spectra which allowed 27 of 30 compounds to be selected as the top match. Compounds 7, 9, and 30 were chosen as the third, second, and second best match to their corresponding predicted spectra, respectively. Thus, the simulated spectrum for a compound was almost always more similar to the authentic spectrum for that molecule than to the spectra of structurally similar compounds. As was the case in an earlier study (22), the library search results for many compounds degraded if the number of spectral peaks excluded was more than that required to compensate for atoms not included in the modeling process. Partial Simulated Spectra. Earlier simulation work has addressed the importance and use of partial simulated spectra (20, 22). It was of interest to investigate further the use of simulated subspectra by applying it to the aromatic compounds in this study. The chemical shifts of aromatic ring carbons generally are in the 110-160 ppm range (relative to Me,Si), while methyl carbon resonances in aromatic molecules fall in a quite different region of the spectrum, usually in the 10-35 ppm range. To more fully determine the accuracy of the models developed for the aromatic ring carbons, without the influence of methyl carbon information, simulated subspectra were created, such that the methyl carbon resonances were excluded. Library searching, using an abbreviated model set A (model 1 only), correctly retrieved 20 of the 30 reference compounds as the top spectral match, using the appropriate peak difference corrections, as needed. Twenty-nine of 30 were among the top five spectral matches. The modified model set B (models 3 and 4 only) selected 19 of 30 compounds as the closest spectral match, and 28 of 30 among the best five spectral matches. Thus, it is evident that while the aromatic ring carbon models yield acceptable predicted subspectra, the methyl carbon model equations play an important role in generating accurate simulated spectra for this class of molecules. Spectral Simulation for Prediction Set Compounds.
+
+
Table VI. Residual Mean Square Spectral Errors for Prediction Compounds Having Unique Substituents
compound 38 39 40 41 58 59
rms spectral error, ppm model set A model set B 2.25 1.57 2.53
3.00 1.65 1.49
1.56 1.95 1.61 1.41 1.78 1.76
Simulated 13C NMR spectra were generated for the 44 prediction compounds. These molecules had not been used in the formation of the regression models and therefore served to test the external predictive ability of the model equations. The mean rms spectral error for the 44 prediction compounds, using model set A, was 1.63 ppm. Compound 70 (4,5-dimethylphenanthrene) had an extremely large rms error of 10.97 ppm, indicating great difficulty in predicting the spectrum for this compound. By use of model set B, an average rms error of 1.36 ppm was obtained, and the rms error for compound 70 was only 2.62 ppm. Library searching was performed for the prediction compounds, using the same spectral library described above, and employing peak difference corrections for compounds 38-45, 57-59, and 68. Compounds 39 and 40 required corrections for n 4 peaks and would therefore not be expected to yield satisfactory spectral matches. They did not. Model set A retrieved 30 of 44 compounds as the best spectral match, selecting 38 of 44 among the top five matches. Model set B chose 41 of 44 compounds within the top five matches, retrieving 28 of 44 compounds as the top selection. Thus, the calculated chemical shift models yielded simulated spectra of sufficient accuracy to allow for the correct identification of the majority of external prediction compounds. This supports the use of this spectrum simulation approach as an aid for solving structure elucidation problems. The chemical shifts for the methyl carbon atoms in compound 56, not reported in the literature, were predicted by using model 2. Chemical shift values of 22.96 and 15.63 ppm were predicted for the methyl carbons in the 1and 2 positions, respectively. The rms spectral errors for compounds 71 and 74, which have ring systems, not represented in the reference set, were 1.00 and 1.19 ppm, respectively, using model set A, and were 0.59 and 1.33 ppm, respectively, employing model set B. Table VI presents the rms spectral errors, using both model sets, for prediction of 38-41,58, and 59, which contain alkyl substituents not present in any of the reference structures. These results suggest that both model sets are useful in generating reasonably accurate simulated spectra for compounds containing structural features not included in the modeling process. It appears that the models are more sensitive to the introduction of new alkyl substituents than to changes in the aromatic ring structure of the molecules. Simulated subspectra were generated for the external prediction compounds, leaving out the methyl carbon reso-
+
1890
ANALYTICAL CHEMISTRY, VOL. 62, NO. 17, SEPTEMBER 1, 1990
Table VII. Residual Mean Square Spectral Errors for Individual Compound Classes
Training Set model
benzenes
set
(9)"
A
B
1.24* (1.30)' 1.23 (1.35)
naphthalenes anthracenes
H- or
(9)
(6)
-CH3(24)
1.04 (1.12) 0.90 (0.97)
1.28 (1.35) 1.01 (1.07)
1.00 (1.08) 0.98 (1.07)
Prediction Set model set A
B
benzenes (15) 1.53 (1.52) 1.61 (1.65)
naphthalenes anthracenes
H- or
(14)
(9)
-CH3 (32)
1.38 (1.43) 1.26 (1.25)
1.36 (1.44) 1.11 (1.16)
1.52 (1.59) 1.18 (1.20)
"The number of compounds in each category. *The residual mean square spectral error (ppm) using complete simulated spectra. 'The residual mean square spectral error using partial simulated mectra. where the methvl carbon resonances are excluded. nances. Library searches were performed yielding 32 of 44 and 37 of 44 compounds chosen among the five best matches, and only 17 and 18 structures were selected m the top spectral matches, using abridged model sets A and B, respectively. Clearly, the methyl carbon models appear to be very important in generating accurate simulated spectra for external prediction, or unknown, compounds. Comparison of Both Atom Subsetting Schemes. Both model sets A and B performed about equally well in terms of rms spectral prediction error and library search results for the reference set compounds. Model set A slightly outperformed set B in library searches but had a larger average rms spectral error. Separating the ring carbon atoms into ringbridging and non-ring-bridging groups (model set B) yielded more accurate simulated spectra (and subspectra) for the prediction set molecules than did treating all the ring carbons together (model set A). Using model set A requires the use of only 2 equations and the calculation of 15 descriptors, while model set B uses 3 models and 19 total parameters. Thus, it appears that treating ring-bridging and nonbridging aromatic ring carbons separately during model formation improves the external predictive ability of the equations, even though either approach works equally well for fitting observed chemical shift data.
Spectral Prediction Accuracy for Specific Compound Groups. Further characterization of the chemical shift models requires an examination of the variation of spectral prediction accuracy for groups of compounds containing specified ring skeletal structures. Also, it is important to evaluate the effect of the presence of non-methyl alkyl substituents upon spectrum simulation accuracy. Table VI1 presents a breakdown of the average rms spectral error for the benzenes, naphthalenes, and anthracenes, along with that for the methyl and nonsubstituted aromatic compounds. Results for both model sets, using full and partial (methyl carbon resonances withheld) simulated spectra, are presented for both the reference and prediction compounds. The rms errors for the phenanthrenes and molecules with larger PAH ring systems are not shown, since the number of compounds in these categories is too few to provide statistically meaningful results. Model set A appears to work slightly better than set B for alkyl-substituted benzenes, especially for external prediction. This is interesting, since there are no ring-bridging carbons in benzenes, yet model 1,which includes ring-bridgmg carbons, performs slightly better than models 3 and 4 for predicting the shifts of these carbons. Model set B yields lower mean rms spectral prediction errors for both reference and prediction naphthalenes and anthracenes. The reduction in rms error
when model set B is used in place of set A is about the same for both the training and prediction sets for both groups of molecules. When both model sets are applied to only methylor unsubstituted compounds, model set B yields more accurate predicted shifts, with the difference being more pronounced for the external prediction compounds. Further examination of Table VI1 reveals that the naphthalenes have the lowest mean rms spectral prediction error for the training set, while the anthracenes are the most accurately predicted external prediction compounds. Also, comparing the mean rms error for each subgroup of compounds to that of all 30 reference or 44 prediction molecules shows that, in general, spectral prediction accuracy falls off for benzenes but improves for naphthalenes and anthracenes. This suggests that future simulation studies involving diverse sets of aromatic molecules consider treating molecules with benzene ring systems separately from their fused-ring counterparts. A similar comparison for methyl- and unsubstituted aromatics shows that the effects upon chemical shifts due to the presence of non-methyl alkyl substituents may not have been encoded as completely as those for methyl substitution. In addition, model set B appears to be more greatly influenced by the presence of non-methyl alkyl substituents than equation set A. This indicates that increased caution should be exercised when selecting and applying these regression models to compounds containing large alkyl substituents.
CONCLUSIONS
This work has shown the successful development of accurate predictive model equations and simulated 13C NMR spectra for a variety of alkyl-substituted benzenes and polycyclic aromatic compounds. Several new types of structural descriptors were developed and were shown to be useful in characterizing structural attributes relating to the conjugated a systems present in these molecules. Alternate approaches for the treatment of aromatic ring carbons in the model generation process were tested and compared both in terms of fit to existing chemical shift data and for utility in external prediction. It appears that treating bridging and nonbridging ring carbons separately yields more accurate simulation results for prediction purposes. The use of partial simulated spectra was further investigated and was used to assist in the selection of appropriate model sets, as well as to help characterize the accuracy of individual ring carbon equations. The spectral prediction accuracy for various subsets of the compounds in the reference and prediction sets was examined and used to help evaluate the models and provided insight regarding the future development and application of predictive chemical shift models for these types of compounds. The next phase of this work involves the treatment of increasingly more diverse collections of aromatic molecules, containing various combinations of alkyl, hydroxy, and halogen substituents, along with different ring systems. Further work in this area will also include the development and improvement of descriptors to more completely describe carbon atom structural environments in aromatic molecules. New descriptors have already been created to characterize carbons in molecules simultaneously containing a variety of heteroatoms and substituent types, and these will require evaluation. Of particular interest is the continued development of new electronic structural parameters, and work is currently in progress aimed at increasing the number and types of electronic features included in the calculation of such descriptors. ACKNOWLEDGMENT Debra S. Egolf, of Marietta College, is gratefully acknowledged for developing the computer programs used to calculate the electronic descriptors based upon extended
Anal. Chem. 1990, 62, 1891-1893
Huckel calculations and for her comments regarding this project. Steven L. Dixon is also acknowledged for implementing the software used to generate the partial atomic charges used to develop several of the new descriptors presented in this study.
LITERATURE CITED Ewing, D. F. Org. Magn. Reson. 1979, 72, 499-524. Johnels, D.; Ediund, U.; Johansson, E.; Wold, S. J. Magn. Reson. 1983, 55, 316-321. Rudolf, M.; Jordls, U. Chemom. Intell. Lab. Syst. 1989, 5, 323-327. Newmark, R. A. Comput. Chem. 1988, 70, 223-228. Nelson, 0.L.; Willlams, E. A. Prog. Phys. Org. Chem. 1978, 72,
229-342. Bloor, J. E.; Breen, D. L. J. Phys. Chem. 1988, 72. 716-722. Sardella, D. J. J. Am. Chem. Soc.1978, 98. 2100-2104. Martin, G. J.; Martln, M. L.; Odiot, S. Org. Magn. Reson. 1975, 7 , 2-17. Kaichhauser, H.; Roblen, W. J. Chem. Inf. Comput. Sci. 1985, 25,
103-108. Milne, G. W. A.; Zupan, J.; Heiler, S. R.; Miller, J. A. Org. Magn. Reson. 1979, 72,289-296. Grant, D. M.; Paul, E. G. J. Am. Chem. Soc. 1984, 86, 2984-2989. Lindeman, L. P.; Adams, J. Q. Anal. Chem. 1971, 43, 1245-1252. Ejchart, A. Org. Magn. Reson. 1980, 73, 368-371. Ejchart. A. Org. Magn. Reson. 1981, 75,22-24. Bernassau, J. M.; Fetizon, M.; Mala, E. A. J. Phys. Chem. 1988, 9 0 ,
6129-6134. Bastard, J.; Bernassau, J. M.; Bertranne, M.; Mala, E. R. Magn. Reson. Chem. 1988, 26, 992-1002. Small, 0.W.; Jurs, P. C. Anal. Chem. 1983, 55, 1121-1127. Small, G.W.; Jurs, P. C. Anal. Chem. 1983, 55, 1128-1134. Small, 0.W.; Jurs, P. C. Anal. Chem. 1984, 56, 2307-2314. Egoif, D. S.;Jurs, P. C. Anal. Chem. 1987, 59, 1586-1593. Egolf, D. S.;Brockett, E. B.; Jurs, P. C. Anal. Chem. 1988, 6 0 ,
2700-2706. Sutton, G. P.; Jurs, P. C. Anal. Chem. 1989, 67, 863-871. Ranc, M. L.;Jurs, P. C. Anal. Chem. 1989, 67,2489-2496. Egolf, D. S.Computer-Aided Carbon-13 Nuclear Magnetic Resonance Spectrum Simulation Investigations. Ph.D. Dissertation, The Pennsyivania State University, University Park, PA, 1988. McIntyre, M. K.; Small, G. W. Anal. Chem. 1987, 59, 1805-1811. Small, 0.W.; McIntyre, M. K. Anal. Chem. 1989, 67,666-674. Barber, A. S.;Small, G. W. Anal. Chem. 1989, 67,2858-2664. Breitmaier, E.; Voeiter, W. Carbon-73 NMR Spectroscopy, 3rd ed.; VCH: York. 1987 . - - . .- New . . __._, Wilson, N. K.; Stothers, J. 8. J. Magn. Reson. 1974, 75,31-39. Dailing, D. K.; Ladner, K. H.; Grant, D. M.; Woolfenden, W. R. J. Am. Chem. SOC.1977. 99. 7142-7150. Ernst, L.; Mannschreck; A. Chem. Ber. 1977, 770,3258-3265. Kitchlng, W.; BuilpM, M.; Gartshore, D.; Adcock, W.; Khor, T. C.; Doddrell, D.; Rae, 1. D. J. Org. Chem. 1977, 42, 2411-2418.
1891
(33) Caspar, M. L.; Stothers, J. B.; Wilson, N. K. Can. J. Chem. 1975, 53, 1958-1969. (34) Gobert, F.; Combrrisson, S.; Platzer, N.; Ricard, M. Org. Magn. Reson. 1978, 8,293-298. (35) Bullpitt, M.; Kitching, W.; Adcock, W.; Doddreii, D. J. Organomet. Chem. 1978, 776,161-185. (36) Stothers, J. 8.; Tan, C. T.; Wilson, N. K. Org. Magn. Reson. 1977, 9 , 408-413. (37) Hansen, P. E.; Poulsen, 0. K.; Berg, A. Org. Magn. Reson. 1975, 7 , 475-477. .. - . . . .
(38)Ozubko, R. S.; Buchanan, G. W.; Smith, I . C. P. Can. J. Chem. 1974, 52. 2493-2501. (39)Jones, D. Shaw, J. D. Magn. Reson. Chem. 1985, 23, 787-789. (40)Johnson, L. F.; Jankowski, W. C. Carbon-73 NMR Spectra; J. Wiley and Sons: New York, 1972. (41)Hansen, P. E. Org. Magn. Reson. 1979, 72, 109-142. (42)BullpM, M.; Kitching, W.; Adcock, W.; Doddrell, D. J. Organomet. Chem. 1978, 716, 187-198. (43) Berger, S.;Zeller, K. P. Org. Megn. Reson. 1978, 7 7 , 303-307. (44) Storek, W.; Sauer, J.; Stoder, R. 2. Neturforch. 1979, 34A, 1334-1343. (45) Brugger, W. E.; Jurs, P. C. Anal. Chem. 1975, 47, 781-783. (46) Stuper, A. J.; Jurs, P. C. J. Chem. Inf. Comput. Sci. 1978, 76. 99-105. (47) Rohrbaugh. R. H.; Jurs, P. C. MJRAW;Quantum Chemistry Exchange, Program 300, 1988. (48) Stuper, A. J.; Brugger, W. E.; Jurs, P. C. Computer Assisted Studies of
w.;
Chemical Struchve and Ebbgical Function;Wiley-Interscience: New York, 1979;pp 83-90. (49) Abraham, R . J.; Griffiths, L.; Loftus, P. J. Comput. Chem. 1982, 3 ,
407-416. (50)Abraham, R. J.; Smith, P. E. J. Comput. Chem. 1988, 9 , 288-297. (51) Small, G. W.; Jurs, P. C. Anal. Chem. 1984, 56, 1314-1323. (52) Draper, N. R.; Smith, H. Applied Regresslon Analysis, 2nd ed.;WileyInterscience: New York, 1981. (53) Neter, J.; Wasserman, W.; Kutner, M. H. Applied Linear Statistical Models, 2nd ed.; Richard D. Irwin: Homewood, IL, 1985. (54) Belsley, D. A.; Kuh, E.; Welsch, R. E. Regression Diagnosflcs: Identi-
wing Influentla1Data and Sources of Collinear/ty;Wiley-Interscience: New York, 1980. (55)Allen, D. M. Technical Report No. 23, 1971;Department of Statistics, University of Kentucky, Lexington, KY. (56)Snee, R. D. Technometrics 1977, 79, 415-427.
RECEIVED for review March 8,1990. Accepted May 17, 1990. This work was supported by the National Science Foundation under Grant CHE-8815785. The Sun 4/110 Workstation was purchased with partial financial support of the National Science Foundation. Portions of this paper were presented a t the 41st Annual Pittsburgh Conference and Exposition on Analytical Chemistry and Applied Spectroscopy, New York, NY, March 1990.
CORRESPONDENCE Chromatographic Detection of Interaction between Polyoxyethylene and Nitrous Acid Sir: Multidentate cyclic and acyclic complexing agents have been well-investigated from various points of view (1-4). Most of the researchers have focused on the selectivity toward cations (1-3). It is well-known that the selectivity of these complexing agents originates in a match of the cavity size with the crystalline size of a cation, the nature of donor atoms, facility in building up the coordination shell a t the optimum distance required by a cation, etc. Developing new compounds showing unique selectivity according to these bases is still a topic of fundamental importance. On the other hand, despite the requirements, only a few compounds have been known to be selective toward anionic compounds ( 4 , 5 ) . Continued efforts are made to seek anion-selective agents. Polyoxyethylenes (POE), which form the complexes with some metal cations similar to crown ethers, have been regarded 0003-2700/90/0362-1891$02.50/0
as acyclic cation-selective complexing agents (6). The reaction of POEs with anionic compounds has not been reported except for the case where a POE-metal complex forms an ion pair with a counteranion (6). In this contribution, I would like to show that POE interacts with nitrous acid. This interaction results in unique selectivity in chromatography and solvent extraction. EXPERIMENTAL SECTION The chromatographic system was composed of a computercontrolled pump, CCPM or CCPD (Tosoh Co.), a Rheodyne sample injection valve equipped with a 100-pL sample loop, a column oven (CO-SOOO, Tosoh), a conductometric detector (CM8000, Tosoh), a UV-visible detector (UV-8000, Tosoh), and a refractive index detector (830-RI,Jasco). A separation column was an Inertsil ODS 2-T (particle size, 5 pm; 4.6 mm i d . X 150 0 1990 American Chemical Society