Simulation of carbon-13 nuclear magnetic resonance spectra of

Structural analysis of polychlorinated biphenyls from carbon-13 nuclear magnetic resonance spectra. Debra S. Egolf and Peter C. Jurs. Analytical Chemi...
1 downloads 0 Views 941KB Size
2700

Anal. Chem. 1988, 6 0 , 2700-2706

Simulation of Carbon- 13 Nuclear Magnetic Resonance Spectra of Methyl-Substituted Norbornan-2-01s Debra S. Egolf, Elizabeth B. Brockett, and Peter C. J u r s * 152 Davey Laboratory, Chemistry Department, Penn State University, University Park, Pennsylvania 16802

’‘C NMR spectra were slmulated for a set of 42 methyl-substituted norbornan9ol compounds using llnear equations that relate chemical shift values to descriptors encoding atombased structural features. These equations were generated through a multlple llnear regression procedure. The bridged backbone structure of this class of compounds Is rlgld but highly strained, whlch in turn affects the electromagnetlc envlronments of these molecules. We developed new torsional descriptors In an attempt to partially encode the effect of straln on chemkal shM resonance. These descriptors, as well as several d181mce, u charge, van der Waals, and simple topologkal descriptors, were particularly useful In our chemical shM model equations. The simulated spectra were determined to be excellent representations of the measured spectra as judged through direct spectral comparisons as well as library search results.

Carbon-13 nuclear magnetic resonance spectroscopy (I3C NMR) is a useful analytical tool for the structure elucidation of organic compounds due to the direct relationship between a carbon atom’s local structural environment and its chemical shift value. However, because this relationship is not completely understood, 13C NMR spectral interpretation can be complex. Spectrum identification is often assisted through computer-aided searching of spectral libraries; however, appropriate reference spectra must be available for comparison. Other computer-based methods (1) useful in spectral interpretation include artificial intelligence, pattern recognition, and spectral simulation. Spectral simulation techniques enable the construction of approximate spectra for structures whose actual spectra are unavailable. The simulation approach can be implemented by using the following procedure: the chemist proposes candidate structures as potential genuine representations of the “unknown” compound, simulates their spectra, and compares these spectra with the actual measured spectrum. The method of spectral simulation under investigation in this research involves the development of linear parametric equations that relate calculated numerical atom-based structural descriptors to measured chemical shift values. These model equations have the simple, linear form where S is the predicted chemical shift of a given carbon atom, the Xi are the descriptor values, the bi are the coefficients determined through a multiple linear regression analysis of a set of observed chemical shift values, and d indicates the number of descriptors in the model. This parametric approach to spectral simulation was developed and utilized by Grant and Paul (2) and Lindeman and Adams (3) in their studies of linear and branched alkanes. These initial studies involved the calculation of simple topological parameters to be used as descriptions of the local structural environments of carbon atoms. Beierbeck and Saunden (4), followed by Whitesell and co-workers (5,6), used conformer probabilities based upon steric interactions and

rotational potentials to incorporate geometrical information into the estimationof chemical shift data for acyclic molecules. To implement this approach more extensively and effectively, an interactive computer system was developed (7,8) that handles the calculation and manipulation of large numbers of structural descriptors. Topological,geometrical, and electronic representations of local atomic environments can be encoded as descriptors. We have demonstrated the successful application of this computational methodology to several structural classes of compounds: cyclohexanols and decalols (9),steroids (IO), cyclopentanes and cyclopentanols (111, cyclohexanones and decalones (12), and norbornanols (the data set to be discussed in this paper). In addition, this methodology has also been applied to a set of earbohydrate compounds (13). Because the direct cause and effect relationship between 13C NMR chemical shift resonance and chemical structure is unknown, this simulation methodology is designed and used for the approximation of chemical shift values with intramolecular structural features, rather than for the assessment of the quantitative significance of these features in causal relationships. Although certain factors clearly influence chemical shift resonance, heteroatom substitution, and steric crowding for instance, other structural parameters are necessary for a thorough and accurate characterization of this measure of electromagnetic response. Thus, in an attempt to improve this simulation methodology, new and better means of describing chemical shift data are continually being explored. Topological and electronic descriptions of the atomic environment, as encoded in this simulation methodology, are typically depicted as throughbond relationships, whereas geometrical descriptors generally describe throughspace relationships. Topological descriptors encode counts of heteroatoms and atom valencies as well as more complex branching relationships expressed as weighted paths (14) or connectivity indices (15,16). Electronic properties of atoms are described using Del Re u charge calculations (17). van der Waals energy descriptors can be regarded as electronic in nature. These descriptors, calculated with the parameters used in Allinger’s MM2 molecular mechanics program (18), are also geometry dependent. The energies typically describe one to four or greater nonbonded atom-atom throughspace interactions. Other geometrical descriptors are functions of atom-atom distances or torsional bond relationships. This paper will explore the utility of two new types of torsional descriptors in the simulation of chemical shift data for a set of norbornanol molecules. EXPERIMENTAL SECTION Figure 1 shows the 42 methyl-substituted norbornan-2-01 compounds used in this study. A random number generator was employed to divide the compounds into two sets. The set of compounds used in the formation of chemical shift models consists of structures 1-32 and the prediction set is composed of structures 33-42. The NhfR spectra for all of the compounds were taken from data published by Stothers et al. (19). These spectra were collected on a Varian XL-100-15system operating in the Fourier transform mode. Solutions of 10-15% (w/v)in deuteriochloroform

0003-2700/88/0360-2700$01.50/00 1988 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 60, NO. 24, DECEMBER 15, 1988

H0dkcH3

CH3 Ho+

5

2701

7

CH3

6

&

CH3H 0 d 9 y c H 3

HO

12

13

19

20

+ ;,

14

CH3

HO

I

H3C 22

HO

& H3C

23

HO

"* H3C

H3C

24

H

H3C

2s

3

C

26

27

H3C 21

HO

28

d

HO

29

30

31

32

33

34

35

36

37

38

39

40

41

42

Figure 1. The structures of the 42 norbornan-2-ois used in the I3C NMR simulation studies. Compounds 1-32 compose the reference set and compounds 33-42 compose the prediction set.

were used and chemical shifts were measured relative to tetramethylsilane to a precision of f0.05 ppm. The 42 chemical structures were entered into the computer disk files by using the graphical input procedure of the ADAPT software system (20,21). Molecular modeling of the structures using an interactive molecular mechanics program (22) was performed to obtain approximate three-dimensional atomic coordinates. Following this, the MM2 program developed by Allinger (18,23) was used to refine these coordinates. The computer programs used in this study were written in FORTRAN and implemented on a PRIME 750 computer operating in the Department of Chemistry at The Pennsylvania State University. Tektronix PLOT-10 software provides the graphics capabilities for these studies. RESULTS AND DISCUSSION The norbornane backbone structure, bicyclo[2.2.1]heptane, is a common occurrence in organic molecules. This structure can be characterized as consisting of a cyclohexane ring in the boat conformation with a methylene bridge linking carbons 1 and 4. Significant deviation from ideal sp3 hybridization geometries around the carbon atoms distorts the electronic framework within these molecules, thus inducing strain throughout the structure. To partially relieve this strain, these structures often twist up to 14' (24) such that the atoms in the cyclohexane ring are not completely eclipsed. Comprehensive investigations on the measurement and interpretation of 13C NMR spectral data for norbornane compounds were reported by several researchers (19,25-27). Two recent books discuss the topic of NMR and stereochemistry as it relates to bicyclic compounds (28, 29). A comprehensive discussion of NMR chemical shifts and coupling constants for several nuclei in bicyclic [2.2.1] and [2.2.2] compounds are considered in the book by Marchand (28), whereas I3C NMR chemical shifts and substituent effects for several classes of monocyclic and bicyclic compounds are presented in the book by Whitesell and Minton (29). Lippmaa et al. (26) proposed additivity rules to approximate 13C NMR chemical shifts for a series of norbornane compounds. Sub-

stituent parameters were derived for the carbon atoms in various bonding environments relative to the substituent groups. Stothers et al. (19) also suggest substituent parameters for norbornane molecules but indicate a poor fit when predicting shifts for carbon atoms when crowded 6 interactions are present. According to Marchand (B), the mechanism of transmission of electronic substituent effects in bicyclic systems, as expressed in I3C NMR spectra, is unclear. Either a throughspace, field effect, or throughbond, inductive effect, or a combination of the two, is responsible for the transmission of substituent effects. A more general and complete discussion of substituent effects and their structural dependence is presented elsewhere (30). In essence, the degree of substitution in the (Y and /3 positions, as well as the orientations of the substituents, give rise to the observed chemial shift values. Steric interactions of substituents increase the shielding of carbon atoms. Stothers et al. (19) relate this effect to the 1,4 torsional interactions between atoms. Torsional Descriptors. We created new descriptors to incorporate this torsional effect into chemical shift models. Descriptors that encode counts of angles in standard torsional relationships: gauche, 60°;anti, 180'; eclipsed, Oo, 120°, were already present in our software. However, because the torsional angles in the norbornane structures cover a range of values, two types of continuous-variable descriptors were developed for this specific application. Stothers e t al. (19) described a general trend of increased shielding with decreased torsional angle for norbornane compounds. To parametrize this effect, we implemented a steric or energy-type relationship as calculated by using strain functions developed by Wipke et al. (31) rather than assigning a direct linear function. Figure 2 illustrates this sinusoidal energy/torsional angle relationship. Eclipsed atom arrangements are weighted as high-energy interactions, while staggered atom arrangements are considered low-energy conformations. Although the low-energy well does not lie at exactly

2702

ANALYTICAL CHEMISTRY, VOL. 60, NO. 24, DECEMBER 15, 1988

Flgure 2. A plot of the energy barrier as a function of torsional angle.

60°, Whitesell and Hildebrandt (5) point out that perfect gauche relationships do not produce the lowest energy conformations in molecules. The descriptors generated by using this relationship include total, average, largest, and smallest strain energies over all torsional angles for each carbon atom. Several researchers have described the interactions of vicinal hydrogens as affecting 13C NMR chemical shift values (25,32, 33). Grutzner et al. (25) indicate that successive replacement of a or @ hydrogens causes a 3 ppm change in chemical shift value. Whitesell and Minton (33) further assert that only anti-hydrogen interactions cause this downfield shift. Thus, to describe these effects, hydrogen torsional descriptors were developed. Descriptors that encode the counts of standard angles ( O O , 60°, 120°, 180°) for vicinal hydrogens were generated, as well as descriptors that encode all torsional angle relationships. For this type of continuous-variabledescriptor, the actual degree values of the angles are used for total, average, largest, and smallest angle calculations. Characterizationof the Data Set. ALI methyl-substituted norbornan-2-01 compounds for which measured 13C NMR chemical shift data were available in the literature were included in this data set. The 42 compounds contain zero to three methyl substituents: 2 compounds have no methyl substituents, 18 monomethyl, 16 dimethyl, and 6 trimethyl. The numbers of statistically possible norbornan-2-01isomers with zero to three methyl substituents are 2,22,110, and 330 compounds, respectively. Therefore, this data set shows significant representation of the isomers containing zero or one methyl substituents, whereas only a sampling of all possible dimethyl and trimethyl isomers are present. Thus, 422 theoretical structures remain with no readily available 13C NMR spectra. This structural class illustrates an ideal case for the application of spectral simulation methodology. A high degree of structural similarity is present within this set of compounds. For all but two of the compounds, at least one other geometrical isomer is present in the data set. For example, eight isomers of 5,g-dimethyl substitution (compounds 12-16,36-38) exhibit identical topological properties. However, chemical shift data are highly geometry dependent; topologically equivalent atoms differ as much as 15.4 ppm (atom 6 in compounds 13 and 15) in chemical shift. Thus, parametric model equations must be very sensitive to the geometrical differences between topologically equivalent molecules. The torsional angle, throughspace distance, and van der Waals descriptors are useful for the elucidation of these differences. Review of a Qualitative Model. Univariate linear regression models derived by Li and Chesnut (34) relate van der Waals energies to chemical shift data for 57 carbon atoms from a set of norbornane structures. These models have the form where is the observed chemical shift for nucleus i, k is the class of the nuclei, b k is the class or bonding constant, and &w,j is the local van der Wads energy of the nucleus. The five atom classes in this example include primary carbons,

secondary carbons at position 7, other secondary carbons, tertiary bridging carbons, and other tertiary carbons. The bonding constant was calculated for each class while a constant slope, cvdw, was maintained. A root mean square error of 3.5 ppm is calculated for the 57 carbon nuclei. Somewhat better fits of the data can be obtained if the slope is allowed to vary for the individual classes. These results indicate a strong correlation between van der Waals energy and chemical shift resonance. This relationship is further illustrated for several alkane classes and is demonstrated for other resonating nuclei as well (34, 35). The researchers’ goal was to obtain a good qualitative, physical interpretation of chemical shift resonance, rather than an optimized quantitative relationship which can be achieved through multivariate approaches. In general, chemical shift is dependent upon both diamagnetic and paramagnetic contributions. The paramagnetic term is the major contributor to shift changes; therefore, the van der Waals effect on the paramagnetic term can be used to explain the observed correlation of properties. The van der Waals energy is composed of attractive and repulsive terms. The attractive potential causes an increase in shielding, typically observed in y atom relationships. On the other hand, repulsive interactions are deshieldingand can be observed as a?! , effect. These deshielding effects are generally larger than shielding effects because repulsive interactions are much greater than the attractive interactions (35). Commonly smaller in magnitude than y effects, 6 effects are usually shielding as well. However, especially in the syn-diaxial case, shift effects cannot completely be attributed to van der Waals interactions (34). Most of the observed substituent effects for the norbornane carbon atoms can be readily explained by using this qualitative, physically interpretable, linear model. Construction of Quantitative Models. Although qualitative interpretations of chemical shift data are valuable for our understanding of chemical shift mechanisms, our goal is to derive linear models that yield accurate simulated spectra. Therefore, numerous combinations of parameters in linear equations were computationally evaluated in order to obtain effective simulations. To begin the investigation, the 13C NMR spectra for the 32 compounds of the reference set were entered and stored in computer disk files. Due to the asymmetric nature of 2-hydroxylnorbornane compounds, each carbon atom in a given structure of the data set gives rise to a distinct chemical shift value. Thus, the atom list is composed of all 274 carbon atoms present in these structures. The dependent variable for the regression analyses was formed from the chemical shift values associated with these carbon atoms. In order for the chemical shift models to be focused on the description of small chemical shift differences between atoms with similar chemical environments, the atom list can be subdivided into atom groups based on a broad atom classification property such as carbon atom connectivity (2). This property divides the atom list into the following five groups: (1)primary, (2) secondary, (3A) tertiary without an attached hydroxyl group, (3B) tertiary with an attached hydroxyl group, and (4) quaternary carbon atoms. These groups contain 50, 95,82,30, and 17 atoms, respectively. A sixth group (4A) was created by using just the 15 carbon atoms present in quaternary group 4 which do not have an attached hydroxyl group. Compounds 31 and 32 possess the only two quaternary atoms with an attached hydroxyl substituent. The reasons for the inclusion of group 4A will be discussed below. Topological, geometrical, and electronic descriptors were calculated for all of the carbon atoms. For each atom group, the descriptors were statistically evaluated in order to elim-

ANALYTICAL CHEMISTRY, VOL. 60, NO. 24, DECEMBER 15, 1988

inate those descriptors that contained minimal or redundant information. Chemical shift models were developed for all six atom groups using the remaining descriptors and the dependent variable. The "best" model for each group, as deduced from a variety of statistical tests (36-39), is presented in Table I. The number of parameters in the model, the descriptor labels, and the mean, standard deviation, regression coefficient, and mean effect on the predicted chemical shifts are given for the descriptors of each atom group. The statistical significance of each descriptor in the models was determined by calculating its t value (a number with an absolute value 1 4 is sufficient to ensure statistical significance). All of the variables in the models met this statistical criterion. Observation of these models shows the importance of torsional angle type descriptors. Torsional information is present in all six models shown here. Models for groups 1and 4 each contain a descriptor that encodes the number of 60' torsional anglesin which each carbon atom is involved. The models for groups 3A and 3B each include one of the hydrogen torsional descriptors, whereas the models for groups 2,3A, and 4A each include one of the torsional strain descriptors. Other properties prevalent in the models include van der Waals energies, hydrogen, oxygen, and lone pair distances, u charges, and atom connectivities. Model Evaluation. A more detailed inspection of the descriptors in these chemical shift models reveals some interesting trends. The information encoded in the descriptors should give some indication of the important interactions that affect chemical shift resonance for carbon atoms in these types of structural environments. The descriptors listed for each model in Table I are ranked in order of relative importance. As indicated in Li and Chesnut's model, van der Waals energy is correlated with chemical shift resonance. Likewise, in our first three model equations, van der Waals descriptors (CHVD 1, COET 1 (40), OSTR 1, CXVD 1, COEL 1 (40), HXVD 1, and TCVD 1) play key roles. These descriptors primarily encode information associated with y or greater interactions between atoms. Two classes of throughspace distance descriptors are represented in the fiit five model equations-continuous variable distance descriptors (HRD3 2, LPDB 3 (40), "13 1,HX13 2, HLD3 3 (40),"13 3, and ARD3 1)and discrete variable shell count descriptors (HSHL 5, HSHL 3, and ASHL 1). HRD3 2 and LPDB 3 encode relationships between y atoms (or lone pairs attached to y atoms), while HXI3 2 and HLD3 3 encode y information from the perspective of the whydrogen. "13 3 encodes both y and 6 information and "13 1 describes the relationships of vicinal hydrogen atoms. Additionally, higher values of "13 3 indicate the close approach of hydrogen atoms, as well as numerous H-H interactions. ARD3 1 (in the model for group 4 carbons) simply provides bond length information for the four atoms attached to the quaternary carbon; primarily, this differentiates the atoms with and without attached hydroxyl groups. The shell count descriptors encode the number of hydrogens or non-hydrogens located in a region of space bounded by spherical shells at predetermined radii from the carbon atom. In the model for group 2 carbons, HSHL 3 appears to be related to the number of hydrogens located y-gauche to the carbon, while HSHL 5 primarily includes the number of more distant hydrogens located 6 or as methyl hydrogens e to the carbon. Similarly, for the carbons of group 3A, HSHL 5 gives a count of distant hydrogen atoms. ASHL 1,for the carbons of group 3B, is related to the number of methyl groups located y in the endo position on carbon 6 or in the syn position on carbon 7. The torsional descriptors were described earlier as pertaining to 1-4 torsional relationships between either the carbon

2703

center and another non-hydrogen atom or between vicinal hydrogen atoms. Models for groups 1, 2, 3A, 4, and 4A incorporate the first type of torsional descriptor, while the models for groups 3A and 3B include the hydrogen torsional descriptors. TTAS 1, in the model for group 2, indicates differences between secondary atoms of the bridging methylene and secondary atoms in the less strained backbone cyclohexane ring, whereas ATAS 1, in the model for group 3A, differentiates broadly between the two general types of tertiary atoms and provides information relative to connectivity and orientations of y substituents. In the model for group 4A, ATAS 1 gives some measure of congestion near the carbon atom and that atom's location within the molecule. NTOR 60 also appears to provide information regarding location. The hydrogen torsional descriptors furnish information regarding relative orientation of substituents (as well as the hydrogen atoms themselves) and the number of substituents geminal to each hydrogen. The remaining descriptors include topological (NNOX 3, ICNC 1,and AVC3 2) and charge descriptors (TOCG 3 and MPCG 2). NNOX 3 provides a count of the number of oxygens located y to the carbon (in this case, the descriptor can only take on the values of 0 or 1). AVC3 2 encodes the number of tertiary carbon atoms located ,8 to the carbon center. ICNC 1encodes the connectivity of the bonds attached to the carbon center. The latter two descriptors give a direct measure of crowding near the carbon center. For example, ICNC 1is the descriptor with the highest influence in the model for group 3A carbons and appears to differentiate between tertiary bridging carbons and all other tertiary carbons. The charge descriptors, TOCG 3 and MPCG 2, encode the total charge y and the most positive charge ,8 to the carbon atoms, respectively. These charge descriptors provide significant structural information. For example, TOCG 3 in the model for group 3B carbon atoms indicates the number of y carbon atoms as well as the degree of methyl substitution at carbon 5. Effects observed experimentally are apparently reflected in these chemical shift models as well. In general, y relationships between atoms appear to have a significant influence in these model equations. In particular, all of the descriptors in the model for the primary carbons, group 1, encode information associated predominately with y interactions between atoms. Also, interpretation of the information contributed by ASHL 1 (and to a lesser extent "13 3) in the model for the tertiary carbons with an attached hydroxyl substituent, group 3B, agrees with an analysis of 6 interactions presented by Stothers et al. (19). They indicate that syn-axial substitution (such as shown with the hydroxyl and methyl substituents in compound 15, at carbons 2 and 6) or similar interaction (in compound 11, atoms 2 and 7) produces a large deshielding effect in the chemical shifts of the carbons bearing the substituents, as well as in the shift of the methyl carbon in the 6 relationship. Finally, the vicinal hydrogen relationship to chemical shift, as described earlier, is apparent in models that include the hydrogen torsional descriptors and/or "13 1. A summary of the model statistics is presented in Table 11. The low, high, and mean observed chemical shift values are given for each atom group with an associated standard deviation. The statistics included are n, the number of atoms in each group, d, the number of variables (descriptors) in the regression model, R, the multiple correlation coefficient for the predicted versus observed chemical shifts, (R(adj),R adjusted for the degrees of freedom, s, the standard error of estimate for the predicted chemical shifts (in parts per million), and F, the F value for the statistical significance of the regression model.

2704

ANALYTICAL CHEMISTRY, VOL. 60, NO. 24, DECEMBER 15, 1988 ~~

T a b l e I . Chemical Shift Models

P

desc"

mean

1 2 3 4 5 6 7 8

CHVD 1 HRDB 2 LPD3 3 COET 1 OSTR 1 NTOR 60 TOCG 3 intercept

0.311 0.147 0.0289 0.0196 0.0207 0.640 0.427

Primary Carbons (Group 1) 0.326 13.1 f 0.7 0.062 33.5 f 3.3 0.0379 -51.1 f 7.9 0.0540 34.1 f 3.4 0.0173 -66.7 f 11.0 0.875 -0.836 f 0.177 0.209 5.01 f 1.20 11.2

4.06 f 0.23 4.93 f 0.49 -1.48 f 0.22 0.669 f 0.066 -1.38 f 0.23 -0.535 f 0.113 2.14 f 0.51 11.2

1 2 3 4 5 6 7 8 9 10 11

"13 1 CHVD 1 NNOX 3 CXVD 1 MPCG 2 HX13 2 TTAS 1 HSHL 5 HSHL 3 COEL 1 intercept

0.124 0.239 0.505 0.261 0.0379 0.0451 10.4 3.98 3.16 0.0118

Secondary Carbons (Group 2) 0.034 -217 f 11 0.202 7.95 f 1.16 0.503 -5.69 f 0.87 0.199 19.8 f 2.5 0.0475 49.9 f 8.5 0.0186 -301 f 26 4.6 0.657 f 0.094 2.04 1.13 f 0.14 1.05 -2.22 f 0.31 0.1009 -13.2 f 2.3 63.6

-26.9 f 1.3 1.90 f 0.28 -2.87 f 0.44 5.16 f 0.65 1.89 f 0.32 -13.6 f 1.2 6.84 f 0.98 4.48 f 0.56 -7.03 f 0.98 -0.156 f 0.027 63.6

1 2

3 4 5 6 7 8

ICNC 1 HLD3 3 HXVD 1 HSHL 5 SHTA 1 TCVD 1 ATAS 1 intercept

Tertiary Carbons without Attached Hydroxyl (Group 3A) 1.18 0.08 -66.3 f 2.1 0.0195 0.0333 -85.8 f 3.6 0.0549 0.2783 6.68 f 0.76 2.24 1.46 0.839 f 0.096 38.5 19.0 0.0376 f 0.0086 -0.0520 0.3501 8.60 f 1.23 2.86 0.82 2.71 f 0.46 111

1 2 3 4 5 6

"13 1 ASHL 1 "13 3 AHTA 1 TOCG 3 intercept

Tertiary Carbons with Attached Hydroxyl (Group 3B) 0.052 -72.0 f 3.2 0.222 1.30 0.47 3.36 f 0.52 0.0475 0.0354 -71.6 f 7.1 58.3 10.0 0.0996 f 0.0168 0.140 0.082 17.4 f 3.0 83.8

1 2 3

ARDB 1 NTOR 60 intercept

1 2 3

ATAS 1 AVC3 2 intercept

1.10 2.12

std dev

coefficient

Quaternary Carbons (Group 4) 0.03 434 f 21 0.99 -2.73 f 0.58 -422

Quaternary Carbons without Attached Hydroxyl (Group 4A) 2.59 1.04 5.15 f 0.35 0.933 0.458 -3.81 f 0.79 34.3

mean effect, ppm

-78.2 f 2.5 -1.67 f 0.07 0.367 f 0.042 1.88 f 0.21 1.45 f 0.33 -0.447 f 0.064 7.75 f 1.33 111 -16.0 f 0.7 4.34 f 0.68. -3.40 f 0.34 5.81 f 0.98 2.44 f 0.42 83.8 478 f 23 -5.79 f 1.22 -422 13.3 f 0.9 -3.56 f 0.73 34.3

"Descriptor definition: AHTA 1 sum of vicinal hydrogen torsional angles of hydrogens attached to the carbon center divided by the number of angles; ARDB 1, sum of inverse cubed throughspace distances from the carbon center to heavy atoms attached to the carbon center; ASHL 1,number of heavy atoms located from 2.7 to 3.4 8, from the carbon center; ATAS 1, sum of strain energies of torsional angles involving the carbon center divided by the number of angles; AVC3 2, number of tertiary atoms located two bonds from the carbon center; CHVD 1, van der Waals energy due to interactions between the carbon center and hydrogens in the molecule; COEL 1, van der Waals energy due to interactions between the carbon center and oxygen atoms 1 3 bonds away; COET 1,van der Waals energy due to interactions between the carbon center and oxygen atoms 1 2 bonds away; CXVD 1, van der Waals energy due to interactions between the carbon center 1, average of sum of inverse cubed throughspace distances from the hydrogens attached to the carbon center and other heavy atoms; "13 to hydrogens two bonds from the carbon center; "13 3, average of sum of inverse cubed throughspace distances from hydrogens attached to the carbon center to hydrogens four bonds from the carbon center; HLD3 3, average of sum of inverse cubed throughspace distances from hydrogens attached to the carbon center to lone pairs attached to oxygens four bonds away; HRDB 2, sum of inverse cubed throughspace distances from the carbon center to hydrogens three bonds away; HSHL 3, number of hydrogens located from 2.4 to 3.2 8, from the carbon center; HSHL 5, number of hydrogens located from 3.6 to 5.0 A from the carbon center; HX13 2, average of sum of inverse cubed throughspace distances from the hydrogens attached to the carbon center to heavy atoms three bonds from the carbon center; HXVD 1, van der Waals energy due to interactions between hydrogens attached to the carbon center and heavy atoms in the molecule; ICNC 1, corrected molecular connectivity index computed for bonds attached to the carbon center; LPD3 3, sum of inverse cubed throughspace distances from the carbon center to lone pairs attached to oxygens three bonds away; MPCG 2, most positive u charge of atoms located two bonds from the carbon center; NNOX 3, number of oxygen atoms located three bonds from the carbon center; NTOR 60, number of 60° torsional angles involving the carbon center; OSTR 1, van der Waals energy of the oxygen atom divided by the distance from the oxygen to the carbon center; SHTA 1, smallest vicinal hydrogen torsional angle of hydrogens attached to the carbon center; TCVD 1, total van der Waals energy of the carbon center; TOCG 3, sum of the absolute values of u charges for atoms three bonds away from the carbon center; TTAS 1, sum of strain energies of torsional angles involving the carbon center.

The R values for all of the models are very high, indicating good fits to the observed data. The standard errors for three of the models fall below 1.0 ppm, while five of the six models

have errors below 1.5 ppm. T h e model standard error of 2.28 ppm for quaternary carbon atom group 4 signals that the range of t h e chemical shift values for this subset is much too large

ANALYTICAL CHEMISTRY, VOL. 60, NO. 24, DECEMBER 15, 1988

2705

Table 11. Summary of Model Statistics group

low

1 2 3A 3B 4 4A

10.2 18.3 24.1 69.0 33.2 33.2

observed chemical shifts high mean std dev

n

d

R

R(adj)

S

F

19.6 33.4 42.3 77.0 48.0 44.0

50 95 82 30 17 15

7 10 7 5 2 2

0.993 0.987 0.991 0.982 0.985 0.975

0.992 0.985 0.991 0.979 0.984 0.973

0.76 1.19 0.88 0.86 2.28 1.33

421 314 598 127 224 114

33.2 48.7 52.9 86.2 77.9 49.4

5.9 7.0 6.4 4.1 12.3 5.5

for small shift differences to be adequately described with a two-variable model. However, the chemical shifts can be predicted to within 1.33 ppm by removing the only two quaternary carbon atoms with attached hydroxyl substituents and forming a new group (group 4A). The F values are excellent for all of the models. Chemical shifts for each atom group were simulated with these linear models. To investigate the practicality of using these simulated shifts to represent real data, they can be merged to form complete simulated spectra for the reference compounds. Several tests can then be applied to evaluate the similarity of these spectra to their corresponding measured spectra. Therefore, the simulated chemical shifts from the models for atom groups 1, 2,3A, 3B, and 4 were assembled for each compound to form a complete spectrum. One measure of similarity is determined through calculation of a residual mean square (rrns) error for the one-to-onecomparison of a simulated spectrum with its corresponding observed spectrum. The mean rms error for the 32 reference compounds is 1.05 ppm, with low and high errors of 0.50 and 1.59 ppm, respectively. A second measure of the quality of simulated spectra can be determined by performing a library search. In the search procedure, each simulated spectrum is compared to all spectra contained in the spectral library to determine the most similar measured spectrum. The spectral library, compiled from data in the literature, is composed of 521 spectra from several chemical classes, including norbornanes and norbornanols, cyclic and acyclic alkanes and alcohols, cyclic ketones, steroids, PCBs, and an assortment of small molecules. Various metrics can be used to measure spectral similarity; however, the squared Euclidean distance metric is sufficient in this case. Sorted chemical shift values of each simulated and library spectrum were compared, and the five library spectra having the smallest squared Euclidean distances for each simulated spectrum were recorded. For each of the 32 reference compounds, the library spectrum that was retrieved with the smallest squared Euclidean distance, when compared with the simulated spdctrum, was its corresponding observed spectrum. Thus, in the library search, the correct spectrum was always retrieved as the most similar spectrum to the simulated spectrum. The simulated chemical shifts generated from the model for quaternary atom group 4A can replace the shifts simulated with the model for atom group 4 in the simulated spectra. In this case, a mean rms error of 0.98 ppm for the observed versus simulated spectra of the 32 reference compounds, and high and low errors of 1.33 and 0.50 ppm, respectively, were calculated. Recall that the simulated spectra for compounds 31 and 32 are incomplete when atom subset 4A is used because no chemical shifts were simulated for their quaternary carbon atoms. Again a library search was performed to find the best spectral match for each simulated spectrum. For all 30 complete simulated spectra, the corresponding observed spectra were determined to be the best match. Simulation of Prediction Set Spectra. Finally, to test the applicability of the chemical shift models to compounds

similar to those of the reference set, chemical shifts can be simulated for the prediction set compounds. The 13C NMR chemical shifts for the external prediction compounds were entered and stored. Compounds 33-42 contain 88 carbon atom centers that were divided into five subsets for shift simulation. These subsets contain 18 primary, 27 secondary, 28 tertiary without hydroxyl, 10 tertiary with hydroxyl, and 5 quaternary carbon centers. None of the quaternary carbon atoms has attached hydroxyl substituents; thus, either quaternary model can be used to simulate their chemical shift data. The descriptors present in the six models were calculated for the prediction atom list and used to generate simulated spectra for the 10 compounds. These spectra have a mean rms value of 1.67 ppm when the simulated shifts generated by using model 4 are used for the quaternary carbon atoms and 1.62 ppm when shifts from model 4A are used. The low rms error of 0.97 ppm for compound 43 is calculated in both cases, while the high rms errors are 2.66 ppm (compound 41) and 2.42 ppm (compound 39) when shifts from models 4 and 4A, respectively, are applied. The increases in rms error over those for the reference compounds are not unusual as neither the structural nor chemical shift data for these compounds was used in the construction of the linear regression models. Library searches were performed for the simulated spectra of the prediction compounds. All of the simulated spectra were determined to be most similar to their corresponding observed spectra.

CONCLUSIONS We were able to develop linear model equations that relate 13C NMR chemical shift data to atom-based structural properties for the set of methyl-substituted norbornan-2-01 reference compounds described in this paper and, subsequently, use these models to simulate chemical shift data for both the reference and prediction compounds. These simulated spectra provide excellent representations of the actual measured spectra. Furthermore, the chemical shift models contain information consistent with other researchers’ interpretations of the relationship between local atomic environments and chemical shift differences for compounds with the norbornane backbone. In addition, the torsional descriptors proved valuable in the discrimination of the geometrical structural environments of topologically identical carbon atoms for all six atom subsets. This allowed accurate identification of all 42 simulated spectra when compared with measured spectra. Therefore, spectra can also be simulated for those mono-, di-, and trimethyl compounds for which measured 13C NMR spectra do not currently exist. The accuracy of those simulations should approximatethat of the simulated spectra for the 10 prediction compounds in this study. These chemical shift models may be applicable to more highly methylated norbornan-2-01 compounds; however, we are unable to verify whether this extrapolation of our results is warranted. The quaternary model developed for the carbon atoms without an attached hydroxyl substituent was found to more accurately simulate chemical shift data than the model developed for all quaternary atoms. However, the simulated data

2706

ANALYTICAL CHEMISTRY, VOL. 60, NO. 24, DECEMBER 15, 1988

from either model can be used in conjunction with the simulated shifts obtained from the other linear models to generate high-quality spectra for the compounds of this data set. Finally, these high-quality simulated spectra indicate that the effects of atom strain are suitably encoded by the model equations. These simulated spectra appear to be unique representations of the electromagnetic environments for the compounds of this structural class.

ACKNOWLEDGMENT We wish to acknowledge G. W. Small, University of Iowa, for providing the software for oxygen and lone pair distance and van der Waals descriptor calculations. LITERATURE CITED (1) (2) (3) (4) (5) (8)

Small, G. W. Anal. Chem. 1987, 59, 535A-546A. Grant, D. M.; Paul, E. G. J. Am. Chem. Soc. 1984, 86. 2984-2990. Llndeman. L. P.; Adams, J. 0. Anal. Chem. 1971, 43, 1245-1252. Belerbeck, H.; Saunders, J. K. Can. J. Chem. 1980, 58, 1258-1265. Whitesell, J. K.; Hlldebrandt, B. J. Org. Chem. 1985, 50, 4975-4978. Whitesell, J. K.; LaCour, T.; Lovell, R. L.; Pojman, J.; Ryan, P.; Yamade-Nosaka, A. J. Am. Chem. SOC. 1988, 170, 991-996. Smith, D. H.; Jurs, P. C. J. Am. Chem. Soc. 1978, 700, 3316-3321. Small, 0. W.; Jurs, P. C. Anal. Chem. 1983, 55, 1121-1127. Small, G. W.; Jurs, P. C. Anal. Chem. 1983. 55, 1128-1134. Small, G. W.; Jurs, P. C. Anal. Chem. 1984. 56, 2307-2314. Egolf, D. S.; Jurs, P. C. Anal. Chem. 1987, 59, 1586-1593. SuttOn. 0. P.: Jurs. P. C. submitted for Dublication in Anal. Chem. McIntyre, M. K.; Small, G. W. Anal. C d m . 1987, 59, 1805-1811. Randie. M. J. Chem. Inf. Comput. Sci. 1984, 24, 164-175. Rand16 M. J. Am. Chem. Soc. 1975, 97, 6809-8615. Kler, L. 8.: Hall, L. H. J. Pharm. Sci. 1978, 65, 1806-1809. Del Re, G. J . Chem Soc . 1958. 403 1-4040. Burkert, U.; AIHnger, N. L. Molsculer Mechanics; ACS Monograph 177; American Chemical Soclety: Washington, DC, 1982. Stothers, J. B.; Tan, C. T.; Tan, K. C. Can. J. Chem. 1978, 5 4 , 1211-1221. Brugger, W. E.; Jurs, P. C. Anel. Chem. 1975, 47, 781-783. Stuper, A. J.; Jurs, P. C. J. Chem. Inf. Comput. Sci. 1978, 16, 99- 105. Stuper, A. J.; Brugger, W. E.; Jurs, P. C. Campurer Assisted Stwlfes of Chemlcal Structure and 8iobgical Function: Wlley-Interscience: New York. 1979; pp 83-90. Clark, T. A Handbook of Computational Chemktry: A Practical Guide to Chemical Structure and Energy Calcu&tions ; Wlley-Intersclence: New York, 1985; Chapters 1 and 2.

(24) Altona, C.; Sundaralingam, M. J. Am. Chem. SOC. 1970, 92, 1995-1999. (25) Grutzner, J. 8.; Jautelat, M.; Dence, J. B.; Smith, R. A.; Roberts, J. D. J. Am. Chem. SOC. 1970, 92, 7107-7120. (26) Lippmaa, E.; Pehk, T.; Paasivlrta, J.; Belikova, N.; Plat(, A. Org. Magn. Reson. 1970, 2 , 581-604. (27) Llppmaa, E.; Pehk, T.; Belikova, N. A,; Bobyleva, A. A.; Kallnichenko, A. N.; Ordubadi, M. D.; Plat(, A. F. Org. Magn. Reson. 1976, 8 , 74-78. (28) Marchand, A. P. Stereochemical Applications of Nh4R Studles In RlgM 8icycl/c Systems ; Methods in Stereochemical Analysis; Marchand, A. P., Ed., Verlag Chemie International: Deerfield Beach, FL, 1982; Vol. 1. (29) Whitesell, J. K.; Minton, M. A. Stereochemical Ana&& of Alicycllc Compounds by C - 73 NMR Spectroscopy; Chapman and Hall: London, 1987. (30) Duddeck, H. I n Topics in Stereochemistry; Eliel, E. L., Wllen, S. H., Alllnger, N. L., Eds.; Wlley-Intersclence: New York, 1988; Vol. 16, pp 219-324. (31) Wlpke, W. T.; Dyott, T. M.; VerDalis, J. G. Abstracts of Papers; 161st National Meeting of the Amerlcsn Chemical Soclety. Los Angeles, CA; American Chemical Society: Washington, DC, 1971; CHLT 22. (32) Jackman, L. M.; Kelly, D. P. J. Chem. SOC.8 1970, 102-110. (33) Whitesell, J. K.; Mlnton, M. A. J. Am. Chem. SOC. 1987, 709, 225-228. (34) Li. S.: Chesnut, D. B. M a p . Reson. Chem. 1988, 2 4 , 93-100. (35) LI, S.; Chesnut, D. 6. Magn. Reson. Chem. 1985. 23. 625-638. (36) Draper, N. R.; Smith, H. Applied Regmsslon Analysis, 2nd ed.; WlleyInterscience: New York, 1981. (37) Belsley, D. A.; Kuh, E.; Welsch, R. E. Regression Dkgnostics: Identifying Influential Data and Sources of Collinear/ty; Wlley-Intersclence: New York, 1980. (38) Allen, D. M. Technical Report No. 23, 1971; Department of Statistics, University of Kentucky, Lexington, KY. (39) Snee, R. D. Technomehics 1977, 79, 415-427. (40) Small, G. W., The University of Iowa, personal communication, 1987.

.

RECEIVED for review July 28, 1988. Accepted September 23, 1988. This work was supported by the National Science Foundation under Grant CHE-8503542. The PRlME 750 computer was purchased with partial financial support of the National Science Foundation. Portions of this paper were presented under the auspices of the COMP division of the ACS at the 3rd Chemical Congress of North America & 195th National Meeting of the American Chemical Society, Toronto, Ontario, Canada, June 1988.