Simulation of carbon-13 nuclear magnetic resonance spectra of

Jun 1, 1991 - Kyle L. Jensen, Abigail S. Barber, and Gary W. Small ... Vicente P. Emerenciano , Marcelo J.P. Ferreira , Daniel Cabrol-Bass , Michel Ro...
1 downloads 0 Views 1MB Size
Anal. Chem. 1991, 63, 1081-1090

"clusters". If this model is an accurate representation of the interface, then the exposure of silica-immobilized fluorescent probes to solution should not be homogeneous but dependent upon their distribution within these hydrophobic domains. Some populations may be fully exposed to solution while other populations are minimally exposed. These differences would be reflected in a range or distribution of fluorescence decay times and quenching rates, as long as the domains live longer than the excited states of the probe molecules. Since nonexponential decay behavior is absent in the shorter C8-alhemodified silica, it is possible that greater mobility of these shorter alkyl chains at room temperature produces an average environment sampled by the pyrene probe during its excited-state lifetime, which appears homogeneous. Further studies to determine the inhomogeneity and structuring of alkyl chains at silica surfaces in more polar solvents are currently in progress.

LITERATURE CITED (1) Lochmijlkr, C. H.; Marshall, D. B.; Wilder, D. R. Anal. chhn. Acfa 1981, 130. 31. (2) LochmiNler, C. H.; Marshall, D. 8.; Harris, J. M. Anal. CMm. Acfa 1981, 131. 283. (3) LochmOller, C. H.; Colborn, A. S.; Hunnlcutt, M. L.; Harris. J. M. Anal. Chem. 1983, 55, 1344. (4) LochmOller, C. H.; Colborn, A. S.; Hunnlcutt, M. L.; Harris, J. M. J . Am. Chem. Soc. 1984, 106, 4077. (5) Stahlbwg, J.; Almgren, M. Anal. Chem. 1984, 5 7 , 817. (8) Carr, J. W.; Harrls, J. M. Anal. Chem. 1988, 58,828. (7) Can, J. W.; Harrls, J. M. Anal. Chem. 1987, 5 9 , 2548. (8) Men, Y . 4 . ; Marshall, D. B. Anal. Chem. 1990, 62, 2808. (9) Dowllng, S. D.;Seltz, W. R. Anal. Chem. 1985, 5 7 , 802. (10) Shaksher, 2. M.; Seltz, W. R. Anal. Chem. 1989, 6 1 , 590.

(11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (28) (27)

1081

Can, J. W.; Harrls, J. M. J . chsomew. 1989, 481, 135. Can,J. W.; Rauchkorst, A. J.; Harrls. J. M. Unpubllshed results.

DUI, K. A. J . pnvS. (;rhem. 1987, 91, 1980. LochmWw. C. H.; Mnnlcun, M. L. J . Pnys. Chem. 1986, 90, 4318. Bogar, R. G.; Thomas, J. C.; Callis, J. B. Anal. Chem. 1984, 5 6 , 1080. Stahlberg, J.; Almgren, M.; Alslns. J. Anal. Chem. 1968, 60, 2487. Wong, A. L.; Marshall, D. 8.; Harrls, J. M. Can. J . phvs. 1990, 68, 1027. Wong, A. L.; Hunnlcutt, M. L.; Harris, J. M. J . M y s . Chem., In press. hnnkutt, M. L.; Hanis, J. M.; LochmOller, C. H. J . phys. Chem. 1985, 89, 5248. bnghllde, F. W.; Thulstrup, E. W.; Mkhl, J. J . Chem. phys. 1983. 78, 3372. Unger, K. K. Pcmwp S k a ; Elsevier: New York. 1979. McCormick, R. M.; Karger, 8. L. Anal. Chem. 1980, 52, 2249. Synder, L. R. principles of Adsorprion Chomatcgaphy; Marcel Dekker Inc.: New York, 1988. Wong, A. L.; Harrls, J. M. Anal. Chem. 1989, 61. 2310. bvlngton, P. R. De& ReducNon and Error Analysis for the m y s b l Sciences; McGraw-Hlll: New York, 1969. Dong, D. C.; Wlnnik, M. W. p h o t c d " . PhotoM. 1984, 35,17. ersity HUnnlCutt, of Utah. M. L.; WOng, A. L.; Harris, J. M. Unpublished Studies, Univ-

(28) Lochmijller, C. H.; Wilder, D. R. J . Chomatogr. Sci. 1979, 17. 574. (29) Chemie Minerale; Pascal, P.. Ed.; Masson: Parls. . . Nouveau Trait8 1982; Vol. V. (30) Blngham. E. C.; W h b , 0. F.; Thomas, A.; Caldwell. J. L. 2. C h m . 1912, 83,41. (31) Marshall, D. M. Anal. Chem. 1989, 61, 860.

RECEIVED for review February 7,1991. Accepted March 8, 1991. This research was funded in part by a grant from the Office of Naval Research. Additional fellowship support (to A.L.W.) from the ACS Division of Analytical Chemistry, and the Society of Analytical Chemists of Pittsburgh is acknowledged.

Simulation of Carbon- 13 Nuclear Magnetic Resonance Spectra of Polycyclic Aromatic Compounds Kyle L. Jensen, Abigail S. Barber, and Gary W. Small*

Department of Chemistry, The University of Iowa, Iowa City, Iowa 52242

Spectra-structure relatlonshlps are developed that allow carbon-13 nuckar magnetic resonance chemkal shtfts to be Sknulated In a set of dlverse polycyclic aromatlc compounds. Employing 33 compounds that encompass 24 different aromatk rlng backbones, SIXIlnear models are generated that allow complete spectra to be simulated to an average error of 0.68 ppm. Successful model generation Is found to be dependent upon grouplng atoms of sknllar chemkal envlronmenis together and the use of structural parameters based on Huckel molecular orbital and molecular mechanics calculations. The computed models are evaluated by use of a separate predktbn set of 11 compounds not Included in the m0d.l gmoratkn work. The average rpectral pndictkn error for there compounds Is 1.02 ppm, even though seven of the 11 compounds contain rlng backbones not found among the 33 compounds usod In computing the models.

INTRODUCTION Spectrum simulation techniques for carbon-13 nuclear magnetic resonance spectroscopy ('8c NMR)have potential 0003-2700/91/0383-1081$02.50/0

use in the solution of structure elucidation problems and in the verification of chemical shift assignments. These methods allow the chemical shifts of carbon atoms to be estimated, given only the chemical structure. The two most widely used approaches to predicting 13C NMR chemical shifts are database retrieval methods (I)and empirical modeling techniques (2-4).The database techniques are based on the ability to encode the chemical environment of a carbon atom in a form that can be compared to each member of a library of similarly encoded environments and their associated chemical shifts. In a prediction, each estimated chemical shift is taken as the experimentally observed shift associated with the environment in the library that matches most closely to that of the carbon whose predicted chemical shift is desired. The principal drawback to this approach is that the prediction accuracy depends directly on the presence in the library of the appropriate environments. An exact match of the environments must be obtained if an accurate prediction is to be made. In this scheme, there is no capability to use the library data to interpolate the correct chemical shifts. The modeling techniques attempt to overcome this limi@ 1991 American Chemlcal Soclety

1082

ANALYTICAL CHEMISTRY, VOL. 63, NO. 11, JUNE 1, 1991

00 &?

&I9 000

3 0

Oo0 00 Oo0 000

& 0 0

000

Flgure 1.

00

00 000 00 0 0

& "a()/& E3

0 0 000 000

000

&&p 000 oooo

Structural backbones of the compounds used In the model development and evaluation.

tation by establishing empirical relationships between features of the chemical environment and the corresponding chemical shift. Structural features are encoded into numerical parameters for use in the model. The models derived through this procedure can then be employed to predict chemical shifts of carbons not included in the actual model development. While these methods can interpolate chemical shifts effectively, they are ultimately limited in accuracy by the difficulty in determining and encoding the structural changes that are

most significant in inducing small changes in chemical shifts. One approach to overcoming this limitation lies in the use of computer-based techniques to calculate detailed parameters that encode steric or electronic structural information. Parameters of this type have been used in modeling several different chemical systems (5-9). Recently, work in our laboratory has sought to broaden the applicability of the modeling approach through the use of molecular mechanics and Huckel molecular orbital calculations

ANALYTICAL CHEMISTRY, VOL. 63,NO. 11, JUNE 1, 1991

Table I. Compounds Uwd for Model Development and Testing

no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

34 35 36 37 38 39 40 41 42 43 44 a

name Modeling anthracene phenanthrene 1&dimethylphenanthrene 1,4,6-trimethylphenanthrene

3-methylphenanthrene PYene chrysene 1-methylchrysene 2-methylchrysene 3-methylchrysene benz[a]anthracene 7-methylbenz[a]anthracene 3,9-dimethylbenz[a]anthracene 7,12-dimethylbenz[a]anthracene

benzo[clchrysene picene dibenz[a,h]anthracene dibenzo[b,deflchrysene benzo[ret]pentaphene coronene triphenylene perylene benzo[ghi]perylene dibenzo[def,p]chrysene benzo[e]pyrene fluoranthene benzo[ghi]fluoranthene indeno[4,3,2,1]chrysene benzo[k]fluoranthene benzo[elacephenanthrylene naphth[2,3-~]aceanthrylene

indeno[1,2,3-cd]fluoranthene biphenyl Testing 1,B-dimethylanthracene 1-methylphenanthrene 4-methylphenanthrene 6-methylchrysene benzo[blchrysene naphtho[2,3-&]fluoranthene benzo [g]chrysene dibenzo[dej,mno]chrysene naphtho[1,2,3,4-deflchryeene benzo [deflchrysene dibenz[a,e]aceanthrylene

lit. ref

backbonea no.

11 11

1 2 2 2 2

11 11 11 11 11 11

3 4 4 4 4 5 5 5 5 6 7 8

15 15 15

13

12 12 11 12 11 12 12 12

13

12 11 12 11 11 11 12 11 11 12 12

14 16 17

17 11 11 12

18 11 12 11 12

9

10 11 12

13 14 15 16 17

18 19 20 21 22

23 24 1 2 2

4 25 30 27 28 29 26 31

Refere to backbones displayed in Figure 1.

in the design of new structural parameters for use in modeling chemical shifts in aromatic systems. In an initial study, a set of models was developed to predict chemical shifts in substituted benzenes, naphthalenes, and anthracenes to an accuracy of 0.5 ppm (IO). In the work presented here, chemical shift modeling capabilities for aromatic systems are enhanced significantly to include 31 different ring system backbones. The computed models are evaluated, and their predictive ability is assessed. EXPERIMENTAL SECTION A total of 33 polycyclic aromatic compounds were used for the development of spectrum simulation models, and a set of 11 polycyclic aromatic compounds, not included in the model development, were utilized to determine the predictive ability of the generated models. Broad-band-decoupled %2 N M R chemical shifts,taken trom nine literature sources, were used. Compound names, for both the model generation and prediction, are listed in Table I along with a companding identificationnumbar. The structural backbonea of these Compounds (excluding substituents) are shown in Figure 1.

1083

Experimental conditionsare described below for the collection of the spectral data used in this work. CDC13was used as the solvent unless otherwise noted. All chemical shifts were referenced to Me,Si as an internal or external standard, unless otherwise noted. A total of 34 of the spectra used were taken from the twovolume collection edited by Karcher et al. (11,12). Spectra for compounds 2,6,23,25,and 26 were collected by use of a Bruker WP80 spectrometer operating at 20 MHz. Concentrations were in the range 20435 g/L, except compound 23 where a saturated solution was used. A Varian SC-300 spectrometer operating at 75.5 MHz was used to collect the spectra for the methylated chrysenea. For compounds 8-10 and 37,the concentration range was 12-40 g/L. Spectra for compounds 1,7,11, 15, 17,27, 29, and 30 were obtained by use of a JEOL PFT-100 spectrometer operating at 25 MHz. The concentration for these compounds ranged from 9 g/L to a saturated solution. A JEOL GX-270 spectrometer operating at 67.8 MHz was used in the collection of spectra of compounds 13, 14,16,18-20,22,24,28,31,32,38, 39,and 41-44. For these compounds, a concentration range of 4-65 mg/mL was reported. For compound 16,a spectrum was obtained at a concentration of 8 mg/mL with DMSO as the solvent. The spectra of compounds 12 and 21 were taken from the work of Ozubko, Buchanan, and Smith (13). The spectra were obtained by use of a Varian X G l W 1 5 spectrometer operating at 25.2 MHz. A concentration range of 0.09-0.40 M was reported. The spectrum of biphenyl (compound 33)was taken from the Sadtler 13C NMR Spectral Library (14). A Digilab FTNMR-3 spectrometer was used, and the solution concentration was 28% (w/w). Three spectra, compounds 3-5, were obtained from work by Letcher (15). A JEOL FX-9OQ Fourier transform spectrometer operating at 22.53 MHz was used to obtain the spectra for these compounds. The concentration for each compound was 0.1 M. The spectrum of lI8-dimethylanthracene(compound 34) was obtained from a study by Wolfenden and Grant (16). This spectrum was obtained with a Varian XL-100 Fourier transform NMR spectrometer operating at 15.1 MHz. No solution concentration was reported for this compound. Spectra for two of the methylated phenanthrenes (35and 36) were taken from the work of Stothers et al. (17). A Varian XL-100-15 spectrometer operating at 25.2 MHz was used. The concentration of 4-methylphenanthrene was reported as 5-10% (w/v). No concentration was reported for the spectrum of 1methylphenanthrene. The spectrum of compound 40,benzo[g]chrysene, was obtained from the work of Bax et al. (18). A Nicolet ET-NMRspectrometer operating at 500 MHz was used. A solution concentration of 15 mg/2.5 mL of solvent was reported. The computer software used in this work was written in FORTRAN 77 and implemented on a Prime 9955 interactive computer system operating in the Gerard P. Weeg Computing Center at the University of Iowa. The manipulation of spectral data, calculation and evaluation of chemical structural parameters, and generation and testing of regression models made use of a set of software tools developed by Small and Jurs (19). The MM2 (87) molecular mechanics software was obtained from the Quantum Chemistry Program Exchange (Department of Chemistry, Indiana University, Bloomington, IN) and was implemented without modification to the force field parameters. Bar charts and principal component plots were generated by use of the TELLACRAF interactive graphics system (Computer Associates International, Inc., Garden City, NY) with a Hewlett-Packard 7475A digital plotter as the output device. Plots of chemical structures made use of the EASYCAD z graphics package (Evolution Computing, Tempe, AZ) with a Hewlett-Packard LaserJet Series I1 printer as the output device.

RESULTS AND DISCUSSION

Overview of Spectrum Simulation Methodology. The spectrum simulation methodology used in this work is based on the construction of linear models that relate numerical structural parameters to 13C NMR chemical shifts. The models have the form

1084

ANALYTICAL CHEMISTRY, VOL.

63,NO. 11, JUNE 1, 1991

DO

00

m

0 1

f

-150

IO

0

-50

-100

50

100

flRST PRINCIPAL COMPONENT

0

0

44

7 $

0

0

2

0

8

L o J zK n

Q

8

-I

e

0

-1

0

8 Q

0

-4

0 0

0

0

0

0 0

0

fl 0

0 0

0 0 0

.--15 I

-10

0

0

A

-5

0 4 io flRST PRlNClPAL COMPONENT

15

20

25

Flguro 2. (Top) Princlpel components score plot depicting the environments of the 392 topologically unique atoms in compounds 1-33 (circles) group 1; (squares)group 11; (triangles) group 111. (Middle) Separation of the group I1 ring atoms Into two groups: (squares) group I V ; (clrcles) group V. (Bottom) Separation of the group 111 junction atoms Into three groups: (circles) group VI; (squares) group VII; (triangles) group V I I I .

+

+

+ +

S bo b l X , bzX2 ... bnXn (1) where n + 1 terms are summed to estimate S,the chemical shift of a specific carbon atom whose predicted chemical shift

is desired. The X i terms in eq 1 encode aspects of the chemical environment of the carbon atom in a manner that is linearly related to the chemical shift, whereas the bi weight the in-

ANALYTICAL CHEMISTRY, VOL. 63, NO. 11, JUNE 1, 1991

dividual X i terms. Regression analysis techniques are used to compute the bi and to select the X i that are most signiimt in modeling the chemical shifts. The generation of models of this type requires a set of representative carbon atoms and associated chemical shifts. For the computed models to be precise, two conditions must be met. First, the chemical shifta used in the model generation must be accurate, and the shift assignments must be correct. Second, the carbon atoms used must span the range of chemical environments over which the generated model is expeded to be used. Stated differently, the model must be defined in such a manner that the chemical shift predictions are interpolations rather than extrapolations. If these conditions are met, the computed model can be applied with confidence to the prediction of chemical shifts of atoms in structures not included in the model generation step. Two factors are most important in determining the absolute accuracy of predictions made with chemical shift models of the type described above. First, the Xi used must adequately describe the important factors in the structural environment that induce changea in chemical shifts. Often, subtle structural changes can induce relatively large changes in observed chemical shifts. Second, the correct selection must be made of which atoms to group together in the generation of chemical shift models. For example, in a set of aromatic compounds, can all ring carbons be modeled together or must ring-bridging carbons be modeled separately? Previous studies have revealed that if prediction errors of 1 ppm or less are desired, the proper atom subsetting strategy must be found. Both of these factors are examined below in the generation of chemical shift models for the polycyclic aromatic compounds. Selection of Compounds. In selecting compounds for this study, an attempt was made to assemble a diverse set of aromatic structural backbones from among the compounds with available spectral data. An emphasis was placed on compounds with multiple fused rings. Toward this end, 33 compounds with 24 different ring backbones were selected for the model development work. Additionally, a set of 11 compounds spanning an additional seven ring backbones were selected for use in evaluating the computed models. Two structures (1 and 34) from our previous study (10)of smaller ring compounds were included (one in model development and one in model evaluation) in order to verify that the computed models were also compatible with smaller ring systems. The complete set of benzenes, naphthalenes, and anthracenes studied previously was not combined with the current set of compounds due to software limits on the total number of atoms that can be studied together. Assembly of Data for Modeling and Testing. The structures of the 44 compounds in Table I were entered into computer disk files for subsequent use in model development and testing. The entry of the chemical structures made use of a graphical procedure developed by Brugger and Jurs (20). To obtain approximate three-dimensional atomic coordinates for the compounds, molecular mechanics calculations were performed. A force field described by Stuper et al. (21)was used to compute initial coordinates of the hydrogen-suppressed structures. Hydrogens were then attached to the structures by direct calculation. Final modeling was performed by use of the MM2187) force field developed by Allinger et al. (22). This molecular mechanics procedure has been validated with many of the same s t r u h used here (22). Since the computed atomic coordinates allow steric factors to be encoded, significant effort was expended in obtaining accurate and consistent atomic coordinates for compounds 1-44. Model Development. Selection of Atom Subsets. The 33 compounds used in the model development work contained 392 topologically unique carbon atoms. Separation of these atom centers into groups for model development was accom-

V

1085

VI

Flguro 3. Examples of atoms in groups I V - V I I I .

plished by use of a semiautomated procedure developed by Small and Jurs (23). In this procedure, the environment of each carbon atom center is described by six numeric values. These values represent topological information about the chemical environment at the atom of interest and at bond steps (1-5) away from that atom. The six values can be represented as a six-dimensional vector. In order to visually inspect the data for relationships existing among the atoms, a principal components analysis (PCA) is performed. The PCA calculation effectively reduces the dimensionality of the data, while preserving the relationships that exist among the data points. As a result of the procedure, these interpoint relationships can be represented accurately in two dimensions. Reduction to a two-dimensional space is feasible, since greater than 95% of the variance in the six-dimensionaldata is encoded in the two-dimensional representation. The upper plot in Figure 2 is a principal components score plot depicting the relationships existing among the environments of the 392 carbon atoms in compounds 1-33. Atom centers in similar chemical environments form clusters in the plot. Three distinct groupings of atoms can be observed, indicated in the plot by circles, squares, and triangles. Atoms to be used for model development were separated into the following groups: group I (circles) 14 methyl atoms; group I1 (squares) 234 nonbridging ring atoms; group I11 (triangles) 144 ring-bridging atoms. The corresponding chemical shift ranges for the atom groups were 14.1-27.3,119.1-136.6,and 122.6-142.1ppm, respectively. Models were generated and found to be inadequate in explaining the chemical environments in groups I1 and 111. For these atom groups, a large number of parameters would have been needed to encode all of the chemical environments present. This was not feasible due to statistical constraints. Therefore, the groups were subdivided into smaller sets of similar atoms to try to improve the model performance. The polycyclic aromatic hydrocarbons used in this study represent significantly more varied and complex aromatic carbon environments than were used in the previous modeling study (10). The diversity in the structures may be in part responsible for the lack of precision in the models for the group I1 and I11 atoms. The procedure described above for grouping atoms in similar carbon chemical environments can be used to display this diversity. PCA was performed individually on the six-dimensional environment vectors representing the group I1 and I11 carbons. The center plot in Figure 2 displays the results for group 11, while the lower plot in the figure depicts the results for group 111. On the basis of the center plot in Figure 2,the 234 group I1 nonbridging ring carbons were divided into two groups: group IV (squares), 158 ring atoms two bonds away from a ring junction atom, and group V (circles), 76 ring atoms one bond away from a ring junction atom. In the figure, the group V atoms define two distinct clusters. The two clusters correspond to (1) atoms one bond away from a single ringbridging atom and (2)atoms one bond away from two ringbridging atoms. For group IV, the chemical shift range was

1086

ANALYTICAL CHEMISTRY, VOL. 63, NO.

11, JUNE 1, 1991

El

Table 11. Summary of Model Statistics group

no

pb

R’

Sd

I

14 158

2

0.989

11

0.951

0.588 0.985

251 125

V VI

76

6

0.967

0.624

23

4

0.972

VI1

66

7

0.937

165 76.6 59.7

VI11

55

5

0.901

1.094 1.434 0.815

IV

I20

F

42.5

Number of chemical shifta used to define model. *Number of parameters in computed model. e Correlation coefficient. dStandard error of estimate in chemical shift unita (ppm). CF value for sinnificance of the model. a

119.1-135.2 ppm. Analogously, for group V, the range was 124.8-136.6 ppm. Examples of these atoms are shown in Figure 3. The 144 ring-bridging atoms were divided into the three groups shown in the lower plot in Figure 2. Group VI (circles) was defined by the inner-ring junction atoms shared by more than two rings and atoms within a four-sided bay region. This group contained 23 atoms. Group W (squares) was comprised of 66 atoms within a three-sided bay region, and group VI11 (triangles) was defined by the 55 ring junction atoms shared by two rings and not located in a bay region. Groups VI-VIII had chemical shift ranges of 122.6-135.6,124.46-142.1, and 127.8-135.1 ppm, respectively. Examples of atoms from groups VI-VI11 are also shown in Figure 3. Removal of Atoms in Anomalous Environments. The procedure used to divide the atom centers into groups with similar chemical environments is based only on topological considerations. The six-dimensional vector describing two atoms may be similar, but the geometric and electronic environmentsof the atom may be quite different. In the present study, three atoms were incorrectly judged to be similar to the others used in a specific group. The presence of these atoms significantly degraded the model performance. The structure of biphenyl (compound 33) at the minimum energy configuration calculated by using the MM2(87) force field has one phenyl ring rotated by 90° from the other ring. All other structures in the study were planar or nearly planar. The biphenyl bridging atom was removed from the modeling procedure because of the unique environment surrounding that atom. The remaining ring atoms for biphenyl were well modeled. Additionally, two topologically unique atoms in benzo[ghi]fluoranthene (compound 27) were removed from the model development. Figure 4 shows the structure of benzo(ghilfluoranthene and the two deleted atom centers. Although this structure appears to be topologically similar to indeno[4,3,2,l]chrysene (compound 281, the chemical shifts are quite different for corresponding atoms in the two compounds. For example, the chemical shifts of the atoms labeled 2a and 2a’ in the figure differ by 5.5 ppm. Analogously, the shifts of atoms loa and 12a differ by 8.5 ppm. The presence of both seta of atoms served to confound the model development. Lastly, for the methylated phenanthrenes, compounds 3-6, only the methyl shifts were used in the model generation process, as the ring and bridging atoms did not have unique assignments. It was judged preferable to delete the atoms rather than to make arbitrary assignments. Structural Parameters. A pool of structural parameters was computed for use in modeling the chemical shifts of carbons in compounds 1-33. As in past studies, steric parameters were used on the basis of the modeled atomic coordinates. Three specific types of steric parameters were evaluated (1)sums of inverse throughspace distances from the carbon center of interest to other atoms in the structure, (2) counts of the number of atoms residing in spherical shells

Flgwo 4. Differences in chemical shifts induced by insertbn of an addltlonel aromatlc ring into a structural backbone. The structures depicted am b a u o [ g W ] f ” a ~ (27)and ~ [ 4 , 3 , 2 , 1 1 ~ (28). The chemical shifts of atoms loa and 12a differ by 8.5 ppm, even though the additional ring is insetted four bonds away. Atoms 2a and 2a’ are located five bonds from the point of structural change, yet their chemical shifts differ by 5.5 ppm.

radiating outward from the carbon center of interest, and (3) van der Waals energy parameters based on nonbonded interactions between the carbon center of interest and other atoms in the structure. Additionally, as in past studies focusing on saturated systems, the u charge computation developed by DelRe (24)was found useful. The u change parameters were computed on the basis of the partial charges of atoms residing at fixed numbers of bond steps away from the carbon center of interest. In saturated chemical systems, structural changes rarely have a significant influence on the chemical shifts of atoms five or more bond steps from the change. In aromatic systems, however, the long-range effects of structural changes are more pronounced. In effect, the r system provides a pathway over which the electronic effects of structural changes are transmitted. An example of these long-range interatomic effects can be seen in the two structures in Figure 4. This example showed the tremendous difference the addition of a phenyl group can have upon the chemical shifts of aromatic carbons five bonds away. To encode theae long-range effeds, several parameters based on the Htickel molecular orbital (HMO)theory were developed for the modeling process. Originally developed during our work on aromatic systems of one-three rings (IO),these parameters were also found to be highly useful with more complex ring systems. The parameters found useful in thia work included calculations based on superdelocalizability, autopolarizability, and free valence. These parameters can be calculated for each atom center in the molecule. Autopolarizability is a measure of the way the r charge changes with the electronegativity of the corresponding atom. The other two HMO parameters used, superdelocaliibility and free valence, are measurea of chemical reactivity. In defining the HMO-based structural parameters, the autopolarizability, superdelocalizability, or free valence values were usad at fixed bond steps away from the carbon center of intereat or at ortho, meta, or para positions in the ring system relative to the carbon of interest. For the latter calculation, the extended definitions of ortho, mete, and para ring positions developed in our earlier study (10) were used. Calcuhtion of Models. The selection of specific structural parameters for use in modeling the six atom goups and the calculation of the model coefficients were performed by use (25).Regmaion of forward stepwise multiple linear ”&on subsetting techniques of this type attempt to find the best combination of model variables from among a pool of candidate variables. An important concern in using these techniques is the chance that a subset of variables may be found that exhibit a random high correlation with the dependent variable of observed chemical shifts. While a total of 40-60 structural parameters were investigated for each of the six atom groups, we minimized chance correlations by including only 25-30 variables at a time in the variable pool used by

ANALYTICAL CHEMISTRY, VOL. 03,NO. 11, JUNE 1, lQQ1 1087

Table 111. Summary of Computed Models no. coeff t valuea 1 -129.2

16.5

Par-

no.

Group I Model sum of l / ( d i ~ t a n c e from ) ~ ~ a-He to hydrogens 2 from three to five bonds away

coeff -130.6

t valuea

11.0

P " most negative u charge among non-H atoms two bonds from the target carbond

intercept 29.2 1 -4.440

8.26

11.95

5.63

2

3 -5.172

1.06

4

4.76

7.098

5 52.54 6 -73.15

1 -43.93

2

-34.34

3

-17.64

4 -352.5

7.60 11.9

Group IV Model no. of tertiary carbons two bonds from the target 7 carbon van der Waals energy based on interactions between the target carbon and hydrogens 8 from three to seven bonds away sum of l / ( d i ~ t a n c e from ) ~ a-H to non-H atoms from three to four bonds away 9 van der Waals energy based on interactions between the a-H and non-H atoms from three 10 to seven bonds away sum of l / ( d i ~ t a n c e from ) ~ a-H to hydrogens from three to four bonds away 11 sum of I/(di~tance)~ from a-H to hydrogens from three to five bonds away intercept

88.23 -122.6 -2.839 1032 1.194

8.80

sum of l / ( d i s t a n ~ e from ) ~ target

carbon to hydrogens four bonds away 8.11 sum of l / ( d i s t a n ~ e from ) ~ target carbon to hydrogens three bonds away 7.03 no. of non-H atoms from 2.7-3.4 A from the target carbon 6.25 most positive u charge among non-H atoms bonded to the target carbon 3.26 no. of hydrogens from 2.4-3.2 A from the target carbon

143.1

Group V Model most negative u charge among non-H atoms 5 92.14 bonded to the target carbon 7.40 sum of l / ( d i s t a n ~ e from ) ~ a-H to hydrogens 6 -6.992 three bonds away 1.71 most positive u charge among non-H atoms five bonds from the target carbon 5.99 most positive u charge among non-H atoms intercept 131.36 three bonds from the target carbon

3.94

sum of l / ( d i ~ t a n c e from ) ~ a-H to hydrogens six bonds away van der Waals energy based on interactions between the a-H and all atoms from three to seven bonds away

3.40

sum of autopolarizabilities of atoms meta to the target carbon

-8.634

4.32

-0.6487

2.96

11.08

2.41

sum of free valencies of atoms ortho to the target carbon sum of superdelocalizabilities of atoms meta to the target carbon minimum superdelocalizability among atoms bonded to the target carbon

4.84

5.37

Group VI Model 1 58.93

2

-106.1

3

10.45

9.33

non-H atoms two bonds away 7.63 sum of l / ( d i s t a n ~ e )from ~ target carbon to non-H atoms three bonds away 7.13 maximum autopolarizability among atoms five bonds away from the target carbon

1 -10.33

7.93

2

14.40

6.55

3

-1.297

7.77

4

-17.97

4.74

1 8.095

s u m of l/(distance)* from target carbon to

12.4

2

-1.284

3.02

3

0.6367

3.35

4

-1.39

intercept 85.57

Group VI1 Model sum of autopolarizabilities of atoms para to the 5 target carbon sum of autopolarizabilities of atoms ortho to the 6 target carbon no. of non-H atoms from 4.8 to 5.4 A from the 7 target carbon van der Waals energy based on interactions between the target carbon and hydrogens intercept from three to seven bonds away

121.6

Group VI11 Model van der Waals energy based on interactions 4 -7.325 between the target carbon and non-H atoms from three to seven bonds away sum of autopolarizabilities among atoms para to 5 -4.420 the target carbon sum of superdelocalizabilities of atoms three bonds from the target carbon intercept 123.6

2.65 2.56

minimum autopolarizability among atoms five bonds from the target carbon minimum free valence among atoms bonded to the target carbon

t value for the significance of the coefficient. A t value 24.0 is highly significant. bInteratomic distances are computed with the approximate atomic coordinates obtained from the molecular mechanics calculations. The inverse form of the s u m and the weighting exponent serve to decrease the contribution of long-range interactions. 'An a-H is defined as a hydrogen attached to the target carbon. dThe target carbon is the carbon atom whose chemical shift is being simulated.

the stepwise regression. Given the sizes of the six atom groups, this procedure ia compatible with the guidelinea recommended by Topliss and Edwards (26) in their study of chance correlations in regression analysis. For each atom group, several combinations of structural parameters produced models with acceptable statistics. In the selection of a single model for each group, the standard error of estimate was considered, along with the number of terms in the model and the degree of collinearity existing among the independent variables. The condition number computed from the set of independent variables defining the

model was used in the assessment of collinearity (27). Table I1 summarizes the statistics describing the models selected as the overall best for the six atom groups. In each case, the correlation coefficient is greater than 0.9. Additionally, the standard errors of estimate for the models are in the range of 1 ppm. A total of 32 different structural parameters were used across the six models. Included were 16 steric parameters, 10 HMO-basedparameters, five u charge parameters, and one simple topological parameter (count of the number of tertiary carbons two bonds from the carbon center of interest). The

1088

ANALYTICAL CHEMISTRY, VOL.

0

5

10

15

63,NO. 11, JUNE 1, 1991

10

25

30

COMPOUND NUMBER

Flgwr 5. Bar graph depicting the average spectral prediction errors for the compounds used in the model generation (1-33).

Flgurr 6. Ber graph depicting the average spectral prediction errors for the eleven compounds (34-44) used in the model testing.

parameters used in the six models are summarized in Table 111, along with the corresponding regression coefficients and t values for the significance of the coefficients. An inspection of Table I11 reveals that, to a large degree, the three ring junction models (groups VI-Vm) rely upon the structural parameters derived from the HMO calculations. Of the 16 parameters used across the three models, 11 were HMO-based parameters. Correspondingly, none of the three models made use of Q charge parameters. A similar result was obtained in our initial study focusing on benzenes, naphthalenes, and anthracenes (IO). The inclusion of information about the A system is critical in obtaining good models for the bridging atoms. The nonbridging ring carbons can be modeled effectively without parameters describing the ?r system, however. Steric parameters were most important in the group IV and V models, accounting for 12 of the 17 parameters used. Four of the remaining parameters encoded information about the u charges on atoms around the carbon atom of interest. The importance of the steric parameters is understandable, given the occurrence of three-sided and/or four-sided bay regions in the majority of the compounds studied. Nonbonded interactions are extremely significant in the bay regions. The structural parameters used in the six models combine to describe the chemical environments of the carbon atoms studied. The nature of the parameters used makes it difficult to discuss individual parameters in terms of encoding shielding or deshielding information. Thus, it is important to validate the models on statistical grounds and through application to compounds not included in the generation of the models. Model Evaluation. Simulated spectra for compounds 1-33 were assembled by combining the predicted chemical shifts produced by the six computed models. The simulated spectra were then compared to the corresponding experimentally observed spectra. For the 33 compounds, the mean of the average spectral prediction errors was 0.68 ppm. Figure 5 is a bar graph depicting the average prediction errors for each of the compounds used in the model generation. The simulated spectra were further evaluated by performing a library search against a collection of 563 experimentally observed spectra, including the spectra of the 44 compounds in Table I. For this test, the 28 complete simulated spectra were compared to each of the library spectra. Due to the deletion of atoms, the simulated spectra of compounds 3-5, 27, and 33 were incomplete and were not included in the test. For each spectral comparison, chemical shifts were sorted from smallest to largest, and the sum of squared differences was computed between the corresponding shifta in the two sorted

lists. The library spectrum producing the smallest comparison score was judged most similar to each simulated spectrum. In 25 of the 28 casea,the corresponding experimental spectrum was found most similar to the simulated spectrum. This represents the best possible result. In the remaining three cases, the correct experimental spectrum was observed ns the second nearest match. When combined with the small spectral prediction errors, these results confirm the high accuracy of the computed chemical shift models. Perhaps the best method to evaluate the predictive ability of a model is to use the model to predict chemical shifts for carbon atoms not included in the model development. A total of 11 compounds, whose shifts were known, but not included in the model generation, were used as ‘unknowns”. As noted previously, seven of the 11 compounds contained aromatic ring backbones not represented among the compounds used in the model generation. For the prediction compounds, 34-44, the topologically unique carbon atoms were placed appropriately into groups I and IV-VIII, and the previously computed models were used to predict the chemical shifts. Complete spectra were then assembled from the individual predicted chemical shifts. Since the actual chemical shifts of these compounds were known, it was possible to compare the predicted and known shifts. For the 11 prediction compounds, the average error between the predicted and observed spectra was 1.02 ppm. Figure 6 depicts the average prediction error for each of the 11 compounds. A spectral library search was also performed. For nine of the 11 compounds, the predicted spectrum was found to be the nearest match to the observed spectrum. The remaining compounds, 35 and 41, were found as the second and fourth nearest matches, respectively. Compound 41 had an average prediction error of 1.64 ppm, and was therefore matched incorrectly with other compounds in the spectral library. The ring backbone for this compound was not well represented among the structures used in the model development. Another molecule which had a high standard error (1.59ppm) was 4-methylphenanthrene (36). There may be a large amount of steric interaction between the methyl group at the 4-position and other hydrogen atoms from the bay region of the molecule. This chemical environment was not well encoded by the parameters which were available for describing the structure. To provide a representative sampling of the 11 compounds in the test set, the predicted and the simulated spectra for six compounds are plotted in Figure 7. The spectra corresponding to compounds 36,36,38,39,41, and 43 are shown.

ANALYTICAL CHEMISTRY, VOL. 63, NO. 11, JUNE 1, 1991

I

Obaerved spectrum

I

Simulated Spectrum #39

I

Observed SpectNm

I

Simulated Spectrum #41

1

I

I

Obaerved Soectrum

I

1

1089

Observed Spectrum

Simulated Spectrum #43

Observed spectrum

1

1

FlgUrO 7. Simulated and observed spectra for compounds 35, 36,38, 39, 41, and 45. Apparent discrepancies between the number of iims in the Simulated and observed spectra are due to limited resolution In the plots.

1090

ANALYTICAL CHEMISTRY, VOL. 63, NO. 11, JUNE 1, 1991

Lack of plot resolution has led to some overlap of lines. Therefore, some simulated and observed spectra may appear to have different numbers of spectral lines. Compounds 38 and 39 have the smallest prediction errors among the 11 compounds, while compounds 36 and 41 have the largest prediction errors. The simulated spectra of compounds 35 and 43 are intermediate in terms of prediction errors.

CONCLUSIONS The resulta presented in this work indicate clearly that highly accurate chemical shift models can be generated a c r a a range of aromatic structural backbones. In both the model generation and prediction tests,differences between simulated and observed chemical shifts were seldom greater than 1.0 ppm. The simulated spectra were distinctive enough to be retrieved correctly by a library search in 34 of 39 cases. In four of the five remaining cases, the simulated spectrum was found aa the second closest match to the experimentallyobserved spectrum. These results are particularly significant, given that seven of the 11 compounds in the prediction set contained ring backbones not included among the compounds used in the model generation work. Furthermore, the performance of the computed models was not severely affected by a small number of available chemical shifts for two of the atom groups (I and VI). The successful simulation of the spectra of the 11 test compounds validates the chemical shift models and the procedures used in the selection of variables. The success of this work was keyed by two factors. First, it was essential to find the proper atom subsetting procedure. The topological environment vector approach was found to be particularly useful in the investigation of possible subsetting strategies. In considering the subsets chosen, it could be argued that the use of six separate atom groups demonstrates a lack of generality in the modeling approach to spectrum simulation. We feel, however, that it is a significant achievement to use only six atom groups to model the diversity of aromatic carbon atom environments in compounds 1-33. The second key to the success of this work was the availability of the HMO-based structural parameters for use in obtaining accurate models for the ring-bridging atoms in groups VI-VIII. No acceptable models could be obtained for these atom groups without the HMO-basedparameters. While

the HMO calculation is admittedly not a state-of-the-art method for computing precise electronic structural parameters, it is computationally rapid and it does provide useful descriptive information for aromatic systems of the type studied here.

LITERATURE CITED B r O W , W. Anal.

Acde 1978, 103, 355-385. &ant, D. M.; Paul, E. G. J . Am. Chem. Soc. 1964, 86. 2984-2990. Lhtdman, L. P.; dam, J. a. And. 1971, 13, 1245-1252. E m , D. F. k b ~R . m . 1979, 43, 499-524. SmHh, D. H.; J m , P. C. J . Am. Umvn. Soc. 1978, 100,3316-3921. Small, G. W.; Jvs, P. C. A M I . chsm. 1965, 55, 1128-1134. Qastdgec, J.; Seller, H. Angew. Chem.. Int. Ed. Engl. 1985, 24, 887-889. Small, (3. W.; McIntyre, M. K. Anal. Chem. 1989, 61, 886874. Sutton, 0. P.; Jurs, P. C. Anal. Chem. 1990, 62. 1884-1891. Barber, A. S.; Small, G. W. Anal. Chem. 1989, 61, 2858-2884. Spscbal AMs of pdLcvacrc A m t k c”&; Karcher. W., Fordham, R. J., D u m ,J. J., Glade, P. G. J. M., Llghart, J. A. M.. Eds.; D. Reidel: DordrecM, The “ l a n d s , 1983; Vol. 1. Spscbel Atlas of PO&CYC#C A m t k -; Karcher. W.. Ed.; Kluwer Academlc: Dordrecht, The Netherlands, I 9 8 8 Vol. 2. Ozubko, R. S.; Suchanan, G. W.; Smlth, I. C. P. Can. J . Chem. 1974, 52, 2493-2501. The SedMer Standerd Spsctre lacM ; Sa& Research Laboratorles: Phlladelphla, 1978; Vol. 2, p 274c. Letcher, R. M. &g. ME@?.R m . 1981. 16, 220-223. Wolfenden, W. D.; &ant, D. M. J . Am. Chem. Soc. 1968, 88, 1498-1 502. Stothers, J. B.; Tan, C. T.; Wllson, N. K. Org. Map. R e m . 1977, 9 , 408-413. ..- . .- . Bax, A.; Ferrettl, J. A,; Nashed, N.; Jerkre, D. M. J . Org. chem.1985, 50. 3020-3034. S n h , 0. W.; Jus, P. C. Anal. Chem. 1983, 55, 1121-1127. Brqger, W. E.; Jurs, P. C. Anal. (2”.1975, 47, 781-784. Stuper, A. J.; Bruggef, W. E.; J m , P. C. ConputwA&tedShrdes of chsnnlcelSIhCttn and BbbGkal FwlCtbIl; Wy-Intersclence: New York, 1979; pp 83-90. Sprague, J. T.; Tal, J. C.; Yuh, Y.; Alllnger, N. L. J . Comput. Chem.

am.

.

lP17. .- -. 8 - ,. 581-803. - - . ..- . Small, 0. W.; JUS, P. C. Anal. Chem. 1984, 56, 1314-1323.

DeRe, G. J . Am. Chem. Soc. 1975, 07, 8809-8815. Draper, N. R.; Smlth, H. AppVed Repssbn Ana@&, 2nd ed.; WlleyInterscience: New York, 1981; Chapter 8. Topllss, J. 0.; Edwards, R. P. J . Msd. Chem. 1979, 22. 1238-1244. -by, D. A.; Kuh, E.; W M , R. E. R Db@O~tiw: Idsntit)&g InftrsnLl Data and sovcSs of Cok4wam;Wlley-Intersclence: New York, 1980 Chapter 3.

RECEIVED for review October 15,1990. Accepted February 19,1991. Funding for this work was provided by the Shell Development Co., Houston, TX.