Signal number prediction in carbon-13 nuclear magnetic resonance

Signal number prediction in carbon-13 nuclear magnetic resonance spectrometry. Craig A. Shelley, and Morton E. Munk. Anal. Chem. , 1978, 50 (11), pp 1...
0 downloads 0 Views 719KB Size
1522

ANALYTICAL CHEMISTRY, VOL. 50, NO. 11, SEPTEMBER 1978

perform qualitative analysis of the petroleum, the absorption bands due to the dispersant must be eliminated as well. Since dispersants and petroleum are both complex mixtures of hydrocarbons, chemical separation of these two mixtures is extremely difficult, if not altogether impossible. To determine the feasibility of separating these two mixtures spectrally, a mixture of Kuwait oil and dispersant was placed in CS2. The top spectrum in Figure 7 is of the mixture. The concentration of the oil and the dispersant were both known. A CS2 solution of a known amount of dispersant was also prepared. The middle spectrum is the result of the subtraction of the spectrum of the dispersant from that of the oil/dispersant mixture, using the 1118 cm-' of the dispersant for scaling. The bottom spectrum is of Kuwait oil. Comparison of the lower two spectra shows them to be almost identical, indicating a successful separation of oil and dispersant spectra. The fingerprint of the oil is free from interfering dispersant bands and can be used for qualitative analysis. When subtracting the dispersant from the mixture, the scaling factor is calculated so that the intensity of the dispersant bands in the pure solution matches that of the dispersant bands in the mixture. The scaling factor provides a means to determine the amount of a component in a mixture assuming Beer's law is followed. The concentration of dispersant in CS2 was known. By multiplying the scaling factor obtained from the subtraction in Figure 7 by the known

concentration, a concentration of 540-545 ppm was obtained. The concentration of the dispersant in the mixture was known to be 542 ppm. Knowing the concentration of dispersant in the mixture, we can now determine the amount of hydrocarbon due to petroleum by measuring the infrared band a t 2930 cm-' and subtracting the dispersant value from the value for total hydrocarbons.

ACKNOWLEDGMENT We thank John M. Cece, the contract monitor, and Mason P. Wilson, the project director, for their support. We also express our appreciation to Paul Aceto and Mark Ahmadjian for their help with the interfacing and also to Paul Aceto for the assembly language software. LITERATURE CITED (1) C. W. Brown, P. F. Lynch, and M. Ahmadjian, Ind. Res.lDes.,20 (5), 122 (1978). (2) R. J. Obremski, Beckman Instruments, Inc., Irvine, Calif., private communication, Feb. 1978. (3) J. P. Ccates, PerkiwElmer Limled, Beaconsfield. Budtinghamshire, Engbrd, private communication, Feb. 1978. (4) C . W. Brown, Wai Ping Lee, P. F. Lynch, and M. Ahmadjian, "Environmental Analysis", G. W. Ewing, Ed., Academic Press, New York, N.Y., 1977, pp 79-91.

for review April 10, 1978. Accepted June 29, New York, 1978. This research was supported by the Environmental Control Division. Department of Energy (Contract No.

RECEIVED

E (11-1)-4047).

Signal Number Prediction in Carbon- 13 Nuclear Magnetic Resonance Spectrometry Craig A. Shelley and Morton

E. Munk'

Department of Chemistry, Arizona State University, Tempe, Arizona 8528

The simulation of spectroscopic data is a major component in the design of a computer model of the structure elucidation process. This paper describes an artificlai intelligence procedure (CARBON-13) which predicts the number of slgnals In the broad-band decoupled 13CNMR spectrum of a compound from the molecule's graph. CARBON-13percelves structurally unique carbon atoms uslng topology and rules to detect some dlastereotoplcally related carbon atoms. The program rellablllty was evaluated uslng a flle of 184 structurally dlverse molecules. The results demonstrate that slgnal number predlctlon of 13C NMR spectra Is both feaslble and useful.

A major problem confronting the chemist is the need to rapidly and reliably deduce the structure of an unknown organic compound. An early step in the process is the reduction of chemical and spectroscopic data to structural information. Where feasible, the family of related molecules consistent with this derived structural information is constructed, a process generally facilitated by the chemist's intuition. However, that task is a t best tedious, time-consuming and subject to human error. Three versatile computer programs, STR-3 ( I ) , its successor ASSEMBLE (2), and CONGEN (3) have been described, which construct all molecules compatible with given structural information and relieve the chemist of that role. 0003-2700/78/0350-1522$01 .OO/O

A comparison of the predicted spectra of each of the uctures constructed with the observed spectra follows and provides the basis for retrospective pruning A d ranking, which yields the smallest possible set of compatible molecules arranged in order of decreasing probability. This sequence is also amenable to computer automation. As a step toward achieving this latter goal, CARBON-13, a procedure for predicting the number of signals expected in the broad-band decoupled 13C NMR spectrum of a given molecule, has been recently designed. The focus of this paper is directed toward a description of program implementation, prediction results, limitations, and applications. CARBON-13 is one component of a network of computer programs (CASE) designed to model the entire process of structure elucidation as widely practiced by the organic and natural products chemist ( 4 ) . At its present stage of development, CASE can accelerate and make more reliable the entire process of determining the structure of multifunctional molecules, e.g., biomolecules. CASE consists of three major program modules: chemical and spectral data interpretation, molecule assembly, and spectrum simulation. CARBON-13 is an operating component of the last module. At present, no completely general method exists for simulating complete I3C spectra. Empirical additivity rules have been developed to simulate spectra for acyclic hydrocarbons and a variety of monofunctional structural classes (5). Many of these same additivity rules have also been used to automate S

0 1978 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 50, NO. 1 1, SEPTEMBER 1978 -

Table I. Structural Characteristics of File Compounds no. of structure type compounds acids 2 alcohols 2 aldehydes 1 alkaloids 22 amides 1 amines 11 amino alcohols 1 aromatic compounds (functionalized) 6 azo compounds 1 carbo hydrates 7 cyclic compounds 26 monocyclic, multifunctional 2 polycyclic, multifunctional 24 esters 6 ethers 4 halides 2 hydrocarbons 22 imides 2 ketones 13 lactones 1 nucleosides 5 1 oximes phenols 4 phosphate esters 2 phosphines 3 quinones 4 steroids 8 sulfides 1 terpenes 26 the structure elucidation process for exactly delimited monofunctional compound classes (6-91, e.g., acyclic amines. In general, within the given class of compounds, these programs are highly successful and are capable of identifying the unknown itself or a very limited number of consistent structures. 13C NMR library search procedures may be considered to possess limited spectrum simulation capabilities. Of course, the reference library size and its breadth of structural variation affect performance. As an example, the RACES program (10) can predict chemical shifts based on a library of atom-centered fragments (structural fragments viewed as the addition of a limited number of concentric layers of neighboring atoms to a central atom). Schwenzer and Mitchell have also described a program which automatically determines atom-centered fragments and the corresponding chemical shifts for a file of compounds with assigned 13C NMR chemical shifts (11).

EXPERIMENTAL The data set used in this study consists of 184 structures (as their connection tables-a computer compatible listing of atoms by elemental type, the atoms to which each is bonded and bond multiplicity) and the number of experimentally observed signals in the broad-band decoupled 13C NMR spectrum of each. The compounds represent a broad spectrum of structure types emphasizing multifunctional molecules of complex carbon/ heteroatom skeletons (12). To ensure the diversity of the data set all of the compounds (114) in the Johnson and Jankowski compilation (13) that contain at least ten carbon atoms, but not more than 50 atoms, only covalent bonds and only oxygen, nitrogen, sulfur, phosphorus, and halogen atoms with a valence of four and less are included. This file was expanded with 70 additional compounds, largely naturally-occurring,derived directly from the primary literature. Table I describes the structural characteristics of the molecules in the data base. The “mean compound” in the file contains 14.99 carbon atoms, 2.39 oxygen atoms, 0.56 nitrogen atom, 0.02 sulfur atom, 0.03 phosphorus atom, and 0.17 halogen atom. The “mean compound” has 18.15 nonhydrogen atoms and 6.27 sites of unsaturation (rings plus multiple bonds). The actual number of carbon atoms in the file compounds ranges from a minimum of ten to a maximum of 30.

1523

THEORY A N D IMPLEMENTATION is an artificial intelligence simulation procedure that attempts to parallel the chemist’s reasoning, Le., a set of rules for simulating spectra has been formulated. If the program makes a simulation error, the rules can be altered to correct the mistake. In principle, the number of signals in a broad-band decoupled spectrum is equal to the number of anisochronous (magnetically nonequivalent) carbon atoms. Thus, peak prediction reduces to the perception of the number of structurally unique carbon atoms. ‘Theuniqueness of an atom is a function of its environment which is defined using both topological and stereochemical properties. The topological properties consist of the connectivity of the constituent atoms and atomic labels, e.g., “nitrogen”. The stereochemical properties of‘ a molecule are, in general, a function of the arrangement of constituent atoms in space and can be related to its symmetry properties. Complications can arise in the case of some conformationally mobile molecules. For example, the ambient temperature 13C NMR spectrum of trans-decalin displays 3 signals as predicted from topological and stereochemical properties; that of cis-decalin also shows 3 signals, but that is 2 less than predicted (14). As the temperature is lowered, the number of signals in the spectrum of cis-decalin increases to the expected 5 , as the rate of chair-chair F? chair-chair interconversion slows on the NMR time scale. The spectrum of trans-decalin is not temperature-dependent, consistent with its conformational rigidity. (It is interesting to note that, a t room temperature, topological properties alone correctly predict the number of observed signals for each isomer.) Thus, a complete consideration of the relationship of stereochemical properties to 13C NMR spectra must include conformational effects and energy barriers. Two conditions relate to carbon atom equivalence. First. topologically distinct atoms must be nonequivalent. Second, topologically identical atoms may be nonequivalent when stereochemical properties are considered. For example, the geminal methyl groups of 2-methyl-2-buteneare topologically identical, but diastereotopic to one another, and therefore nonequivalent. The above analysis of atom uniqueness suggested to us that the perception of topological symmetry can form the basis for peak number prediction. An efficient procedure for topological symmetry perception has already been written as part of ASSEMBLE (15) and was adapted for use in CARBON-13. This procedure consists of two steps. First, each nonhydrogen atom is partitioned into a class by using elemental type and the number of adjacent two-electron covalent bonds to nonhydrogen atoms. Second, further partitioning is achieved by examining class membership of adjacent atoms. This latter step is repeated until no additional classes result. For some polycyclic compounds, e.g., 1, CARBON-13

1

initially associating a cycle property list to each atom is necessary to correctly identify topological symmetry (16). For each atom in a molecule of n atoms, the cycle property list is comprised of the number of distinct elementary cycles in which the atom occurs, starting with those of length 3 and increasing to those of length n - 1. However, such compounds are rarely encountered and therefore efficiency considerations suggest the use of the original procedure (15). I t is well known that diastereoi,opically related geminal methyl groups are a commonly encountered structural feature of organic molecules. This is confirmed by the data file used

1524

ANALYTICAL CHEMISTRY, VOL. 50, NO. 11, SEPTEMBER 1978

in this study, 2070 of the compounds of which contain this moeity. Therefore, a few simple rules were devised for incorporation into CARBON-13 to provide for the recognition of diastereotopically related geminal methyl groups. Diastereotopically related geminal methyl groups have their origin in two structural characteristics: first, the presence of chirality in the molecule and second, geometric constraints. In the latter case, the presence of an isopropylidene group that is part of an unsymmetrically substituted carbon-carbon double bond, e.g., as in 2-methyl-2-butene, is a common source of such diastereotopism. Incorporation of rules for program recognition of diastereotopic geminal methyls requires at the minimum specific definitions for both chiral centers and unsymmetrically substituted double bonds. CARBON-13 perceives only carbon atoms with four topologically distinct attachments as chiral centers. Other sources of molecular chirality are not perceived a t this time. Likewise, the symmetry of substitution a t the double bond is based solely on topology at this time. In practice, a diastereotopic relationship between geminal methyl groups is a necessary but not sufficient condition of anisochrony. For example, in the case of a chiral center as the origin of the diastereotopism, the magnitude of the chemical shift difference between the methyl groups is in part dependent on the spatial relationship between them and the chiral center, e.g., the distance along the molecular backbone. The data file contains 28 compounds with diastereotopic geminal methyl groups attached to tetracoordinate carbon. The effect of the distance between diastereotopic methyl group and chiral center, Le., the smallest number of bonds separating the two, was examined. Of the 28 occurrences, magnetic equivalence, Le., overlapping signals, was observed in only three cases. The bond separation was 5 in two of these compounds and 7 for the third. However, there are four compounds with anisochronous geminal methyl groups that are five bonds removed from the chiral center. For this reason, a maximum bond limit was established at five, Le., all geminal methyls within 5 bonds of a chiral center are considered by CARBON-13 to be anisochronous; those further removed are considered to be isochronous. It is also conceivable that diastereotopic geminal methyl groups arising from the presence of an unsymmetrically substituted carbon-carbon double bond (isopropylidenegroup) may be isochronous if the topological difference between the substituents on the adjacent carbon atom is small. Of the 6 compounds in the data base with isopropylidene groups possessing structurally dissimilar substituents on the adjacent carbon atom, each displays anisochronous methyl groups; therefore, all such geminal methyl groups are predicted by the program to be anisochronous a t the present time. CARBON-13, requiring 3.7K words, is coded in FORTRAN and runs on the Arizona State University UNIVAC 1110 computer. The program and data set are available from the authors on request.

RESULTS AND DISCUSSION As described earlier, in principle, the number of signals expected in the “ideal” 13C NMR spectrum of a compound of known structure and stereochemistry is accurately predictable from its topology and symmetry. In practice, approximations are considered in the interest of minimizing computational time. These approximations, discussed below, may lead to predicted values that are less than theoretical, but never greater. But, in practice, real world NMR spectrometers are also less than ideal, and often reveal fewer than the theoretically expected number of signals. Thus, structurally distinct carbon atoms may be reported by the spectrometer to be structurally identical. In the data file of 184 spectra used in this study, approximately one third display

the overlap of signals of nonidentical carbon atoms. In this study, the quality of the prediction results was examined with and without the incorporation of the rules for the detection of diastereotopically related geminal methyl groups. In the absence of these rules, the presence of such diastereotopic carbon atoms that are anisochronous would lead to low peak number predictions; however, it was possible that these errors could in part be balanced by the relatively common occurrence of overlapping signals. In the study, it also appeared advisable t o establish whether a limited concentric environment about an atom could serve to identify topological equivalence for purposes of peak prediction. We know that although the observed chemical shift of a carbon atom is a function of the complete molecular environment, it is most profoundly influenced by a more limited concentric sphere about the atom. Thus, the useful Lindeman-Adams correlation for acyclic hydrocarbons ( 5 ) ,an empirical additivity correlation, takes into account e atoms at most. A comparison of predicted peak number values with observed values as the expanse of the sphere radius examined by the program is varied, is summarized in Table 11. The column headed TSO records the results derived from an examination of topological symmetry only. For comparison, the results obtained by including the rules for identifying diastereotopic geminal methyl groups are summarized in the adjacent column headed TSS. Table I1 also records the central processor time (CPU) per molecule for each run. As expected, it increases as the concentric environment is expanded, and the search for diastereotopic geminal methyl groups adds 15-20% to the CPU time required. In a consideration of prediction errors, i.e., the difference between predicted value and observed value (A in Table II), a distinction must be made between program errors and “other errors.” CARBON-13 is designed to predict the number of signals expected in an “ideal” spectrum. But, because of the appearance of errors beyond the control of the program, Le., “other errors“, CARBON-13 incorporates approximations and is. by design, not rigorous. Program errors, Le., discrepancies between predicted and theoretical values, are of two types: (1) stereochemically distinctive atoms not perceived by the program instructions a t this time, and (2) topological and/or stereochemical differences within the purview of the program that are not perceived as a consequence of structural features that appear beyond the user-defined sphere radius and bond separation limits. Thus, program errors lead to predicted values that are less than theoretically expected values, but, because of “other errors”, not necessarily less than experimentally observed values. (In rare cases of compounds with one or more pairs of topologically equivalent asymmetric carbon atoms ( I 7), equivalent geminal methyl groups may be reported by program to be diastereotopic, thus leading to a prediction one greater than the theoretical value.) A t the present time, CARBON-13’s perception of a diastereotopic relationship between carbon atoms is limited to the geminal methyl groups described earlier; other structural features giving rise to diastereotopic atoms are not discerned. For example, in the achiral molecule 2-methyladamantane, compound 2 (28),

6 2

the diastereotopic methylene and methine carbon atoms are not recognized.

ANALYTICAL CHEMISTRY, VOL. 50, NO. 1 1 , SEPTEMBER 1978

1525

Adjustments in the user-defined parameters-sphere radius in topological symmetry perception and bond separation values in diastereotopic methyl group recognition-can correct errors that result from taking a too-confined view of atoms in a molecule. For example, in 1-hydroxyphenazine (3) (19),

Lc

3

0 Lc

all carbon atoms are anisochronous, but the 7 and 8 positions are equivalent when the concentric environment examined is limited to y neighbors (a sphere radius of 3). “Other errors” arise from: (1)constitutionally similar, but topologically different atoms with the same observable chemical shift; a matter of instrument resolution, (2) purely accidental isochrony, and (3) conformational effects. The first two give rise to spectra that display less than the theoretically expected number of signals. Thus, CARBON-I3 with the maximum sphere radius set a t 3 correctly predicts 11 signals for 3-undecanone (4),

CH,CH,C(CH,),CH,

I1

0

5

4

x

9

w

I+

oorlowmmr-w~+oo i++m

/c

o m m t - m ~ i m m i o o o i m o i

i

i

cj

e

but only 9 appear because of some overlapping methylene signals that are topologically distinct (13). Purely accidental isochrony is a surprisingly common source of error. 0-Pinene ( 5 ) exhibits two such occurrences even though there are only 10 distinct carbon atoms (13). Conformational effects, where they operate, do not lead to “errors” in the same sense as (1) and (2) above, because the spectra produced (ignoring overlapping signals) are correct. I t is just that the consequence of conformational mobility, if any, on signal number is difficult to predict because it is dependent not only on the nature of the conformational change, but also on the energy barrier to rotation about the pertinent bonds. Thus, without a calculation of the energy barrier t o the chair-chair s chair--chair interconversion of cis-decalin, it is not possible to predict in advance on topological and symmetry grounds alone that at room temperature cis-decalin will display 3 and not 5 signals. Because of the low energy barrier in this case, a time-averaged spectrum is obtained in which “equivalence” is conferred on atoms that are diastereotopically related in the static conformer. Likewise, without a calculation of the energy barrier, it is not possible to predict that the two methyl signals of N,Ndimethylformamide will not be time-averaged, Le., the energy barrier to rotation about the amide C-N bond is sufficiently great to result in diastereotopically related methyl groups on an NMR time scale (13). This is not an isolated occurrence. There are numerous examples of atropisomerism (molecular asymmetry that results from restriction of free rotation (20)) as the origin of anisochronous atoms. The bulky chlorine atoms of compound 6 (21)

2m

6

lead to sufficiently restricted rotation that differentiates carbon atoms 2’ and 6’, and 3’ and 5’ in the NMR spectrum.

1526

ANALYTICAL CHEMISTRY, VOL. 50, NO. 11, SEPTEMBER 1978

Consider the quality of the peak simulation data summarized in Table 11. With the limited stereochemical perception capabilities functional and the maximum sphere radius set at 3, CARBON-13 generated signal number values that agree exactly with observation for 118 (6470)of the 184 spectra on the file. That number rises to only 120 (65%) when the topological symmetry of the entire molecule is considered, Le., n set at infinity. If a prediction error range of fl is permitted ( A range from +1 to -l),168 of the 184 compounds ( 9 1 7 ~are ) correctly predicted with n set at 3. However, that number drops t o 161 (88%) when the topological symmetry of the entire molecule is considered ( n = m l . For a maximum sphere radius less than 3, the proportion of low predictions rises rapidly. As a smaller concentric environment is examined, a higher degree of topological symmetry is falaciously perceived, leading to a smaller number of distinct carbon atoms predicted. I t is interesting to note that for a maximum sphere radius of 3, the error bias is toward the high prediction, Le., the probability of chance overlap of peaks appears t o be greater than program error. That bias increases even more for values of n greater than 3. For a sphere radius greater than 5, i.e., n = 6, the fraction of low predictions drops to below 2 % (3 compounds). Each of the three low predictions arises from failure of the program to detect diastereotopic carbon atoms. Without the rules for detection of diastereotopic geminal methyl groups, the proportion of low predictions rise to 13% (24 compounds) where n = 6 and m. For n = 3, the number of low predictions drops from 40 (22%) to 20 (117~) when the stereochemical rules are incorporated. Finally, the prediction errors appear to increase with molecular size and complexity. Using a maximum sphere radius of 3 and a prediction error range of kl,the results with the 114 compounds taken from the Johnson and Jankowski compilation (13) were compared t o those derived from the 'io compounds, largely natural products, taken from the primary literature. The former gave correct answers, within the error limit set, for 108 compounds (95%); the latter, for 60 compounds (87%). Taken together, these data suggest that for optimum overall program performance and efficiency, the concentric environment about each carbon atom examined may be limited to y neighboring atoms, Le., the maximum sphere radius set a t 3. I t would appear that for the data file used, this value of n provides for optimum neutralization of countervailing errors. (However, this parameter and the bond separation value described earlier can be user-defined.) Therefore, to expand CARBON-13's ability to perceive a wider range of diastereotopically related carbon atoms, although feasible, except for complex conformationally mobile systems, would not appear to offer advantages in terms of program performance, and would certainly reduce program efficiency.

UEAt P k E l l l C ' l n V O f FERKS

NO.

io

3

1

4

'3

110 "UU U R N 1 TO E l l 1 1 S I R U C T U P E FILE?: YES I N T E R THE M I N I M U M NUMfiER OF ? E M S : 10 ENTER TUE M A X I M U M NUMBER OF FERI:S: i C Eh7EH CONNEClIUITY TAfiLElSI

[mru m u c r i u E .

P C U N U 5TRUCllJHE F I L E IS " C M W

Figure 1. Computer output,

CARBON-13

PXRT CASE .DRAW

::>MIDCflH. 3

STRUCTURE-

c-c

C \

03,

APPLICATIONS The major emphasis of CARBON-13 and the entire CASE network is directed toward the structure elucidation of the multifunctional molecules of practical and theoretical importance. Throughout the evolution of CASE, actual structure elucidation problems, as well as "simulated" problems taken from the chemical literature, have played an important role in program development and refinement. A simple, but informative example will illustrate an application of CARBON-13. The structural information that served as initial input to ASSEMBLE was derived from the infrared (IR) and proton magnetic resonance (PMR) spectra of the compound in question (CI3Hl8O2). The IR spectrum was computer-interpreted (22) and suggested the presence of a saturated carboxylic acid moiety and a benzene residue, each with a high confidence level. Structural information from the PMR

hECULTS NO. O f iOnPOuN115

C

z : :

c-c-c

'

1 c--C

\

c

/

\

c=c

C-OH 0

6

STRUCTURE-

c

c-c

c \

=

=

I

c-c I

c-c.-c \

c

/

\

c=c

C-OH

0 STRUCTURE-

c

c-c =

\ /

',

c =

c-c c

15

I

c-c \

/

C=C

\

OH

C \ / C 0

Figure 2. Final output of CASE, applications problem (C,,H,,O,)

spectrum was chemist-derived and limited to that of general high reliability, specifically, in this case, to information pertaining to methyl groups. A total of three C-methyl groups were indicated; two, part of an isopropyl group; and a third attached to methine carbon. With the molecular formula and the above structural information alone, ASSEMBLE reduced the molecular formula C13H1802to only 15 valid structures. The broad-band decoupled 13C NMR spectrum of the compound displayed 9 signals. A call to CARBON-13 produced the number of signals expected in the spectrum of each of the 15 molecules constructed by ASSEMBLE. The actual computer printout for this problem is shown in Figure 1. As indicated, the range of acceptable signal numbers to be applied in the pruning step is user-defined. In problem solving, normally a range of h1 relative to the observed number of signals is set. As the output shows, there are no structures predicted to have 9 signals, the minimum being 10. Setting the acceptable signal number limit at no greater than 10 results in pruning all but 3 structures from the list. These three most

ANALYTICAL CHEMISTRY, VOL. 50, NO. 11, SEPTEMBER 1978

likely compounds, shown as the actual computer printout in Figure 2, are presented to the chemist as equally ranked structures. The correct structure is among them and easily differentiated by the chemist. Each of the three structures contains a program-identified chiral center, but in each, the chiral center is separated from the diastereotopic geminal methyls of the isopropyl group by more than 5 bonds, the "normal" bond separation distance. As a result, CARBON-13 predicts isochronous methyls of the isopropyl group of all three candidates, and therefore, 10 signals instead of the theoretically predicted value of 11. The fact that only 9 signals are observed again indicates chance overlap.

CONCLUSIONS The study described herein demonstrates both the feasibility as well as the value of predicting the number of signals expected in the 13C NMR spectrum of a given molecular structure. The topological symmetry-based procedure is significantly improved by the incorporation of some simple rules for perception of stereochemistry. Further refinement of this perception ability is not expected to lead to a reduction in prediction errors. Although the effective concentric environment of an atom varies with the nature of the molecular structure, a single value of the maximum sphere radius of 3 gives generally satisfactory predictions. At the present time, chance overlap of signals is the largest source of error. CARBON-13 is sufficiently efficient to make predictions for even an extremely large number of compounds without the need for excessive computer time. It is estimated that a file of 1000 molecules can be run in about 45 s. Clearly it is desirable to extend the predictive capability of CARBON-I3 to both off-resonance spectra and chemical shift data. These extended applications require a more extensive data base and are under consideration. Each new development of CARBON-13 is viewed in terms of the overall objectives of the CASE system. ACKNOWLEDGMENT The authors are indebted to J. Devens Gust for many helpful discussions and to Charles R. Snelling, Jr., for technical assistance. We thank Edward C. Olson, Head, Physical and Analytical Chemistry Research, The Upjohn Company,

1527

Kalamazoo, Mich., for the spectroscopic data on the application described. We also acknowledge with gratitude the drawing program provided by Raymond E. Carhart, Stanford University.

LITERATURE CITED (1) B. D. Cox, Computer Program STR-3, iPh.D. Dissertation, Arizona State University, Tempe, Ariz., 1973. (2) C. A. Shellev, T. R. Havs. M. E. Munk. and R. V. Roman. Anal. Chim. Acta, in press. (3) R. E. Carhart, D. H. Smith, H. Brown, and C. Djerassi, J . A m . Chem. Soc., 97, 5755 (1975). (4) C. A. Sheiiev. H. B. Woodruff. C. R. Sneiiino. and M E. Munk. ACS Symposium Series, No. 54, 1977, pp 92-10?. (5) F. W. Wehrli and T. Wirthiin. "InlerDretation of Carbon-13 NMR Spectra", Heyden, New York, N.Y., 1976, p 41. (6) S. Ochiai, Y. Hirota, Y. Kudo, and S. Sasaki, Jpn. Anal., 22, 399 (1973). (7) R. E. Carhart and C. Djerassi, J . Chem. SOC., Perkin Trans. 2, 1753 (1973). (8) A. L. Buriingarne, R. V. McPherson, and D. M. Wilson, Proc. Natl. Acad. Sci., U . S . A . ,70, 3419 (1973). (9) S. Sasaki, Y. Hirota, and S. Ochiai, Jpn. Anal., 23, 1184 (1974). (10) B. A. Jezi and D. L. Dairympie, Anal. Chem., 47, 203 (1975). (11) G. M. Schwenzer and T. M. Mitchell, ACS Symposium Series, No. 54, pp 58-76, 1977. (12) A listing of mdecubr shctures is available from the authors and is included in the Ph.D. dissertation of C. A. Shelley, "Computer-Assisted Structure Elucidation," Arizona State University, Tempe, Ariz., 1978. (13) L. F. Johnson and W. C. Jankowski, "Carbon-13 NMR Spectra", Wiley-Interscience, New York, N.Y., 1972. (14) F. W. Wehrii and T. Wirthlin, "Interpretation of Carbon-I3 NMR Spectra", Heyden, New York, N.Y., 1976, p 207. (15) C.A.SheileyandM.E.Munk.J.Chem. Inf. Conput. Sci., 17, llO(l977). (16) The revised program is avaihbie from the authors and will form the subject of a forthcoming manuscript by C. A. Shelley, M. E. Munk, and R. V. Roman. (17) E. Eliii, "StereochemisQ of Carbon Compounds", McGraw-Hili, New Ycfk, N.Y., 1962, pp 26-30. (18) K. MiinariE-Majerski, 2. Majerski, and li. Pretsch, J . Org. Chem., 41, 686 (1976). (19) E. Breitmaier and U. Hoiistein, J . Org. Chem., 41, 2104 (1976). (20) M. Hanack, "Conformational Theory", Academic Press, New York, N.Y., 1965, p 9. (21) M. K. Eberie and L. Brzechffa, J . Org. Chem., 41, 3775 (1976). (22) H. B. Woodruff and M. E. Munk. J . Org. Chem., 42, 1761 (1977).

RECEIVED for review December 8, 1977. Accepted June 26, 1978. This paper was presented in part a t the 173rd National American Chemical Society Meeting, Division of Computers in Chemistry, New Orleans, La., March 23, 1977. Financial support by the National Institute of General Medical Sciences (GM 21703) and the Arizona State University Computer Center is gratefully acknowledged.