CHEMICS: A Computer Program System for ... - ACS Publications

CHEMICS: A Computer Program System for Structure. Elucidation of Organic Compounds. TOHRU YAMASAKI, HIDETSUGU ABE, YOSHIHIRO KUDO,...
0 downloads 4 Views 1MB Size
8 C H E M I C S : A Computer Program System for Structure Elucidation of Organic Compounds TOHRU YAMASAKI, HIDETSUGU ABE, YOSHIHIRO KUDO, and SHIN-ICHI SASAKI

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

Miyagi University of Education, Aoba, Sendai 980 Japan

There have been many articles concerned with computer programs for structure elucidation of organic compounds by analyzing chemical spectra. The methodologies and the techniques employed for this purpose can be classified into two categories, one i s the identification of unknown compounds by the retrieval method of f i l e d spectra (1,2) i s carried out and the other is the generation of structural formula based on the analytical results of spectral data and other chemical evidence (3,4,5). As reported previously, our integrated computer system for structure elucidation of organic compounds named CHEMICS stands mainly on the latter methodology (6). IR and H NMR spectral data of an organic compound are analyzed and plausible structural formula consistent with the analytical results are generated. Since generation of correct structure i s the major premise of this system, rather ample allowance for elucidation of partial structures is made during data analysis. Thus, an excessive number of candidate structures (informational homologues) are generated upon occasion. In order to prevent this undesirable situation, two different strategies are considered to be practical. They are; 1) Application of the f i l e retrieval method as a complement to the data analysis, and 2) introduction of other kinds of information sources and/or improvement of the spectral data analysis more precisely. The former solution has been already actualized as CHEMICS-F as shown in Fig. 1 (7). For the latter strategy, several t r i a l s have been made at our laboratory, for example, quantitative analysis of IR spectra(£0 , spectral simulation of NMR( 1

108

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

YAMASAKI

ET AL.

Structure

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

/

of Organic

M o l . F o r m . , NMR, IR, MS, UV

DATA FILE

SEARCH

Match ing Resul

Compounds

/

/

Plausible

109

/

ANALYSIS

'components'

STRUCTURE

GENERATOR

Candidate

Structure

/

/

Matching Result

/

OUTPUT

/

Figure

1.

Plausible Structure

Block diagram of CHEMICS-F.

/

Dashed arrow means off-line

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

routine.

110

COMPUTER-ASSISTED

STRUCTURE

ELUCIDATION

ALACON)(9), a n a l y s i s o f n u c l e a r double resonance d a t a (1H{1H}, NMDR)(10) and p r e d i c t i o n o f NMR s p e c t r a ( 11) . In t h i s paper we d e s c r i b e i n c o r p o r a t i o n o f NMR s p e c t r a l a n a l y s i s i n t o CHEMICS t o extend i t s capabilities.

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

General

feature of 13Q NMR s p e c t r a l

data

analysis

R e c e n t l y , 13C NMR s p e c t r o s c o p y has been e f f e c t i v e l y employed f o r s t r u c t u r e e l u c i d a t i o n o f o r g a n i c compounds. Here we i n t e n d t o i n t r o d u c e t h e s p e c t r a l d a t a as a new i n f o r m a t i o n s o u r c e because o f i t s gene r a l l y a p p l i c a b l e nature. The e n t i r e system i s shown i n F i g . 2. The program f o r a n a l y s i s o f C NMR s p e c t r a ( ASSINC) i s composed o f t h e f o l l o w i n g f o u r elements as shown i n F i g . 2. a) DATA INPUT b) PRIMARY ANALYSIS c) SECONDARY ANALYSIS d) CHEMICAL SHIFT TABLE The i d e a o f ASSINC i s much t h e same as t h a t o f 1'H NMR d a t a a n a l y s i s o f t h e system CHEMICS (ASSIN) (6), i n which knowledge o b t a i n e d by a n a l y z i n g s p e c t r a l d a t a o f unknown compounds i s r e p r e s e n t e d as a group o f subs t r u c t u r e s named 'components . A c c o r d i n g t o t h i s i d e a , 189 k i n d s o f 'components' a r e p r e v i o u s l y d e f i n e d f o r the ASSINC as shown (part i a l l y ) i n T a b l e I , i n s t e a d o f t h e 179 'components' f o r the former e d i t i o n . Each 'component' i s d e f i n e d by i t s a d j a c e n t atoms and/or f u n c t i o n a l groups bonded with i t . l 3

1

DATA INPUT. Input d a t a f o r 13c NMR d a t a a n a l y s i s c o n s i s t o f p o s i t i o n s and i n t e n s i t i e s o f e v e r y s i g n a l and t h e i r m u l t i p l i c i t i e s . We use the example o f s t r u c t u r e 1, C9H14O, whose spectrum i s shown i n F i g . 3. Both c a r d and paper t a p e image d a t a a r e O^x^vy acceptable. Even i f the m u l t i p l i c i t i e s T j a r e n o t a v a i l a b l e , the ASSINC can a n a l y z e t h e r e s t o f t h e d a t a and w i l l o f f e r u s a b l e I answers f o r s u c c e s s i v e r o u t i n e s . But i n such a c a s e , some a m b i g u i t i e s c o u l d not be 1 avoidable. PRIMARY ANALYSIS. The b l o c k diagram o f the p r i m a r y a n a l y s i s r o u t i n e i s shown i n F i g . 2. As shown i n this f i g u r e , i t c o n s i s t s o f two major p a r t s . One i s a l l o c a t i o n o f carbons t o each s p e c t r a l s i g n a l and t h e o t h e r i s e x a m i n a t i o n o f t h e p r e s e n c e o f 'components'.

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

8.

YAMASAKI E T A L .

Structure

of Organic

Data

111

Compounds

analysis

IR,

lH

NMR

!3c

NMR

Spectrum

off-resonance

/

multiplicity

>

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

primary

Allocation

Selection

||

of

of

carbon

'components'

Making

set

of

by

analysis

atoms

chemical

shift

table

'components

secondary

analysis

Structure

Figure

NO

POSITION(ppm)

2.

Flow

chart of C

INTENSITY

13

generation

NMR

spectral

data

analysis

MULTIPLICITY

1

24.4

1679

Q

2

28.3

4549

Q

3

33.5

895

4

45.2

2380

5

50.8

2119

6

125.4

2494

7

159.9

1084

199.2

861

S T T D S S

Figure

3.

C

13

NMR data of pound 1

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

com-

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

112

COMPUTER-ASSISTED S T R U C T U R E E L U C I D A T I O N

A l l o c a t i o n o f c a r b o n s . The f i r s t s t e p o f t h e p r i m a r y a n a l y s i s i s t h e a l l o c a t i o n o f t h e p r o p e r number o f carbons t o each s i g n a l . However, i t must be emphasized t h a t t h e p r o c e s s i s n o t aimed a t o b t a i n i n g the e x p l i c i t s o l u t i o n f o r a l l c a s e s , b u t g a t h e r i n g as much u s e f u l i n f o r m a t i o n as p o s s i b l e . I t i s well known t h a t s i g n a l i n t e n s i t i e s a r e n o t always p r o p o r t i o n a l t o t h e carbon numbers c o n t r i b u t e d t o t h e s i g n a l s i n 13Q NMR s p e c t r a , m a i n l y because o f t h e p r e s ence o f n u c l e a r Overhauser e f f e c t ( N O E ) ( 1 2 ) . However, i t can be assumed t h e s i g n a l i n t e n s i t i e s o f p r o t o n a t e d carbons a r e p r o p o r t i o n a l t o the amount o f carbons because o f t h e i r almost complete enhancement a c c o r d i n g to t h e NOE. The a l l o c a t i o n o f carbon numbers i s based on t h i s assumption. The b l o c k diagram o f t h e r o u t i n e f o r t h e a l l o c a t i o n o f carbons i s shown i n F i g . 4. By u t i l i z i n g t h e m u l t i p l i c i t y d a t a , the i n p u t s i g n a l s a r e c l a s s i f i e d i n t o two c a t e g o r i e s , namely, s i g n a l s a s s i g n e d t o p r o t o n a t e d carbons and t h o s e which are a s s i g n e d t o n o n - p r o t o n a t e d c a r b o n s . Allocation of carbons f o r t h e s i g n a l s i s performed s e p a r a t e l y f o r each c a t e g o r y . At f i r s t , t h e a l l o c a t i o n i s t r i e d f o r s i g n a l s assigned t o protonated carbons. Then t h e amount o f carbons (AOC) c o r r e s p o n d e d t o t h i s c a t e g o r y i s l i m i t e d i n t h e range o f R^ t o R2 d e f i n e d by e q u a t i o n ( 1 ) . , , , , w h o l e c a r b o n numbers^ V of the molecule )

(

1

/number o f s i g n a l s \ f _ \ V p r o t o n a t e d carbon / n Q n

(1) 2

_/number o f s i g n a l s a s s i g n e d \ ^ t o p r o t o n a t e d carbons /

A f t e r e s t i m a t i o n o f t h e AOC, t h e number o f carbons f o r v a l u e o f each s i g n a l (CNS) i s e v a l u a t e d by means of t h e e q u a t i o n (2) and a s e t o f t h e CNS v a l u e s i s o b t a i n e d w i t h r e s p e c t t o each AOC v a l u e . However, i f any one o f t h e CNS v a l u e i n t h e s e t i s g r e a t e r than 0.3 and l e s s than 0.7, t h e s e t i s abandoned t o a v o i d an error. AOS CNSi = I N T i / I

(INT) j * AOC

CNSi: carbon number a l l o c a t e d t o s i g n a l " i " INTi: i n t e n s i t y of signal " i " AOC : amount o f c o r r e s p o n d i n g carbons

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

(2)

YAMASAKI E T AL.

Structure

of Organic

c

D for evaluation for

of

AOC

protonated

of

CNS

estimation

of

CNS

to

category

range

calculation sets

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

Compounds

e a c h AOC

sets

for

evaluat ion

non-protonated

category

o f AOC

c o r r e s ponded

to

p r o t o r a t e d AOC

i estimat ion

Figure

4.

of

CNS

Procedure for the allocation to each signal

of

carbons

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

114

COMPUTER-ASSISTED

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

AOS

: amount o f c o r r e s p o n d i n g

STRUCTURE

ELUCIDATION

signals

The a l l o c a t i o n p r o c e s s f o r t h e s i g n a l s a s s i g n e d t o n o n - p r o t o n a t e d carbon i s t h e f o l l o w i n g s t e p . At t h i s s t a g e , t h e AOC v a l u e i s e s t i m a t e d i n t h e b a s i s o f r e m a i n i n g carbons which a r e n o t consumed a t p r e c e e d i n g stage. As t h e r e s u l t o f s o l v i n g t h e e q u a t i o n (2) , the s e t s o f CNS v a l u e s w h i c h c o r r e s p o n d t o non-protona t e d carbons a r e o b t a i n e d . Here, i t i s assumed t h a t the weakest i n t e n s i t y o f t h e s i g n a l i s s h a r e d w i t h a u n i t number(1,2,3,...) o f c a r b o n s . Consequently, a l l o c a t e d numbers, namely, a s e t o f e n t i r e CNS i s a c q u i r e d f o r each i n p u t s i g n a l . I f t h e r e i s more than one s o l u t i o n f o r t h i s problem, any one o f them c o u l d be chosen as a c o r r e c t s e t o f a l l o c a t e d numbers t o t h e signals. The a p p l i c a t i o n o f t h e p r o c e d u r e t o t h e spectrum o f compound 1 i s d e s c r i b e d below. The i n p u t s i g n a l s shown i n F i g . 3 a r e c a l s s i f i e d i n t o e i t h e r p r o t o n a t e d o r n o n - p r o t o n a t e d c a t e g o r y where s i g n a l s number 1,2,4, 5 and 6 a r e grouped i n t o t h e former and 3,7 and 8 a r e grouped i n t o t h e l a t t e r . Through t h e p r o c e d u r e o f p r o t o n a t e d c a t e g o r y t h e AOC i s a p p r a i s e d as 5 and 6 because i s c a l c u l a t e d as 6 ( 9 - 3 ) and R2 i s e q u a l t o 5. The c o r r e s p o n d i n g s e t s o f t h e CNS a r e shown below where each i n t e g e r v a l u e e n c l o s e d by p a r e n t h e s i s i s a l l o c a t e d number o f c a r b o n s . signal number

1

2

4

5

6

A0C=5

0.63 (*)

1.72 (2)

0.90 (1)

0.80 (1)

0.94 (1)

AOC=6

0.76 (1)

2.06 (2)

1.08 (1)

0.96 (1)

1.13 (1)

S i n c e i t i s i m p o s s i b l e t o a l l o c a t e carbons t o s i g n a l number 1 a t t h e f i r s t s e t , t h i s s e t i s abandoned. T h e r e f o r e o n l y one s o l u t i o n i s d e r i v e d from the case where t h e AOC i s e q u a l t o 6. A t the f o l l o w i n g s t a g e , t h e AOC f o r n o n - p r o t o n a t e d c a t e g o r y i s f i x ed t o 3, and so each r e s i d u a l s i g n a l must be a l l o c a t e d t o one carbon i n d i v i d u a l l y . The f i n a l r e s u l t o f a l l o c a t e d number i s as f o l l o w s : signal number allocated number

1 1

2 2

3 1

4 1

5 1

6 1

7 1

8 1

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

8.

YAMASAKI

ET AL.

Structure

of Organic

Compounds

115 1

E x a m i n a t i o n o f t h e p r e s e n c e o f 'components . Now we have c o n f i r m e d two k i n d s o f i n f o r m a t i o n about a g i v e n C NMR s p e c t r a l d a t a . They a r e t h e amount o f carbons a s s i g n e d t o each s i g n a l and n a t u r e o f carbons (protonated o r non-protonated). By c o n s i d e r i n g t h e i n f o r m a t i o n , t h e p o s s i b l e p r e s e n c e o f each 'component' i s examined and t h o s e which a r e i n c o n s i s t e n t w i t h t h e i n f o r m a t i o n a r e abandoned. The p r e s e n c e o f each 'components' i s judged t o be a p p r o p r i a t e by i t s c h e m i c a l s h i f t range ( r e f e r t o T a b l e I ) , i n o t h e r words, i f t h e r e a r e no s i g n a l s w i t h i n a c h e m i c a l s h i f t range c o r r e s p o n d i n g t o a 'component', i t i s judged t o be n o t p r e s e n t i n a sample compound. As shown i n F i g . 5, twenty-nine components s u r v i v e for compound 1., through t h e p r i m a r y a n a l y s i s . The r e s u l t o f t h e p r i m a r y a n a l y s i s i s r e p r e s e n t e d by the m a t r i x named NM m a t r i x , i n which each row i s c o r r e sponding t o a s u r v i v e d 'component' and each column t o each s i g n a l o f t h e g i v e n 13c NMR spectrum. Each mat r i x element i n d i c a t e s maximum number o f t h e carbons f o r 'component' a s s i g n e d t o t h e c o r r e s p o n d i n g s i g n a l . Those elements w i t h v a l u e -1 i n d i c a t e c o r r e s p o n d i n g 'components' were n o t a s s i g n e d t o t h e s i g n a l s .

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

1 3

SECONDARY ANALYSIS. A t t h e f i r s t s t e p o f t h i s r o u t i n e , a s e t o f 'components' which i s c o n s i s t e n t w i t h t h e m o l e c u l a r f o r m u l a i s s e l e c t e d from s u r v i v e d 'components'. One o f t h e f i v e s e t s which was f i n a l l y g e n e r a t e d f o r compound 1_ i s shown i n F i g . 6. As d e s c r i b e d b e f o r e , each o f t h e s i g n a l s i s t r e a t e d as i f i t were independent o f t h e o t h e r s and the 'components' which can be a s s i g n e d t o a t l e a s t one s i g n a l s u r v i v e w i t h o u t any f u r t h e r e x a m i n a t i o n a t t h e primary a n a l y s i s . However, i t i s n e c e s s a r y t o examine whether t h e s e t i s c o n s i s t e n t w i t h t h e g i v e n spectrum o r n o t , i n o t h e r words, each o f a l l 'components' o f t h e s e t s h o u l d be c o n f i r m e d whether they a r e f u l l y consistent w i t h t h e i n p u t spectrum w i t h n e i t h e r excess n o r d e f i ciency. To make t h i s e x a m i n a t i o n , t h e s e l e c t i v e NM m a t r i x i s made f o r t h e s e t by e x t r a c t i n g t h e rows c o r r e s p o n d i n g t o s e l e c t e d 'components' from NM mateix shown i n F i g . 5. T h i s s e l e c t i v e m a t r i x i s shown i n F i g . 6. As shown i n F i g . 7 , t h i s m a t r i x N i s c o n v e r t e d i n t o another m a t r i x X by s u b s t i t u t i n g t h e p o s i t i v e elements by v a r i a b l e s ( x ^ j ) and t h e n e g a t i v e elements by z e r o s . A s e t o f simultaneous l i n e a r equations i s made from X and two c o n s t n a t v e c t o r s C and D, r e p r e -

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

COMPUTER-ASSISTED STRUCTURE ELUCIDATION

NO

CMP

SUB/STRUCTURE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

10 11 12 14 17 33 38 40 106 107 108 109 118 143 144 145 146 153 172 173 174 175 177 182 184 185 186 187 188

GEM-DI M E T H Y L - ( D ) GEM-DI M E T H Y L - ( T ) GEM-DI M E T H Y L - ( C ) CH3-CO(Y) (T) CH3-COCH3(D) CH3COCD) CH3CO(C) -CH2(C)(K) -CH2(C)(D) -CH2(C)(T) -CH2(C)(C) -CH= :C= < 0 L E F I N > =C= =C= FURAN(O) -0-CO(C)(D) -CO(C)(T) -CO(C)(C)

NM

o=c= Y Y C C C C

(0) (C) (Y) (K) (D) (T) (C)

c

SAMPLE

MATRIX

1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1

2 2 2 2 2 2 -1 2 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 1 1 1 1 1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 1 1 -1 -1 -1 -1 -1

1 1 1 1 -1 -1 -1 -1 -1 -1 -1

X

1 J O B END

Figure

5.

Survived

components

of compound analysis

1 through

C

13

NMR

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

data

8.

YAMASAKI E T A L .

Structure

of Organic

number 1

12

GEM-DIMETHYL-

(C)

2

33

CH 3

number

of

of

selective

carbons

'components'

117

Compounds

2#

-

(D)

-1

-

1

3

106

-CH -

(C)(K)

4

107

-CH -

(C)(D)

5

118

-CH=



-1

6

143

-C=



-1

-

7

172

-CO-

(C)(D)

-1

-

8

188

-c-

(C)

-1

-

1

1

2

2

1

( 1 2

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

NM m a t r

-1

-

1

allocation Only methyl

carbon

Figure

f

is

considered

Selective

6.

components

1

2 -1 -1 -1 -1 -1 -1 >

1

2 -1 -1 -1 -1 -1 -1

-1 -1 -1

1

1 -1 -1 -1

-1 -1 -1

1

1 -1 -1 -1

-1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1

s

1 -1

a

)

a

n

1

1

1

1

1

1

1 )

d:

( 1

of

1

0 0 0 0 0 0

0 0 0 0 0 0 X 0 0 0 0 0 0 83 0

0 0 0 0\ 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 67 0 0 0 0 78 0 0 0 0 )

2

1

1

x:

0 0 0 0 0 0

1

group

for the fifth set of compound

1 -1 -1 -1 -1

1

( 2

gem-dimethyl

1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1

in

1

number

carbons #

1

X

1

X

X

1

1

1 )

d

N' X' O D mean s e l e c t i v e NM m a t r i x , s e l e c t i v e NM m a t r i x r e p l a c e d by XJLj, m o d i f i e d 'component' v e c t o r a n d a l l o c a t i o n v e c t o r , r e s p e c t i v e l y .

r x •i = c ^ b)

I



X

representation o f simultaneous v e c t o r having e i g h t elements.

Figure

7.

Representation

of

=

D

linear

equations

simultaneous linear pound 1

equations

where I means u n i t row

for

the

fifth

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

set of

com-

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

118

COMPUTER-ASSISTED

STRUCTURE

ELUCIDATION

s e n t i n g carbon numbers i n t h e 'components' and a l l o c a t e d carbon numbers, r e s p e c t i v e l y . The number o f e q u a t i o n s i s t h e number o f 'components' i n t h e s e t plus that of the s i g n a l s . The e q u a t i o n s have a r e s t r i c t i o n , t h a t t h e v a r i a b l e x^- s h o u l d n o t exceed t h e range between z e r o and t h e v a l u e o f the c o r r e s p o n d i n g s e l e c t i v e m a t r i x element. To s o l v e t h e s e s i m u l t a n e o u s e q u a t i o n s i s t h e major function of t h i s routine. When no s o l u t i o n i s o b t a i n e d , the s e t i s judged t o be i n a p p r o p r i a t e one, and when a s o l u t i o n i s g i v e n , the s e t i s s e n t t o t h e f o l l o w i n g r o u t i n e (the s t r u c ture generator). At the f i n a l stage o f the s p e c t r a l a n a l y s i s , f i v e s e t s o f components which a r e g e n e r a t e d from twentyn i n e components a r e s e l e c t e d as p l a u s i b l e ones f o r compound 1. F i v e s e t s a r e shown as f o l l o w s , numera l i n p a r e n t h e s i s e x p r e s s e s number o f t h e component; NO.

1

10 (1), 38 (1), 107 (1), 109 (1), 118 (1), 143 (1), 189 (1)

NO.

2

10 (1), 40 (1), 106 (1), 107(1), 118 (1), 143(1), 189 (1)

NO.

3

12 (1), 38(1), 107 (2), 118(1), 143 (1), 189 (1),

NO.

4

10 (1), 33 (1), 106 (1), 109 (1), 118 (1), 143 (1), 172 (1), 189(1)

NO.

5

12 (1), 33 (1), 106 (1), 107 (1), 118 (1), 143 (1), 172 (1), 189 (1)

The o v e r a l l p r o c e s s t h a t 189 components a r e r e duced i n t o 29 by means o f t h e e x a m i n a t i o n o f m o l e c u l a r f o r m u l a f o l l o w e d by t h e s u c c e s s i v e a n a l y s e s o f IR, 1H NMR and C NMR i s shown i n F i g . 8. In f i g . 8, num e r a l s 10 8, 105, 59 and 29 i n p a r e n t h e s e s i n d i c a t e t h e amounts o f s u r v i v e d components by s u c c e s s i v e restrict i o n s o f m o l e c u l a r f o r m u l a , IR, J-H NMR and 13c NMR, respectively. Only f i v e s e t s o f components u n c o n t r a d i c t o r y w i t h m o l e c u l a r f o r m u l a and g i v e n NMR spectrum a r e p i c k e d up from t h e s e twelve components. Finally, the s t r u c t u r e generator(L3) i s a p p l i e d t o generate the s t r u c t u r e s from each s e t o f components so t h a t 3, 1, 2, 3 and 3 s t r u c t u r e ( s ) produced f o r s e t s , 1, 2, 3, 4 and 5. These s t r u c t u r e s a r e shown i n F i g . 9 as i n f o r m a t i o n a l homologues f o r t h e i n p u t m o l e c u l a r f o r m u l a and c h e m i c a l s p e c t r a . The u n d e r l i n e d one i s t h e s t r u c t u r e o f t h e compound 1. 1 3

PREPARATION OF CHEMICAL SHIFT TABLE. A c h e m i c a l s h i f t ranges f o r a s i g n a l o f a 'component' was d e t e r mined f o r the a n a l y s i s d e s c r i b e d i n the previous

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

YAMASAKI E T A L .

Structure

of

Organic

119

Compounds

189 COMPONENTS Molecular Formula ( C

9

H

1 4

^

0 )

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 27 29 30 31 32 33 34 38 39 40 41 42 43 44 45 46 49 52 53 55 57 58 59 60 61 67 71 76 79 80 82 84 85 86 87 88 90 92 93 99 100 101 102 104 105 106 107 108 109 110 113 114 115 116 117 118 126 127 136 137 138 139 140 141 142 143 144 145 146 153 165 172 173 174 175 176 177 182 183 184 185 186 187 188 (108)

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

IR S p e c t r a l Data 1 2 3 4 5 6 7 8 9 10 11 12 13 22 23 24 25 27 29 30 31 32 33 34 38 39 49 52 53 55 57 58 59 60 61 67 71 76 79 90 92 93 99 100 101 102 104 105 106 107 108 109 137 138 139 140 141 142 143 144 145 146 153 165 172 184 185 186 187 188 (105)

14 40 80 110 173

15 16 17 18 19 21 41 42 43 44 45 46 82 84 85 86 87 88 114 115 116 117 118 136 174 175 176 177 182 183

iH NMR S p e c t r a l Data 7 9 10 11 12 13 14 15 16 17 18 33 34 38 39 40 41 43 44 45 46 67 71 76 79 80 85 86 87 88 104 106 107 108 109 116 117 118 141 142 143 144 145 146 153 165 172 173 174 175 176 177 182 183 184 185 186 187 188 (59) 1 3

C NMR S p e c t r a l Data

5»»

10 11 12 14 17 33 38 40 106 107 108 109 118 143 144 145 146 153 172 173 174 175 177 182 184 185 186187 188 (29)

selected 'components' 10 12 33 38 40 106 107 109 118 143 172 189

generated structures

set o f • components' #2 #4 #3

#1 1 0 0 1 0 0 1 1 1 1 0 1

( Figure

8.

Feature

1 0 0 0 1 1 1 0 1 1 0 1

0 1 0 1 0 0 2 0 1 1 0 1

1 0 1 0 0 1 0 1 1 1 1 1

0 1 1 0 0 1 1 0 1 1 1 1

1

2

3

3

1

3

#5

1 1 )

of reducing the number of components analyses of compound 1

through

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

consecutive

120

COMPUTER-ASSISTED

STRUCTURE

ELUCIDATION

s e c t i o n i n t h e f o l l o w i n g way. The components which c o n t a i n carbon atoms a r e 177 o u t o f e n t i r e 189. F o r t h o s e 'components , t h e i r c h e m i c a l s h i f t v a l u e s i n v a r i o u s k i n d s o f compounds were c o l l e c t e d from s e v e r a l s o u r c e s ( 1 4 , 1 5 , 1 6 ) . The c o l l e c t e d d a t a f o r 'component no.25 o f m e t h y l c a r b o n s , as an example, a r e shown i n F i g . 10. By u s i n g t h e s e d a t a , the c h e m i c a l s h i f t range f o r t h e 'component' i s o b t a i n e d as f o l l o w s . 1

1

1

1

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

i.

ii.

An assumed r e g i o n o f the mean v a l u e ( y ) i s c a l c u l a t e d by means o f common s t a t i s t i c a l procedure. An a r b i t r a r y v a l u e up.

(y')

i n the r e g i o n i s p i c k e d

iii.

The s t a n d a r d d e v i a t i o n ( a ) lated.

iv.

Whether a l l t h e c o l l e c t e d d a t a f o r the 'component' a r e w i t h i n t h e range between TT' - 3 a t o I T ' + 3 a i s examined.

v.

y~' i s c a l c u -

f o r the

I f n o t , the TT' i s updated and p r o c e d u r e s i i i and i v a r e r e p e a t e d , i f i t i s , the v a l u e s I T ' - 3 a and y ' + 3 a a r e determined as the upper and lower l i m i t s o f the s h i f t o f the 'component' r e spectively. -

The assumed r e g i o n o f mean v a l u e o f component 25 was c a l c u l a t e d as 19.15 - 24.48ppm based on v a r i o u s k i n d s o f d a t a s o u r c e s as shown i n F i g . 10. Here, an a p p a r e n t mean v a l u e o f t h e s e c o l l e c t e d d a t a i s 21.8ppm and t h i s i s an i n i t i a l v a l u e o f y"' . Some d a t a o f samples a r e o f t e n o u t o f the normal G a u s s i a n d i s t r i b u t i o n , t h e r e f o r e s t a n d a r d d e v i a t i o n has t o be c o n s i d e r e d s e p a r a t e l y i n h i g h e r magnetic f i e l d ( a ) and lower magnetic f i e l d ( C L ) compared w i t h y ' , f o r d e t e r m i n a t i o n o f the s t a n d a r d d e v i a t i o n f o r y . The y ' i s renewed by ' f l i p - f l o p ' u n t i l l y ' - 3 a and y* + 3 a can i n c l u d e the whole sampling d a t a . In case o f component 25, mean v a l u e i s f i n a l l y found o u t t o be 21.4ppm, when a =2.05 and a =1.39. The upper and lower l i m i t s o f the s h i f t determined a c c o r d i n g t o t h i s manner i s 15.21 - 25.53ppm which i s r e g i s t e r e d i n T a b l e I. T h i s p r o c e d u r e i s a p p l i e d t o a l l 'components' and the c h e m i c a l s h i f t t a b l e i s o b t a i n e d as shown i n T a b l e I. H

1

H

R

Result and

The

1

L

L

Discussion

r e s u l t o b t a i n e d f o r twenty two

compounds by

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

8.

YAMASAKI E T A L .

Structure

of Organic

121

Compounds

obtained shift

chemical

range

assumed r e g i o n ^

mean

of

value

-

"

CH

> ( C ) 3

iteration sample



mean

1

I

'

15.0

Figure

10.

' —

1

— I —

1

20.0

Estimation



1



1



1



1



1



1

1—

t i m e s = 46

=

value

21.4

1

25.0

chemical

of C NMR chemical shift range of component 13

= 32

amount

shift

#25

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

122

COMPUTER-ASSISTED

Table I

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 #

ELUCIDATION

Components and t h e i r appearance range of T3c NMR chemical

NO

STRUCTURE

shift

COMPONENT TERT-BUTYLTERT-BUTYLTERT-BUTYLTERT-BUTYLTERT-BUTYLTERT-BUTYLGEM-DIMETHYLGEM-DIMETHYLGEM-DIMETHYLGEM-DIMETHYLGEM-DIMETHYLGEM-DIMETHYLCH3-COCH3-COCH3-COCH3-COCH3-COCH3-COISO-PROPYLISO-PROPYLISO-PROPYLISO-PROPYLISO-PROPYLISO-PROPYLISO-PROPYLCH30CH30CH30CH30CH30CH30CH3CH3CH3CH3C0CH3C0CH3COCH3COCH3COCH3CO-

SHIFT (0># (Y) (K) (D) (T) (C) CO) (Y) (K) CD) (T) (C) (0) (Y) (K) (D) (T) (C) (0) (A) (Y) CIO CD) CT) CO CO) CY) CK) CD) CT) CO CY) CD) CT) CO) CY) CO CD) CT) CO

RANGE (ppm)

26,02 *«•* 31.13 24.47 * * * * 33,57 2 5 . 4 8 *•»» 3 4 , 0 4 2 8 . 2 3 * * * * 36.78 25.48 **** 34.04 23.65 * * « * 32.97 27.42 * * * * 32.95 10.72 36.27 36.27 10.12 14.72 36.27 36.27 10,12 6.80 * * * * 3 2 . 6 1 4,58 * * * * 3 2 . 0 1 4 . 5 8 *#•* 3 2 , 0 1 5.25 ***» 1 5 . 5 0 10.43 21.53 4.58 32.01 9.92 12.97 25.83 15.09 16.63 25.83 25.45 20.95 15,09 **** 23.87 16.33 25.83 15.09 25.83 15.21 25.53 52.88 61.61 54.59 57,92 50.34 52.53 56.68 * * « * 61.51 52.88 61.51 60.60 49.95 7.26 26.10 7.06 • ••• 3 3 . 0 8 -2.49 **** 8.49 19.81 * * * * 23.39 22.95 31.79 22,95 33.92 22.22 28,15 8.49 -2,49 2 0 . 8 0 #*»» 3 0 . 0 1

means t h e a d j a c e n t atom o r f u n c t i o n a l g r o u p , t h e y a r e , s a t u r a t e d oxygen (O), a r o m a t i c c a r b o n ( Y ) , c a r b o n y l carbon(K), o l e f i n i c carbon(D), a c e t y l e n i c carbon(T), and s a t u r a t e d c a r b o n ( C ) , r e s p e c t i v e l y .

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

8.

YAMASAKI E T A L . Table

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 *1 22

II

Structure

Results obtained f o r

a-Methyltetrahydrofuran p-Quinone 2-Methylpentane 3-Methylpentane 2,3-Dimethylbutane 3-Heptanone 2-Heptanone m-Xylene E t h y l benzene Cyclohexylacetate 2-0ctanol Coumarine Isophorone Diisobutylketone n-Nonanol Dicyclopentadiene Verbenone Camphor n-Decanol 2-Cyclohexylcyclohexanone 3-Ionone Methyl m y r i s t a t e

III

Results

several

5 6 6 6 6 7 7 8 8 8 8 9 9 9 9 10 10 10 10 12 13 15

obtained

compounds by CHEMICS

10 4 14 14 14 14 14 10 10 14 18 6 14 18 20 12 14 16 22 20 20 30

by

123

Compounds

through

molecular C H

compound

Table

of Organic

number o f I H _ I R , I H ' N M R " through I R , H N M R , analysis analysis 1

1 2 1 1 1 2 1 21 5 1 1 116 12 1 1 41 42 75 1 147 481 1

10 589 3 3 4 3 4 40 5 161 38 834 1895 30 24 1729 53274< 3253 50 2109 57827< 4767

utilizing

various

combinations

of

information

sources

compound

C

H

analytical

0

mode*

number o f ' i n f o r m a t i o n a l homologues'

P

3

6

C 7

3-Heptanone

14

3 2

P+C

1

C+0

2

P+C+0

38

P 2-0ctanol

8

18

41

C P+C

1

13 1

C+0

1

P+C+0

1895

P Isophorone

*

P, C,

9

and 0

off-resonance

mean

14

12

P+C C+0 P+C+0

1

an a n a l y s i s

spectra,

27

C

of

iH

27

12

NMR,

13c

NMR

and/or

respectively.

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1 3

CNMR

124

COMPUTER-ASSISTED

STRUCTURE

ELUCIDATION

means o f t h e o l d system and new system a r e p r e s e n t e d i n Table II. The c o r r e c t s t r u c t u r e has been always gene r a t e d among t h e p l a u s i b l e s t r u c t u r e s . The numbers o f i n f o r m a t i o n a l homologues o b t a i n e d by means o f ASSINC a r e reduced t o 19.7 p e r c e n t ( i n s i m p l e average) o r 3.1 p e r c e n t ( i n weighted average) o f those o b t a i n e d by means o f t h e o l d system where o n l y IR and NMR d a t a were a n a l y z e d . As a r e s u l t o f t h e a d d i t i o n o f C NMR s p e c t r a l d a t a a n a l y s i s , i t becomes p o s s i b l e t o d e c r e a s e remarkably t h e numbers o f i n f o r m a t i o n a l homologues. F o r example, t h e number was reduced from 4767 t o one f o r compound 20_ as shown i n T a b l e I I . T a b l e III shows t h e number o f i n f o r m a t i o n a l homologues o f s e v e r a l compounds o b t a i n e d by u t i l i z i n g v a r i o u s c o m b i n a t i o n s o f i n f o r m a t i o n s o u r c e s , namely, 1H NMR, 13c NMR, l H NMR p l u s 13c NMR, 13c NMR p l u s i t s o f f - r e s o n a n c e d a t a , and 1H NMR p l u s 13c NMR p l u s o f f resonance d a t a . As shown i n t h i s t a b l e , t h e number o f i n f o r m a t i o n a l homologues and t h e number o f t h e component s e t s a r e b o t h d e c r e a s e d i n a c c o r d a n c e w i t h t h e a d d i t i o n o f new i n f o r m a t i o n s o u r c e s . In c o n c l u s i o n , t h e number o f t h e ' i n f o r m a t i o n a l homologues' and t h e 'component' s e t s a r e s a t i s f a c t o r i l y reduced by c o n s e c u t i v e a n a l y s e s . As mentioned above, the e f f o r t s t o reduce t h e e x c e s s i v e 'components' b e a r good f r u i t s , i . e . , t h e number o f t h e produced s e t s a r e l e s s than t e n f o r a l l c a s e s . Therefore, the informat i o n about t h e c o n e c t i v i t i e s between a l l t h e 'components ' i n a s e t become i m p o r t a n t d a t a t o i n c l u d e i n a f u t u r e system. That k i n d o f i n f o r m a t i o n w i l l work e f f e c t i v e l y t o r e duce t h e e x c e s s i v e ' i n f o r m a t i o n a l homologues' and n u c l e a r magnetic resonance t e c h n i q u e s w i l l g i v e such information.

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

1 3

T h i s work was s u p p o r t e d i n p a r t by a S c i e n t i f i c R e s e a r c h Grant from t h e M i n i s t r y o f E d u c a t i o n , Japan.

(1) (2) (3) (4) (5) (6)

Literature Cited Schwarzenbach,R.,Meili,J.,Koenitzer,H. and Clerc, J.T., Org. Mag. Resonance, (1976),8,11 Bremser,W.,Klier,M. and Meyer,E., ibid, (1975), 7,97 Carhart,R.E.,Smith,D.H.,Brown,H. and Djerassi,C., J . Am. Chem. Soc., (1975), 97, 5755 Beech,G.,Jones,R.T. and M i l l e r , K . , Anal. Chem., (1976), 46, 714 Gray,N.A.B., ibid, (1975), 47, 2426 Sasaki,S. et a l , Mikrochimica Acta(Wien), (1971), 726

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

8.

YAMASAKI E T A L .

(7)

(8) (9) (10) (11) (12)

Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008

(13) (14) (15) (16)

Structure

of Organic

Compounds

125

S a s a k i , S . , CHEMICS-F in " I n f o r m a t i o n C h e m i s t r y " , p227, The U n i v e r s i t y o f Tokyo P r e s s , T o k y o , 1975, and the detail o f CHEMICS-F will be r e p o r t e d i n the near f u t u r e . M i y a s h i t a , Y . and Sasaki,S., Jpn.Chem.Soc. Meeting, (1975), I, 174 Y a m a s a k i , T . and Sasaki,S., Jpn. Anal., (1975),213 unpublished Ochiai,S.,Hirota,Y.,Kudo,Y. and Sasaki,S., Jpn. Anal., (1973), 22, 399 Stother,J.B., "Carbon-13 NMR S p e c t r o s c o p y " , Academic P r e s s , New Y o r k , 1972 K u d o , Y . and Sasaki,S., J.Chem. I n f . Comput. Sci., (1976), 16, 43 B e a c h , L . B . , "API 44 S e l e c t e d 13CNMR S p e c t r a l Data" API Research P r o j e c t 44 Publication, T e x a s , 1975 N u c l e a r M a g n e t i c Resonance S p e c t r a l S e a r c h System, NIH/EPA, USA J o h n s o n , L . F . and J a n k o w s k i , W . C . , "Carbon-13 NMR S p e c t r a " , W i l e y I n t e r s c i e n c e , New Y o r k , 1972

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.