8 C H E M I C S : A Computer Program System for Structure Elucidation of Organic Compounds TOHRU YAMASAKI, HIDETSUGU ABE, YOSHIHIRO KUDO, and SHIN-ICHI SASAKI
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
Miyagi University of Education, Aoba, Sendai 980 Japan
There have been many articles concerned with computer programs for structure elucidation of organic compounds by analyzing chemical spectra. The methodologies and the techniques employed for this purpose can be classified into two categories, one i s the identification of unknown compounds by the retrieval method of f i l e d spectra (1,2) i s carried out and the other is the generation of structural formula based on the analytical results of spectral data and other chemical evidence (3,4,5). As reported previously, our integrated computer system for structure elucidation of organic compounds named CHEMICS stands mainly on the latter methodology (6). IR and H NMR spectral data of an organic compound are analyzed and plausible structural formula consistent with the analytical results are generated. Since generation of correct structure i s the major premise of this system, rather ample allowance for elucidation of partial structures is made during data analysis. Thus, an excessive number of candidate structures (informational homologues) are generated upon occasion. In order to prevent this undesirable situation, two different strategies are considered to be practical. They are; 1) Application of the f i l e retrieval method as a complement to the data analysis, and 2) introduction of other kinds of information sources and/or improvement of the spectral data analysis more precisely. The former solution has been already actualized as CHEMICS-F as shown in Fig. 1 (7). For the latter strategy, several t r i a l s have been made at our laboratory, for example, quantitative analysis of IR spectra(£0 , spectral simulation of NMR( 1
108
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
YAMASAKI
ET AL.
Structure
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
/
of Organic
M o l . F o r m . , NMR, IR, MS, UV
DATA FILE
SEARCH
Match ing Resul
Compounds
/
/
Plausible
109
/
ANALYSIS
'components'
STRUCTURE
GENERATOR
Candidate
Structure
/
/
Matching Result
/
OUTPUT
/
Figure
1.
Plausible Structure
Block diagram of CHEMICS-F.
/
Dashed arrow means off-line
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
routine.
110
COMPUTER-ASSISTED
STRUCTURE
ELUCIDATION
ALACON)(9), a n a l y s i s o f n u c l e a r double resonance d a t a (1H{1H}, NMDR)(10) and p r e d i c t i o n o f NMR s p e c t r a ( 11) . In t h i s paper we d e s c r i b e i n c o r p o r a t i o n o f NMR s p e c t r a l a n a l y s i s i n t o CHEMICS t o extend i t s capabilities.
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
General
feature of 13Q NMR s p e c t r a l
data
analysis
R e c e n t l y , 13C NMR s p e c t r o s c o p y has been e f f e c t i v e l y employed f o r s t r u c t u r e e l u c i d a t i o n o f o r g a n i c compounds. Here we i n t e n d t o i n t r o d u c e t h e s p e c t r a l d a t a as a new i n f o r m a t i o n s o u r c e because o f i t s gene r a l l y a p p l i c a b l e nature. The e n t i r e system i s shown i n F i g . 2. The program f o r a n a l y s i s o f C NMR s p e c t r a ( ASSINC) i s composed o f t h e f o l l o w i n g f o u r elements as shown i n F i g . 2. a) DATA INPUT b) PRIMARY ANALYSIS c) SECONDARY ANALYSIS d) CHEMICAL SHIFT TABLE The i d e a o f ASSINC i s much t h e same as t h a t o f 1'H NMR d a t a a n a l y s i s o f t h e system CHEMICS (ASSIN) (6), i n which knowledge o b t a i n e d by a n a l y z i n g s p e c t r a l d a t a o f unknown compounds i s r e p r e s e n t e d as a group o f subs t r u c t u r e s named 'components . A c c o r d i n g t o t h i s i d e a , 189 k i n d s o f 'components' a r e p r e v i o u s l y d e f i n e d f o r the ASSINC as shown (part i a l l y ) i n T a b l e I , i n s t e a d o f t h e 179 'components' f o r the former e d i t i o n . Each 'component' i s d e f i n e d by i t s a d j a c e n t atoms and/or f u n c t i o n a l groups bonded with i t . l 3
1
DATA INPUT. Input d a t a f o r 13c NMR d a t a a n a l y s i s c o n s i s t o f p o s i t i o n s and i n t e n s i t i e s o f e v e r y s i g n a l and t h e i r m u l t i p l i c i t i e s . We use the example o f s t r u c t u r e 1, C9H14O, whose spectrum i s shown i n F i g . 3. Both c a r d and paper t a p e image d a t a a r e O^x^vy acceptable. Even i f the m u l t i p l i c i t i e s T j a r e n o t a v a i l a b l e , the ASSINC can a n a l y z e t h e r e s t o f t h e d a t a and w i l l o f f e r u s a b l e I answers f o r s u c c e s s i v e r o u t i n e s . But i n such a c a s e , some a m b i g u i t i e s c o u l d not be 1 avoidable. PRIMARY ANALYSIS. The b l o c k diagram o f the p r i m a r y a n a l y s i s r o u t i n e i s shown i n F i g . 2. As shown i n this f i g u r e , i t c o n s i s t s o f two major p a r t s . One i s a l l o c a t i o n o f carbons t o each s p e c t r a l s i g n a l and t h e o t h e r i s e x a m i n a t i o n o f t h e p r e s e n c e o f 'components'.
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
8.
YAMASAKI E T A L .
Structure
of Organic
Data
111
Compounds
analysis
IR,
lH
NMR
!3c
NMR
Spectrum
off-resonance
/
multiplicity
>
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
primary
Allocation
Selection
||
of
of
carbon
'components'
Making
set
of
by
analysis
atoms
chemical
shift
table
'components
secondary
analysis
Structure
Figure
NO
POSITION(ppm)
2.
Flow
chart of C
INTENSITY
13
generation
NMR
spectral
data
analysis
MULTIPLICITY
1
24.4
1679
Q
2
28.3
4549
Q
3
33.5
895
4
45.2
2380
5
50.8
2119
6
125.4
2494
7
159.9
1084
199.2
861
S T T D S S
Figure
3.
C
13
NMR data of pound 1
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
com-
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
112
COMPUTER-ASSISTED S T R U C T U R E E L U C I D A T I O N
A l l o c a t i o n o f c a r b o n s . The f i r s t s t e p o f t h e p r i m a r y a n a l y s i s i s t h e a l l o c a t i o n o f t h e p r o p e r number o f carbons t o each s i g n a l . However, i t must be emphasized t h a t t h e p r o c e s s i s n o t aimed a t o b t a i n i n g the e x p l i c i t s o l u t i o n f o r a l l c a s e s , b u t g a t h e r i n g as much u s e f u l i n f o r m a t i o n as p o s s i b l e . I t i s well known t h a t s i g n a l i n t e n s i t i e s a r e n o t always p r o p o r t i o n a l t o t h e carbon numbers c o n t r i b u t e d t o t h e s i g n a l s i n 13Q NMR s p e c t r a , m a i n l y because o f t h e p r e s ence o f n u c l e a r Overhauser e f f e c t ( N O E ) ( 1 2 ) . However, i t can be assumed t h e s i g n a l i n t e n s i t i e s o f p r o t o n a t e d carbons a r e p r o p o r t i o n a l t o the amount o f carbons because o f t h e i r almost complete enhancement a c c o r d i n g to t h e NOE. The a l l o c a t i o n o f carbon numbers i s based on t h i s assumption. The b l o c k diagram o f t h e r o u t i n e f o r t h e a l l o c a t i o n o f carbons i s shown i n F i g . 4. By u t i l i z i n g t h e m u l t i p l i c i t y d a t a , the i n p u t s i g n a l s a r e c l a s s i f i e d i n t o two c a t e g o r i e s , namely, s i g n a l s a s s i g n e d t o p r o t o n a t e d carbons and t h o s e which are a s s i g n e d t o n o n - p r o t o n a t e d c a r b o n s . Allocation of carbons f o r t h e s i g n a l s i s performed s e p a r a t e l y f o r each c a t e g o r y . At f i r s t , t h e a l l o c a t i o n i s t r i e d f o r s i g n a l s assigned t o protonated carbons. Then t h e amount o f carbons (AOC) c o r r e s p o n d e d t o t h i s c a t e g o r y i s l i m i t e d i n t h e range o f R^ t o R2 d e f i n e d by e q u a t i o n ( 1 ) . , , , , w h o l e c a r b o n numbers^ V of the molecule )
(
1
/number o f s i g n a l s \ f _ \ V p r o t o n a t e d carbon / n Q n
(1) 2
_/number o f s i g n a l s a s s i g n e d \ ^ t o p r o t o n a t e d carbons /
A f t e r e s t i m a t i o n o f t h e AOC, t h e number o f carbons f o r v a l u e o f each s i g n a l (CNS) i s e v a l u a t e d by means of t h e e q u a t i o n (2) and a s e t o f t h e CNS v a l u e s i s o b t a i n e d w i t h r e s p e c t t o each AOC v a l u e . However, i f any one o f t h e CNS v a l u e i n t h e s e t i s g r e a t e r than 0.3 and l e s s than 0.7, t h e s e t i s abandoned t o a v o i d an error. AOS CNSi = I N T i / I
(INT) j * AOC
CNSi: carbon number a l l o c a t e d t o s i g n a l " i " INTi: i n t e n s i t y of signal " i " AOC : amount o f c o r r e s p o n d i n g carbons
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
(2)
YAMASAKI E T AL.
Structure
of Organic
c
D for evaluation for
of
AOC
protonated
of
CNS
estimation
of
CNS
to
category
range
calculation sets
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
Compounds
e a c h AOC
sets
for
evaluat ion
non-protonated
category
o f AOC
c o r r e s ponded
to
p r o t o r a t e d AOC
i estimat ion
Figure
4.
of
CNS
Procedure for the allocation to each signal
of
carbons
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
114
COMPUTER-ASSISTED
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
AOS
: amount o f c o r r e s p o n d i n g
STRUCTURE
ELUCIDATION
signals
The a l l o c a t i o n p r o c e s s f o r t h e s i g n a l s a s s i g n e d t o n o n - p r o t o n a t e d carbon i s t h e f o l l o w i n g s t e p . At t h i s s t a g e , t h e AOC v a l u e i s e s t i m a t e d i n t h e b a s i s o f r e m a i n i n g carbons which a r e n o t consumed a t p r e c e e d i n g stage. As t h e r e s u l t o f s o l v i n g t h e e q u a t i o n (2) , the s e t s o f CNS v a l u e s w h i c h c o r r e s p o n d t o non-protona t e d carbons a r e o b t a i n e d . Here, i t i s assumed t h a t the weakest i n t e n s i t y o f t h e s i g n a l i s s h a r e d w i t h a u n i t number(1,2,3,...) o f c a r b o n s . Consequently, a l l o c a t e d numbers, namely, a s e t o f e n t i r e CNS i s a c q u i r e d f o r each i n p u t s i g n a l . I f t h e r e i s more than one s o l u t i o n f o r t h i s problem, any one o f them c o u l d be chosen as a c o r r e c t s e t o f a l l o c a t e d numbers t o t h e signals. The a p p l i c a t i o n o f t h e p r o c e d u r e t o t h e spectrum o f compound 1 i s d e s c r i b e d below. The i n p u t s i g n a l s shown i n F i g . 3 a r e c a l s s i f i e d i n t o e i t h e r p r o t o n a t e d o r n o n - p r o t o n a t e d c a t e g o r y where s i g n a l s number 1,2,4, 5 and 6 a r e grouped i n t o t h e former and 3,7 and 8 a r e grouped i n t o t h e l a t t e r . Through t h e p r o c e d u r e o f p r o t o n a t e d c a t e g o r y t h e AOC i s a p p r a i s e d as 5 and 6 because i s c a l c u l a t e d as 6 ( 9 - 3 ) and R2 i s e q u a l t o 5. The c o r r e s p o n d i n g s e t s o f t h e CNS a r e shown below where each i n t e g e r v a l u e e n c l o s e d by p a r e n t h e s i s i s a l l o c a t e d number o f c a r b o n s . signal number
1
2
4
5
6
A0C=5
0.63 (*)
1.72 (2)
0.90 (1)
0.80 (1)
0.94 (1)
AOC=6
0.76 (1)
2.06 (2)
1.08 (1)
0.96 (1)
1.13 (1)
S i n c e i t i s i m p o s s i b l e t o a l l o c a t e carbons t o s i g n a l number 1 a t t h e f i r s t s e t , t h i s s e t i s abandoned. T h e r e f o r e o n l y one s o l u t i o n i s d e r i v e d from the case where t h e AOC i s e q u a l t o 6. A t the f o l l o w i n g s t a g e , t h e AOC f o r n o n - p r o t o n a t e d c a t e g o r y i s f i x ed t o 3, and so each r e s i d u a l s i g n a l must be a l l o c a t e d t o one carbon i n d i v i d u a l l y . The f i n a l r e s u l t o f a l l o c a t e d number i s as f o l l o w s : signal number allocated number
1 1
2 2
3 1
4 1
5 1
6 1
7 1
8 1
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
8.
YAMASAKI
ET AL.
Structure
of Organic
Compounds
115 1
E x a m i n a t i o n o f t h e p r e s e n c e o f 'components . Now we have c o n f i r m e d two k i n d s o f i n f o r m a t i o n about a g i v e n C NMR s p e c t r a l d a t a . They a r e t h e amount o f carbons a s s i g n e d t o each s i g n a l and n a t u r e o f carbons (protonated o r non-protonated). By c o n s i d e r i n g t h e i n f o r m a t i o n , t h e p o s s i b l e p r e s e n c e o f each 'component' i s examined and t h o s e which a r e i n c o n s i s t e n t w i t h t h e i n f o r m a t i o n a r e abandoned. The p r e s e n c e o f each 'components' i s judged t o be a p p r o p r i a t e by i t s c h e m i c a l s h i f t range ( r e f e r t o T a b l e I ) , i n o t h e r words, i f t h e r e a r e no s i g n a l s w i t h i n a c h e m i c a l s h i f t range c o r r e s p o n d i n g t o a 'component', i t i s judged t o be n o t p r e s e n t i n a sample compound. As shown i n F i g . 5, twenty-nine components s u r v i v e for compound 1., through t h e p r i m a r y a n a l y s i s . The r e s u l t o f t h e p r i m a r y a n a l y s i s i s r e p r e s e n t e d by the m a t r i x named NM m a t r i x , i n which each row i s c o r r e sponding t o a s u r v i v e d 'component' and each column t o each s i g n a l o f t h e g i v e n 13c NMR spectrum. Each mat r i x element i n d i c a t e s maximum number o f t h e carbons f o r 'component' a s s i g n e d t o t h e c o r r e s p o n d i n g s i g n a l . Those elements w i t h v a l u e -1 i n d i c a t e c o r r e s p o n d i n g 'components' were n o t a s s i g n e d t o t h e s i g n a l s .
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
1 3
SECONDARY ANALYSIS. A t t h e f i r s t s t e p o f t h i s r o u t i n e , a s e t o f 'components' which i s c o n s i s t e n t w i t h t h e m o l e c u l a r f o r m u l a i s s e l e c t e d from s u r v i v e d 'components'. One o f t h e f i v e s e t s which was f i n a l l y g e n e r a t e d f o r compound 1_ i s shown i n F i g . 6. As d e s c r i b e d b e f o r e , each o f t h e s i g n a l s i s t r e a t e d as i f i t were independent o f t h e o t h e r s and the 'components' which can be a s s i g n e d t o a t l e a s t one s i g n a l s u r v i v e w i t h o u t any f u r t h e r e x a m i n a t i o n a t t h e primary a n a l y s i s . However, i t i s n e c e s s a r y t o examine whether t h e s e t i s c o n s i s t e n t w i t h t h e g i v e n spectrum o r n o t , i n o t h e r words, each o f a l l 'components' o f t h e s e t s h o u l d be c o n f i r m e d whether they a r e f u l l y consistent w i t h t h e i n p u t spectrum w i t h n e i t h e r excess n o r d e f i ciency. To make t h i s e x a m i n a t i o n , t h e s e l e c t i v e NM m a t r i x i s made f o r t h e s e t by e x t r a c t i n g t h e rows c o r r e s p o n d i n g t o s e l e c t e d 'components' from NM mateix shown i n F i g . 5. T h i s s e l e c t i v e m a t r i x i s shown i n F i g . 6. As shown i n F i g . 7 , t h i s m a t r i x N i s c o n v e r t e d i n t o another m a t r i x X by s u b s t i t u t i n g t h e p o s i t i v e elements by v a r i a b l e s ( x ^ j ) and t h e n e g a t i v e elements by z e r o s . A s e t o f simultaneous l i n e a r equations i s made from X and two c o n s t n a t v e c t o r s C and D, r e p r e -
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
COMPUTER-ASSISTED STRUCTURE ELUCIDATION
NO
CMP
SUB/STRUCTURE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
10 11 12 14 17 33 38 40 106 107 108 109 118 143 144 145 146 153 172 173 174 175 177 182 184 185 186 187 188
GEM-DI M E T H Y L - ( D ) GEM-DI M E T H Y L - ( T ) GEM-DI M E T H Y L - ( C ) CH3-CO(Y) (T) CH3-COCH3(D) CH3COCD) CH3CO(C) -CH2(C)(K) -CH2(C)(D) -CH2(C)(T) -CH2(C)(C) -CH= :C= < 0 L E F I N > =C= =C= FURAN(O) -0-CO(C)(D) -CO(C)(T) -CO(C)(C)
NM
o=c= Y Y C C C C
(0) (C) (Y) (K) (D) (T) (C)
c
SAMPLE
MATRIX
1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1
2 2 2 2 2 2 -1 2 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 1 1 1 1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 1 1 -1 -1 -1 -1 -1
1 1 1 1 -1 -1 -1 -1 -1 -1 -1
X
1 J O B END
Figure
5.
Survived
components
of compound analysis
1 through
C
13
NMR
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
data
8.
YAMASAKI E T A L .
Structure
of Organic
number 1
12
GEM-DIMETHYL-
(C)
2
33
CH 3
number
of
of
selective
carbons
'components'
117
Compounds
2#
-
(D)
-1
-
1
3
106
-CH -
(C)(K)
4
107
-CH -
(C)(D)
5
118
-CH=
-1
6
143
-C=
-1
-
7
172
-CO-
(C)(D)
-1
-
8
188
-c-
(C)
-1
-
1
1
2
2
1
( 1 2
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
NM m a t r
-1
-
1
allocation Only methyl
carbon
Figure
f
is
considered
Selective
6.
components
1
2 -1 -1 -1 -1 -1 -1 >
1
2 -1 -1 -1 -1 -1 -1
-1 -1 -1
1
1 -1 -1 -1
-1 -1 -1
1
1 -1 -1 -1
-1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1
s
1 -1
a
)
a
n
1
1
1
1
1
1
1 )
d:
( 1
of
1
0 0 0 0 0 0
0 0 0 0 0 0 X 0 0 0 0 0 0 83 0
0 0 0 0\ 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 67 0 0 0 0 78 0 0 0 0 )
2
1
1
x:
0 0 0 0 0 0
1
group
for the fifth set of compound
1 -1 -1 -1 -1
1
( 2
gem-dimethyl
1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
in
1
number
carbons #
1
X
1
X
X
1
1
1 )
d
N' X' O D mean s e l e c t i v e NM m a t r i x , s e l e c t i v e NM m a t r i x r e p l a c e d by XJLj, m o d i f i e d 'component' v e c t o r a n d a l l o c a t i o n v e c t o r , r e s p e c t i v e l y .
r x •i = c ^ b)
I
•
X
representation o f simultaneous v e c t o r having e i g h t elements.
Figure
7.
Representation
of
=
D
linear
equations
simultaneous linear pound 1
equations
where I means u n i t row
for
the
fifth
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
set of
com-
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
118
COMPUTER-ASSISTED
STRUCTURE
ELUCIDATION
s e n t i n g carbon numbers i n t h e 'components' and a l l o c a t e d carbon numbers, r e s p e c t i v e l y . The number o f e q u a t i o n s i s t h e number o f 'components' i n t h e s e t plus that of the s i g n a l s . The e q u a t i o n s have a r e s t r i c t i o n , t h a t t h e v a r i a b l e x^- s h o u l d n o t exceed t h e range between z e r o and t h e v a l u e o f the c o r r e s p o n d i n g s e l e c t i v e m a t r i x element. To s o l v e t h e s e s i m u l t a n e o u s e q u a t i o n s i s t h e major function of t h i s routine. When no s o l u t i o n i s o b t a i n e d , the s e t i s judged t o be i n a p p r o p r i a t e one, and when a s o l u t i o n i s g i v e n , the s e t i s s e n t t o t h e f o l l o w i n g r o u t i n e (the s t r u c ture generator). At the f i n a l stage o f the s p e c t r a l a n a l y s i s , f i v e s e t s o f components which a r e g e n e r a t e d from twentyn i n e components a r e s e l e c t e d as p l a u s i b l e ones f o r compound 1. F i v e s e t s a r e shown as f o l l o w s , numera l i n p a r e n t h e s i s e x p r e s s e s number o f t h e component; NO.
1
10 (1), 38 (1), 107 (1), 109 (1), 118 (1), 143 (1), 189 (1)
NO.
2
10 (1), 40 (1), 106 (1), 107(1), 118 (1), 143(1), 189 (1)
NO.
3
12 (1), 38(1), 107 (2), 118(1), 143 (1), 189 (1),
NO.
4
10 (1), 33 (1), 106 (1), 109 (1), 118 (1), 143 (1), 172 (1), 189(1)
NO.
5
12 (1), 33 (1), 106 (1), 107 (1), 118 (1), 143 (1), 172 (1), 189 (1)
The o v e r a l l p r o c e s s t h a t 189 components a r e r e duced i n t o 29 by means o f t h e e x a m i n a t i o n o f m o l e c u l a r f o r m u l a f o l l o w e d by t h e s u c c e s s i v e a n a l y s e s o f IR, 1H NMR and C NMR i s shown i n F i g . 8. In f i g . 8, num e r a l s 10 8, 105, 59 and 29 i n p a r e n t h e s e s i n d i c a t e t h e amounts o f s u r v i v e d components by s u c c e s s i v e restrict i o n s o f m o l e c u l a r f o r m u l a , IR, J-H NMR and 13c NMR, respectively. Only f i v e s e t s o f components u n c o n t r a d i c t o r y w i t h m o l e c u l a r f o r m u l a and g i v e n NMR spectrum a r e p i c k e d up from t h e s e twelve components. Finally, the s t r u c t u r e generator(L3) i s a p p l i e d t o generate the s t r u c t u r e s from each s e t o f components so t h a t 3, 1, 2, 3 and 3 s t r u c t u r e ( s ) produced f o r s e t s , 1, 2, 3, 4 and 5. These s t r u c t u r e s a r e shown i n F i g . 9 as i n f o r m a t i o n a l homologues f o r t h e i n p u t m o l e c u l a r f o r m u l a and c h e m i c a l s p e c t r a . The u n d e r l i n e d one i s t h e s t r u c t u r e o f t h e compound 1. 1 3
PREPARATION OF CHEMICAL SHIFT TABLE. A c h e m i c a l s h i f t ranges f o r a s i g n a l o f a 'component' was d e t e r mined f o r the a n a l y s i s d e s c r i b e d i n the previous
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
YAMASAKI E T A L .
Structure
of
Organic
119
Compounds
189 COMPONENTS Molecular Formula ( C
9
H
1 4
^
0 )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 27 29 30 31 32 33 34 38 39 40 41 42 43 44 45 46 49 52 53 55 57 58 59 60 61 67 71 76 79 80 82 84 85 86 87 88 90 92 93 99 100 101 102 104 105 106 107 108 109 110 113 114 115 116 117 118 126 127 136 137 138 139 140 141 142 143 144 145 146 153 165 172 173 174 175 176 177 182 183 184 185 186 187 188 (108)
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
IR S p e c t r a l Data 1 2 3 4 5 6 7 8 9 10 11 12 13 22 23 24 25 27 29 30 31 32 33 34 38 39 49 52 53 55 57 58 59 60 61 67 71 76 79 90 92 93 99 100 101 102 104 105 106 107 108 109 137 138 139 140 141 142 143 144 145 146 153 165 172 184 185 186 187 188 (105)
14 40 80 110 173
15 16 17 18 19 21 41 42 43 44 45 46 82 84 85 86 87 88 114 115 116 117 118 136 174 175 176 177 182 183
iH NMR S p e c t r a l Data 7 9 10 11 12 13 14 15 16 17 18 33 34 38 39 40 41 43 44 45 46 67 71 76 79 80 85 86 87 88 104 106 107 108 109 116 117 118 141 142 143 144 145 146 153 165 172 173 174 175 176 177 182 183 184 185 186 187 188 (59) 1 3
C NMR S p e c t r a l Data
5»»
10 11 12 14 17 33 38 40 106 107 108 109 118 143 144 145 146 153 172 173 174 175 177 182 184 185 186187 188 (29)
selected 'components' 10 12 33 38 40 106 107 109 118 143 172 189
generated structures
set o f • components' #2 #4 #3
#1 1 0 0 1 0 0 1 1 1 1 0 1
( Figure
8.
Feature
1 0 0 0 1 1 1 0 1 1 0 1
0 1 0 1 0 0 2 0 1 1 0 1
1 0 1 0 0 1 0 1 1 1 1 1
0 1 1 0 0 1 1 0 1 1 1 1
1
2
3
3
1
3
#5
1 1 )
of reducing the number of components analyses of compound 1
through
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
consecutive
120
COMPUTER-ASSISTED
STRUCTURE
ELUCIDATION
s e c t i o n i n t h e f o l l o w i n g way. The components which c o n t a i n carbon atoms a r e 177 o u t o f e n t i r e 189. F o r t h o s e 'components , t h e i r c h e m i c a l s h i f t v a l u e s i n v a r i o u s k i n d s o f compounds were c o l l e c t e d from s e v e r a l s o u r c e s ( 1 4 , 1 5 , 1 6 ) . The c o l l e c t e d d a t a f o r 'component no.25 o f m e t h y l c a r b o n s , as an example, a r e shown i n F i g . 10. By u s i n g t h e s e d a t a , the c h e m i c a l s h i f t range f o r t h e 'component' i s o b t a i n e d as f o l l o w s . 1
1
1
1
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
i.
ii.
An assumed r e g i o n o f the mean v a l u e ( y ) i s c a l c u l a t e d by means o f common s t a t i s t i c a l procedure. An a r b i t r a r y v a l u e up.
(y')
i n the r e g i o n i s p i c k e d
iii.
The s t a n d a r d d e v i a t i o n ( a ) lated.
iv.
Whether a l l t h e c o l l e c t e d d a t a f o r the 'component' a r e w i t h i n t h e range between TT' - 3 a t o I T ' + 3 a i s examined.
v.
y~' i s c a l c u -
f o r the
I f n o t , the TT' i s updated and p r o c e d u r e s i i i and i v a r e r e p e a t e d , i f i t i s , the v a l u e s I T ' - 3 a and y ' + 3 a a r e determined as the upper and lower l i m i t s o f the s h i f t o f the 'component' r e spectively. -
The assumed r e g i o n o f mean v a l u e o f component 25 was c a l c u l a t e d as 19.15 - 24.48ppm based on v a r i o u s k i n d s o f d a t a s o u r c e s as shown i n F i g . 10. Here, an a p p a r e n t mean v a l u e o f t h e s e c o l l e c t e d d a t a i s 21.8ppm and t h i s i s an i n i t i a l v a l u e o f y"' . Some d a t a o f samples a r e o f t e n o u t o f the normal G a u s s i a n d i s t r i b u t i o n , t h e r e f o r e s t a n d a r d d e v i a t i o n has t o be c o n s i d e r e d s e p a r a t e l y i n h i g h e r magnetic f i e l d ( a ) and lower magnetic f i e l d ( C L ) compared w i t h y ' , f o r d e t e r m i n a t i o n o f the s t a n d a r d d e v i a t i o n f o r y . The y ' i s renewed by ' f l i p - f l o p ' u n t i l l y ' - 3 a and y* + 3 a can i n c l u d e the whole sampling d a t a . In case o f component 25, mean v a l u e i s f i n a l l y found o u t t o be 21.4ppm, when a =2.05 and a =1.39. The upper and lower l i m i t s o f the s h i f t determined a c c o r d i n g t o t h i s manner i s 15.21 - 25.53ppm which i s r e g i s t e r e d i n T a b l e I. T h i s p r o c e d u r e i s a p p l i e d t o a l l 'components' and the c h e m i c a l s h i f t t a b l e i s o b t a i n e d as shown i n T a b l e I. H
1
H
R
Result and
The
1
L
L
Discussion
r e s u l t o b t a i n e d f o r twenty two
compounds by
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
8.
YAMASAKI E T A L .
Structure
of Organic
121
Compounds
obtained shift
chemical
range
assumed r e g i o n ^
mean
of
value
-
"
CH
> ( C ) 3
iteration sample
•
mean
1
I
'
15.0
Figure
10.
' —
1
— I —
1
20.0
Estimation
—
1
—
1
—
1
—
1
—
1
—
1
1—
t i m e s = 46
=
value
21.4
1
25.0
chemical
of C NMR chemical shift range of component 13
= 32
amount
shift
#25
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
122
COMPUTER-ASSISTED
Table I
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 #
ELUCIDATION
Components and t h e i r appearance range of T3c NMR chemical
NO
STRUCTURE
shift
COMPONENT TERT-BUTYLTERT-BUTYLTERT-BUTYLTERT-BUTYLTERT-BUTYLTERT-BUTYLGEM-DIMETHYLGEM-DIMETHYLGEM-DIMETHYLGEM-DIMETHYLGEM-DIMETHYLGEM-DIMETHYLCH3-COCH3-COCH3-COCH3-COCH3-COCH3-COISO-PROPYLISO-PROPYLISO-PROPYLISO-PROPYLISO-PROPYLISO-PROPYLISO-PROPYLCH30CH30CH30CH30CH30CH30CH3CH3CH3CH3C0CH3C0CH3COCH3COCH3COCH3CO-
SHIFT (0># (Y) (K) (D) (T) (C) CO) (Y) (K) CD) (T) (C) (0) (Y) (K) (D) (T) (C) (0) (A) (Y) CIO CD) CT) CO CO) CY) CK) CD) CT) CO CY) CD) CT) CO) CY) CO CD) CT) CO
RANGE (ppm)
26,02 *«•* 31.13 24.47 * * * * 33,57 2 5 . 4 8 *•»» 3 4 , 0 4 2 8 . 2 3 * * * * 36.78 25.48 **** 34.04 23.65 * * « * 32.97 27.42 * * * * 32.95 10.72 36.27 36.27 10.12 14.72 36.27 36.27 10,12 6.80 * * * * 3 2 . 6 1 4,58 * * * * 3 2 . 0 1 4 . 5 8 *#•* 3 2 , 0 1 5.25 ***» 1 5 . 5 0 10.43 21.53 4.58 32.01 9.92 12.97 25.83 15.09 16.63 25.83 25.45 20.95 15,09 **** 23.87 16.33 25.83 15.09 25.83 15.21 25.53 52.88 61.61 54.59 57,92 50.34 52.53 56.68 * * « * 61.51 52.88 61.51 60.60 49.95 7.26 26.10 7.06 • ••• 3 3 . 0 8 -2.49 **** 8.49 19.81 * * * * 23.39 22.95 31.79 22,95 33.92 22.22 28,15 8.49 -2,49 2 0 . 8 0 #*»» 3 0 . 0 1
means t h e a d j a c e n t atom o r f u n c t i o n a l g r o u p , t h e y a r e , s a t u r a t e d oxygen (O), a r o m a t i c c a r b o n ( Y ) , c a r b o n y l carbon(K), o l e f i n i c carbon(D), a c e t y l e n i c carbon(T), and s a t u r a t e d c a r b o n ( C ) , r e s p e c t i v e l y .
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
8.
YAMASAKI E T A L . Table
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 *1 22
II
Structure
Results obtained f o r
a-Methyltetrahydrofuran p-Quinone 2-Methylpentane 3-Methylpentane 2,3-Dimethylbutane 3-Heptanone 2-Heptanone m-Xylene E t h y l benzene Cyclohexylacetate 2-0ctanol Coumarine Isophorone Diisobutylketone n-Nonanol Dicyclopentadiene Verbenone Camphor n-Decanol 2-Cyclohexylcyclohexanone 3-Ionone Methyl m y r i s t a t e
III
Results
several
5 6 6 6 6 7 7 8 8 8 8 9 9 9 9 10 10 10 10 12 13 15
obtained
compounds by CHEMICS
10 4 14 14 14 14 14 10 10 14 18 6 14 18 20 12 14 16 22 20 20 30
by
123
Compounds
through
molecular C H
compound
Table
of Organic
number o f I H _ I R , I H ' N M R " through I R , H N M R , analysis analysis 1
1 2 1 1 1 2 1 21 5 1 1 116 12 1 1 41 42 75 1 147 481 1
10 589 3 3 4 3 4 40 5 161 38 834 1895 30 24 1729 53274< 3253 50 2109 57827< 4767
utilizing
various
combinations
of
information
sources
compound
C
H
analytical
0
mode*
number o f ' i n f o r m a t i o n a l homologues'
P
3
6
C 7
3-Heptanone
14
3 2
P+C
1
C+0
2
P+C+0
38
P 2-0ctanol
8
18
41
C P+C
1
13 1
C+0
1
P+C+0
1895
P Isophorone
*
P, C,
9
and 0
off-resonance
mean
14
12
P+C C+0 P+C+0
1
an a n a l y s i s
spectra,
27
C
of
iH
27
12
NMR,
13c
NMR
and/or
respectively.
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1 3
CNMR
124
COMPUTER-ASSISTED
STRUCTURE
ELUCIDATION
means o f t h e o l d system and new system a r e p r e s e n t e d i n Table II. The c o r r e c t s t r u c t u r e has been always gene r a t e d among t h e p l a u s i b l e s t r u c t u r e s . The numbers o f i n f o r m a t i o n a l homologues o b t a i n e d by means o f ASSINC a r e reduced t o 19.7 p e r c e n t ( i n s i m p l e average) o r 3.1 p e r c e n t ( i n weighted average) o f those o b t a i n e d by means o f t h e o l d system where o n l y IR and NMR d a t a were a n a l y z e d . As a r e s u l t o f t h e a d d i t i o n o f C NMR s p e c t r a l d a t a a n a l y s i s , i t becomes p o s s i b l e t o d e c r e a s e remarkably t h e numbers o f i n f o r m a t i o n a l homologues. F o r example, t h e number was reduced from 4767 t o one f o r compound 20_ as shown i n T a b l e I I . T a b l e III shows t h e number o f i n f o r m a t i o n a l homologues o f s e v e r a l compounds o b t a i n e d by u t i l i z i n g v a r i o u s c o m b i n a t i o n s o f i n f o r m a t i o n s o u r c e s , namely, 1H NMR, 13c NMR, l H NMR p l u s 13c NMR, 13c NMR p l u s i t s o f f - r e s o n a n c e d a t a , and 1H NMR p l u s 13c NMR p l u s o f f resonance d a t a . As shown i n t h i s t a b l e , t h e number o f i n f o r m a t i o n a l homologues and t h e number o f t h e component s e t s a r e b o t h d e c r e a s e d i n a c c o r d a n c e w i t h t h e a d d i t i o n o f new i n f o r m a t i o n s o u r c e s . In c o n c l u s i o n , t h e number o f t h e ' i n f o r m a t i o n a l homologues' and t h e 'component' s e t s a r e s a t i s f a c t o r i l y reduced by c o n s e c u t i v e a n a l y s e s . As mentioned above, the e f f o r t s t o reduce t h e e x c e s s i v e 'components' b e a r good f r u i t s , i . e . , t h e number o f t h e produced s e t s a r e l e s s than t e n f o r a l l c a s e s . Therefore, the informat i o n about t h e c o n e c t i v i t i e s between a l l t h e 'components ' i n a s e t become i m p o r t a n t d a t a t o i n c l u d e i n a f u t u r e system. That k i n d o f i n f o r m a t i o n w i l l work e f f e c t i v e l y t o r e duce t h e e x c e s s i v e ' i n f o r m a t i o n a l homologues' and n u c l e a r magnetic resonance t e c h n i q u e s w i l l g i v e such information.
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
1 3
T h i s work was s u p p o r t e d i n p a r t by a S c i e n t i f i c R e s e a r c h Grant from t h e M i n i s t r y o f E d u c a t i o n , Japan.
(1) (2) (3) (4) (5) (6)
Literature Cited Schwarzenbach,R.,Meili,J.,Koenitzer,H. and Clerc, J.T., Org. Mag. Resonance, (1976),8,11 Bremser,W.,Klier,M. and Meyer,E., ibid, (1975), 7,97 Carhart,R.E.,Smith,D.H.,Brown,H. and Djerassi,C., J . Am. Chem. Soc., (1975), 97, 5755 Beech,G.,Jones,R.T. and M i l l e r , K . , Anal. Chem., (1976), 46, 714 Gray,N.A.B., ibid, (1975), 47, 2426 Sasaki,S. et a l , Mikrochimica Acta(Wien), (1971), 726
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
8.
YAMASAKI E T A L .
(7)
(8) (9) (10) (11) (12)
Downloaded by TUFTS UNIV on November 21, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch008
(13) (14) (15) (16)
Structure
of Organic
Compounds
125
S a s a k i , S . , CHEMICS-F in " I n f o r m a t i o n C h e m i s t r y " , p227, The U n i v e r s i t y o f Tokyo P r e s s , T o k y o , 1975, and the detail o f CHEMICS-F will be r e p o r t e d i n the near f u t u r e . M i y a s h i t a , Y . and Sasaki,S., Jpn.Chem.Soc. Meeting, (1975), I, 174 Y a m a s a k i , T . and Sasaki,S., Jpn. Anal., (1975),213 unpublished Ochiai,S.,Hirota,Y.,Kudo,Y. and Sasaki,S., Jpn. Anal., (1973), 22, 399 Stother,J.B., "Carbon-13 NMR S p e c t r o s c o p y " , Academic P r e s s , New Y o r k , 1972 K u d o , Y . and Sasaki,S., J.Chem. I n f . Comput. Sci., (1976), 16, 43 B e a c h , L . B . , "API 44 S e l e c t e d 13CNMR S p e c t r a l Data" API Research P r o j e c t 44 Publication, T e x a s , 1975 N u c l e a r M a g n e t i c Resonance S p e c t r a l S e a r c h System, NIH/EPA, USA J o h n s o n , L . F . and J a n k o w s k i , W . C . , "Carbon-13 NMR S p e c t r a " , W i l e y I n t e r s c i e n c e , New Y o r k , 1972
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.