1 Computer-Assisted Structure Identification of U n k n o w n Mass Spectra
Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001
R. VENKATARAGHAVAN, H. E. DAYRINGER, G. M. PESYNA, B. L. ATWATER, I. K. MUN, M. M. CONE, and F. W. McLAFFERTY Department of Chemistry, Cornell University, Ithaca, NY 14853
Mass spectrometry has become a routine technique for structure identification in a number of applications (1). Gas chromatograph/mass spectrometer/computer (GC/MS/COM) systems capable of producing a mass spectrum every second are commercially available (2). Voluminous amounts of data are generated with such systems using subnanogram amounts of sample. For full utilization of this highly specific information it is essential to employ computer techniques. Such computer-aided structure identification from mass spectrometric data has taken two distinct directions (3). The first utilizes "retrieval" systems which compare the unknown data to a library of reference spectra to report compounds with a high degree of similarity. A number of techniques have been employed for the retrieval approach (3). The second approach involves interpretive schemes that attempt to identify part or all of the unknown structure from correlations of mass spectral fragmentation behavior. Pattern recognition (4) and artificial intelligence (5) are examples of such schemes that have been employed for interpreting mass spectral data of specific classes of compounds. We will describe here a retrieval Probability Based Matching (PBM) system (6, 7) and an interpretive Self-Training Interpretive and Retrieval System (STIRS) (8 -11)developed for the analysis of low resolution mass spectra. Both these systems are available on a computer network (TYMNET) from an IBM-370/168 computer system at Cornell University to outside users. Probability Based Matching System It has been shown that to increase the relevancy of information retrieved from a library of data it is essential to attach proper weighting to the contents of the system (12). The PBM system employs a probability weighting to both the mass and abundance data (6, 7) • The abundance values are weighted according to a log normal distribution (13) and the masses are given a uniqueness value based on their occurrence probability 1 Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001
2
COMPUTER-ASSISTED STRUCTURE ELUCIDATION
i n a m a s s s p e c t r a l d a t a b a s e of 1 8 , 8 0 6 different c o m p o u n d s (14). The P B M s y s t e m a l s o u s e s a r e v e r s e s e a r c h s t r a t e g y , i n d e p e n d e n t l y p r o p o s e d b y A b r a m s o n (15), w h i c h i s v a l u a b l e i n i d e n t i f y i n g components of a m i x t u r e . This t e c h n i q u e demands that the p e a k s of the r e f e r e n c e s p e c t r u m be p r e s e n t i n the u n k n o w n , but not that a l l p e a k s of the u n k n o w n be p r e s e n t i n the r e f e r e n c e . The d e g r e e of m a t c h of the r e f e r e n c e to the u n k n o w n i s i n d i c a t e d w i t h a c o n fidence i n d e x K, b a s e d on the s t a t i s t i c a l p r o b a b i l i t y that this degree of m a t c h o c c u r r e d by c o i n c i d e n c e ; d e t a i l s of the method h a v e b e e n d e s c r i b e d e l s e w h e r e (6, 7 ) . A s t a t i s t i c a l e v a l u a t i o n of P B M ' s performance w a s made u s i n g " u n k n o w n " m a s s s p e c t r a , for e a c h of w h i c h at l e a s t one other s p e c t r u m of the same compound w a s present i n the d a t a b a s e . L o w a n d h i g h m o l e c u l a r w e i g h t s e t s , e a c h of ~ 4 0 0 u n k n o w n s p e c t r a r e m o v e d at r a n d o m from the d a t a b a s e , w e r e r u n t h r o u g h the P B M s y s t e m , and the r e s u l t s evaluated u s i n g r e c a l l and r e l i a b i l i t y as measures of performance. R e c a l l (RC) i s d e f i n e d a s the number of r e l e v a n t s p e c t r a a c t u a l l y r e t r i e v e d and r e l i a b i l i t y (RL) i s t h e p r o p o r t i o n o f r e t r i e v e d s p e c t r a w h i c h a r e a c t u a l l y relevant. In a d d i t i o n to t h e s e terms i t i s d e s i r a b l e to e x p r e s s the performance of automated s y s t e m s i n terms of f a l s e p o s i t i v e s (FP), the p r o p o r t i o n of s p e c t r a p r e d i c t e d i n c o r r e c t l y (16). R
C
=
:
c
/
V (
I
c
P
(2)
V
+
FP = I / P f
(1)
c
(3)
f
where I = number of c o r r e c t p r e d i c t i o n s , P = t o t a l p o s s i b l e n u m b e r o f c o r r e c t p r e d i c t i o n s , If = n u m b e r o f f a l s e p r e d i c t i o n s , and P = t o t a l p o s s i b l e number of f a l s e p r e d i c t i o n s . At the 50% r e c a l l l e v e l the r e l i a b i l i t i e s for the l o w and h i g h m o l e c u l a r w e i g h t sets w e r e 65% and 4 2 % , counting as correct only predicted s t r u c tures w h i c h are i d e n t i c a l to the u n k n o w n . I n v a r i a b l y r e t r i e v a l s y s t e m s p r e d i c t s i m i l a r s t r u c t u r e s i n a d d i t i o n to the i d e n t i c a l s t r u c t u r e . In the e v a l u a t i o n of P B M r e s u l t s four c l a s s e s of s i m i l a r i t y w e r e d e f i n e d : I, i d e n t i c a l c o m p o u n d or s t e r e o i s o m e r ; I I , c l a s s I o r a r i n g p o s i t i o n i s o m e r ; I I I , c l a s s II o r a h o m o l o g ; I V , c l a s s III o r a n i s o m e r o f c l a s s III c o m p o u n d f o r m e d b y m o v i n g o n l y o n e c a r b o n a t o m . It w a s f o u n d t h a t w h e n c l a s s IV t y p e c o m pounds were a c c e p t e d as correct predictions the r e l i a b i l i t y of the s y s t e m i n c r e a s e d to 95% at the same r e c a l l l e v e l . R e c e n t l y , i t has b e e n found that the performance of P B M for the i d e n t i f i c a t i o n of c o m p o n e n t s i n a m i x t u r e c a n be e n h a n c e d (17) b y i n c o r p o r a t i n g a s p e c t r u m s u b t r a c t i o n p r o c e d u r e s i m i l a r t o the one p r o p o s e d b y H i t e s a n d B i e m a n n (18). The method subtracts the reference compound matched by P B M w i t h the h i g h e s t c o n f i d e n c e i n d e x (or a n y o t h e r i n t h e l i s t o f p r e d i c t e d s p e c t r a ) f r o m t h e unknown spectrum and matches the r e s i d u a l peaks against the c
c
f
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
VENKATARAGHAVAN E T A L .
Structure
of Unknown
Mass Spectra
3
reference f i l e b y P B M . This operation i s p a r t i c u l a r l y v a l u a b l e for identifying a minor component m i s s e d by the reverse search proc e d u r e w h e n there i s s u b s t a n t i a l o v e r l a p i n the s p e c t r a of the major and minor c o m p o n e n t , or w h e n amount of the latter f a l l s o u t s i d e the l i m i t s set for " p e r c e n t c o m p o n e n t " or " p e r c e n t c o n tamination" #
Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001
Self-Training Interpretive and Retrieval
System
The STIRS s y s t e m i s a n i n t e r p r e t i v e s c h e m e t h a t t r a i n s i t s e l f for the i d e n t i f i c a t i o n of different s t r u c t u r a l features i n a n u n k n o w n b y u t i l i z i n g s p e c i f i c c l a s s e s o f m a s s s p e c t r a l d a t a (8). Table I shows the fifteen data c l a s s e s u s e d ; although these have b e e n s e l e c t e d for t h e i r s t r u c t u r a l s i g n i f i c a n c e , there are no p r e d e s i g n a t e d c o r r e l a t i o n s of s p e c i f i c s p e c t r a l d a t a w i t h c o r r e s ponding s t r u c t u r e s . For e a c h unknown spectrum the system matches its data i n each c l a s s against the corresponding c l a s s d a t a of a l l r e f e r e n c e s p e c t r a a n d c o m p u t e s a m a t c h f a c t o r (MF) i n d i c a t i n g the degree of s i m i l a r i t y . In e a c h data c l a s s the fifteen r e f e r e n c e c o m p o u n d s o f h i g h e s t M F v a l u e s a r e s a v e d . If a p a r t i c u l a r substructure(s) i s found i n a s i g n i f i c a n t p r o p o r t i o n of t h e s e c o m p o u n d s , its p r e s e n c e i n the u n k n o w n i s p r o b a b l e . A b s e n c e of a s u b s t r u c t u r e i s not p r e d i c t e d , as the m a s s s p e c t r a l features of one s u b s t r u c t u r e c a n be made n e g l i g i b l e by the p r e s e n c e of a more p o w e r f u l f r a g m e n t a t i o n - d i r e c t i n g g r o u p . The d a t a b a s e for the s y s t e m i n c l u d e s i n f o r m a t i o n from 2 9 , 4 6 8 different o r g a n i c compounds containing the common elements H , C , N , O, F, S i , P, S, C I , B r , a n d / o r I. A l l s t r u c t u r e s of t h e s e c o m p o u n d s h a v e b e e n c o d e d i n W i s w e s s e r L i n e N o t a t i o n (WLN) to f a c i l i t a t e c o m puter h a n d l i n g of s t r u c t u r e d a t a . To u t i l i z e t h e i n f o r m a t i o n p r o v i d e d b y t h e STIRS s y s t e m , the r e s u l t s for e a c h d a t a c l a s s are e x a m i n e d and the common s t r u c t u r a l features i d e n t i f i e d . To a i d t h i s p r o c e s s , i n a r e c e n t l y i m p l e m e n t e d s y s t e m (9), t h e c o m p u t e r e x a m i n e s t h e d a t a for t h e p r e s e n c e of 179 f r e q u e n t l y f o u n d s u b s t r u c t u r e s (19). The p r o b a b i l i t y for the p r e s e n c e i n the u n k n o w n of e a c h s u b s t r u c t u r e i s predicted u s i n g a random drawing m o d e l . Knowing the frequency of o c c u r r e n c e of a s p e c i f i c s u b s t r u c t u r e i n the f i l e , t h i s m e t h o d i n d i c a t e s the p r o b a b i l i t y that the p r e d i c t i o n of its p r e s e n c e i n the u n k n o w n o c c u r r e d at r a n d o m . From t h i s p r o b a b i l i t y the c o n f i d e n c e for e a c h p r e d i c t i o n i s c a l c u l a t e d . For e x a m p l e , i n the STIRS d a t a b a s e t h e p h e n y l s u b s t r u c t u r e i s f o u n d to b e p r e s e n t i n 28% of the c o m p o u n d s . S t a t i s t i c a l l y on the a v e r a g e t h i s s u b s t r u c t u r e w o u l d o c c u r i n 4 o f a n y 15 c o m p o u n d s i n t h e d a t a b a s e , i n c l u d i n g t h e t o p 15 c o m p o u n d s s e l e c t e d i n a S T I R S d a t a c l a s s . O n t h e o t h e r h a n d i f p h e n y l i s f o u n d i n 10 o f t h e 15 c o m p o u n d s , the probability that this occurred by chance i s only 1 i n 113, so that the confidence i n the p h e n y l p r e d i c t i o n i s >99%, or a f a l s e p o s i t i v e s v a l u e of < 1 % .
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
4
COMPUTER-ASSISTED STRUCTURE ELUCIDATION
Table I.
M a s s S p e c t r a l D a t a C l a s s e s U s e d i n STIRS
Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001
Data Class
D e s c r i p t i o n , maximum number of p e a k s
I
Ion Series
(14 a m u s e p a r a t i o n )
2-4
Characteristic ions
250)
5C
Five
16-20, 30-38, 44-51, 59-65,72-76
6C
Five
26-28, 39-42, 52-56, 62-70, 80-84
7, 8
II
S e c o n d a r y n e u t r a l l o s s e s from most abundant o d d - m a s s (MF7) and e v e n - m a s s (MF8) l o s s
175)
Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001
1.
VENKATARAGHAVAN E T A L .
Structure
of Unknown
Mass Spectra
5
The s y s t e m h a s b e e n e x t e n s i v e l y t e s t e d for e a c h of the 179 s u b s t r u c t u r e s b y s e l e c t i n g 373 c o m p o u n d s a t r a n d o m from t h e d a t a b a s e (every 50th compound i n the Registry data) (20). If the d a t a s e t d i d n o t c o n t a i n a t l e a s t 30 c o m p o u n d s w i t h a p a r t i c u l a r s u b s t r u c t u r e , the required a d d i t i o n a l compounds were s e l e c t e d at r a n d o m t h a t c o n t a i n e d t h e s u b s t r u c t u r e . I f f e w e r t h a n 30 c o m pounds w i t h a g i v e n substructure w e r e a v a i l a b l e , a l l of them were selected. System performance i n each data c l a s s was evaluated by computing r e c a l l and r e l i a b i l i t y terms for e a c h s u b s t r u c t u r e . In contrast to equation 2 , the r e l i a b i l i t y term i n this c a s e i n c l u d e d a f a l s e p o s i t i v e factor, b e i n g set equal to RC/(RC + FP), s u c h that the v a l u e s reflect the s y s t e m performance averaged for c o m pounds c o n t a i n i n g and not c o n t a i n i n g the s u b s t r u c t u r e . This r e l i a b i l i t y term l e d to s u b s t a n t i a l c o n f u s i o n , s o that w e n o w f e e l that i t i s better to report performance of the s y s t e m i n terms of r e c a l l and f a l s e p o s i t i v e s (16), as d i s c u s s e d for P B M (equations 1 and 3). A n a l y s i s of the d a t a s h o w s that a l t h o u g h i n d i v i d u a l d a t a c l a s s e s are g o o d for s p e c i f i c s u b s t r u c t u r e i d e n t i f i c a t i o n , the b e s t p e r f o r m a n c e i s f o u n d i n t h e o v e r a l l m a t c h f a c t o r ( T a b l e I) r e s u l t s . T h i s i s due to the fact t h a t the o v e r a l l m a t c h factor d a t a c o m b i n e s the i n f o r m a t i o n d e r i v e d from the i n d i v i d u a l d a t a c l a s s e s . The overall match factor, M F 1 1 . 0 , w h i c h combines ion series, c h a r a c t e r i s t i c i o n s , and neutral l o s s data has been found to g i v e the most r e l i a b l e information on the different substructure p o s s i b i l i t i e s i n a n u n k n o w n c o m p o u n d . F o r t h e 179 s u b s t r u c t u r e s t e s t e d , t h e M F 1 1 . 0 g a v e a r e c a l l of 4 9 % a t 1.9% f a l s e p o s i t i v e l e v e l . A number of improvements have b e e n made to the c h a r a c t e r i s t i c i o n d a t a c l a s s e s (10) a n d t h e p r i m a r y n e u t r a l l o s s e s ( 1 1 ) ; the o v e r a l l m a t c h factors M F 1 1 . 1 and M F 1 1 . 2 h a v e b e e n found to g i v e a n a v e r a g e r e c a l l of 47% and 3 2 . 1 % , r e s p e c t i v e l y , at-H0-C H -C H N 0 R00C-C H ~C0-0-*-C H RC0NH-C H -C0-0-CH 6
4
e
2
4
3
4
6
3
2
6
3
2
e
7
e
6
3
4
e
4
e
3
3
2
3
4
3
3
4
3
3
6
3
4
3
4
2
5
4
4
4
e
4
6
2
4
e
4
e
4
6
3
4
11 10 9 10 20
p-H0-C H -C0—0-C H p-H0-C H -C0-0-fl-C Hg />~H0-C H -C0—0-CH m—H0-C H -C0-0-CH tf-H0-C H -C0-CH /fj-H0-C H ~C0-e0-CH )-R m-H0-C H -C0-CH p -H0-C«H -C0-CH /n-H0-C«H -*C0-0H)C0R 7?-R-0—C H -CO-OH 6
4
4
73+ 72+ 72+ 71+ 61**+
Data class 3B:/»/* 89-158
7
5
4
6
37 46 45 50
% Component 53% 64% 53% 38%
24% 26% 29% 30%
spectrum subtracted, residual spectrum run o n
Isopropylbenzene Isopropylbenzene Isopropylbenzene I s opropy lb en zen e 1 -Methyl-2 -ethylbenzene
50-
9
Mass Spectra
P B M R e s u l t s o n U n k n o w n a n d R e s i d u a l Spectra from
Compound
lOO-i
of Unknown
3
s
7
6
34% 34% 34% 43% 36%
77% 91% 83% 72% 74%
Oata class 5: losses of 0-64 C4H3O-CO—0— n — CjH C H 0-C0-0-5 — C H C H NH-CO-O - * - C H R0C H - CO - 0 - n — C H HSCH -C0-0-/» - C H C H -C0-0-5-C3H CH -C0—0-CH NR CH — S-n—C H C0-0-CH 4
6
3
5
6
4
7
S
7
s
2
6
S
5
3
2
s
3
4
3
Neutral Losses STIRS
results for the mass spectrum
of n-propyl
7
2
Data class 5 "(neutral losses)"
1.
7
7
m/e
Figure
7
3
p-hydroxybenzoate
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7
Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001
10
COMPUTER-ASSISTED STRUCTURE
ELUCIDATION
w h e n a r e f e r e n c e s p e c t r u m of t h i s c o m p o u n d , n - p r o p y l jD-hydroxyb e n z o a t e , w a s not i n the data b a s e . R e s u l t s f o r t h r e e o f t h e 15 d a t a c l a s s e s i l l u s t r a t e t h e " s e l f - t r a i n i n g " feature b y w h i c h STIRS i n d i c a t e s structural features of the u n k n o w n . D a t a c l a s s 2 A u t i l i z e s the l a r g e s t p e a k s i n t h e l o w m a s s r e g i o n of t h e s p e c t r u m (m/e 6 - 8 8 ) ; t h e s e f r a g m e n t i o n s a r e m o r e o f t e n f o r m e d b y s e c o n d ary r e a c t i o n s of h i g h e r energy r e q u i r e m e n t s , and so are i n d i c a t i v e of g r o s s , rather t h a n s p e c i f i c , s t r u c t u r a l f e a t u r e s . Thus a l l of the s p e c t r a found of h i g h e s t M F 2 A v a l u e s c o n t a i n e d a p h e n y l group, although the phenyl rings i n these compounds c o n t a i n a rather w i d e v a r i e t y of s u b s t i t u e n t s . The e x p e r i e n c e d m a s s s p e c t r o m e t r i s t p r o b a b l y w o u l d h a v e i n f e r r e d the p r e s e n c e of p h e n y l f r o m t h e " a r o m a t i c i o n s e r i e s " i n t h i s r e g i o n ; h o w e v e r , STIRS w a s not t r a i n e d s p e c i f i c a l l y to r e c o g n i z e t h e s e f e a t u r e s , but i n s t e a d i n d i c a t e d the p r e s e n c e of p h e n y l by f i n d i n g that s u c h c o m p o u n d s matched these data the most c l o s e l y . D a t a c l a s s 3B c o v e r s a h i g h e r m a s s r a n g e , w h o s e f r a g ment p e a k s s h o u l d be i n d i c a t i v e of more s p e c i f i c s t r u c t u r a l features. A g a i n a l l c o m p o u n d s of h i g h e s t M F 3 B v a l u e s c o n t a i n the p h e n y l g r o u p , but a l m o s t a l l of t h e m a l s o c o n t a i n a n a r y l h y d r o x y g r o u p (not ortho) a n d a c a r b o n y l . N o t e t h a t t h e l a t t e r i s contained i n carboxyl, ester, and keto functionalities; because STIRS i s d e s i g n e d t o p r o v i d e p o s i t i v e i n f o r m a t i o n , d a t a c l a s s 3B thus i n d i c a t e s the p r e s e n c e of H O - p h e n y l - C O - . D a t a c l a s s 5 employs " n e u t r a l l o s s " information, the differences i n mass b e t w e e n the observed fragment i o n and the m o l e c u l a r i o n , w h i c h i n t h i s c a s e i s a s s u m e d to b e m/e 1 8 0 . C l e a v a g e of the m o l e c u l a r i o n g i v e s two fragments, o n l y one of w h i c h holds the p o s i t i v e c h a r g e , and thus the n e u t r a l l o s t g e n e r a l l y c o n t a i n s the more e l e c t r o n e g a t i v e f u n c t i o n a l i t i e s . Illustrating t h i s , w h e n the m a s s e s r e p r e s e n t i n g the most common neutral l o s s e s of t h i s u n k n o w n w e r e matched a g a i n s t the w h o l e r e f e r e n c e f i l e , the h i g h e s t M F 5 v a l u e s w e r e found to be m a i n l y p r o p y l esters. To r e i t e r a t e , STIRS w a s not p r e p r o g r a m m e d to r e c o g n i z e p r o p y l e s t e r s f r o m t h e i r c o m m o n l o s s e s o f 4 1 , 4 2 , a n d 59 m a s s u n i t s ; STIRS i n e f f e c t t r a i n s i t s e l f to r e c o g n i z e t h e p r o p y l e s t e r f u n c t i o n a l i t y by f i n d i n g that t h e s e d a t a of the u n k n o w n w e r e matched best by propyl esters i n the f i l e . Note a l s o that the c o m pounds found by M F 5 d i d not c o n t a i n a p a r t i c u l a r l y s i g n i f i c a n t n u m b e r of p h e n y l g r o u p s ; t h e d i f f e r e n t d a t a c l a s s e s of STIRS h a v e been s e l e c t e d to be s e n s i t i v e to different f u n c t i o n a l i t i e s . STIRS h a s b e e n d e s i g n e d a s a n a i d to t h e i n t e r p r e t e r ; i f t h e i n t e r p r e t e r n o w a d d s up t h e m a s s of d i - s u b s t i t u t e d p h e n y l (76), h y d r o x y l (17), a n d p r o p y l e s t e r (87), he c a n n o t e t h a t t h e s u m c o r r e s p o n d s to the s u p p o s e d m o l e c u l a r w e i g h t , 1 8 0 , i n d i c a t i n g that a l l of the f u n c t i o n a l i t i e s of the u n k n o w n m o l e c u l e h a v e been i d e n t i f i e d by these three d a t a c l a s s e s of STIRS. STIRS: U n k n o w n Terpene. The m a s s s p e c t r u m of l2p-acetoxysandaracopimar-15-en-8p-, l l a - d i o l was examined
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
by
Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001
1.
VENKATARAGHAVAN E T A L .
Structure
of Unknown
Mass
Spectra
11
STIRS, o m i t t i n g a l l s p e c t r a of t h i s c o m p o u n d from the r e f e r e n c e file. The n i n e s t r u c t u r e s of h i g h e s t " o v e r a l l m a t c h f a c t o r " (MF11.0) v a l u e s are shown i n Figure 2. If t h e i d e n t i t y o f t h i s m o l e c u l e h a d b e e n t o t a l l y u n k n o w n to the i n t e r p r e t e r , t h e s e M F 1 1 . 0 s e l e c t i o n s s h o u l d h a v e i n d i c a t e d at l e a s t the g e n e r a l s t r u c t u r a l features of the m o l e c u l e to the interpreter. Thus a l l of the c o m p o u n d s of F i g u r e 2 h a v e e i t h e r three or four fused rings and a l l have the three f u s e d s i x membered r i n g s t h a t are a c t u a l l y p r e s e n t i n the u n k n o w n . The four t r i c y c l i c compounds c l o s e l y r e s e m b l e the correct structure i n h a v i n g m e t h y l g r o u p s i n t h e 4 , 4 , 1 0 , a n d 13 p o s i t i o n s , hydroxy at 8, and v i n y l at 1 3 . N o t e that three of the steroids contain a 5-hydroxy group, w h i c h c a n be v i e w e d as corresponding to the correct 8-hydroxy p o s i t i o n by " f l i p p i n g " the s t r u c t u r e s , w i t h their a c e t o x y groups then at l e a s t present i n the r i n g c o r r e s p o n d i n g to t h e r i n g c o n t a i n i n g t h e a c e t o x y group i n t h e u n k n o w n . The p r e s e n c e of h y d r o x y l a n d a c e t o x y g r o u p s are i n d i c a t e d b y the f a c t that e i g h t of the n i n e c o m p o u n d s c o n t a i n h y d r o x y l s a n d s e v e n c o n t a i n a c e t o x y g r o u p s ; o n l y t w o c o n t a i n more t h a n one h y d r o x y l g r o u p , w h i l e none c o n t a i n more t h a n one a c e t o x y . H o w e v e r , the compound does g i v e a m o l e c u l a r i o n , so that it s h o u l d be p o s s i b l e for the i n t e r p r e t e r to i n f e r c o r r e c t l y t h a t the u n k n o w n c o n t a i n s one acetoxy and two h y d r o x y l groups after d e d u c i n g the t r i c y c l i c s y s tem w i t h the other s u b s t i t u e n t s . A l s o , the steroid s e l e c t e d as the seventh compound has a 4 - g e m - d i m e t h y l group. For this u n k n o w n t h u s STIRS c a n g i v e f a i r c o n f i d e n c e i n a l l of t h e s t r u c t u r e a s s i g n m e n t s e x c e p t the p o s i t i o n of the a c e t o x y a n d one of the h y d r o x y g r o u p s ; there i s e v e n s o m e i n d i c a t i o n of t h e i r p o s i t i o n s , as i n the majority of s e l e c t e d s t r u c t u r e s of F i g u r e 2 t h e s e s u b stituents are o n the exterior r i n g b e a r i n g the bridgehead h y d r o x y l . P B M / S T I R S E x a m i n a t i o n of U n k n o w n S p e c t r a of F a t t y A c i d E s t e r s . In a n e a r l y c l a s s i c c a s e of n a t u r a l p r o d u c t s t r u c t u r e d e t e r m i n a t i o n b y m a s s s p e c t r o m e t r y (25) a c o m p o u n d i s o l a t e d a s the m e t h y l e s t e r from butterfat w a s i d e n t i f i e d to be m e t h y l 3 , 7 , 1 1 , 1 5 - t e t r a m e t h y l h e x a d e c a n o i c a c i d . T h e o r i g i n a l p u b l i s h e d (25) s p e c t r u m (omitted from the r e f e r e n c e file) w a s r u n t h r o u g h P B M a n d STIRS to g i v e the r e s u l t s s h o w n i n T a b l e I V . P B M c o r r e c t l y i d e n t i f i e d the compound as methyl p h y t a n o a t e , r e t r i e v i n g the two r e f e r e n c e s p e c t r a of t h i s compound i n the P B M reference file; note that the third s e l e c t i o n is a much p o o r e r m a t c h . The s u b s t r u c t u r e s i d e n t i f i e d b y STIRS M F 1 1 . 0 a n d 1 1 . 1 are c o r r e c t , although the a c e t a t e substructure i n d i c a t e d by M F 1 1 . 2 is not (Table IV). The b e s t - m a t c h i n g c o m p o u n d s found b y STIRS M F 1 1 . 0 a r e a l l m e t h y l e s t e r s of l o n g - c h a i n f a t t y a c i d s , and a l l but one has a methyl group i n the three p o s i t i o n . The p o s i t i o n s of the other m e t h y l groups w e r e not found b y STIRS, c o n s i s t e n t w i t h the rather s m a l l effect of s u c h methyl groups on the mass s p e c t r a .
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
OAc
STRUCTURE
ELUCIDATION
12/3 — Acetoxysandaracopimar— l 5 - e n - 8 / 3 , Ha — diol
Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001
(spectrum
MF
Figure
II.0
Best
2. Best-matching STIRS examination
not in file)
Matches:
compounds and their MF11.0 values found in the of 12j3-acetoxysandaracopimar-15-en-8p,li