Computer-Assisted Structure Identification of Unknown Mass Spectra

Jun 1, 1977 - Mass spectrometry has become a routine technique for structure identification in a number of applications (1). Gas chromatograph/mass ...
0 downloads 0 Views 2MB Size
1 Computer-Assisted Structure Identification of U n k n o w n Mass Spectra

Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001

R. VENKATARAGHAVAN, H. E. DAYRINGER, G. M. PESYNA, B. L. ATWATER, I. K. MUN, M. M. CONE, and F. W. McLAFFERTY Department of Chemistry, Cornell University, Ithaca, NY 14853

Mass spectrometry has become a routine technique for structure identification in a number of applications (1). Gas chromatograph/mass spectrometer/computer (GC/MS/COM) systems capable of producing a mass spectrum every second are commercially available (2). Voluminous amounts of data are generated with such systems using subnanogram amounts of sample. For full utilization of this highly specific information it is essential to employ computer techniques. Such computer-aided structure identification from mass spectrometric data has taken two distinct directions (3). The first utilizes "retrieval" systems which compare the unknown data to a library of reference spectra to report compounds with a high degree of similarity. A number of techniques have been employed for the retrieval approach (3). The second approach involves interpretive schemes that attempt to identify part or all of the unknown structure from correlations of mass spectral fragmentation behavior. Pattern recognition (4) and artificial intelligence (5) are examples of such schemes that have been employed for interpreting mass spectral data of specific classes of compounds. We will describe here a retrieval Probability Based Matching (PBM) system (6, 7) and an interpretive Self-Training Interpretive and Retrieval System (STIRS) (8 -11)developed for the analysis of low resolution mass spectra. Both these systems are available on a computer network (TYMNET) from an IBM-370/168 computer system at Cornell University to outside users. Probability Based Matching System It has been shown that to increase the relevancy of information retrieved from a library of data it is essential to attach proper weighting to the contents of the system (12). The PBM system employs a probability weighting to both the mass and abundance data (6, 7) • The abundance values are weighted according to a log normal distribution (13) and the masses are given a uniqueness value based on their occurrence probability 1 Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001

2

COMPUTER-ASSISTED STRUCTURE ELUCIDATION

i n a m a s s s p e c t r a l d a t a b a s e of 1 8 , 8 0 6 different c o m p o u n d s (14). The P B M s y s t e m a l s o u s e s a r e v e r s e s e a r c h s t r a t e g y , i n d e p e n d e n t l y p r o p o s e d b y A b r a m s o n (15), w h i c h i s v a l u a b l e i n i d e n t i f y i n g components of a m i x t u r e . This t e c h n i q u e demands that the p e a k s of the r e f e r e n c e s p e c t r u m be p r e s e n t i n the u n k n o w n , but not that a l l p e a k s of the u n k n o w n be p r e s e n t i n the r e f e r e n c e . The d e g r e e of m a t c h of the r e f e r e n c e to the u n k n o w n i s i n d i c a t e d w i t h a c o n fidence i n d e x K, b a s e d on the s t a t i s t i c a l p r o b a b i l i t y that this degree of m a t c h o c c u r r e d by c o i n c i d e n c e ; d e t a i l s of the method h a v e b e e n d e s c r i b e d e l s e w h e r e (6, 7 ) . A s t a t i s t i c a l e v a l u a t i o n of P B M ' s performance w a s made u s i n g " u n k n o w n " m a s s s p e c t r a , for e a c h of w h i c h at l e a s t one other s p e c t r u m of the same compound w a s present i n the d a t a b a s e . L o w a n d h i g h m o l e c u l a r w e i g h t s e t s , e a c h of ~ 4 0 0 u n k n o w n s p e c t r a r e m o v e d at r a n d o m from the d a t a b a s e , w e r e r u n t h r o u g h the P B M s y s t e m , and the r e s u l t s evaluated u s i n g r e c a l l and r e l i a b i l i t y as measures of performance. R e c a l l (RC) i s d e f i n e d a s the number of r e l e v a n t s p e c t r a a c t u a l l y r e t r i e v e d and r e l i a b i l i t y (RL) i s t h e p r o p o r t i o n o f r e t r i e v e d s p e c t r a w h i c h a r e a c t u a l l y relevant. In a d d i t i o n to t h e s e terms i t i s d e s i r a b l e to e x p r e s s the performance of automated s y s t e m s i n terms of f a l s e p o s i t i v e s (FP), the p r o p o r t i o n of s p e c t r a p r e d i c t e d i n c o r r e c t l y (16). R

C

=

:

c

/

V (

I

c

P

(2)

V

+

FP = I / P f

(1)

c

(3)

f

where I = number of c o r r e c t p r e d i c t i o n s , P = t o t a l p o s s i b l e n u m b e r o f c o r r e c t p r e d i c t i o n s , If = n u m b e r o f f a l s e p r e d i c t i o n s , and P = t o t a l p o s s i b l e number of f a l s e p r e d i c t i o n s . At the 50% r e c a l l l e v e l the r e l i a b i l i t i e s for the l o w and h i g h m o l e c u l a r w e i g h t sets w e r e 65% and 4 2 % , counting as correct only predicted s t r u c tures w h i c h are i d e n t i c a l to the u n k n o w n . I n v a r i a b l y r e t r i e v a l s y s t e m s p r e d i c t s i m i l a r s t r u c t u r e s i n a d d i t i o n to the i d e n t i c a l s t r u c t u r e . In the e v a l u a t i o n of P B M r e s u l t s four c l a s s e s of s i m i l a r i t y w e r e d e f i n e d : I, i d e n t i c a l c o m p o u n d or s t e r e o i s o m e r ; I I , c l a s s I o r a r i n g p o s i t i o n i s o m e r ; I I I , c l a s s II o r a h o m o l o g ; I V , c l a s s III o r a n i s o m e r o f c l a s s III c o m p o u n d f o r m e d b y m o v i n g o n l y o n e c a r b o n a t o m . It w a s f o u n d t h a t w h e n c l a s s IV t y p e c o m pounds were a c c e p t e d as correct predictions the r e l i a b i l i t y of the s y s t e m i n c r e a s e d to 95% at the same r e c a l l l e v e l . R e c e n t l y , i t has b e e n found that the performance of P B M for the i d e n t i f i c a t i o n of c o m p o n e n t s i n a m i x t u r e c a n be e n h a n c e d (17) b y i n c o r p o r a t i n g a s p e c t r u m s u b t r a c t i o n p r o c e d u r e s i m i l a r t o the one p r o p o s e d b y H i t e s a n d B i e m a n n (18). The method subtracts the reference compound matched by P B M w i t h the h i g h e s t c o n f i d e n c e i n d e x (or a n y o t h e r i n t h e l i s t o f p r e d i c t e d s p e c t r a ) f r o m t h e unknown spectrum and matches the r e s i d u a l peaks against the c

c

f

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

VENKATARAGHAVAN E T A L .

Structure

of Unknown

Mass Spectra

3

reference f i l e b y P B M . This operation i s p a r t i c u l a r l y v a l u a b l e for identifying a minor component m i s s e d by the reverse search proc e d u r e w h e n there i s s u b s t a n t i a l o v e r l a p i n the s p e c t r a of the major and minor c o m p o n e n t , or w h e n amount of the latter f a l l s o u t s i d e the l i m i t s set for " p e r c e n t c o m p o n e n t " or " p e r c e n t c o n tamination" #

Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001

Self-Training Interpretive and Retrieval

System

The STIRS s y s t e m i s a n i n t e r p r e t i v e s c h e m e t h a t t r a i n s i t s e l f for the i d e n t i f i c a t i o n of different s t r u c t u r a l features i n a n u n k n o w n b y u t i l i z i n g s p e c i f i c c l a s s e s o f m a s s s p e c t r a l d a t a (8). Table I shows the fifteen data c l a s s e s u s e d ; although these have b e e n s e l e c t e d for t h e i r s t r u c t u r a l s i g n i f i c a n c e , there are no p r e d e s i g n a t e d c o r r e l a t i o n s of s p e c i f i c s p e c t r a l d a t a w i t h c o r r e s ponding s t r u c t u r e s . For e a c h unknown spectrum the system matches its data i n each c l a s s against the corresponding c l a s s d a t a of a l l r e f e r e n c e s p e c t r a a n d c o m p u t e s a m a t c h f a c t o r (MF) i n d i c a t i n g the degree of s i m i l a r i t y . In e a c h data c l a s s the fifteen r e f e r e n c e c o m p o u n d s o f h i g h e s t M F v a l u e s a r e s a v e d . If a p a r t i c u l a r substructure(s) i s found i n a s i g n i f i c a n t p r o p o r t i o n of t h e s e c o m p o u n d s , its p r e s e n c e i n the u n k n o w n i s p r o b a b l e . A b s e n c e of a s u b s t r u c t u r e i s not p r e d i c t e d , as the m a s s s p e c t r a l features of one s u b s t r u c t u r e c a n be made n e g l i g i b l e by the p r e s e n c e of a more p o w e r f u l f r a g m e n t a t i o n - d i r e c t i n g g r o u p . The d a t a b a s e for the s y s t e m i n c l u d e s i n f o r m a t i o n from 2 9 , 4 6 8 different o r g a n i c compounds containing the common elements H , C , N , O, F, S i , P, S, C I , B r , a n d / o r I. A l l s t r u c t u r e s of t h e s e c o m p o u n d s h a v e b e e n c o d e d i n W i s w e s s e r L i n e N o t a t i o n (WLN) to f a c i l i t a t e c o m puter h a n d l i n g of s t r u c t u r e d a t a . To u t i l i z e t h e i n f o r m a t i o n p r o v i d e d b y t h e STIRS s y s t e m , the r e s u l t s for e a c h d a t a c l a s s are e x a m i n e d and the common s t r u c t u r a l features i d e n t i f i e d . To a i d t h i s p r o c e s s , i n a r e c e n t l y i m p l e m e n t e d s y s t e m (9), t h e c o m p u t e r e x a m i n e s t h e d a t a for t h e p r e s e n c e of 179 f r e q u e n t l y f o u n d s u b s t r u c t u r e s (19). The p r o b a b i l i t y for the p r e s e n c e i n the u n k n o w n of e a c h s u b s t r u c t u r e i s predicted u s i n g a random drawing m o d e l . Knowing the frequency of o c c u r r e n c e of a s p e c i f i c s u b s t r u c t u r e i n the f i l e , t h i s m e t h o d i n d i c a t e s the p r o b a b i l i t y that the p r e d i c t i o n of its p r e s e n c e i n the u n k n o w n o c c u r r e d at r a n d o m . From t h i s p r o b a b i l i t y the c o n f i d e n c e for e a c h p r e d i c t i o n i s c a l c u l a t e d . For e x a m p l e , i n the STIRS d a t a b a s e t h e p h e n y l s u b s t r u c t u r e i s f o u n d to b e p r e s e n t i n 28% of the c o m p o u n d s . S t a t i s t i c a l l y on the a v e r a g e t h i s s u b s t r u c t u r e w o u l d o c c u r i n 4 o f a n y 15 c o m p o u n d s i n t h e d a t a b a s e , i n c l u d i n g t h e t o p 15 c o m p o u n d s s e l e c t e d i n a S T I R S d a t a c l a s s . O n t h e o t h e r h a n d i f p h e n y l i s f o u n d i n 10 o f t h e 15 c o m p o u n d s , the probability that this occurred by chance i s only 1 i n 113, so that the confidence i n the p h e n y l p r e d i c t i o n i s >99%, or a f a l s e p o s i t i v e s v a l u e of < 1 % .

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

4

COMPUTER-ASSISTED STRUCTURE ELUCIDATION

Table I.

M a s s S p e c t r a l D a t a C l a s s e s U s e d i n STIRS

Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001

Data Class

D e s c r i p t i o n , maximum number of p e a k s

I

Ion Series

(14 a m u s e p a r a t i o n )

2-4

Characteristic ions

250)

5C

Five

16-20, 30-38, 44-51, 59-65,72-76

6C

Five

26-28, 39-42, 52-56, 62-70, 80-84

7, 8

II

S e c o n d a r y n e u t r a l l o s s e s from most abundant o d d - m a s s (MF7) and e v e n - m a s s (MF8) l o s s

175)

Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001

1.

VENKATARAGHAVAN E T A L .

Structure

of Unknown

Mass Spectra

5

The s y s t e m h a s b e e n e x t e n s i v e l y t e s t e d for e a c h of the 179 s u b s t r u c t u r e s b y s e l e c t i n g 373 c o m p o u n d s a t r a n d o m from t h e d a t a b a s e (every 50th compound i n the Registry data) (20). If the d a t a s e t d i d n o t c o n t a i n a t l e a s t 30 c o m p o u n d s w i t h a p a r t i c u l a r s u b s t r u c t u r e , the required a d d i t i o n a l compounds were s e l e c t e d at r a n d o m t h a t c o n t a i n e d t h e s u b s t r u c t u r e . I f f e w e r t h a n 30 c o m pounds w i t h a g i v e n substructure w e r e a v a i l a b l e , a l l of them were selected. System performance i n each data c l a s s was evaluated by computing r e c a l l and r e l i a b i l i t y terms for e a c h s u b s t r u c t u r e . In contrast to equation 2 , the r e l i a b i l i t y term i n this c a s e i n c l u d e d a f a l s e p o s i t i v e factor, b e i n g set equal to RC/(RC + FP), s u c h that the v a l u e s reflect the s y s t e m performance averaged for c o m pounds c o n t a i n i n g and not c o n t a i n i n g the s u b s t r u c t u r e . This r e l i a b i l i t y term l e d to s u b s t a n t i a l c o n f u s i o n , s o that w e n o w f e e l that i t i s better to report performance of the s y s t e m i n terms of r e c a l l and f a l s e p o s i t i v e s (16), as d i s c u s s e d for P B M (equations 1 and 3). A n a l y s i s of the d a t a s h o w s that a l t h o u g h i n d i v i d u a l d a t a c l a s s e s are g o o d for s p e c i f i c s u b s t r u c t u r e i d e n t i f i c a t i o n , the b e s t p e r f o r m a n c e i s f o u n d i n t h e o v e r a l l m a t c h f a c t o r ( T a b l e I) r e s u l t s . T h i s i s due to the fact t h a t the o v e r a l l m a t c h factor d a t a c o m b i n e s the i n f o r m a t i o n d e r i v e d from the i n d i v i d u a l d a t a c l a s s e s . The overall match factor, M F 1 1 . 0 , w h i c h combines ion series, c h a r a c t e r i s t i c i o n s , and neutral l o s s data has been found to g i v e the most r e l i a b l e information on the different substructure p o s s i b i l i t i e s i n a n u n k n o w n c o m p o u n d . F o r t h e 179 s u b s t r u c t u r e s t e s t e d , t h e M F 1 1 . 0 g a v e a r e c a l l of 4 9 % a t 1.9% f a l s e p o s i t i v e l e v e l . A number of improvements have b e e n made to the c h a r a c t e r i s t i c i o n d a t a c l a s s e s (10) a n d t h e p r i m a r y n e u t r a l l o s s e s ( 1 1 ) ; the o v e r a l l m a t c h factors M F 1 1 . 1 and M F 1 1 . 2 h a v e b e e n found to g i v e a n a v e r a g e r e c a l l of 47% and 3 2 . 1 % , r e s p e c t i v e l y , at-H0-C H -C H N 0 R00C-C H ~C0-0-*-C H RC0NH-C H -C0-0-CH 6

4

e

2

4

3

4

6

3

2

6

3

2

e

7

e

6

3

4

e

4

e

3

3

2

3

4

3

3

4

3

3

6

3

4

3

4

2

5

4

4

4

e

4

6

2

4

e

4

e

4

6

3

4

11 10 9 10 20

p-H0-C H -C0—0-C H p-H0-C H -C0-0-fl-C Hg />~H0-C H -C0—0-CH m—H0-C H -C0-0-CH tf-H0-C H -C0-CH /fj-H0-C H ~C0-e0-CH )-R m-H0-C H -C0-CH p -H0-C«H -C0-CH /n-H0-C«H -*C0-0H)C0R 7?-R-0—C H -CO-OH 6

4

4

73+ 72+ 72+ 71+ 61**+

Data class 3B:/»/* 89-158

7

5

4

6

37 46 45 50

% Component 53% 64% 53% 38%

24% 26% 29% 30%

spectrum subtracted, residual spectrum run o n

Isopropylbenzene Isopropylbenzene Isopropylbenzene I s opropy lb en zen e 1 -Methyl-2 -ethylbenzene

50-

9

Mass Spectra

P B M R e s u l t s o n U n k n o w n a n d R e s i d u a l Spectra from

Compound

lOO-i

of Unknown

3

s

7

6

34% 34% 34% 43% 36%

77% 91% 83% 72% 74%

Oata class 5: losses of 0-64 C4H3O-CO—0— n — CjH C H 0-C0-0-5 — C H C H NH-CO-O - * - C H R0C H - CO - 0 - n — C H HSCH -C0-0-/» - C H C H -C0-0-5-C3H CH -C0—0-CH NR CH — S-n—C H C0-0-CH 4

6

3

5

6

4

7

S

7

s

2

6

S

5

3

2

s

3

4

3

Neutral Losses STIRS

results for the mass spectrum

of n-propyl

7

2

Data class 5 "(neutral losses)"

1.

7

7

m/e

Figure

7

3

p-hydroxybenzoate

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7

Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001

10

COMPUTER-ASSISTED STRUCTURE

ELUCIDATION

w h e n a r e f e r e n c e s p e c t r u m of t h i s c o m p o u n d , n - p r o p y l jD-hydroxyb e n z o a t e , w a s not i n the data b a s e . R e s u l t s f o r t h r e e o f t h e 15 d a t a c l a s s e s i l l u s t r a t e t h e " s e l f - t r a i n i n g " feature b y w h i c h STIRS i n d i c a t e s structural features of the u n k n o w n . D a t a c l a s s 2 A u t i l i z e s the l a r g e s t p e a k s i n t h e l o w m a s s r e g i o n of t h e s p e c t r u m (m/e 6 - 8 8 ) ; t h e s e f r a g m e n t i o n s a r e m o r e o f t e n f o r m e d b y s e c o n d ary r e a c t i o n s of h i g h e r energy r e q u i r e m e n t s , and so are i n d i c a t i v e of g r o s s , rather t h a n s p e c i f i c , s t r u c t u r a l f e a t u r e s . Thus a l l of the s p e c t r a found of h i g h e s t M F 2 A v a l u e s c o n t a i n e d a p h e n y l group, although the phenyl rings i n these compounds c o n t a i n a rather w i d e v a r i e t y of s u b s t i t u e n t s . The e x p e r i e n c e d m a s s s p e c t r o m e t r i s t p r o b a b l y w o u l d h a v e i n f e r r e d the p r e s e n c e of p h e n y l f r o m t h e " a r o m a t i c i o n s e r i e s " i n t h i s r e g i o n ; h o w e v e r , STIRS w a s not t r a i n e d s p e c i f i c a l l y to r e c o g n i z e t h e s e f e a t u r e s , but i n s t e a d i n d i c a t e d the p r e s e n c e of p h e n y l by f i n d i n g that s u c h c o m p o u n d s matched these data the most c l o s e l y . D a t a c l a s s 3B c o v e r s a h i g h e r m a s s r a n g e , w h o s e f r a g ment p e a k s s h o u l d be i n d i c a t i v e of more s p e c i f i c s t r u c t u r a l features. A g a i n a l l c o m p o u n d s of h i g h e s t M F 3 B v a l u e s c o n t a i n the p h e n y l g r o u p , but a l m o s t a l l of t h e m a l s o c o n t a i n a n a r y l h y d r o x y g r o u p (not ortho) a n d a c a r b o n y l . N o t e t h a t t h e l a t t e r i s contained i n carboxyl, ester, and keto functionalities; because STIRS i s d e s i g n e d t o p r o v i d e p o s i t i v e i n f o r m a t i o n , d a t a c l a s s 3B thus i n d i c a t e s the p r e s e n c e of H O - p h e n y l - C O - . D a t a c l a s s 5 employs " n e u t r a l l o s s " information, the differences i n mass b e t w e e n the observed fragment i o n and the m o l e c u l a r i o n , w h i c h i n t h i s c a s e i s a s s u m e d to b e m/e 1 8 0 . C l e a v a g e of the m o l e c u l a r i o n g i v e s two fragments, o n l y one of w h i c h holds the p o s i t i v e c h a r g e , and thus the n e u t r a l l o s t g e n e r a l l y c o n t a i n s the more e l e c t r o n e g a t i v e f u n c t i o n a l i t i e s . Illustrating t h i s , w h e n the m a s s e s r e p r e s e n t i n g the most common neutral l o s s e s of t h i s u n k n o w n w e r e matched a g a i n s t the w h o l e r e f e r e n c e f i l e , the h i g h e s t M F 5 v a l u e s w e r e found to be m a i n l y p r o p y l esters. To r e i t e r a t e , STIRS w a s not p r e p r o g r a m m e d to r e c o g n i z e p r o p y l e s t e r s f r o m t h e i r c o m m o n l o s s e s o f 4 1 , 4 2 , a n d 59 m a s s u n i t s ; STIRS i n e f f e c t t r a i n s i t s e l f to r e c o g n i z e t h e p r o p y l e s t e r f u n c t i o n a l i t y by f i n d i n g that t h e s e d a t a of the u n k n o w n w e r e matched best by propyl esters i n the f i l e . Note a l s o that the c o m pounds found by M F 5 d i d not c o n t a i n a p a r t i c u l a r l y s i g n i f i c a n t n u m b e r of p h e n y l g r o u p s ; t h e d i f f e r e n t d a t a c l a s s e s of STIRS h a v e been s e l e c t e d to be s e n s i t i v e to different f u n c t i o n a l i t i e s . STIRS h a s b e e n d e s i g n e d a s a n a i d to t h e i n t e r p r e t e r ; i f t h e i n t e r p r e t e r n o w a d d s up t h e m a s s of d i - s u b s t i t u t e d p h e n y l (76), h y d r o x y l (17), a n d p r o p y l e s t e r (87), he c a n n o t e t h a t t h e s u m c o r r e s p o n d s to the s u p p o s e d m o l e c u l a r w e i g h t , 1 8 0 , i n d i c a t i n g that a l l of the f u n c t i o n a l i t i e s of the u n k n o w n m o l e c u l e h a v e been i d e n t i f i e d by these three d a t a c l a s s e s of STIRS. STIRS: U n k n o w n Terpene. The m a s s s p e c t r u m of l2p-acetoxysandaracopimar-15-en-8p-, l l a - d i o l was examined

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

by

Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001

1.

VENKATARAGHAVAN E T A L .

Structure

of Unknown

Mass

Spectra

11

STIRS, o m i t t i n g a l l s p e c t r a of t h i s c o m p o u n d from the r e f e r e n c e file. The n i n e s t r u c t u r e s of h i g h e s t " o v e r a l l m a t c h f a c t o r " (MF11.0) v a l u e s are shown i n Figure 2. If t h e i d e n t i t y o f t h i s m o l e c u l e h a d b e e n t o t a l l y u n k n o w n to the i n t e r p r e t e r , t h e s e M F 1 1 . 0 s e l e c t i o n s s h o u l d h a v e i n d i c a t e d at l e a s t the g e n e r a l s t r u c t u r a l features of the m o l e c u l e to the interpreter. Thus a l l of the c o m p o u n d s of F i g u r e 2 h a v e e i t h e r three or four fused rings and a l l have the three f u s e d s i x membered r i n g s t h a t are a c t u a l l y p r e s e n t i n the u n k n o w n . The four t r i c y c l i c compounds c l o s e l y r e s e m b l e the correct structure i n h a v i n g m e t h y l g r o u p s i n t h e 4 , 4 , 1 0 , a n d 13 p o s i t i o n s , hydroxy at 8, and v i n y l at 1 3 . N o t e that three of the steroids contain a 5-hydroxy group, w h i c h c a n be v i e w e d as corresponding to the correct 8-hydroxy p o s i t i o n by " f l i p p i n g " the s t r u c t u r e s , w i t h their a c e t o x y groups then at l e a s t present i n the r i n g c o r r e s p o n d i n g to t h e r i n g c o n t a i n i n g t h e a c e t o x y group i n t h e u n k n o w n . The p r e s e n c e of h y d r o x y l a n d a c e t o x y g r o u p s are i n d i c a t e d b y the f a c t that e i g h t of the n i n e c o m p o u n d s c o n t a i n h y d r o x y l s a n d s e v e n c o n t a i n a c e t o x y g r o u p s ; o n l y t w o c o n t a i n more t h a n one h y d r o x y l g r o u p , w h i l e none c o n t a i n more t h a n one a c e t o x y . H o w e v e r , the compound does g i v e a m o l e c u l a r i o n , so that it s h o u l d be p o s s i b l e for the i n t e r p r e t e r to i n f e r c o r r e c t l y t h a t the u n k n o w n c o n t a i n s one acetoxy and two h y d r o x y l groups after d e d u c i n g the t r i c y c l i c s y s tem w i t h the other s u b s t i t u e n t s . A l s o , the steroid s e l e c t e d as the seventh compound has a 4 - g e m - d i m e t h y l group. For this u n k n o w n t h u s STIRS c a n g i v e f a i r c o n f i d e n c e i n a l l of t h e s t r u c t u r e a s s i g n m e n t s e x c e p t the p o s i t i o n of the a c e t o x y a n d one of the h y d r o x y g r o u p s ; there i s e v e n s o m e i n d i c a t i o n of t h e i r p o s i t i o n s , as i n the majority of s e l e c t e d s t r u c t u r e s of F i g u r e 2 t h e s e s u b stituents are o n the exterior r i n g b e a r i n g the bridgehead h y d r o x y l . P B M / S T I R S E x a m i n a t i o n of U n k n o w n S p e c t r a of F a t t y A c i d E s t e r s . In a n e a r l y c l a s s i c c a s e of n a t u r a l p r o d u c t s t r u c t u r e d e t e r m i n a t i o n b y m a s s s p e c t r o m e t r y (25) a c o m p o u n d i s o l a t e d a s the m e t h y l e s t e r from butterfat w a s i d e n t i f i e d to be m e t h y l 3 , 7 , 1 1 , 1 5 - t e t r a m e t h y l h e x a d e c a n o i c a c i d . T h e o r i g i n a l p u b l i s h e d (25) s p e c t r u m (omitted from the r e f e r e n c e file) w a s r u n t h r o u g h P B M a n d STIRS to g i v e the r e s u l t s s h o w n i n T a b l e I V . P B M c o r r e c t l y i d e n t i f i e d the compound as methyl p h y t a n o a t e , r e t r i e v i n g the two r e f e r e n c e s p e c t r a of t h i s compound i n the P B M reference file; note that the third s e l e c t i o n is a much p o o r e r m a t c h . The s u b s t r u c t u r e s i d e n t i f i e d b y STIRS M F 1 1 . 0 a n d 1 1 . 1 are c o r r e c t , although the a c e t a t e substructure i n d i c a t e d by M F 1 1 . 2 is not (Table IV). The b e s t - m a t c h i n g c o m p o u n d s found b y STIRS M F 1 1 . 0 a r e a l l m e t h y l e s t e r s of l o n g - c h a i n f a t t y a c i d s , and a l l but one has a methyl group i n the three p o s i t i o n . The p o s i t i o n s of the other m e t h y l groups w e r e not found b y STIRS, c o n s i s t e n t w i t h the rather s m a l l effect of s u c h methyl groups on the mass s p e c t r a .

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

OAc

STRUCTURE

ELUCIDATION

12/3 — Acetoxysandaracopimar— l 5 - e n - 8 / 3 , Ha — diol

Downloaded by 80.82.77.83 on May 25, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch001

(spectrum

MF

Figure

II.0

Best

2. Best-matching STIRS examination

not in file)

Matches:

compounds and their MF11.0 values found in the of 12j3-acetoxysandaracopimar-15-en-8p,li