Computer-Assisted Structure Elucidation

elucidation (1-8) that led us to the computer and the ... status of the structure problem at any given .... minimum number of such rings, 0 for maximu...
1 downloads 0 Views 1MB Size
7 Interactive Structure Elucidation C. A. SHELLEY, H . B. WOODRUFF, C. R. SNELLING, and M . E. MUNK

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

Department of Chemistry, Arizona State University, Tempe, AZ 85281

The t o p i c of computer-assisted s t r u c t u r e e l u c i d a tion cuts across disciplinary boundaries as w e l l as the traditional boundaries w i t h i n the discipline of chemistry itself. As a result, scientists with v a r i e d backgrounds and i n t e r e s t s have been a t t r a c t e d t o it. We entered the area through the door marked "natural products chemists." I t was our own work in s t r u c t u r e e l u c i d a t i o n (1-8) that l e d us t o the computer and the belief that the process as p r a c t i c e d by the n a t u r a l products chemist is amenable t o computer modeling. It is q u i t e evident that our efforts bear the imprint of our background. While it is true that no two n a t u r a l products chemists p r a c t i c e the science and art of s t r u c t u r e e l u c i d a t i o n in e x a c t l y the same way, c e r t a i n common features may be discerned (Figure 1). Three i n t e g r a l components of the process are: 1. the reduction of chemical and p h y s i c a l data to t h e i r s t r u c t u r a l i m p l i c a t i o n s ; 2. the familiar partial s t r u c t u r e , an expression comprised of known s t r u c t u r a l fragments and unaccounted-for atoms that summarizes the status of the s t r u c t u r e problem a t any given stage; 3. the design of new experiments, guided by the p a r t i a l s t r u c t u r e , o r some o r a l l of the mol e c u l a r s t r u c t u r e s compatible with i t . The f i n a l s o l u t i o n o f the problem may be described as the c y c l i c process that leads t o the reduction of the number o f s t r u c t u r a l fragments and atoms i n the part i a l s t r u c t u r e t o one. D e s c r i p t i o n of CASE In developing a computer model of the s t r u c t u r e 92

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

7.

SHELLEY E T A L .

Interactive

Structure

93

Elucidation

e l u c i d a t i o n p r o c e s s as we e n v i s i o n e d i t , we f i r s t f o c u s s e d o u r a t t e n t i o n on two o f i t s major components: 1. t h e e x p a n s i o n o f a p a r t i a l s t r u c t u r e t o a l l m o l e c u l a r s t r u c t u r e s c o n s i s t e n t w i t h i t and any o t h e r i n f o r m a t i o n a v a i l a b l e t o t h e chemi s t , and 2. t h e r e d u c t i o n o f c h e m i c a l and s p e c t r o s c o p i c data to t h e i r s t r u c t u r a l i m p l i c a t i o n s . The c u r r e n t s t a t u s o f CASE, o u r acronym f o r c o m p u t e r - a s s i s t e d s t r u c t u r e e l u c i d a t i o n , i s summarized i n F i g u r e 2. CASE i s a network o f computer programs d e s i g n e d t o a c c e l e r a t e and make more r e l i a b l e t h e e n t i r e p r o c e s s o f s t r u c t u r e e l u c i d a t i o n . The system i s h i g h l y i n t e r a c t i v e and i s c o n t i n u a l l y e v o l v i n g . The t a s k o f r e d u c i n g c h e m i c a l and s p e c t r o s c o p i c d a t a t o s t r u c t u r a l i n f o r m a t i o n i s p r e s e n t l y s h a r e d by t h e c h e m i s t and t h e computer. An i n f r a r e d i n t e r p r e t e r designed s p e c i f i c a l l y f o r a p p l i c a t i o n to multifunct i o n a l i z e d m o l e c u l e s i s a t an advanced s t a g e o f development and f u l l y o p e r a t i o n a l interp r e t e r i s t o be o f v a l u e t o t h e n a t u r a l p r o d u c t s chemist i n s o l v i n g a c t u a l s t r u c t u r e e l u c i d a t i o n problems, s e v e r a l c r i t e r i a must be met. The program must be a b l e t o make d e c i s i o n s c o n c e r n i n g t h e p r e s e n c e o r absence o f a l a r g e number o f f u n c t i o n a l groups. Iti s i n s u f f i c i e n t s i m p l y t o d i s t i n g u i s h e s t e r s from nonesters. R a t h e r one s h o u l d be a b l e t o make a more s p e c i f i c d i s t i n c t i o n (e.g., s a t u r a t e d e s t e r s v s . unsaturated esters vs. lactones). In a d d i t i o n , t h e program must be a b l e t o i n t e r p r e t t h e r e l a t i v e l y complex s p e c t r a o f compounds found i n n a t u r e . Our program t e s t s f o r t h e p r e s e n c e o r absence o f 169 chemical f u n c t i o n a l i t i e s . I t has been t e s t e d on o v e r 500 s p e c t r a o f v a r y i n g c o m p l e x i t y w i t h a h i g h degree of success. I t i s an a r t i f i c i a l i n t e l l i g e n c e program t h a t attempts t o p a r a l l e l t h e c h e m i s t ' s r e a s o n i n g i n i n t e r p r e t i n g an i n f r a r e d spectrum as much as p o s s i b l e . The c h e m i s t uses an e m p i r i c a l approach t o i n t e r p r e t infrared spectra. A set of guidelines f o r i n t e r p r e t i n g i n f r a r e d s p e c t r a i s determined. These g u i d e l i n e s may r e s u l t from o b s e r v a t i o n o f a s u f f i c i e n t number o f s p e c t r a and/or from r e a d i n g t e x t b o o k s and l e a r n i n g from t h e o b s e r v a t i o n s o f o t h e r s . An i n i t i a l set o f g u i d e l i n e s f o r i d e n t i f y i n g saturated c a r b o x y l i c a c i d s might be t o l o o k f o r a b r o a d , medium t o s t r o n g peak c e n t e r e d around 3000 c m l , a s t r o n g c a r b o n y l peak near 1715 c m l , and a b r o a d , medium i n t e n s i t y peak i n the v i c i n i t y o f 920 cm" . Any compounds w i t h s p e c t r a which f o l l o w e d t h e s e g u i d e l i n e s would be i n t e r p r e t e d I

f

t

n

e

_

_

1

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

94

COMPUTER-ASSISTED

CHEMICAL AND SPECTROSCOPIC DATA

ELUCIDATION

MOLECULAR FORMULA

PARTIAL STRUCTURE

CHEMISTI

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

STRUCTURE

COMPATIBLE STRUCTURES

—>

UNIQUE STRUCTURE

EXPERIMENTAL DESIGN AND EXECUTION

Figure

1.

"Manual"

structure

elucidation

MOLECULAR FORMULA

COMPATIBLE

STRUCTURES

CHEMICAL DATA f AUTOMATED INTERPRETER/

Figure

2.

CASE

network

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

TRUNCATED LIST OF STRUCTURES

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

7.

SHELLEY

ET

AL.

Interactive

Structure

Elucidation

95

as c o n t a i n i n g a c a r b o x y l i c a c i d f u n c t i o n a l i t y . Simil a r i n i t i a l s e t s o f g u i d e l i n e s must be e s t a b l i s h e d f o r a l l other f u n c t i o n a l i t i e s . Next the program i s t e s t e d on some i n f r a r e d s p e c t r a . Anytime the program makes an e r r o n e o u s i n t e r p r e t a t i o n , i t i s n e c e s s a r y t o a l t e r the g u i d e l i n e s t o c o r r e c t the m i s t a k e . As l o n g as a c h e m i s t can do a b e t t e r j o b o f i n t e r p r e t i n g a spectrum than the program, then the program can be a l t e r e d by u s i n g t h i s new i n f o r m a t i o n so t h a t i t w i l l do a b e t t e r j o b o f i n t e r p r e t a t i o n i n the f u t u r e . T h i s a b i l i t y t o a l t e r the program t o c o r r e c t m i s t a k e s i s a major advantage o f a r t i f i c i a l i n t e l l i g e n c e programming. As w i t h o t h e r components o f CASE, the i n f r a r e d i n t e r preter i s continually evolving. The c o m p u t e r - d e r i v e d a n a l y s i s o f t h e i n f r a r e d spectrum can be reviewed by the c h e m i s t p r i o r t o a u t o m a t i c e n c o d i n g , or t h e c h e m i s t can be bypassed, w i t h t h e a u t o m a t i c a l l y encoded f u n c t i o n a l group i n f o r m a t i o n b e i n g s e n t d i r e c t l y t o the m o l e c u l e assembler. The development o f programs f o r the automated i n t e r p r e t a t i o n of other spectroscopic information i s a t an e a r l i e r s t a g e . A preliminary investigation u s i n g p a t t e r n r e c o g n i t i o n t e c h n i q u e s t o a i d i n the i n t e r p r e t a t i o n o f 13c-NMR s p e c t r a has y i e l d e d promising results. A t the p r e s e n t t i m e , much o f the r e maining i n t e r p r e t a t i o n of s p e c t r a l data i s chemistderived. CASE a l s o i n c o r p o r a t e s programs f o r the automated i n t e r p r e t a t i o n of chemical information. As an example, CASE a c c e p t s t h e number o f moles o f p e r i o d a t e consumed by a compound and uses the i n f o r m a t i o n t o c o n s t r a i n t h e m o l e c u l e assembler. Thus, o n l y m o l e c u l e s c o n s i s t e n t w i t h the p e r i o d a t e i n f o r m a t i o n a r e assembled. The m o l e c u l e assembler a c c e p t s b o t h the computerd e r i v e d and u s e r - d e r i v e d s t r u c t u r a l i n f o r m a t i o n , and, g i v e n the m o l e c u l a r f o r m u l a , c o n s t r u c t s a l l s t r u c t u r a l isomers c o m p a t i b l e w i t h the i n p u t . A nonredundant l i s t i n g o f the c o m p a t i b l e m o l e c u l e s i s p r e s e n t e d t o the c h e m i s t i n the c o n v e n t i o n a l s t r u c t u r a l language. The l i s t o f c o m p a t i b l e m o l e c u l e s u s u a l l y can be f u r t h e r t r u n c a t e d by comparing c e r t a i n p r e d i c t e d s p e c t r o s c o p i c p r o p e r t i e s f o r each m o l e c u l e c o n s t r u c t e d w i t h the o b s e r v e d s p e c t r o s c o p i c p r o p e r t i e s o f the unknown. The t a s k s o f p r e d i c t i n g , comparing, and r a n k i n g are a s s i g n e d t o the spectrum s i m u l a t o r . These programs a r e a l s o a t an e a r l y s t a g e of development. One o p e r a t i o n a l component c a l l e d PEAK w i l l be i l l u s trated later. G i v e n a m o l e c u l a r s t r u c t u r e , PEAK p r e d i c t s the number o f s i g n a l s e x p e c t e d i n the l^C-NMR spectrum.

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

96

COMPUTER-ASSISTED

STRUCTURE

ELUCIDATION

The m o l e c u l e assembler ( F i g u r e 3) i s unique i n approach and i n i t s s i m p l e s t form was d e s i g n e d t o expand e x h a u s t i v e l y the c o n v e n t i o n a l p a r t i a l s t r u c t u r e i n t o a l l s t r u c t u r a l isomers c o n s i s t e n t w i t h i t . The a l g o r i t h m on which the assembler i s based i s r e c u r s i v e . By c o n c e n t r a t i n g more on p a r t i a l s t r u c t u r e e x p a n s i o n than on m o l e c u l a r f o r m u l a e x p a n s i o n , g r e a t e r compactness and e f f i c i e n c y were a c h i e v e d . From the s t a r t i t was r e c o g n i z e d t h a t a b r o a d l y a p p l i c a b l e m o l e c u l e assembler must be c a p a b l e o f u t i l i z i n g more i n f o r m a t i o n than j u s t the c o n v e n t i o n a l p a r t i a l s t r u c t u r e , because the c h e m i s t g e n e r a l l y has more i n f o r m a t i o n than can be e x p r e s s e d by the conventional p a r t i a l structure. Thus, i n F i g u r e 3 the term " p a r t i a l s t r u c t u r e " i s used i n a b r o a d e r c o n t e x t . The p a r t i a l s t r u c t u r e i n c l u d e s the m o l e c u l a r f o r m u l a and c o m p u t e r - d e r i v e d and/ o r c h e m i s t - d e r i v e d s t r u c t u r a l fragments. Atoms i n t h e s e fragments must not d u p l i c a t e one a n o t h e r ; t h a t i s , t h e fragments must be n o n o v e r l a p p i n g . The p a r t i a l s t r u c t u r e a l s o i n c l u d e s supplementary i n f o r m a t i o n t h a t cannot be e x p r e s s e d i n terms o f n o n o v e r l a p p i n g s t r u c t u r a l fragments. Communication w i t h CASE i s a c h i e v e d by means t h a t mimic the n a t u r a l language o f the c h e m i s t . The molecu l a r f o r m u l a i s i n p u t i n s t a n d a r d format. The v e r s a t i l e l i n e a r code d e s i g n e d f o r s t r u c t u r a l fragment i n p u t i s i l l u s t r a t e d i n F i g u r e 4. Thus, l i n e s 1, 3, 5, 7 and 9 r e p r e s e n t r e s p e c t i v e l y , a 1 - h y d r o x y e t h y l group w i t h a s i n g l e r e s i d u a l v a l e n c e a t the 1 - p o s i t i o n , a c a r b o n y l group w i t h d o u b l y d e f i c i e n t carbon atom, a c a r b o n y l group j o i n e d t o oxygen w i t h v a l e n c e d e f i c i e n c i e s a t carbon and s i n g l e bonded oxygen, a c a r b o x y l group and a c y c l o h e x a n e r i n g w i t h each c a r b o n atom doubly valence d e f i c i e n t . In the l a t t e r example, note t h a t r i n g d e s i g n a t i o n i s a c h i e v e d by l a b e l i n g one carbon atom w i t h a number (1 i n t h i s case) and f o r m i n g a bond between a carbon f i v e atoms removed and t h e label. The l i n e a r code may be f u r t h e r e l a b o r a t e d by a d d i n g atom and fragment tags t o d e s c r i b e t h e l o c a l environment o f atoms i n a s t r u c t u r a l fragment w i t h o u t c o n c e r n f o r o v e r l a p p i n g atoms. F o r example, the v a l e n c e d e f i c i e n t carbon atom o f the 1 - h y d r o x y e t h y l group i s r e q u i r e d t o bond t o a methine c a r b o n by the a d d i t i o n o f t h e atom t a g ( l i n e 2 ) . S i n c e the i n f o r m a t i o n i s p r o v i d e d by means of t h e t a g , the c h e m i s t need not be concerned whether t h a t methine c a r b o n d u p l i c a t e s a s i m i l a r group i n a n o t h e r s t r u c t u r a l fragment.

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SHELLEY E T AL.

Interactive

Structure

97

Elucidation

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

MOLECULAR FORMULA NONREDUNDANT LIST OF COMPATIBLE STRUCTURES

STRUCTURAL FRAGMENTS

SUPPLEMENTARY INFORMATION

Figure

3.

Molecule

assembler

1. CH3-CH-0H

2. CH -CH-0H 3

3. 0=C L

0 = C

MOLECULAR FORMULA

5. 0 = C-0 6. 0 = C-0 7. 0 = C-OH

STRUCTURAL FRAGMENTS

8. 0 = C-OH 9. 1:C-C-C-C-C-C-1 10. 1:C-C-C-C-C-C-1 MULTIPLE BONDS RINGS SUBSTRUCTURE CONTROL AUTOMATED CHEMICAL CONSTRAINTS AUTOMATED SPECTROSCOPIC CONSTRAINTS^

Figure

4.

PARTIAL STRUCTURE

SUPPLEMENTARY INFORMATION

Constraints

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

98

COMPUTER-ASSISTED S T R U C T U R E E L U C I D A T I O N

Statement 4 r e s t r i c t s the environment o f t h e c a r b o n y l c a r b o n atom w i t h t h r e e atom t a g s : p r o h i b i t s i t s p r e s e n c e as p a r t o f a 3-5 membered r i n g , i . e . , t h e c a r b o n y l group i s u n s t r a i n e d (R f o r r i n g , 03 f o r minimum r i n g s i z e , 05 f o r maximum r i n g s i z e , 0 f o r minimum number o f s u c h r i n g s , 0 f o r maximum number o f such r i n g s ) , and and r e q u i r e t h e c a r b o n y l t o be f l a n k e d by a methine and a methylene group, respectively. Statement 6 s p e c i f i e s a five-membered lactone w i t h u n s a t u r a t i o n c o n j u g a t e d t o t h e a l c o h o l oxygen. The environment o f t h e c a r b o n y l c a r b o n atom i s d i s c l o s e d by t h r e e atom t a g s : r e q u i r e s t h a t i t j o i n t o c a r b o n , d e s i g n a t e s t h a t i t must be p a r t o f a five-membered r i n g (the absence o f minimum and maximum v a l u e s f o r t h e number o f such groups i n f e r s a minimum o f 1) and p r e c l u d e s t h e p r e s e n c e o f a , 3 u n s a t u r a t i o n t o t h a t c a r b o n atom (= i s t h e double bond symbol, 0 f o r minimum number o f d o u b l e bonds, 0 f o r maximum number). The two atom t a g s f o r the a l c o h o l oxygen r e q u i r e t h a t i t bond t o c a r b o n () and the p r e s e n c e o f a , 3 - u n s a t u r a t i o n (). Statement 8 r e p r e s e n t s a c a r b o x y l i c a c i d f u n c t i o n t h a t cannot b e a r a , 3 - u n s a t u r a t i o n . The symbol i s a fragment t a g t h a t i n s t r u c t s t h e program t o f o r b i d the f o r m a t i o n o f i n t e r n a l bonds i n a fragment. Thus, because o f statement 10, no m o l e c u l a r s t r u c t u r e s cont a i n i n g m u l t i p l e bonds o r b r i d g e s i n t h e c y c l o h e x a n e r i n g w i l l be assembled. Some s t r u c t u r a l i n f o r m a t i o n i s n o t s p e c i f i c t o atoms or fragments and must be t r e a t e d s e p a r a t e l y . Thus, M u l t i p l e Bonds d e s i g n a t e s the number, e i t h e r e x a c t o r a range, and k i n d s o f m u l t i p l e bonds a l l o w e d ; Rings p e r m i t s t h e same c o n t r o l o v e r r i n g s . Substruct u r e C o n t r o l can be used t o r e q u i r e t h e p r e s e n c e o r absence o f any s p e c i f i e d s t r u c t u r a l fragment u s i n g t h e same l i n e a r code d e s c r i b e d . Automated C h e m i c a l and S p e c t r o s c o p i c C o n s t r a i n t s i n s t r u c t the m o l e c u l e assemb l e r to generate only those s t r u c t u r e s c o n s i s t e n t with the a p p l i e d c o n s t r a i n t s . F o r example, the a p p l i e d c o n s t r a i n t may be t h e number o f moles o f p e r i o d a t e t h a t an unknown compound consumes o r the number o f s i g n a l s i n t h e -^C-NMR spectrum o f t h e unknown. These a p p l i c a t i o n s w i l l be i l l u s t r a t e d l a t e r . The i n f o r m a t i o n c o n t a i n e d i n " p a r t i a l s t r u c t u r e " ( F i g u r e 3) i s used i n a way t h a t maximizes t h e e f f i c i e n c y o f the molecule assembler. The g o a l i s t o p r e c l u d e the assembly o f an i n v a l i d m o l e c u l e r a t h e r t h a n r e j e c t i t a f t e r assembly. Thus, i n most c a s e s , atom and fragment t a g s , and supplementary i n f o r m a t i o n

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

7.

SHELLEY E T A L .

Interactive

Structure

Elucidation

99

c o n s t r a i n the assembly p r o c e s s . In o t h e r c a s e s , retrospective searching i s required. The m o l e c u l e assembler i s c o n s t r a i n e d i n o t h e r ways as w e l l . 1. A u s e r - d e f i n e d l i b r a r y o f h i g h l y s t r a i n e d o r c h e m i c a l l y u n s t a b l e m o e i t i e s c o n s t r a i n s the assembly o f m o l e c u l e s c o n t a i n i n g t h e s e s t r u c t u r a l f e a t u r e s , £•£•/ a c y c l o p r o p a n o n e r i n g . 2. The assembly o f d u p l i c a t e s t r u c t u r e s i s minimized. A number o f t e c h n i q u e s a r e i n c o r p o r a t e d i n t o the program t o e l i m i n a t e duplicates prospectively. The most i m p o r t a n t t e c h n i q u e i n v o l v e s the p e r c e p t i o n o f t o p o l o g i c a l symmetry (11) a t each s t e p o f t h e assembly p r o c e s s . In t h i s way, o n l y one member o f a group o f t o p o l o g i c a l l y e q u i v a l e n t v a l e n c e - d e f i c i e n t atoms i s p e r m i t t e d t o i n i t i a t e bond f o r m a t i o n . In s p i t e o f t h e h e u r i s t i c s used, t h e t o t a l e x c l u s i o n o f d u p l i c a t e s cannot always be a s s u r e d . Those d u p l i c a t e s s t i l l formed can be e l i m i n a t e d retrospectively. A newly d e s i g n e d and h i g h l y e f f i c i e n t c a n o n i c a l naming a l g o r i t h m performs t h i s r e m a i n i n g t a s k . The a l g o r i t h m a l s o r e c o g n i z e s resonance forms and i n c l u d e s o n l y one member on t h e l i s t o f v a l i d s t r u c t u r e s . Applications In t h e d i s c u s s i o n o f a p p l i c a t i o n s , our purpose i s t o p r o v i d e an o v e r v i e w o f the system network, i t s i n t e r a c t i v e n a t u r e , and i t s scope. CASE was o r i g i n a l l y d e v e l o p e d on r e a l w o r l d problems under s t u d y i n our own l a b o r a t o r y and i n a c o u p l e o f o t h e r n a t u r a l products l a b o r a t o r i e s . In more r e c e n t y e a r s "simul a t e d " r e a l w o r l d problems, t h a t i s , problems t a k e n from t h e c h e m i c a l l i t e r a t u r e , have a l s o p l a y e d an important r o l e . Such problems, a l t h o u g h a d m i t t e d l y somewhat c o n t r i v e d , p r o v i d e d the n e c e s s a r y b r e a d t h o r depth e x a c t l y when needed i n t h e program development and t e s t i n g . S p e c i f i c k i n d s o f problem s i t u a t i o n s a r e d i f f i c u l t t o produce on demand w i t h slower moving r e a l w o r l d s t r u c t u r e problems. F a u l k n e r e t a l . (_12) r e c e n t l y i s o l a t e d a h a l o g e nated monoterpene from a sea hare and r e p o r t e d t h e s t r u c t u r e (A). The assignment was made i n p a r t on the c h e m i c a l and s p e c t r o s c o p i c d a t a r e p o r t e d i n the paper and i n p a r t by analogy t o a r e l a t e d known compound. The s t r u c t u r a l i n f o r m a t i o n d e r i v e d from the c h e m i c a l and s p e c t r o s c o p i c d a t a i s shown i n the I/O p r i n t o u t o f

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

100

COMPUTER-ASSISTED

STRUCTURE

ELUCIDATION

the i n i t i a l run o f t h e problem ( F i g u r e 5 ) . The mol e c u l a r f o r m u l a i s i n p u t as C 1 0 H 1 3 B 2 X 3 , where X, a s t a n d a r d h a l o g e n symbol, i s CI i n t h i s problem, and B i s u s e r - d e f i n e d as B r . S i n c e B i s n o t a s t a n d a r d mol e c u l a r f o r m u l a symbol, t h e computer asks t h e u s e r t o s p e c i f y i t s v a l e n c e . Next the n o n o v e r l a p p i n g f r a g ments are l i s t e d . In o r d e r they a r e : an a l l y l c h l o r i d e fragment w i t h a l l carbon atoms h a v i n g r e s i d u a l v a l e n c e , a v i n y l bromide u n i t , a bromomethylene u n i t , and two CH groups each a t t a c h e d t o a q u a t e r n a r y carbon. The s t r u c t u r e c o n s t r a i n i n g d e v i c e , S u b s t r u c t u r e C o n t r o l , i s used t o l i m i t t h e number o f CH groups t o t h e known v a l u e o f 2. A t t h i s p o i n t we a r e ready t o s t a r t the m o l e c u l e assembler. S i n c e we have no e s t i m a t e on t h e t o t a l number o f s t r u c t u r a l isomers c o n s i s t e n t w i t h t h i s i n p u t , we s e t some a r b i t r a r y l i m i t on m o l e c u l e assembly, 20 i n t h i s c a s e , t o make c e r t a i n t h a t the problem s t r a t e g y i s i n d e e d sound b e f o r e g i v i n g f r e e r e i n t o t h e m o l e c u l e assembler. The f i r s t 20 s t r u c t u r e s were examined and we found t h a t we c o u l d e a s i l y c o n s t r a i n the m o l e c u l e assembler w i t h 2 a d d i t i o n a l p i e c e s o f i n f o r m a t i o n because we saw a c o n j u g a t e d d i e n e i n some s t r u c t u r e s and a gem-dimethyl group i n some s t r u c t u r e s . One o f the a u t h o r s o f t h e paper, Dr. I r e l a n d , i n d i c a t e d the a v a i l a b i l i t y o f e v i d e n c e , not i n t h e paper, to e x c l u d e a c o n j u g a t e d d i e n e . In a d d i t i o n , the geminal methyl groups s h o u l d have been e x c l u d e d by us based on the p u b l i s h e d PMR i n f o r m a t i o n . These two c o n s t r a i n t s were added by c a l l i n g Substructure C o n t r o l (Figure 6). Sixteen structures were g e n e r a t e d . Of the 16, 8 had geminal c h l o r i n e atoms. I r e l a n d b e l i e v e d t h e s e s t r u c t u r e s t o be unl i k e l y on t h e b a s i s o f i n d i r e c t c h e m i c a l e v i d e n c e . The r e m a i n i n g 8 c a n d i d a t e s c o u l d be pruned t o 4 s t r u c t u r e s on t h e b a s i s o f some o f t h e mass s p e c t r o s c o p i c data. The s t r u c t u r e a s s i g n e d (A) i s one o f the f o u r .

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

3

3

In our own s t r u c t u r e work on the a n t i b i o t i c a c t i n o b o l i n (_5 ) , which i s a compound u n r e l a t e d i n s t r u c t u r a l type t o known a n t i b i o t i c s , an e a r l i e r v e r s i o n o f CASE was used. The s t r u c t u r e study was i n i t i a t e d by an e x a m i n a t i o n o f a d e g r a d a t i o n p r o d u c t ,

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

SHELLEY E T AL.

Interactive

Structure

Elucidation

T I T L E J FAULKNER J0Cv41. v2461 (1976) • E N T E R T H E MAXIMUM NUMBER OF S T R U C T U R E S TU B E G E N E R A T E D J MOLECULAR FORMULA• C 1 0 H 1 3 B 2 X 3 WHAT I S T H E V A L E N C E OF A B ATOM? t 1 FRAGMENT ( S ) t CH-CH-CH--X FRAGMENT(S)• CH«CH-B FRAGMENT(S)i CH2-B F R A G M E N T ( S ) : CH3 CH3 FRAGMENT(S): ? C O N S T R A I N T ( S ) J SUBSTR FRAGMENT(S): CH3 5 MINIMUM• 2 MAXIMUM: 2 CONSTRAINTS) : y COMMAND: G E N E R A T E

20

STRUCTURE'S G E N E R A T E D Figure

5

T I T L E J FAULKNER JOC.41*2461(1976)• M 0 L E C U L A R F 0 R MIJ I... A I C :l. 0 H 1.3 B 2 X 3 WHAT I S T H E V A L E N C E OF A B ATOM? S 1. F R A G M E N T ( S ) : CH=CH-CH-X F R A G M E N T ( S ) : CH=CH~B FRAGMENT(S)t CH2-B F R A G M E N T ( S ) J CH3 C H 3 < C H 0 > FRAGMENT(S ) : ? CONSTRAINT< S ) J SUBSTR FRAGMENT(S)J CH3 t MINIMUM: 2 MAXIMUM: 2 C O N S T R A I N T ( S ) : SUBSTR F R A G M E N T ( S ) : C=C - C=C ? MINIMUM: o MAXIMUM: o C O N S T R A I N T ( S ) : SUBSTR FRAGMENT(S): CH3-C-CH3 v MINIMUM: o MAXIMUM: o CONSTRAINT(S): y COMMAND: G E N E R A T E

1.6 S T R U C T U R E S G E N E R A T E D Figure 6

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

102

COMPUTER-ASSISTED S T R U C T U R E

ELUCIDATION

actinobolamine (B), c o n t a i n i n g 9 o f the o r i g i n a l 13 c a r b o n atoms ( 2_ ) . T h a t problem has been r e r u n on the c u r r e n t v e r s i o n o f CASE. The IR i n t e r p r e t e r program r e p o r t s the p r e s e n c e o f a l c o h o l and ketone w i t h h i g h c o n f i d e n c e l e v e l s , 4 and 3 ( F i g u r e 7 ) . Subclass information, with c o n f i dence l e v e l s o f 2, i s o f i n s u f f i c i e n t c e r t a i n t y f o r use by t h e m o l e c u l e assembler. The same i s t r u e o f o t h e r major c l a s s e s o f f u n c t i o n a l groups l i s t e d . Note t h a t the program c o n s i d e r s o n l y f u n c t i o n a l groups c o n s i s t e n t w i t h the m o l e c u l a r f o r m u l a . In a d d i t i o n , atoms o f f u n c t i o n a l groups r e c e i v i n g a c o n f i d e n c e l e v e l o f 4 a r e a u t o m a t i c a l l y s u b t r a c t e d from the m o l e c u l a r formula and a r e not c o n s i d e r e d f u r t h e r . The I/O p r i n t o u t shown i n F i g u r e 8 i n c l u d e s u n s t r a i n e d ketone c a r b o n y l , i . e . , not p a r t o f a 3-5 membered r i n g , f l a n k e d by a C b e a r i n g f o u r r e a d i l y exchangeable p r o t o n s , a secondary h y d r o x y l group, a secondary amine a t t a c h e d t o two d i f f e r e n t methine c a r b o n s , and a 1 - h y d r o x y e t h y l group. The M u l t i p l e Bond C o n s t r a i n t s DOUBLE and TRIPLE p r e c l u d e the g e n e r a t i o n o f double and t r i p l e bonds. Substructure C o n t r o l r e s t r i c t s assembly o f k e t a l and a m i n a l l i n k a g e s , and a l s o t h e number o f CH groups t o one. The consumption o f two moles o f p e r i o d a t e by a c t i n o bolamine i s i m p o r t a n t s t r u c t u r a l i n f o r m a t i o n , the s i g n i f i c a n c e o f which i s a u t o m a t i c a l l y c o n s i d e r e d by c a l l i n g PERIODATE and g i v i n g t h e molar u p t a k e . This i n f o r m a t i o n c o n s t r a i n s t h e m o l e c u l e assembler i t s e l f ; the s e a r c h f o r c o m p a t i b i l i t y i s not done r e t r o s p e c tively. CASE produced f i v e s t r u c t u r e s c o n s i s t e n t w i t h the a v a i l a b l e e v i d e n c e ( F i g u r e 9 ) . Armed w i t h the a s s u r ance t h a t no v a l i d s t r u c t u r e had been o v e r l o o k e d , an examination of these f i v e s t r u c t u r e s provided i n v a l u a b l e g u i d a n c e i n the d e s i g n o f the minimum number o f experiments t o a s s i g n the s t r u c t u r e o f a c t i n o b o l a m i n e correctly. A l l o f the e x p e r i m e n t s were s p e c t r o s c o p i c i n n a t u r e and l e d t o the c o r r e c t s t r u c t u r e (B). 3

OH

CHOH CH

3

(B)

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

Interactive

SHELLEY E T AL.

Structure

Elucidation

H30 C13H20N206



C9H15N03

AC T I N O B O L I N

ACTINOBOLAMINE

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

IR INTERPRETER OUTPUT

EACH C L A S S HAS A CONFIDENCE L E V E L OF 0-4• 4 •••• D E F I N I T E L Y PRESENT 3 - HIGH P R O B A B I L I T Y 2 - MEDIUM P R O B A B I L I T Y 1 - LOW P R O B A B I L I T Y 0 ™ D E F I N I T E L Y ABSENT

CLASS 1 • ALCOHOL-

SUBCLASSES •--

4

2, PRIMARY 4 |

6. KETONE 8* LACTAM-

~

- 3

••-

2

10 • CARBAMATE -

3-A f B-A ' >B' ~

7. SATURATED

- -

9. 5 MEMBER W/0

2 1 1 . PRIMARY

2 2 2 . SATURATED

2 4 . ETHER 2 7 . ACETAL 2 8 . KETAL

— —



2

5* SEC• I N RING

- 2

NH-~

2

2 16* TERTIARY

17. C=C(NON~AROMATIC) 2 1 8 . CHR=CR2 19. METHYL— --- 2 2 0 . GEM DIMETHYL

2 3 . PYRROLE

•—

1

2 15* SECONDARY

2 1 . NITRO GROUP

3* 2-A9B-

- 2 1 2 . SECONDARY

1 3 , TERTIARY 14* AMINE-

-—

2

- 1. 1 2

2 1 2 5 . SATURATED - 1

1 2 6 . UNSATURATED-

1. Figure

7

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

104

COMPUTER-ASSISTED S T R U C T U R E E L U C I D A T I O N

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

T I T L E: A C T I N 0 B 0 L AMINE MOLECULAR FORMULA * C 9 H 1 5 N 1 0 3 0=C < R 0 3 0 5 0 0 > < c I-1222 > FRAGMENT(S) FRAGMENT(S) 0H NLKCILI 2 2 > FRAGMENT(S) F R A G M E N T ( S ) :CH3-CH--0H FRAGMENT(S)J 9 C O N S T R A I N T ( S ) : DOUBLE CONSTRAINTS) : TRIPLE C O N S T R A I N T S ) : SUBSTR F R A G M E N T ( S ) : O-C-0 9 MINIMUM: 0 MAXIMUM: o JBSTR CONSTRAINT(S) 0~C -N 9 FRAGMENTS) MINIMUM: 0 MAXIMUM: o C O N S T R A I N T S ) : SUBSTR F R A G M E N T S ) : CH3 9 MINIMUM: i MAXIMUM: I PERIODATE CONSTRAINTS) CONSTRAINTS) t 5 COMMAND: G E N E R A T E

Figure

8

5 STRUCTURES

GENERATED

QH

Figure

9.

Program

CASE-draw,

Arizona

State University,

actinobolamine

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

7.

SHELLEY E T

AL.

Interactive

Structure

105

Elucidation

C o r o n a t i n e i s a t o x i n produced by a m i c r o o r g a n i s m of the Pseudomonas genus. I t s s t r u c t u r e was r e p o r t e d e a r l y t h i s y e a r (13). A d e g r a d a t i o n p r o d u c t , c o r o n a f a c i c a c i d , p l a y e d a key r o l e , and a l t h o u g h some chemi c a l and s p e c t r o s c o p i c d a t a and t h e i r s t r u c t u r a l s i g n i f i c a n c e a r e r e p o r t e d i n the paper, the f i n a l d e t e r m i n a t i o n o f the d e g r a d a t i o n p r o d u c t was by x - r a y . We d e c i d e d t o see how c l o s e t o the a c t u a l s t r u c t u r e the r e p o r t e d c h e m i c a l and s p e c t r o s c o p i c i n f o r m a t i o n would have taken the a u t h o r s . The computer i n p u t i s shown i n F i g u r e 10. Coronafacic acid i s C i 2 H i 0 . I t contains a c y c l o p e n t a n o n e r i n g w i t h t h r e e r e a d i l y exchangeable hydrogen atoms, an u n s t r a i n e d a , $ - u n s a t u r a t e d c a r b o x y l i c a c i d moeity and an e t h y l group t h a t i s not p a r t o f a p r o p y l group. There a r e no a d d i t i o n a l m u l t i p l e bonds and o n l y a s i n g l e methyl group. CASE assembled 88 s t r u c t u r e s , t h u s , the c h e m i c a l and s p e c t r o s c o p i c e v i d e n c e b r o u g h t the a u t h o r s t o w i t h i n 88 s t r u c t u r e s o f the c o r r e c t one. One l a s t s i m p l e , but i n f o r m a t i v e example i l l u s t r a t e s one a p p l i c a t i o n o f the spectrum s i m u l a t o r ( F i g u r e 11). The monoterpene c i n e o l e , C i o H i 0 , was r e c e n t l y examined by l^C-NMR. o f f - r e s o n a n c e and broad-band p r o t o n d e c o u p l e d s p e c t r a r e v e a l q u a t e r n a r y carbon b e a r i n g e t h e r oxygen, a t l e a s t one methine carbon, two methylene carbons and two methyl c a r b o n s , and no u n s a t u r a t e d c a r b o n s . The 1 C-NMR e v i d e n c e i s c o m p a t i b l e w i t h 458 s t r u c t u r a l isomers a c c o r d i n g t o CASE. I f PEAK i s c a l l e d ( F i g u r e 12), the number o f CNMR s i g n a l s expected f o r each o f the 458 compounds i s predicted. Those s t r u c t u r e s not c o n f o r m i n g t o the observed number, 7 i n t h i s c a s e , are r e j e c t e d . In t h i s way the l i s t o f 458 s t r u c t u r e s i s pruned t o 38. Of the 38 s t r u c t u r e s , o n l y 5 conform t o the i s o p r e n e r u l e . Peak p r e d i c t i o n i s based on m o l e c u l a r t o p o l o g y , b u t the d e t e r m i n a t i o n o f c l a s s e q u i v a l e n c e i n t h i s case c o n s i d e r s o n l y n e i g h b o r i n g atoms no more than t h r e e bonds removed. S i n c e a p e r f e c t match between p r e d i c t i o n and o b s e r v a t i o n cannot be expected f o r each and e v e r y s t r u c t u r e examined by PEAK, the p r u n i n g s t e p of PEAK can compare the a c t u a l number of observed s i g n a l s t o a range o f p r e d i c t e d v a l u e s , g e n e r a l l y the a c t u a l number p l u s o r minus one. Thus, i f PEAK i s s e t a t 7 w i t h a range o f p l u s o r minus one, the l i s t o f 458 s t r u c t u r e s i s reduced t o 144. Of the 144, o n l y 19 comply w i t h the i s o p r e n e r u l e . 6

3

8

3

1 J

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

computer-assisted structure elucidation

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

T I T L E : CORONAFACIC ACID M O L E C U L A R FORMULA: C 1 2 H 1 A 0 3 FRAGMENT S ) : 1:C(=0)-CH-C~C-CH2-1 FRAGMENT ( S ) •* C H < R 0 3 0 5 0 0 X H 0 0 > = C < H 0 0 > - C < = 0 ) -OH FRAGMENT(S)t CH3-CH2 FRAGMENTS) : $ C O N S T R A I N T S ) : DOUBLE 0 0 C O N S T R A I N T S ) : SUBSTR FRAGMENT(S)* CH3 i MINIMUM: 1 MAXIMUM: i CONSTRAINTS) * 9 COMMAND: G E N E R A T E

8 8 STRUCTURES GENERATED Figure 10

T I T L E : CINEOLE MOLECULAR FORMULA: C1.OH 1 S01 F R A G M E N T S ) : C 0-C F R A G M E N T ( S ) J CH FRAGMENT < S ) * CH2 CH2 F R A G M E N T S ) * CH3 CH3 v C O N S T R A I N T ( S ) : DOUBLE 0 0 CONSTRAINTS): TRIPLE 0 0 CONSTRAINT(S)J * COMMAND: G E N E R A T E

Figure

11

458 S T R U C T U R E S G E N E R A T E D

T I T L E : CINEOLE M O L E C U L A R FORMULA: C10H1B01. F R A G M E N T ( S ) : C-0-C F R A G M E N T ( S ) J CH FRAGMENT ( S ) : CH2 CH2 COMMAND: G E N E R A T E

Figure

12

38

STRUCTURES GENERATED

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SHELLEY E T A L .

Interactive

Structure

Elucidation

107

Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch007

Summary In summary CASE i s a h i g h l y i n t e r a c t i v e network o f computer programs f o r r e l i a b l y and e f f i c i e n t l y a s s i s t i n g the chemist i n t h e c o n v e r s i o n o f chemical and s p e c t r o s c o p i c d a t a t o m o l e c u l a r s t r u c t u r e . Comm u n i c a t i o n i s i n t h e c o n v e n t i o n a l language o f t h e c h e m i s t and program e x e c u t i o n i s s u f f i c i e n t l y r a p i d to make problem s o l v i n g a h i g h l y c o n v e r s a t i o n a l p r o cess. CASE i s d e s i g n e d t o grow and expand, and we a r e c o n f i d e n t i t w i l l be more p o w e r f u l tomorrow than i t i s today. Literature 1.

2. 3.

4. 5.

6. 7. 8. 9. 10. 11. 12. 13.

Cited

S t e v e n s , C a l v i n L., Taylor, K . G r a n t , Munk, Morton E., M a r s h a l l , W. S., Noll, K l a u s , Shah, G . D., Shah, L . G . and U z u , K., J. Med. Chem. (1965), 8, 1. Munk, Morton E., Sodano, C h a r l e s S . , McLean, Robert L. and Haskell, Theodore H., J. Am. Chem. Soc. (1967), 89, 4158. Munk, Morton E., N e l s o n , Denny B., A n t o s z , F r e d e r i c k J., H e r a l d , Jr., D e l b e r t L . and H a s k e l l , Theodore H., J. Am. Chem. S o c . (1968), 90, 1087. N e l s o n , D. B., Munk, M. E., Gash, K . B . and H e r a l d , Jr., D. L., J. O r g . Chem. (1969), 34, 3800. A n t o s z , F . J., N e l s o n , D. B., H e r a l d , Jr., D. L. and Munk, M. E., J. Am. Chem. S o c . (1970), 92, 4933. N e l s o n , D. B . and Munk, M. E., J. O r g . Chem. (1970), 35, 3832. N e l s o n , D. B . and Munk, M. E., J. O r g . Chem. (1971), 36, 3456. Bognar, R . , Sztaricskai, F., Munk, M. E. and Tamas, J., J. O r g . Chem. (1974), 39, 2971. Woodruff, H . B . and Munk, M. E., J. O r g . Chem. (1977), 42, 0000. Woodruff, H . B . and Munk, M. E., Anal. Chim. A c t a / Computer T e c h n i q u e s and O p t i m i z a t i o n , i n p r e s s . S h e l l e y , C . A . and Munk, M. E., J. Chem. I n f . Comput. Sci. (1977), 17, 0000. Ireland, C., Stallard, M. O., F a u l k n e r , D. J., Finer, J. and C l a r d y , J., J. O r g . Chem. (1976), 41, 2461. Ichihara, A., Shiraishi, K., S a t o , H., Sakamura, S., N i s h i y a m a , K., S a k a i , R . , F u r u s a k i , A . and Matsumoto, T., J. Amer. Chem. S o c . (1977), 99, 636.

Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.