Designing an Expert System for Organic Synthesis - ACS Symposium

Sep 1, 1989 - 2 Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60077. 3 G. D. Searle & Company, Skokie, IL 60077...
0 downloads 0 Views 2MB Size
Chapter 9

Designing an Expert System for Organic Synthesis The Need for Strategic Planning Peter Y. Johnson , Dene Burnstein , John Crary , Martha Evans , and Tunghwa Wang

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

1

2

3

2

2

1Department of Chemistry, Illinois Institute of Technology, Chicago, IL 60077 2Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60077 G . D. Searle & Company, Skokie, IL 60077 3

SYNLMA, an expert system for organic synthesis, with a theorem prover as i t s inference engine and NCI's XTCHEM as i t s user interface, uses a retrosynthetic approach to find reaction pathways and generate a problem-solving tree representing the alternative designs i t has explored. Presently, the system is capable of handling compounds of the order of complexity of Darvon, Ibuprofen, and the bicyclic system, cocaine. The combinatorial explosion that results from the input of larger target molecules has convinced us of the need for strategic planning during the synthesis process. We have developed a three-stage approach to aid SYNLMA i n the planning process. The f i r s t stage identifies abstracted potential starting materials or name reaction derived synthons using graph overlay techniques to compare them with complex substructures i n the target molecule. The second stage involves using "PMCD" strategies to define graphical paths between the target and abstracted synthons or starting materials. The leaf nodes on this path represent "chemical islands" which are then connected by general reaction rules. The third stage defines the tree by supplying specific reaction rules. SYNLMA is an expert system designed to produce reaction pathways for organic synthesis problems. Many groups have worked on the organic synthesis problem, i n the main using conventional programming techniques (1-7). What makes the SYNLMA system unique is the partitioning of the system into independent units consisting of a 0097-6156/89/0408-0102$06.75/0 c 1989 American Chemical Society

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

9. JOHNSON ETAL.

Designing an Expert System for Organic Synthesis

c h e m i c a l knowledge base, a u s e r i n t e r f a c e and a r e a s o n i n g component. Our c h o i c e o f a theorem p r o v e r as t h e i n f e r e n c i n g e n g i n e adds s t r e n g t h and f l e x i b i l i t y t o t h e system d e s i g n . S e p a r a t i o n o f t h e r e a s o n i n g component from o t h e r components o f the system has p r o v e d t o have many advantages ( 8 - 1 0 ) . I t has a l l o w e d us t o experiment w i t h d i f f e r e n t r e p r e s e n t a t i o n s o f c h e m i c a l knowledge w i t h o u t major changes i n t h e o v e r a l l system. The d e s i g n has a l s o a l l o w e d us t o e a s i l y add o r d e l e t e knowledge from o u r d a t a b a s e , and has g i v e n us t h e p o t e n t i a l t o i n t e r f a c e w i t h a wide variety of commercial d a t a bases such as those p r o v i d e d by the I n s t i t u t e o f S c i e n t i f i c I n f o r m a t i o n and Chemical A b s t r a c t s . P r e s e n t l y SYNLMA has been a p p l i e d t o t h e s y n t h e s e s o f s m a l l compounds o f t h e o r d e r o f c o m p l e x i t y o f Darvon ( 1 , F i g . 4) ( 1 0 ) , I b u p r o f e n ( 2 , F i g . 3) ( 1 1 ) , and the b i c y c l i c compound c o c a i n e ( 3 , F i g . 6) u s i n g a d a t a base o f s e v e r a l hundred s e l e c t e d r e a c t i o n r u l e s and n e a r l y f i f t y s e l e c t e d s t a r t i n g m a t e r i a l s . Those r u l e s and s t a r t i n g m a t e r i a l s needed t o d u p l i c a t e t h e p u b l i s h e d s y n t h e s e s o f 1, 2, and 3 were i n c l u d e d as a s u b s e t o f o u r t o t a l r e a c t i o n r u l e d a t a base. A t t e m p t s t o s y n t h e s i z e l a r g e r m o l e c u l e s , o r i n t e r f a c e w i t h commercial d a t a b a s e s have r e s u l t e d , f o r a number o f reasons i n a c o m b i n a t o r i a l e x p l o s i o n , c l e a r l y i l l u s t r a t i n g t h e need f o r s t r a t e g i c p l a n n i n g . We a r e now i n t h e p r o c e s s o f a d d i n g p l a n n i n g i n t e l l i g e n c e t o t h e system so t h a t i t w i l l be more e f f i c i e n t i n d e v e l o p i n g r e a c t i o n pathways. The g o a l i s t o have SYNLMA more c l o s e l y model t h e thought p r o c e s s e s o f t h e s y n t h e t i c c h e m i s t . Our f l e x i b l e d e s i g n i s e n a b l i n g us t o experiment w i t h a new t h r e e - s t a g e p l a n n i n g s t r a t e g y u s i n g t h e i n f e r e n c i n g c a p a b i l i t i e s o f t h e theorem p r o v e r , w h i l e k e e p i n g t h e c h e m i c a l r e p r e s e n t a t i o n scheme and u s e r i n t e r f a c e i n t a c t . D e s c r i p t i o n o f t h e P r e s e n t SYNLMA System To i n i t i a t e a c h e m i c a l s y n t h e s i s , a u s e r o f SYNLMA f i r s t i n t e r a c t s w i t h t h e f r o n t end o f t h e system, a s e r i e s o f P a s c a l programs c a l l e d XTSYN, b a s e d on t h e N a t i o n a l Cancer I n s t i t u t e XTCHEM s t r u c t u r e i n p u t package ( 1 2 ) . XTSYN was d e v e l o p e d b y J o h n C r a r y (11) on a n IBM/AT w i t h a 80287 math c o p r o c e s s o r and a H e r c u l e s g r a p h i c s b o a r d . The u s e r enters a g r a p h i c a l r e p r e s e n t a t i o n o f the t a r g e t molecule a t the k e y b o a r d . The system c o n v e r t s t h i s g r a p h i c a l r e p r e s e n t a t i o n i n t o c o n n e c t t a b l e format and s t o r e s i t i n a f i l e . A t t h e u s e r ' s r e q u e s t XTSYN w i l l c o n v e r t t h e connect t a b l e r e p r e s e n t a t i o n i n t o c l a u s e format w h i c h c a n a l s o be s t o r e d i n a f i l e , w h i c h s e r v e s as i n p u t t o SYNLMA. XTSYN a l s o has t h e c a p a b i l i t y t o p e r f o r m t h e r e v e r s e p r o c e s s o f c o n v e r t i n g c l a u s e r e p r e s e n t a t i o n s i n t o connect t a b l e form, w h i c h c a n t h e n be used t o generate a g r a p h i c a l d i s p l a y o f a g i v e n m o l e c u l e . T h i s c a p a b i l i t y i s p a r t i c u l a r l y u s e f u l d u r i n g a r u n . Subgoal compounds g e n e r a t e d as c l a u s e s and i n c o r p o r a t e d i n the problem s o l v i n g t r e e c a n be d i s p l a y e d on t h e s c r e e n i n g r a p h i c a l format f o r u s e r i n s p e c t i o n . A complex d a t a s t r u c t u r e i n t h e form o f a d o u b l y l i n k e d l i s t i s used by XTSYN t o c o n v e r t the m o l e c u l e r e p r e s e n t a t i o n s from one form t o a n o t h e r ( 1 1 ) . I n o r d e r t o produce s o l u t i o n s t o a g i v e n problem, t h e theorem p r o v e r must be p r o v i d e d w i t h a theorem t o be p r o v e d and a s e t o f axioms a l l i n c l a u s e form. I n t h e case o f SYNLMA, t h e t a r g e t compound becomes t h e theorem t o be proved. I t i s c o n v e r t e d i n t o a c l a u s e l i s t which c o n s i s t s o f i n d i v i d u a l c l a u s e s r e p r e s e n t i n g the chemical environment o f each atom i n t h e compound bonded t o a t l e a s t two o t h e r

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

103

104

EXPERT SYSTEM APPLICATIONS IN CHEMISTRY

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

atoms. Axioms c o n t a i n i n g c h e m i c a l knowledge i n t h e form o f f u n c t i o n a l groups, r e a c t i o n r u l e s , and s t a r t i n g m a t e r i a l s a r e a l s o r e p r e s e n t e d i n c l a u s e format ( 1 0 ) . The t a r g e t i s decomposed i n t o s i m p l e r p r e c u r s o r compounds b y p a t t e r n matching w i t h t h e a p p r o p r i a t e r e a c t i o n r u l e s chosen on t h e b a s i s o f f u n c t i o n a l group i d e n t i f i c a t i o n and p r i o r i t y i n f o r m a t i o n . These l e s s complex compounds a r e t h e n c o n s i d e r e d as s u b g o a l s b y t h e system. The p r o c e s s o f d e c o m p o s i t i o n c o n t i n u e s u n t i l the bottom l e v e l p r e c u r s o r s a r e a v a i l a b l e compounds o r o t h e r u s e r d e f i n e d c o n s t r a i n t s a r e s a t i s f i e d . T h i s backward r e a s o n i n g p r o c e s s , c a l l e d r e t r o - s y n t h e t i c a n a l y s i s b y c h e m i s t s , i s o f t e n c a l l e d backward c h a i n i n g b y computer s c i e n t i s t s . The M u l t i - L a y e r e d D e s i g n Approach. The p r e s e n t system i s b u i l t o f s e v e r a l l a y e r s as shown i n F i g . 1. The t o p l a y e r o f SYNLMA d i r e c t s the s y n t h e t i c p r o c e s s , c a l l i n g t h e m i d d l e l a y e r t o p e r f o r m one-step r e a c t i o n s . The bottom l a y e r i s a custom-made theorem p r o v e r modeled a f t e r "ITP," an i n t e r a c t i v e theorem p r o v e r . "iTP" i t s e l f i s b u i l t upon a package o f P a s c a l r o u t i n e s c a l l e d L o g i c Machine A r c h i t e c t u r e (LMA), which implements a resolution based theorem prover. Inferencing rules, b a s e d on c l a s s i c a l logic techniques, are c o n t i n u o u s l y a p p l i e d t o e x i s t i n g c l a u s e s t o g e n e r a t e new i n f o r m a t i o n . Both ITP and LMA were d e s i g n e d and implemented b y t h e theorem p r o v i n g group a t Argonne N a t i o n a l L a b o r a t o r y (13-14). The t o p l a y e r o f SYNLMA m a i n t a i n s a s e t o f complex d a t a s t r u c t u r e s w h i c h r e p r e s e n t s y n t h e s i s i n f o r m a t i o n found b y t h e l o w e r l a y e r s . One o f t h e s e s t r u c t u r e s , t h e problem s o l v i n g t r e e , i s a r e p r e s e n t a t i o n o f t h e pathways SYNLMA has g e n e r a t e d f o r t h e s y n t h e s i s of t h e t a r g e t compound. The nodes o f t h e t r e e r e p r e s e n t m o l e c u l e s , the a r c s r e p r e s e n t r e a c t i o n r u l e i n f o r m a t i o n . The r o o t node i s t h e t a r g e t compound. The l e a f nodes a r e t h e s t a r t i n g m a t e r i a l s . Other d a t a s t r u c t u r e s managed by t h e t o p l a y e r a r e t h e m o l e c u l a r h a s h t a b l e w h i c h c o n t a i n s i n f o r m a t i o n f o r a l l o f t h e m o l e c u l e s f o u n d so f a r i n the development o f t h e problem s o l v i n g t r e e . Only one e n t r y i s made f o r a m o l e c u l e i n t h e m o l e c u l a r h a s h t a b l e . T h i s e n t r y c o n t a i n s two v e c t o r s t h a t u n i q u e l y i d e n t i f y t h a t m o l e c u l e so t h a t i t i s n o t p r o c e s s e d more t h a n once. A r e a c t i o n r u l e h a s h t a b l e i s k e p t f o r a l l the r e a c t i o n s r e f e r r e d t o i n t h e problem s o l v i n g t r e e . Associated w i t h each r e a c t i o n a r e parameters i n d i c a t i n g y i e l d , e x p e r i m e n t a l d i f f i c u l t y , c o s t , and s a f e t y . I n a d d i t i o n t o t h e s e d a t a s t r u c t u r e s , a work l i s t o f a l l t h e m o l e c u l e s w a i t i n g t o be p r o c e s s e d i s k e p t by the t o p l a y e r i n l i n k e d l i s t form s o r t e d i n o r d e r o f t h e v a l u e s c a l c u l a t e d f o r them by t h e e v a l u a t i o n f u n c t i o n . The m o l e c u l e w i t h t h e l o w e s t v a l u e i s chosen t o be p r o c e s s e d f i r s t . These v a l u e s a r e c a l c u l a t e d by an e v a l u a t i o n f u n c t i o n c o n t a i n i n g h e u r i s t i c i n f o r m a t i o n w h i c h t a k e s i n t o account t h e c o m p l e x i t y o f t h e m o l e c u l e , t h e f u n c t i o n a l groups i t c o n t a i n s , and i t s p o s i t i o n i n the problem s o l v i n g tree. I n o r d e r t o d e r i v e p r e c u r s o r compounds, SYNLMA must s e a r c h t h e r e a c t i o n r u l e d a t a base and match t h e g o a l compound w i t h t h e p r o d u c t s i d e o f a r e a c t i o n r u l e . This process begins a t the top l a y e r , which b u i l d s t h e problem s o l v i n g t r e e . I t c a l l s t h e m i d d l e l a y e r t o add a new b r a n c h t o t h e t r e e . The m i d d l e l a y e r i s d e s i g n e d as a network o f what we c a l l environment p a i r s and s e r v e s as t h e i n t e r f a c e between t h e c h e m i c a l knowledge d a t a base and t h e i n f e r e n c e e n g i n e . Each p a i r c o n s i s t s o f a) a c a l l t o t h e theorem p r o v e r w i t h a l l t h e n e c e s s a r y i n f o r m a t i o n

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

Machine

Logic

(LMA)

Layer

Bottom

Figure

1: Layer

Prover

Architecture

- Theorem

Structure

Environments

Solving

- Network o f

- Problem

Layer

Layer

Middle

Top

of

SYNLMA

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

106

EXPERT SYSTEM APPLICATIONS IN CHEMISTRY

i t needs such as c l a u s e l i s t s , t e r m i n a t i n g c o n d i t i o n s and s t r a t e g i e s , and b) a u t i l i t y program t h a t e x t r a c t s and s t o r e s i n f o r m a t i o n from the c a l l t o the theorem p r o v e r and a l s o g a t h e r s i n f o r m a t i o n from the d a t a base. I n f o r m a t i o n f o u n d i n one c a l l t o the theorem p r o v e r i s made a v a i l a b l e f o r f u r t h e r c a l l s . F i g . 2 shows the environment network o f the m i d d l e l a y e r , and i t s component environment p a i r s . The f i r s t environment p a i r , 1A and I B , i s r e s p o n s i b l e f o r f i n d i n g and s t o r i n g f u n c t i o n a l group i n f o r m a t i o n . When the m i d d l e l a y e r i s c a l l e d t o add a new b r a n c h t o the t r e e , the f i r s t s t e p i s a c a l l from 1A t o the bottom l a y e r , the theorem p r o v e r , w h i c h generates f u n c t i o n a l group i n f o r m a t i o n f o r each g o a l compound. The second h a l f o f the p a i r , I B , c o l l e c t s and s t o r e s t h i s i n f o r m a t i o n . Once the f u n c t i o n a l group i n f o r m a t i o n i s a v a i l a b l e , the n e x t environment i s c a l l e d and 2A s t a r t s t o s e a r c h , by p r i o r i t y , a s u b s e t o f the r e a c t i o n r u l e f i l e s c o n t a i n i n g o n l y those r e a c t i o n s t h a t p e r t a i n t o t h e s e f u n c t i o n a l groups. To make t h i s s e a r c h e f f i c i e n t , our i n i t i a l approach has been t o p a r t i t i o n the r e a c t i o n r u l e d a t a base i n t o s u b s e t s ( c h a p t e r s ) o r d e r e d by unique f u n c t i o n a l group numbers. P r i o r i t y f o r c a l l i n g the f u n c t i o n a l group c h a p t e r s has been s e t by the f i r s t author. As soon as a match i s found between the p r o d u c t s i d e o f a r e a c t i o n r u l e and the g o a l compound, the u t i l i t y program i n the p a i r , environment 2B, s t o r e s t h i s i n f o r m a t i o n . There may be s e v e r a l r u l e s whose p r o d u c t p o r t i o n matches w i t h the g o a l compound. A l l t h e s e p o t e n t i a l r e a c t i o n s are s t o r e d f o r c o n s i d e r a t i o n by the third environment p a i r . The t h i r d p a i r c o n s t r u c t s new sub g o a l compounds from the r e a c t a n t h a l f o f a r e a c t i o n r u l e . I t a c c o m p l i s h e s t h i s by s u b s t i t u t i n g known atoms from the s u b g o a l compound m o l e c u l e f o r the v a r i a b l e s i n the reaction rule. The f o u r t h e n v i r o n m e n t a l p a i r p r o c e s s e s the newly g e n e r a t e d subgoals. The theorem p r o v e r i s c a l l e d t o check each s u b g o a l f o r c h e m i c a l f e a s i b i l i t y , p r e s e n c e o r absence i n the l i s t o f a v a i l a b l e s t a r t i n g compounds and t o d i s c o v e r whether i t has been p r e v i o u s l y g e n e r a t e d by the system. T h i s i n f o r m a t i o n g u i d e s the u t i l i t y program so t h a t i t can i n s e r t each s u b g o a l i n t o the a p p r o p r i a t e data s t r u c t u r e s m a i n t a i n e d by the top l a y e r . D u r i n g the c o u r s e o f the b u i l d i n g o f the p r o b l e m s o l v i n g t r e e , the c h e m i s t can e x t r a c t s u b g o a l m o l e c u l e s from the nodes o f the t r e e . The s u b g o a l s , w h i c h are g e n e r a t e d as c l a u s e l i s t s can be p a s s e d t o XTSYN w h i c h w i l l d i s p l a y them on the s c r e e n i n a g r a p h i c a l f o r m a t and s t o r e them i n XTCHEM connect t a b l e form f o r f u t u r e d i s p l a y . C u r r e n t l y the s e a r c h p r o c e s s t e r m i n a t e s when any o f the f o l l o w i n g c o n d i t i o n s occur: 1) t h e r e i s no more memory l e f t i n the computer, 2) the work l i s t i s empty; no more m o l e c u l e s t o p r o c e s s , 3) the u s e r - s p e c i f i e d upper l i m i t s on the h e i g h t and/or d e p t h o f the p r o b l e m s o l v i n g t r e e a r e exceeded, 4) the u s e r - d e f i n e d maximum number o f s o l u t i o n s i s reached. Scope o f the P r e s e n t

System

W i t h the p r e s e n t d e s i g n , a d a t a base o f f i f t y selected starting m a t e r i a l s , and two hundred s e l e c t e d r e a c t i o n r u l e s SYNLMA i s c u r r e n t l y able to generate s y n t h e t i c t r e e s , o f t e n i n a very naive or i n e f f i c i e n t manner, f o r m o l e c u l e s o f the s i z e and c o m p l e x i t y o f Darvon, I b u p r o f e n ,

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

9. JOHNSON ETAL.

Designing an Expert System for Organic Synthesis

F i n d and S t o r e Functional

Group

Information Insert

Subgoals i n

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

Problem S o l v i n g T r e e

Environment

1A

Theorem P r o v e r Environment

4B

Theorem P r o v e r

1 Environment

Environment

IB

Utility 4A

Utility

Environment Environment

3B

2A

Theorem P r o v e r

Utility *

Not

More

Matched r

1

Environment

2B

Matched Environment 3A

Utility

Theorem P r o v e r F i n d and S t o r e R e a c t i o n R u l e s G e n e r a t e New

Subgoals

F i g u r e 2:

A Network o f Environment

Pairs

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

108

EXPERT SYSTEM APPLICATIONS IN CHEMISTRY

and c o c a i n e . A p r u n e d sample problem s o l v i n g t r e e showing p l a u s i b l e r o u t e s t o I b u p r o f e n i s d i s p l a y e d i n F i g . 3. The most s e r i o u s p o t e n t i a l p r o b l e m i s c o m b i n a t o r i a l e x p l o s i o n . These s y n t h e t i c t r e e s r e p r e s e n t l i m i t i n g cases f o r system r e s o u r c e s on a VAX-11/750 r u n n i n g the UNIX o p e r a t i n g system. I t i s c l e a r t h a t a t t e m p t s t o s y n t h e s i z e more complex m o l e c u l e s , o r i n t e r f a c e w i t h data bases c o n t a i n i n g thousands o f s t a r t i n g m a t e r i a l s and r e a c t i o n r u l e s w i l l r e s u l t i n a combinatorial explosion. E r r o r s i n p r u n i n g a l s o cause s i g n i f i c a n t p r o b l e m s . Omitted p r u n e d p a t h s g e n e r a l l y r e s u l t e d from o u r n o t u s i n g r e a c t i o n r u l e c o n s t r a i n t s o r n o n s e l e c t i v e and/or n o n - i n t e l l i g e n t u s e o f t h e r u l e s . T h i s i s one r e a s o n why none o f SYNLMA's p a t h s r e p r e s e n t p u b l i s h e d s y n t h e s e s o f I b u p r o f e n (15) i n s p i t e o f t h e f a c t t h a t t h e r e q u i s i t e r u l e s were i n t h e d a t a base. On t h e p o s i t i v e s i d e , t h e s y n t h e t i c p a t h s t o I b u p r o f e n d i s c o v e r e d b y SYNLMA a r e s t r a i g h t f o r w a r d and w o u l d p r o b a b l y work as shown. SYNLMA, i n i t s p r e s e n t form i s c h e m i c a l l y u n s o p h i s t i c a t e d . I t does n o t have t h e r e a c t i o n i n s i g h t s , i n f o r m a t i o n on s t r u c t u r a l l i m i t a t i o n s , and p l a n n i n g s t r a t e g i e s t h a t t h e e x p e r t c a n c a l l i n t o p l a y d u r i n g t h e c o u r s e o f s o l v i n g a s y n t h e s i s problem. F o r example, when p l a n n i n g t h e s y n t h e s i s o f a C25 n - a l k a n e w h i c h c o n t a i n s a l o n g l i n e a r c h a i n o f r e p e a t i n g ( C H 2 ) n " groups, a c h e m i s t , h o p i n g t o minimize the s y n t h e t i c steps i n h i s s y n t h e s i s , would t y p i c a l l y s t a r t the s e a r c h f o r p r e c u r s o r synthons h a v i n g a p p r o x i m a t e l y h a l f t h e c h a i n l e n g t h c o n t a i n i n g a p p r o p r i a t e bond making f u n c t i o n a l groups. One o f the SYNLMA s o l u t i o n s t o t h i s problem was a s t e p - w i s e s y n t h e s i s o f t h e e n t i r e c h a i n , one methylene u n i t a t a t i m e , u s i n g a n o n s e l e c t i v e bond-making r e a c t i o n such as a carbene i n s e r t i o n r e a c t i o n . C l e a r l y no knowledgeable c h e m i s t w o u l d take t h i s approach! T h i s same n o n s e l e c t i v e carbene r e a c t i o n was used as p a r t o f t h e SYNLMA s o l u t i o n t o s u g g e s t e d s y n t h e s i s o f Darvon as shown i n F i g . 4. T h i s r e a c t i o n and s e v e r a l o t h e r s were removed from o u r r e a c t i o n r u l e d a t a base i n order t o prevent t h e i r nonselective use. As one c a n see, t h e n a t u r e and s e l e c t i o n o f r e a c t i o n r u l e s has p l a c e d l i m i t a t i o n s on SYNLMA. The r e a c t i o n r u l e d a t a base n o t o n l y c o n t a i n s the,* r u l e i t s e l f , b u t a l s o "must have-must n o t have" i n f o r m a t i o n / c o n s t r a i n t s c o n c e r n i n g f u n c t i o n a l group i n c o m p a t i b i l i t y (10). These mandated c o n s t r a i n t s , o f t e n i n v o k e d i n lieu of s e l e c t i v i t y knowledge, p r o t e c t e d us from i n c o r r e c t u s e o f some r e a c t i o n s , b u t , i n numerous c a s e s , a l s o caused SYNLMA t o e l i m i n a t e p o t e n t i a l l y u s e f u l r e a c t i o n r u l e s - r u l e s t h a t a c h e m i s t m i g h t have c o n s i d e r e d i n s p i t e o f t h e c o n s t r a i n t s . F o r example a c h e m i s t might be happy t o s a c r i f i c e one e q u i v a l e n t o f a cheap G r i g n a r d r e a g e n t t o a compound c o n t a i n i n g b o t h a ketone and an a l c o h o l i n o r d e r t o have the second e q u i v a l e n t add t o t h e k e t o n e . More i n s i d i o u s t o us was t h e i n a b i l i t y , w i t h c o n s t r a i n t s on, t o use many d o u b l e a d d i t i o n r e a c t i o n s r e q u i r e d t o make b i f u n c t i o n a l c o c a i n e s t a r t i n g m a t e r i a l s . Some examples a r e shown i n F i g . 5. T e t r a bromide XL was n o t c o n s i d e r e d as a p o t e n t i a l s t a r t i n g m a t e r i a l s i n c e t h e f i r s t bromine a d d i t i o n t o g i v e 16 was n o t a l l o w e d . The r e a c t i o n r u l e says y o u cannot add bromine t o a non c o n j u g a t e d a l k e n e i f t h e r e i s a n o t h e r a l k e n e p r e s e n t . In the second example, t h e s e l e c t i v e c o n s t r a i n t s p r e v e n t SYNLMA from adding hydride o r Grignard reagents a r b i t r a r i l y t o the carbonyl o f i t s c h o i c e t o g i v e a l c o h o l s 20 o r 21 when r e a c t i n g w i t h d i o n e 18. n

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

Designing an Expert System for Organic Synthesis

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

9. JOHNSON ETAL.

Figure

4:

Darvon

Synthesis,

Nonselective

Carbene

Insertion

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

109

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

110

EXPERT SYSTEM APPLICATIONS IN CHEMISTRY

Example 2 F i g u r e 5:

R e a c t i o n Rule C o n s t r a i n t s

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

9. JOHNSON ETAL.

Designing an Expert System for Organic Synthesis

T h i s p r o t e c t i o n a g a i n s t n o n s e l e c t i v i t y a l s o s t o p s a d d i t i o n o f two e q u i v a l e n t s o f r e a g e n t t o 18 t o g i v e d i o l 21, a c h e m i c a l l y r e a s o n a b l e r o u t e t o t h i s compound. I t i s c l e a r now t h a t we c o u l d r e w r i t e t h e r e a c t i o n r u l e s t o i n v o k e s u b r u l e s o r l a y e r s o f q u a l i f i e r s as a means o f e f f e c t i n g r e a c t i o n s e l e c t i v i t y b u t checking every q u a l i f i e r o f every r u l e c a l l e d would s l o w SYNLMA s i g n i f i c a n t l y . T h i s l e v e l o f c o n s i d e r a t i o n w o u l d more r e a s o n a b l y be done a f t e r s e v e r a l s t r a t e g i e s h a d been chosen f o r f u r t h e r i n v e s t i g a t i o n . Our e x p e r i m e n t s w i t h r e a c t i o n taxonomies a r e d i s c u s s e d l a t e r i n t h i s paper. W i t h o u t t h e c o n s t r a i n t s SYNLMA f i n d s more p a t h s b u t i s l e s s e f f i c i e n t i n i t s g e n e r a t i o n o f v i a b l e s y n t h e t i c pathways. W h i l e h a v i n g too many " c h e m i c a l r e s t r i c t i o n s , " t h e r e a c t i o n r u l e s have no " s t r u c t u r a l r e s t r i c t i o n s . " I n F i g . 6, w h i c h shows t h e f i r s t r e t r o - s y n t h e t i c s t e p s SYNLMA c o n s i d e r e d f o r c o c a i n e s y n t h e s i s , we see t h a t f o u r B r e d t ' s r u l e v i o l a t i o n s , enamines 27a.b and 28a.b were a c c e p t e d as s u b g o a l s . W h i l e d i s c u s s i n g F i g . 6, i t s h o u l d be n o t e d t h a t s t r u c t u r e s 22, 23, 24, and 26 a r e n o t a l l o w e d when c o n s t r a i n t s are on. S t r u c t u r e s 23, 24, 27, and 28 a r e t y p i c a l o f c u r r e n t SYNLMA output. When i t f i n d s a r e a c t i o n r u l e , i t a p p l i e s t h e r u l e e x h a u s t i v e l y . S t r u c t u r e s 29 and 30 a r e n o t s y n t h e t i c a l l y demodulated and r e p r e s e n t w a s t e d CPU time. F i n a l l y , one second generation s t r u c t u r e , 3JL, i s shown because i t r e p r e s e n t s an i n t e r e s t i n g v a r i a t i o n o f an N-oxide ene c y c l o a d d i t i o n r e a c t i o n t h a t has been u s e d t o s y n t h e s i z e t r o p a n o l ( 1 6 ) , t h e b a s i c c o c a i n e r i n g system. From t h e above examples, i t i s c l e a r we need t o b u i l d e f f e c t i v e p l a n n i n g s t r a t e g i e s i n t o SYNLMA and r e s t r u c t u r e o u r d a t a base o f c h e m i c a l i n f o r m a t i o n . T h i s w i l l improve t h e e f f i c i e n c y o f o u r system and make i t a v i a b l e a s s i s t a n t t o t h e s y n t h e t i c o r g a n i c c h e m i s t . M o d e l i n g S t r a t e g i c P l a n n i n g F o r The S y n t h e s i s

Process

Our new system d e s i g n i n v o l v e s p l a n n i n g and o r g a n i z i n g t h e s y n t h e s i s p r o c e s s so t h a t i t c l o s e l y models t h e human e x p e r t ' s approach. How do c h e m i s t s d e a l w i t h a c o m p l i c a t e d o r g a n i c s y n t h e s i s problem? They seem t o o r g a n i z e t h e i r work i n t o t h r e e s u c c e s s i v e s t a g e s w h i c h we c a l l the t r e e - d e f i n i t i o n , t r e e - b u i l d i n g , and t r e e - v e r i f i c a t i o n s t a g e s (17-18). We a r e now i n t h e p r o c e s s o f r e d e s i g n i n g and u p g r a d i n g SYNLMA t o r e f l e c t t h i s new approach. I n t h e f i r s t s t a g e , t h e t r e e - d e f i n i t i o n s t a g e , t h e main t h r u s t i s t o i d e n t i f y p o t e n t i a l s t a r t i n g m a t e r i a l s and/or m e t h o d o l o g i e s by n o t i n g resemblances between t h e t a r g e t compound and (1) c l a s s e s o f a v a i l a b l e s t a r t i n g m a t e r i a l s o r (2) s u b s t r u c t u r e s o r s u p e r s t r u c t u r e s p r o d u c e d by name r e a c t i o n s such as t h e F i s c h e r i n d o l e s y n t h e s i s . Examples o f t h e s e two approaches a r e shown i n F i g . 7 (19) and F i g . 8 (20) r e s p e c t i v e l y f o r t h e s y n t h e s i s o f t h e a l k a l o i d ibogamine, 34. In the substructure d r i v e n approach t o ibogamine, t h e i n d o l e s u b s t r u c t u r e 34 ( F i g . 7) i s r e c o g n i z e d as an a b s t r a c t e d s t a r t i n g m a t e r i a l . As o u t l i n e d below i n t h e d i s c u s s i o n o f t h e Tree D e f i n i t i o n Stage, the a b s t r a c t e d s t a r t i n g m a t e r i a l s a r e l i n k e d t o i n c r e a s i n g l y s p e c i f i c , l e s s a b s t r a c t p o t e n t i a l s t a r t i n g m a t e r i a l s (see F i g . 9 ) . The i d e n t i f i c a t i o n o f a p o t e n t i a l s t a r t i n g m a t e r i a l d r i v e s the r e t r o s y n t h e t i c a n a l y s i s i n a manner w h i c h p r e s e r v e s t h a t component, g e n e r a t i n g i n F i g . 7 t h e c h e m i c a l i s l a n d 35. I n t h e methodology

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

111

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

112

E X P E R T S Y S T E M APPLICATIONS IN C H E M I S T R Y

31 Figure

6:

SYNLMA Initial

Syntheses

of

Cocaine

Retro-synthetic

Paths

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

Designing an Expert System for Organic Synthesis

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

9. JOHNSON ETAL.

H

Aldrich 1 of

H

32

Catalog

thousands

of

Figure

6 membered 7:

rings

Ibogamine

synthesis:

Substructure

driven.

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

113

114

EXPERT SYSTEM APPLICATIONS IN CHEMISTRY

Fischer

indole

Synthesis

NH I

£1 Starting Ibogamine,

34

N

H

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

~

Material steps

2

4

40

. 42

0^

2 3

CH

.

yY

steps

\\

4

steps

< Starting

Figure

8:

Ibogamine

Synthesis.

ca.

25

Methodology

4-substituted

materials

Driven

indoles

4

ca.30

5-substi tuted

5

2 6-substituted

6

^

f

:

^

>

N ca.

3

c a . 55

3-substituted

2

c a . 35

2-substituted

\

1 ca.

20

N-substituted

Natural

products

Other Saturated ca.

10

Carbazole,

ca.

10 X = c o m b i n a t i o n s

Figure

9:

Indole

X=C;

or

ca.

10

unsaturated of C or

N

Substitution Patterns

Found

in Aldrich

Catalog

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

\

/

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

9. J O H N S O N E T A L .

Designing an Expert System for Organic Synthesis

d r i v e n approach t o ibogamine, t h e i n d o l e s u b s t r u c t u r e 34 ( F i g . 8) i s r e c o g n i z e d as t h e a b s t r a c t e d end r e s u l t o f a named o r g a n i c r e a c t i o n . I n t h i s case t h e r e t r o s y n t h e t i c a n a l y s i s u s i n g t h e named o r g a n i c r e a c t i o n w i l l i n d i c a t e t h a t 41 and 42 a r e p o t e n t i a l s y n t h e t i c s u b g o a l s for further synthesis. The d i s c o v e r y o f t h e s e r e s e m b l a n c e s g i v e s a t e n t a t i v e shape t o the problem s o l v i n g t r e e . This i s the h i g h e s t l e v e l p l a n n i n g stage where c h e m i s t s u s e i n d u c t i o n t o l i m i t t h e s e a r c h for starting m a t e r i a l s and t o d e t e r m i n e where t o f o c u s t h e i r d e d u c t i v e p r o c e s s e s , i n many c a s e s making what h a s been c a l l e d " t h e i n t u i t i v e l e a p " ( 2 1 ) . Resemblances between the target and s t a r t i n g materials, or methodology-produced i n t e r m e d i a t e s c a n be d e s c r i b e d i n terms o f c h e m i c a l s u b s t r u c t u r e s o r s u p e r s t r u c t u r e s and t r a n s l a t e d i n t o g r a p h o v e r l a y t e c h n i q u e s w h i c h c a n be implemented by SYNLMA. D u r i n g t h i s s t a g e t h e system c a n f i l l i n t h e r o o t node and many o f t h e l e a f nodes i n t h e p r o b l e m s o l v i n g t r e e . The l e a f nodes a r e the c l a s s e s o f s t a r t i n g m a t e r i a l s o r s y n t h o n s w h i c h have been i d e n t i f i e d by t h e system as h a v i n g s i g n i f i c a n t o r s t r a t e g i c s t r u c t u r e i n common w i t h t h e t a r g e t compound. D u r i n g t h e second s t a g e , t h e t r e e - b u i l d i n g s t a g e , a c o l l e c t i o n o f c r u d e , i m p r e c i s e l y d e f i n e d p r o b l e m s o l v i n g t r e e s w i l l be g e n e r a t e d . The c h e m i s t goes t h r o u g h an analogous s t a g e . Once an a n a l y s i s o f t h e t a r g e t has been c o m p l e t e d , rough s y n t h e s i s o u t l i n e s / r o u t e s a r e c o n s t r u c t e d u s u a l l y r e f l e c t i n g the i n d i v i d u a l ' s knowledge, c r e a t i v i t y , and p r e j u d i c e s . As p a r t o f t h e p r o c e s s , t h e c h e m i s t w i l l o f t e n i n s e r t , p o t e n t i a l l y a t any node a l o n g the path, intermediate s t r u c t u r e s w h i c h a r e e x p e c t e d t o be c o n v e r t i b l e t o t h e t a r g e t o r higher l e v e l intermediates i n t h e pathway. These intermediate s t r u c t u r e s a l s o have a r e a s o n a b l e chance o f b e i n g s y n t h e s i z e d from some s t a r t i n g m a t e r i a l s a v a i l a b l e , i n t h e a b s t r a c t ( 2 1 ) . I n summary, these intermediate compounds, w h i c h c a n be c o n s i d e r e d "chemical i s l a n d s " have a s t r u c t u r e w h i c h i s r e l a t e d t o t h e a b s t r a c t e d s t a r t i n g m a t e r i a l s o r methodology d e r i v e d s y n t h o n s and t h e t a r g e t m o l e c u l e . They c a n be r e p r e s e n t e d as l e a f nodes a l o n g a c r u d e l y defined s y n t h e t i c pathway. To r e a c h from s t a r t i n g m a t e r i a l s o r s y n t h o n s t o t h e s e " c h e m i c a l i s l a n d s " and t h e n t o t h e t a r g e t may i n v o l v e s e v e r a l m u l t i - s t e p s y n t h e s i s p r o c e s s e s w h i c h c a n be f i l l e d i n t h r o u g h s u c c e s s i v e t r e e - b u i l d i n g s t a g e s , each p r o v i d i n g a s k e l e t o n p l a n f o r the n e x t s t a g e ; each u s i n g more d e t a i l e d o r s e l e c t i v e r e a c t i o n r u l e s . What i s r e q u i r e d t o implement t h i s approach i s a new o r g a n i z a t i o n , o r taxonomy o f r e a c t i o n r u l e s f o r SYNLMA, r a n g i n g from t h e v e r y g e n e r a l t o t h e more s p e c i f i c . The more g e n e r a l r u l e s , w h i c h r e p r e s e n t m u l t i - s t e p r e a c t i o n s o r p r o c e s s e s , a r e a p p l i e d i n the e a r l i e r p l a n n i n g stages. The more s p e c i f i c s i n g l e - s t e p r e a c t i o n s a r e a p p l i e d l a t e r . D u r i n g t h i s second s t a g e we w i l l model t h e e x p e r t ' s a p p r o a c h t o i n i t i a l p a t h g e n e r a t i o n and " c h e m i c a l i s l a n d " d e r i v a t i o n b y h a v i n g SYNLMA c a l l a v e r s i o n o f U g i and G a s t e i g e r ' s " P r i n c i p l e o f Minimum C h e m i c a l D i s t a n c e " (PMCD) program w h i c h o f f e r s a c o m p u t e r - a s s i s t e d c o m b i n a t o r i a l s o l u t i o n f o r c o n n e c t i n g two graphs ( 2 2 ) . Incorporation o f t h i s s t r a t e g y w i l l h e l p SYNLMA choose e f f i c i e n t c o n n e c t i v e p a t h s between t a r g e t and g e n e r a l i z e d ( a b s t r a c t ) s t a r t i n g m a t e r i a l s o r synthons. The nodes a l o n g PMCD p a t h s c o n n e c t i n g t a r g e t g r a p h and synthon o r s t a r t i n g m a t e r i a l graphs r e p r e s e n t basic chemical s t r u c t u r e s . A d d i t i o n o f bond-making f u n c t i o n a l groups t o l a s t

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

115

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

116

EXPERT SYSTEM APPLICATIONS IN CHEMISTRY

d i s c o n n e c t i o n p o i n t s on these b a s i c s t r u c t u r e s w i l l c o n s t i t u t e o u r f i r s t a t t e m p t s a t g e n e r a t i n g " c h e m i c a l i s l a n d s " . The PMCD program, w h i c h i s b a s e d on minimum s t r u c t u r e change, h a s been u s e d t o demonstrate t h a t many c l a s s i c s y n t h e s e s c l o s e l y f o l l o w t h e most e f f i c i e n t graph r e p r e s e n t a t i o n d i s c o n n e c t i o n s between t a r g e t and starting material. For t h e c h e m i s t , t h e t h i r d s t a g e i n t h e s y n t h e s i s p r o c e s s i s u s u a l l y a d e t a i l e d a n a l y s i s . During t h i s stage a l l t h e steps a r e f i l l e d i n , b r i d g i n g t h e c h e m i c a l i s l a n d s t o t h e t a r g e t and t o t h e starting materials. F a c t o r s such as y i e l d , c o s t , and s a f e t y a r e c o n s i d e r e d a t t h i s p o i n t . F o r SYNLMA t h i s phase w i l l r e s u l t i n t h e c o m p l e t i o n o f t h e problem s o l v i n g t r e e u s i n g s i n g l e s t e p r e a c t i o n r u l e s chosen on t h e b a s i s o f f u n c t i o n a l group i n f o r m a t i o n . The system w i l l have t o examine a d j a c e n t nodes o f t h e t r e e , f i n d a p p r o p r i a t e s i n g l e s t e p r e a c t i o n s r u l e s and check c o s t a n d y i e l d factors. S t r u c t u r a l i n f o r m a t i o n w i l l be i n c o r p o r a t e d i n t o t h e r e a c t i o n r u l e s as c o n s t r a i n t s . System I m p l e m e n t a t i o n The new system resembles SYNLMA i n o v e r a l l s t r u c t u r e ; we have c o n t i n u e d t o use t h e t h r e e l a y e r e d approach. The bottom l a y e r i s much l i k e t h e bottom l a y e r o f t h e o l d system; t h a t i s , a custom b u i l t theorem p r o v e r c a l l i n g LMA r o u t i n e s t o do much o f i t s work. Argonne L a b o r a t o r y i s i n the process o f u p d a t i n g LMA, w i t h p a r t i c u l a r emphasis on s p e e d i n g i t up. Any improvements made b y t h e Argonne group w i l l be i n c o r p o r a t e d i n t o t h e new system. The m i d d l e l a y e r c o n t i n u e s t o be a network o f environment p a i r s , b u t w i t h t h e a d d i t i o n a l p a i r s needed f o r graph o v e r l a y and PMCD i m p l e m e n t a t i o n . The t o p l a y e r i s b e i n g e n t i r e l y r e o r g a n i z e d i n t o t h r e e s t a g e s , t h e t r e e - d e f i n i t i o n , t r e e - b u i l d i n g and t r e e - v e r i f i c a t i o n s t a g e s d e s c r i b e d above. The T r e e - D e f i n i t i o n Stage. O f t e n a c h e m i s t w i l l choose a s e t o f a p p r o p r i a t e s t a r t i n g m a t e r i a l s b y n o t i c i n g resemblances between t h e t a r g e t m o l e c u l e and c l a s s e s o f a v a i l a b l e compounds, be t h e y s t a r t i n g m a t e r i a l s o r name r e a c t i o n s y n t h o n s . Resemblances between t h e t a r g e t and s t a r t i n g m a t e r i a l s c a n be d e s c r i b e d i n terms o f c h e m i c a l substructures or superstructures. I f we v i s u a l i z e a c h e m i c a l s t r u c t u r e as a graph, resemblances i n t h e form o f s u b s t r u c t u r e s o r s u p e r s t r u c t u r e s c a n be r e v e a l e d b y t h e o v e r l a y i n g o f one graph on t o p of another. To implement a s u b s t r u c t u r e i d e n t i f i c a t i o n p r o c e s s i n SYNLMA we are a d d i n g t o o u r knowledge base a group o f c h e m i c a l l y m e a n i n g f u l substructures. The s u b s t r u c t u r e s d a t a base w i l l be c u l l e d from t h e A l d r i c h C h e m i c a l Co. c a t a l o g and t h e 500 O r g a n i c Name R e a c t i o n s l i s t e d i n t h e Merck Index. These w i l l be s t o r e d i n c l a u s e form and a r r a n g e d i n a h i e r a r c h i c a l format a c c o r d i n g t o c h e m i c a l c o m p l e x i t y . The m i d d l e l a y e r o f SYNLMA w i l l now have s e v e r a l a d d i t i o n a l environment p a i r s to h a n d l e t h e new p l a n n i n g s t r a t e g i e s . One such environment p a i r w i l l consist of a c a l l t o t h e theorem p r o v e r to find candidate substructures i n the t a r g e t molecule u s i n g i t s p a t t e r n matching algorithms. As i n t h e case o f Wipke's a b s t r a c t e d s t r u c t u r e s , e x a c t f u n c t i o n a l group b o n d i n g d e t a i l s w i l l be i g n o r e d a t t h i s t i m e . I n

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

9. JOHNSON ETAL.

Designing an Expert System for Organic Synthesis

order t o use t h i s substructure i n f o r m a t i o n t o f i n d p o t e n t i a l s t a r t i n g m a t e r i a l s , we a r e o r g a n i z i n g t h e s t a r t i n g m a t e r i a l s b y g e n e r a l i z e d classes. Each c l a s s i s r e p r e s e n t e d b y a p a t t e r n c l a u s e c o n t a i n i n g the substructure which defines that c l a s s , b u t w i t h v a r i a b l e s r e p r e s e n t i n g s i d e c h a i n s and n o n - s t r u c t u r e bonds. The g e n e r a l i z e d s t r u c t u r e s w i l l have p o i n t e r s t o more s p e c i f i c p a t t e r n s w h i c h w i l l i n t u r n have p o i n t e r s t o t h e u n i q u e s t r u c t u r e s l i s t e d i n o u r reference sources. We e x p e c t t o i d e n t i f y about two h u n d r e d g e n e r i c c l a s s e s . I n t h e case o f p o t e n t i a l ibogamine s t a r t i n g m a t e r i a l s ( s e e F i g . 7 ) , n e a r l y 120 o f t h e 14,000 compounds l i s t e d i n t h e A l d r i c h c a t a l o g have t h e i n d o l e m o i e t y as a s u b s t r u c t u r e . The i n d o l e r i n g s l i s t e d i n t h e A l d r i c h c a t a l o g c a n be grouped a c c o r d i n g t o t h e i r f i v e s u b s t i t u t i o n s i t e s c o n t a i n i n g s i g n i f i c a n t members. F i g . 9 shows t h e breakdown o f t h e i n d o l e p r o b l e m from most g e n e r a l t o i n d i v i d u a l structures. (Some compounds c o n t a i n m u l t i p l e s u b s t i t u t i o n s . These a r e m u l t i p l y c o u n t e d , once f o r each s u b s t i t u t i o n p o s i t i o n . ) Once t h e theorem p r o v e r h a s r e c o g n i z e d t h e s u b s t r u c t u r e s i n t h e g o a l compound, t h e system w i l l s e a r c h t h e a b s t r a c t e d s t a r t i n g m a t e r i a l d a t a base f o r c l a s s e s o f s t a r t i n g m a t e r i a l s c o n t a i n i n g t h o s e s u b s t r u c t u r e s . If/when an a b s t r a c t e d s t a r t i n g m a t e r i a l i s r e c o g n i z e d , i t w i l l p o i n t t o a more s p e c i f i c p o s s i b i l i t y . The same t y p e o f p a t t e r n m a t c h i n g w i l l be implemented f o r t h e methodology d r i v e n s y n t h o n s d a t a base. As shown i n F i g . 8 f o r ibogamine, upon r e c o g n i t i o n o f t h e i n d o l e s u b s t r u c t u r e as t h e p r o d u c t o f a name r e a c t i o n , i n t h i s case t h e F i s c h e r i n d o l e s y n t h e s i s ( o r one o f t h e o t h e r 13 name r e a c t i o n s l e a d i n g t o i n d o l e s y n t h e s i s l i s t e d i n t h e Merck I n d e x ) , t h e program w i l l c o n s t r u c t t h e s t r u c t u r e s need t o p e r f o r m t h e name r e a c t i o n . Name r e a c t i o n p r e c u r s o r s w i l l become " c h e m i c a l i s l a n d s " o r new t a r g e t s . The u t i l i t y p o r t i o n o f t h e e n v i r o n m e n t a l p a i r w i l l t h e n s t o r e t h e s u b s t r u c t u r e i n f o r m a t i o n and the p o t e n t i a l synthons o r s t a r t i n g m a t e r i a l s chosen. As o u t l i n e d above, t h e s t a r t i n g m a t e r i a l s and s y n t h o n s d a t a base w i l l be o r g a n i z e d f o r e f f i c i e n t s e a r c h b y g e n e r a l s t r u c t u r a l t y p e s ( g r a p h s ) a t t h e t r e e d e f i n i t i o n s t a g e . These g e n e r a l s t r u c t u r e t y p e s can be o r g a n i z e d as l i n k e d l i s t s o f r e l a t e d s t r u c t u r e s headed b y a general pattern clause representing that p a r t i c u l a r c l a s s o f s t a r t i n g m a t e r i a l s o r s y n t h o n s . Once a match h a s been made w i t h t h e p a t t e r n a t t h e head o f t h e l i s t u s i n g t h e graph o v e r l a y t e c h n i q u e s , t h e o t h e r s t r u c t u r a l l y r e l a t e d p o t e n t i a l s t a r t i n g m a t e r i a l s on t h e l i s t c o u l d be r e t r i e v e d t h r o u g h t h e u s e o f p o i n t e r s . The c u r r e n t system r e c o g n i z e s f u n c t i o n a l groups and r i n g s t r u c t u r e s . The new system w i l l r e c o g n i z e l a r g e r s u b s t r u c t u r e s . F o r example u s i n g t h i s new approach t h e system w o u l d be a b l e t o i d e n t i f y the f o l l o w i n g as a s u b s t r u c t u r e o f DARVON: 1$ 5$

I

I

I

I

C6H5 - C - C - C6H5 3$ 7$ where 1$, 3$, 5$ and 7$ a r e v a r i a b l e s r e p r e s e n t i n g v a r i o u s s i d e chains. From t h e g e n e r a l p a t t e r n c l a s s t h e system c o u l d choose d i p h e n y l ethane, s t i l b e n e o r d i p h e n y l a c e t y l e n e as p o t e n t i a l s t a r t i n g m a t e r i a l s t o be examined d u r i n g t h e l a t e r t r e e - v e r i f i c a t i o n s t a g e . The t r e e - d e f i n i t i o n s t a g e i s complete when: a) a l l s i g n i f i c a n t

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

117

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

118

EXPERT SYSTEM APPLICATIONS IN CHEMISTRY

s u b s t r u c t u r e s i n t h e p r e s e n t g o a l compound have been i d e n t i f i e d and b) c l a s s e s o f p o t e n t i a l s t a r t i n g m a t e r i a l s and/or s y n t h o n s have been found. S i g n i f i c a n c e i s d e f i n e d b y a graph c o m p l e x i t y a l g o r i t h m , w h i c h c o u n t s t h e i n c i d e n c e o f nodes ( t h e numbers o f a r c s e n t e r i n g a node) and g i v e s p r e f e r e n c e t o those w i t h h i g h i n c i d e n c e v a l u e s . O t h e r r e s e a r c h groups have used s u b s t r u c t u r e s e a r c h as a method f o r s e l e c t i n g s u i t a b l e s t a r t i n g m a t e r i a l s ( 2 1 ) . H a v i n g t h e theorem p r o v e r as o u r r e a s o n i n g component makes t h i s t a s k e a s i e r f o r us t o implement because o f t h e theorem p r o v e r ' s a b i l i t y t o u s e p a t t e r n m a t c h i n g t o i d e n t i f y t h e s u b s t r u c t u r e s and t h e n match them w i t h pattern clauses representing classes o f s t a r t i n g materials. In addition, i t i s e a s i e r f o r us t o r e p r e s e n t abstractions of s u b s t r u c t u r e s by t h e use o f c l a u s e s c o n t a i n i n g v a r i a b l e s w h i c h s u b s t i t u t e f o r atoms and s i d e c h a i n s . The T r e e - B u i l d i n g Stage. I n t h i s s t a g e we b e g i n t o s k e t c h o u t t h e shape o f t h e p r o b l e m s o l v i n g t r e e and c o n s t r u c t pathways from t h e t a r g e t m o l e c u l e t o s t a r t i n g m a t e r i a l s . Our p l a n i n c l u d e s the m o d e l i n g o f t h e " P r i n c i p l e o f Minimum C h e m i c a l D i s t a n c e " (PMCD), d e v e l o p e d by J . G a s t e i g e r and coworkers ( 2 2 ) . The use o f t h e PMCD w i l l h e l p t h e system d e v i s e " c h e m i c a l i s l a n d s " ; t h e s e a r e compounds w h i c h a r e s t r u c t u r a l l y r e l a t e d t o b o t h t h e t a r g e t and s t a r t i n g m a t e r i a l s . Our i m p l e m e n t a t i o n o f t h e PMCD w i l l d i f f e r from t h a t o f t h e G a s t e i g e r group i n t h a t we r e p r e s e n t c h e m i c a l knowledge i n terms o f c l a u s e s r a t h e r than matrices. To d e t e r m i n e the minimum c h e m i c a l d i s t a n c e , the l a r g e s t ensembles ( s e t s ) o f l a r g e s t s u b s t r u c t u r e s i n t h e t a r g e t m o l e c u l e must be i d e n t i f i e d . The n e x t s t e p i s t o f i n d the l a r g e s t s u b s t r u c t u r e s common to b o t h t h e p o t e n t i a l s t a r t i n g m a t e r i a l s ( s u b g o a l s ) and t a r g e t compound u s i n g t h e u n i f i c a t i o n r o u t i n e s imbedded i n t h e theorem prover. Some o f t h i s work has been done i n t h e t r e e - b u i l d i n g s t a g e . From t h e t r a c e o f t h e p r o o f we c a n d i s c o v e r where t h e t a r g e t and s t a r t i n g m a t e r i a l (subgoal) s t r u c t u r e s d i f f e r . A u t i l i t y program p a i r e d w i t h a c a l l t o t h e theorem p r o v e r c a n c a l c u l a t e t h e n e c e s s a r y " c h e m i c a l d i s t a n c e s " between t a r g e t and s u b g o a l from t h i s i n f o r m a t i o n . These t e l l us, i n graph form, w h i c h bonds need t o be made and w h i c h need t o be b r o k e n t o produce, i n a r e t r o s y n t h e t i c sense, s u b g o a l s which can l e a d t o t a r g e t s . A t t h i s p o i n t we c a n c o n s t r u c t t h e " i s l a n d " m o l e c u l e s between the t a r g e t and s t a r t i n g m a t e r i a l s t h a t w i l l s a t i s f y t h e PMCD. Our s t r a t e g y i s t o have SYNLMA choose, b a s e d on PMCD i n f o r m a t i o n , t h e bonds t o b r e a k t o g e n e r a t e t h e " c h e m i c a l i s l a n d s . " The bonds t o be b r o k e n i n t h e t a r g e t m o l e c u l e w i l l l e a d t o s u b g o a l s marked w i t h r e a c t i v e c e n t e r s a t the p o s i t i o n s where the bond was p r e v i o u s l y a t t a c h e d . U l t i m a t l y SYNLMA w i l l s e l e c t f u n c t i o n a l groups t o be p l a c e d a t t h e r e a c t i v e c e n t e r s t h a t w o u l d a l l o w s i m p l e f u n c t i o n a l group i n t e r c o n v e r s i o n and/or bond making r e a c t i o n r u l e s to " c h e m i c a l l y " r e c o n s t r u c t t h e bonds s u g g e s t e d f o r b r e a k i n g a t t h e r e t r o s y n t h e t i c planning stage. As c a n be seen t h e n , t h e " c h e m i c a l i s l a n d " formed as a r e s u l t o f the bond b r e a k i n g w i l l c o n t a i n the major s u b s t r u c t u r e s found i n t h e t a r g e t and s t a r t i n g m a t e r i a l s . They w i l l be r e p r e s e n t e d as a b s t r a c t i o n s h a v i n g marked r e a c t i v e s i t e s o f i n t e r m e d i a t e compounds; t h a t i s s u b s t r u c t u r e s common t o the t a r g e t and s t a r t i n g m a t e r i a l s w i l l be p r e s e n t and o t h e r groups such as s i d e c h a i n s w i l l be r e p r e s e n t e d as v a r i a b l e s . The n e x t t a s k i s t o

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

9. JOHNSON ETAL.

Designing an Expert System for Organic Synthesis

c o n s t r u c t b r i d g e s between t h e f l o a t i n g i s l a n d s and t h e t a r g e t . This may r e q u i r e many p a s s e s i f t h e compound t o be s y n t h e s i z e d i s v e r y l a r g e and complex. E s s e n t i a l l y each p a s s w i l l make a g e n e r a l p l a n f o r t h e n e x t , more s p e c i f i c p a s s . D u r i n g each p a s s t h e PMCD w i l l be u s e d as a s c r e e n i n g c r i t e r i a t o " t u r n o f f " random o r i n e f f i c i e n t s y n t h e t i c p a t h s t h u s k e e p i n g t h e t r e e from c o m b i n a t o r i a l e x p l o s i o n . To implement such a scheme we a r e o r g a n i z i n g o u r r e a c t i o n r u l e d a t a base i n t o a taxonomy o f r e a c t i o n r u l e s headed b y (1) bond m a k i n g / b r e a k i n g r e a c t i o n s and (2) f u n c t i o n a l group i n t e r c o n v e r s i o n r e a c t i o n s as shown i n F i g . 10. T h i s o r g a n i z a t i o n i s modeled a f t e r the systems d e v e l o p e d by S a c e r d o t i (23) and S t e f i k ( 2 4 ) . A t t h e t o p l e v e l o f t h e taxonomy r e a c t i o n s a r e e n t e r e d i n as a b s t r a c t a form as possible. These w i l l o f t e n be m u l t i - s t e p r e a c t i o n s . Lower i n t h e taxonomy t h e r e a c t i o n r u l e s a r e s t i l l i n somewhat g e n e r a l form, b u t t h e y have more c h e m i c a l and s t r u c t u r a l d e t a i l . Gross f u n c t i o n a l group i n c o m p a t i b i l i t y and s t r u c t u r a l r e q u i r e m e n t s ( i . e . B r e d t ' s r u l e i n f o r m a t i o n ) , w i l l be i n c l u d e d a t t h i s l e v e l . Finally, specific s i n g l e - s t e p r e a c t i o n s and t h e i r l i m i t a t i o n s a r e e n t e r e d a t t h e l o w e s t l e v e l s . These w o u l d be s i m i l a r t o t h e r e a c t i o n r u l e s i n t h e p r e s e n t v e r s i o n o f SYNLMA w h i c h c o n s i s t o f a g o a l p a t t e r n ( p r o d u c t ) and a subgoal p a t t e r n ( r e a c t a n t s ) . The theorem p r o v e r w i l l choose a p a r t i c u l a r r e a c t i o n r u l e whether m u l t i - s t e p o r s i n g l e - s t e p , when i t f i n d s a match between t h e compound t o be s y n t h e s i z e d and t h e g o a l p o r t i o n o f t h e r e a c t i o n r u l e ( 1 0 ) . The t o p o f F i g . 10 shows a fragment o f a r e a c t i o n type taxonomy. Each r e a c t i o n t y p e c a n be f u r t h e r subdivided according to structure r e a c t i v i t y d i f f e r e n c e s . The bottom o f F i g . 10 shows a fragment o f a s t r u c t u r e t y p e taxonomy for s u b s t i t u t i o n reactions. By use o f t h e taxonomy o f r e a c t i o n r u l e s t h e system o n l y s e a r c h e s a s m a l l r e l e v a n t p o r t i o n o f t h e d a t a base a t each s t a g e . I n the early planning stages only t h e most general o f t h e bond m a k i n g / b r e a k i n g r e a c t i o n s and f u n c t i o n a l group i n t e r c o n v e r s i o n s w i l l be r e q u i r e d ; t h o s e a t t h e head o f t h e r e a c t i o n taxonomy. As t h e s y n t h e s i s p r o g r e s s e s , t h e more s p e c i f i c r u l e s a r e a p p l i e d . For example, t h e system might choose t h e g e n e r a l c a t e g o r y o f s u b s t i t u t i o n r e a c t i o n s f o r f u n c t i o n a l group i n t e r c o n v e r s i o n a t t h e b e g i n n i n g o f a search. Further passes w i l l i n v o l v e t h e choice o f t h e type o f s u b s t i t u t i o n r e a c t i o n , f o r example SN1 o r SN2. F i n a l l y , a s p e c i f i c r e a c t i o n r u l e w i l l be needed f o r t h e type chosen. Presently, the r e a c t i o n r u l e s i n SYNIMA a r e s t o r e d i n f i l e s i n d e x e d b y t h e f u n c t i o n a l groups i n v o l v e d i n t h e r e a c t i o n . I n t h e new system, t h e r e a c t i o n r u l e d a t a base w i l l be l a r g e r and more complex. The p a r t i t i o n i n g o f t h i s d a t a base i s c r u c i a l t o o u r a b i l i t y t o s o l v e complex problems. Our p l a n s a r e t o o r g a n i z e t h e p a r t i t i o n i n g i n terms o f s i g n i f i c a n t s u b s t r u c t u r e s . The d e s i g n must a l s o a l l o w easy p r o g r e s s from t h e more g e n e r a l r e a c t i o n t y p e s t o t h e more s p e c i f i c . We b e l i e v e t h e b e s t approach w o u l d be t o s t o r e t h e taxonomy i n t r e e form w i t h a s e c o n d a r y index o f substructure p o i n t e r s . The T r e e - V e r i f i c a t i o n Stage. I n t h i s f i n a l s t a g e t h e system a t t e m p t s t o r e f i n e t h e t r e e produced b y t h e p r e v i o u s s t a g e u s i n g t h e d e t a i l e d a n a l y s i s approach t h a t SYNLMA uses i n i t s u n s o p h i s t i c a t e d v e r s i o n . I n f a c t t h e l a t t e r form o f SYNLMA c a n be u s e d by making o n l y s l i g h t changes t o t h e l o g i c . I n t h e t r e e v e r i f i c a t i o n s t a g e we examine each

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

119

120

EXPERT SYSTEM APPLICATIONS IN CHEMISTRY

Taxonomy of Reaction Rules 2. Functional group i n t e r c o n v e r s i o n Reactions (see below 2.)

1. Bond making/breaking Reactions (see below 1.)

1. Bond making/breaking reactions

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

A.

Carbon-carbon making/breaking reactions (see A. below)

B.

Carbon-heteroatom or heteroatom-heteroatom bond making/breaking reactions

Substitution, etc.

I Hofmann rearrangement

C. Substructure making/breaking reactions (usually multibond forming)

F i s c h e r indole synthesis, e t c .

G a b r i e l amine synthesis (multistep)

I i n d i v i d u a l steps

A. (from above) Carbon-carbon making/breaking reactions ( s e l e c t e d name reactions, some generic reactions)

aldo

malonic ester (multistep)

organometallic

/ \

i i n d i v i d u a l steps

1,2-addition

addition

Grignard

organocuprate

1,4-addition

substitution

elimination

carbene

etc.

Michael

etc.

etc.

2. F u n c t i o n a l group interconversion reactions (many name r e a c t i o n s , a l l generic reactions)

Fischer

Wolff-Kishner

oxidation

T y p i c a l taxonomy shown f o r :

substitution

reduction

etc.

Substitution

s u b s t i t u t i o n bimolecular

at a l l y l i c , benzylic carbon

at primary carbon / \

at secondary carbon

at t e r t i a r y carbon

/ \

at normal primary

at phenyl, vinyl carbon

at neopentyl carbon

Figure 10:

Reaction Taxonomy

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

etc.

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

9. JOHNSON ETAL.

Designing an Expert System for Organic Synthesis

p a i r o f a d j a c e n t nodes i n t h e t r e e , t r e a t i n g t h e upper node as a g o a l and t h e lower node as s t a r t i n g m a t e r i a l . Because t h e d i s t a n c e between the two nodes i s v e r y s m a l l a t t h i s p o i n t , i t i s f e a s i b l e t o make a thorough examination o f t h e i r best connection paths. The f i n a l l i s t i n g o f a l l node c o n n e c t i n g p a t h s r e s u l t s i n a complete s y n t h e s i s . D u r i n g t h i s s t a g e s t a r t i n g m a t e r i a l s need t o be i d e n t i f i e d as s p e c i f i c compounds r a t h e r t h a n g e n e r a l c l a s s e s o f compounds. The s t a r t i n g m a t e r i a l d a t a base w i l l have t o be o r g a n i z e d a t t h i s s t a g e , f o r easy access t o t h i s i n f o r m a t i o n . The o r g a n i z a t i o n w i l l be b a s e d on f u n c t i o n a l group and s u b s t r u c t u r e i n f o r m a t i o n . The system c a n e v a l u a t e t h e p r o p o s e d r e a c t i o n pathways b y u s e o f the PMCD i n t h i s s t a g e . A rough e v a l u a t i o n o f t h e c o s t o f a p a r t i c u l a r p a t h c a n a l s o be made i n terms o f t h e r a n k i n g o f t h e g e n e r a l a p p l i c a b i l i t y o f t h e r e a c t i o n and t h e number o f s t e p s r e q u i r e d . An i n d u s t r i a l c h e m i s t who needed t o make a more a c c u r a t e d e t e r m i n a t i o n o f c o s t and e f f i c a c y would have t o p r o c e e d w i t h a l i t e r a t u r e s e a r c h a t t h i s p o i n t . I t i s i n t e r e s t i n g t o n o t e t h a t o u r approach w o u l d a l l o w a c h e m i s t , i n SYNLMA's i n t e r a c t i v e mode, t o i n s e r t a " c h e m i c a l i s l a n d " or s t r u c t u r e i n order t o f o r c e o r guide i t s use i n a s y n t h e s i s pathway. I n t e r f a c i n g w i t h Commercial Data Bases For SYNLMA t o be o f p r a c t i c a l use t o an o r g a n i c c h e m i s t , i t must be a b l e t o i n t e r f a c e w i t h l a r g e c o m m e r c i a l d a t a b a s e s . Our p l a n s a r e t o c o n t i n u e work on b u i l d i n g . i n t e r f a c e s t o t h e I S I and CAS ( C h e m i c a l A b s t r a c t ) d a t a b a s e s . We a r e a l s o i n t e r e s t e d i n t h e machine r e a d a b l e form o f t h e A l d r i c h C h e m i c a l c a t a l o g , t h e Merck Index, and B e i l s t e i n C o l l e c t i o n among o t h e r s . I S I h a s added s e v e r a l u s e f u l f e a t u r e s w h i c h make i t e s p e c i a l l y a t t r a c t i v e f o r us t o u s e . F o r example, one c a n s e a r c h t h e d a t a base u s i n g what i s c a l l e d G e n e r i c s u b s t r u c t u r e s w h i c h may r e p r e s e n t many a c t u a l compounds. A u s e r c a n r e t r i e v e a s p e c i f i c compound b y i d e n t i f y i n g t h e groups d e s i r e d as s i d e chains. Darc-Chemlink a l l o w s o f f - l i n e p h r a s i n g o f s u b s t r u c t u r e q u e r i e s on a PC making c h e m i c a l s t r u c t u r e s e a r c h i n g e a s i e r f o r b o t h the e x p e r i e n c e d and i n f r e q u e n t o n l i n e u s e r . B o t h systems s h o u l d be a c c e s s i b l e t h r o u g h the C h e m i c a l A b s t r a c t s C o n n e c t i o n T a b l e Format. We a r e i n the p r o c e s s o f moving t o t h i s format o u r s e l v e s f o r i n t e r n a l u s e . T h i s change w i l l r e q u i r e a change i n o n l y one program, t h e program t h a t c o n v e r t s c o n n e c t i o n t a b l e i n f o r m a t i o n t o c l a u s e form. Summary Our i n i t i a l r e s e a r c h e f f o r t s were d i r e c t e d toward t h e development o f an e x p e r t system t h a t c o u l d s o l v e c h e m i c a l s y n t h e s i s problems u s i n g a theorem p r o v e r as i t s i n f e r e n c e engine. We have been s u c c e s s f u l i n i m p l e m e n t i n g such a system t h a t c a n c a r r y o u t s y n t h e s e s o f s i m p l e m o l e c u l e s such as Darvon, I b u p r o f e n , and t h e b i c y c l i c c o c a i n e . To d e v e l o p a system c a p a b l e o f h a n d l i n g more complex m o l e c u l e s w i t h a c c e s s t o l a r g e commercial d a t a bases we a r e augmenting o u r i n i t i a l d e s i g n t o i n c l u d e t h e p l a n n i n g s t r a t e g i e s u s e d b y human e x p e r t s . The new p l a n i n v o l v e s a t h r e e s t a g e approach; t h e t h r e e s t a g e s d e f i n e d as tree-definition, t r e e - b u i l d i n g and t r e e - v e r i f i c a t i o n . I n the t r e e - d e f i n i t i o n s t a g e graph o v e r l a y t e c h n i q u e s w i l l be u s e d t o do

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

121

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

122

EXPERT SYSTEM APPLICATIONS IN CHEMISTRY

s u b s t r u c t u r e searches which w i l l a l l o w f o r w i s e r choices o f s t a r t i n g m a t e r i a l s . The t r e e - b u i l d i n g stage i n v o l v e s u s e o f a taxonomy o f r e a c t i o n r u l e s w h i c h w i l l h e l p guide t h e system t h r o u g h s u c c e s s i v e l a y e r s o f d e t a i l a p p r o p r i a t e t o each p a s s . G a s t e i g e r ' s PMCD w i l l be u s e d t o e v a l u a t e t h e f e a s i b i l i t y o f p r o p o s e d i n t e r m e d i a t e compounds. A t t h e t r e e - v e r i f i c a t i o n phase a c t u a l d e t a i l s w i l l be f i l l e d i n . I n each s t a g e t h e theorem p r o v e r p l a y s a s i g n i f i c a n t r o l e i n t h e d e r i v a t i o n o f new c h e m i c a l i n f o r m a t i o n f o r t h e system. I n c o r p o r a t i o n o f o u r new s t r a t e g i e s , i n d u c t i o n t o g u i d e d e d u c t i o n t o d e t e r m i n e how to a t t a c k t h e problem, and p l a n n i n g a t s u c c e s s i v e l e v e l s o f g e n e r a l i z a t i o n t o manage c o m p l e x i t y , w i l l add s o p h i s t i c a t i o n t o o u r system. The new system w i l l be much smarter t h a n SYNLMA. I n s t e a d o f t r y i n g t o b u i l d b i g g e r and b i g g e r t r e e s , i t w i l l b u i l d b e t t e r t r e e s .

Literature Cited 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

14.

15. 16. 17.

Bersohn, M. ACS Symposium. 1977, 61, 128. Gelernter, H . ; Sanders, A . ; Larsen, D.; Agarwal, K . ; Boivie, R.;Spritzer, G.; Searleman, J . Science. 1977, 197, 1041. Corey, E . ; Pensak, P.A. J. Am. Chem. Soc., 1974, 96, 7724-37. Wipke, W.; Grund, P.; Grabowski, Z . ; Huff, P.; Smith, G.; Andose, J.;Rhodes, J . J . Chem. Inf. Comp. S c i . 1980, 20, 88. Salatin, T.; Jorgenson, W.; J. Org. Chem. 1980, 45, 2043. Vernin, G.; Chanon, M. Computer Aids to Chemistry. E l l i s Horwood Limited, West Sussex, England, 1986. Funatsu, K . ; Sasaki, S.; Tetrahedron Comput. Method. 1988, 1, (1), 39-51. Wang, T.; Ehrlich, S.; Evens, M.; Gough, A . ; Johnson, P. Proc. Conference on Intelligent Systems and Machines, 1984, 176-181. Wang, T.; Burnstein, I . ; Ehrlich, S.; Evens, M.; Gough, A . ; Johnson, P. Proc. 1985 Conference on Intelligent Systems and Machines, 1985. Wang, T., Burnstein, I. Corbett, M . , Evens, M . , Gough, A . , Johnson, P. ACS Symposium, Artificial Intelligence Applications in Chemistry. T. Pierce and B. Hohne, Eds.; 1986, 244-257. Crary, J. M.S. Thesis, I l l i n o i s Institute of Technology, 1988. Zehnacker, M.; Brennan, R.; Milne, G. W.; M i l l e r , J.; Hammell M. J . Chem. Inf. and Comput. Sci., 1986, 26, 193-197, and refs. cited therein. Lusk, E . ; McCune, W.; Overbeek, R. Proc. Sixth International Conference on Automated Reasoning. D. Loveland, Ed.; Computer Science Lecture Notes, #138, Springer-Verlag: New York, 1982, 85-108. Lusk, E . ; McCune, W.; Overbeek, R. Proc. Sixth International Conference on Automated Reasoning. D. Loveland, Ed., Computer Science Lecture Notes, #138, Springer-Verlag: New York, 1982, 70-84. For examples of Ibuprofen syntheses see Pinhey, J . and Rowe, B., Tet. Let., 1980, 21, 965, and refs. cited therein. Tufariello, J.; Mullen, G. J. Amer. Chem. Soc., 1978, 100, 3638. Bindra, J.; Bindra, R. Creativity in Organic Synthesis, v o l . 1, Academic Press, New York, 1975.

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009

9. JOHNSON ETAL.

Designing an Expert System for Organic Synthesis

18. Warren, S. Organic Synthesis: The Disconnection Approach. John Wiley and Sons, New York, 1982. 19. Sallay, S. J. Amer. Chem. Soc., 1967, 89, 6762. 20. Nagata, W.; Hirai, S.; Kawata, K . ; Okumura, T. J. Amer. Chem. Soc., 1967, 89, 5046. 21. Wipke, W.; Rogers, D. J. Chem. Inf. Comput. Sci., 1984, 24, 71-81. 22. Jochum, C.; Gasteiger, J.; Ugi, I. Angew. Chem. Int. Ed. Engl., 1980, 19, 495-505. 23. Sacerdoti, E. A Structure of Plans and Behavior, Elsevier North Holland, New York, 1977. 24. Stefik, M. Artificial Intelligence. 1981, 16, 111-140. RECEIVED June 26, 1989

In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.

123