Chapter 9
Designing an Expert System for Organic Synthesis The Need for Strategic Planning Peter Y. Johnson , Dene Burnstein , John Crary , Martha Evans , and Tunghwa Wang
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
1
2
3
2
2
1Department of Chemistry, Illinois Institute of Technology, Chicago, IL 60077 2Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60077 G . D. Searle & Company, Skokie, IL 60077 3
SYNLMA, an expert system for organic synthesis, with a theorem prover as i t s inference engine and NCI's XTCHEM as i t s user interface, uses a retrosynthetic approach to find reaction pathways and generate a problem-solving tree representing the alternative designs i t has explored. Presently, the system is capable of handling compounds of the order of complexity of Darvon, Ibuprofen, and the bicyclic system, cocaine. The combinatorial explosion that results from the input of larger target molecules has convinced us of the need for strategic planning during the synthesis process. We have developed a three-stage approach to aid SYNLMA i n the planning process. The f i r s t stage identifies abstracted potential starting materials or name reaction derived synthons using graph overlay techniques to compare them with complex substructures i n the target molecule. The second stage involves using "PMCD" strategies to define graphical paths between the target and abstracted synthons or starting materials. The leaf nodes on this path represent "chemical islands" which are then connected by general reaction rules. The third stage defines the tree by supplying specific reaction rules. SYNLMA is an expert system designed to produce reaction pathways for organic synthesis problems. Many groups have worked on the organic synthesis problem, i n the main using conventional programming techniques (1-7). What makes the SYNLMA system unique is the partitioning of the system into independent units consisting of a 0097-6156/89/0408-0102$06.75/0 c 1989 American Chemical Society
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
9. JOHNSON ETAL.
Designing an Expert System for Organic Synthesis
c h e m i c a l knowledge base, a u s e r i n t e r f a c e and a r e a s o n i n g component. Our c h o i c e o f a theorem p r o v e r as t h e i n f e r e n c i n g e n g i n e adds s t r e n g t h and f l e x i b i l i t y t o t h e system d e s i g n . S e p a r a t i o n o f t h e r e a s o n i n g component from o t h e r components o f the system has p r o v e d t o have many advantages ( 8 - 1 0 ) . I t has a l l o w e d us t o experiment w i t h d i f f e r e n t r e p r e s e n t a t i o n s o f c h e m i c a l knowledge w i t h o u t major changes i n t h e o v e r a l l system. The d e s i g n has a l s o a l l o w e d us t o e a s i l y add o r d e l e t e knowledge from o u r d a t a b a s e , and has g i v e n us t h e p o t e n t i a l t o i n t e r f a c e w i t h a wide variety of commercial d a t a bases such as those p r o v i d e d by the I n s t i t u t e o f S c i e n t i f i c I n f o r m a t i o n and Chemical A b s t r a c t s . P r e s e n t l y SYNLMA has been a p p l i e d t o t h e s y n t h e s e s o f s m a l l compounds o f t h e o r d e r o f c o m p l e x i t y o f Darvon ( 1 , F i g . 4) ( 1 0 ) , I b u p r o f e n ( 2 , F i g . 3) ( 1 1 ) , and the b i c y c l i c compound c o c a i n e ( 3 , F i g . 6) u s i n g a d a t a base o f s e v e r a l hundred s e l e c t e d r e a c t i o n r u l e s and n e a r l y f i f t y s e l e c t e d s t a r t i n g m a t e r i a l s . Those r u l e s and s t a r t i n g m a t e r i a l s needed t o d u p l i c a t e t h e p u b l i s h e d s y n t h e s e s o f 1, 2, and 3 were i n c l u d e d as a s u b s e t o f o u r t o t a l r e a c t i o n r u l e d a t a base. A t t e m p t s t o s y n t h e s i z e l a r g e r m o l e c u l e s , o r i n t e r f a c e w i t h commercial d a t a b a s e s have r e s u l t e d , f o r a number o f reasons i n a c o m b i n a t o r i a l e x p l o s i o n , c l e a r l y i l l u s t r a t i n g t h e need f o r s t r a t e g i c p l a n n i n g . We a r e now i n t h e p r o c e s s o f a d d i n g p l a n n i n g i n t e l l i g e n c e t o t h e system so t h a t i t w i l l be more e f f i c i e n t i n d e v e l o p i n g r e a c t i o n pathways. The g o a l i s t o have SYNLMA more c l o s e l y model t h e thought p r o c e s s e s o f t h e s y n t h e t i c c h e m i s t . Our f l e x i b l e d e s i g n i s e n a b l i n g us t o experiment w i t h a new t h r e e - s t a g e p l a n n i n g s t r a t e g y u s i n g t h e i n f e r e n c i n g c a p a b i l i t i e s o f t h e theorem p r o v e r , w h i l e k e e p i n g t h e c h e m i c a l r e p r e s e n t a t i o n scheme and u s e r i n t e r f a c e i n t a c t . D e s c r i p t i o n o f t h e P r e s e n t SYNLMA System To i n i t i a t e a c h e m i c a l s y n t h e s i s , a u s e r o f SYNLMA f i r s t i n t e r a c t s w i t h t h e f r o n t end o f t h e system, a s e r i e s o f P a s c a l programs c a l l e d XTSYN, b a s e d on t h e N a t i o n a l Cancer I n s t i t u t e XTCHEM s t r u c t u r e i n p u t package ( 1 2 ) . XTSYN was d e v e l o p e d b y J o h n C r a r y (11) on a n IBM/AT w i t h a 80287 math c o p r o c e s s o r and a H e r c u l e s g r a p h i c s b o a r d . The u s e r enters a g r a p h i c a l r e p r e s e n t a t i o n o f the t a r g e t molecule a t the k e y b o a r d . The system c o n v e r t s t h i s g r a p h i c a l r e p r e s e n t a t i o n i n t o c o n n e c t t a b l e format and s t o r e s i t i n a f i l e . A t t h e u s e r ' s r e q u e s t XTSYN w i l l c o n v e r t t h e connect t a b l e r e p r e s e n t a t i o n i n t o c l a u s e format w h i c h c a n a l s o be s t o r e d i n a f i l e , w h i c h s e r v e s as i n p u t t o SYNLMA. XTSYN a l s o has t h e c a p a b i l i t y t o p e r f o r m t h e r e v e r s e p r o c e s s o f c o n v e r t i n g c l a u s e r e p r e s e n t a t i o n s i n t o connect t a b l e form, w h i c h c a n t h e n be used t o generate a g r a p h i c a l d i s p l a y o f a g i v e n m o l e c u l e . T h i s c a p a b i l i t y i s p a r t i c u l a r l y u s e f u l d u r i n g a r u n . Subgoal compounds g e n e r a t e d as c l a u s e s and i n c o r p o r a t e d i n the problem s o l v i n g t r e e c a n be d i s p l a y e d on t h e s c r e e n i n g r a p h i c a l format f o r u s e r i n s p e c t i o n . A complex d a t a s t r u c t u r e i n t h e form o f a d o u b l y l i n k e d l i s t i s used by XTSYN t o c o n v e r t the m o l e c u l e r e p r e s e n t a t i o n s from one form t o a n o t h e r ( 1 1 ) . I n o r d e r t o produce s o l u t i o n s t o a g i v e n problem, t h e theorem p r o v e r must be p r o v i d e d w i t h a theorem t o be p r o v e d and a s e t o f axioms a l l i n c l a u s e form. I n t h e case o f SYNLMA, t h e t a r g e t compound becomes t h e theorem t o be proved. I t i s c o n v e r t e d i n t o a c l a u s e l i s t which c o n s i s t s o f i n d i v i d u a l c l a u s e s r e p r e s e n t i n g the chemical environment o f each atom i n t h e compound bonded t o a t l e a s t two o t h e r
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
103
104
EXPERT SYSTEM APPLICATIONS IN CHEMISTRY
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
atoms. Axioms c o n t a i n i n g c h e m i c a l knowledge i n t h e form o f f u n c t i o n a l groups, r e a c t i o n r u l e s , and s t a r t i n g m a t e r i a l s a r e a l s o r e p r e s e n t e d i n c l a u s e format ( 1 0 ) . The t a r g e t i s decomposed i n t o s i m p l e r p r e c u r s o r compounds b y p a t t e r n matching w i t h t h e a p p r o p r i a t e r e a c t i o n r u l e s chosen on t h e b a s i s o f f u n c t i o n a l group i d e n t i f i c a t i o n and p r i o r i t y i n f o r m a t i o n . These l e s s complex compounds a r e t h e n c o n s i d e r e d as s u b g o a l s b y t h e system. The p r o c e s s o f d e c o m p o s i t i o n c o n t i n u e s u n t i l the bottom l e v e l p r e c u r s o r s a r e a v a i l a b l e compounds o r o t h e r u s e r d e f i n e d c o n s t r a i n t s a r e s a t i s f i e d . T h i s backward r e a s o n i n g p r o c e s s , c a l l e d r e t r o - s y n t h e t i c a n a l y s i s b y c h e m i s t s , i s o f t e n c a l l e d backward c h a i n i n g b y computer s c i e n t i s t s . The M u l t i - L a y e r e d D e s i g n Approach. The p r e s e n t system i s b u i l t o f s e v e r a l l a y e r s as shown i n F i g . 1. The t o p l a y e r o f SYNLMA d i r e c t s the s y n t h e t i c p r o c e s s , c a l l i n g t h e m i d d l e l a y e r t o p e r f o r m one-step r e a c t i o n s . The bottom l a y e r i s a custom-made theorem p r o v e r modeled a f t e r "ITP," an i n t e r a c t i v e theorem p r o v e r . "iTP" i t s e l f i s b u i l t upon a package o f P a s c a l r o u t i n e s c a l l e d L o g i c Machine A r c h i t e c t u r e (LMA), which implements a resolution based theorem prover. Inferencing rules, b a s e d on c l a s s i c a l logic techniques, are c o n t i n u o u s l y a p p l i e d t o e x i s t i n g c l a u s e s t o g e n e r a t e new i n f o r m a t i o n . Both ITP and LMA were d e s i g n e d and implemented b y t h e theorem p r o v i n g group a t Argonne N a t i o n a l L a b o r a t o r y (13-14). The t o p l a y e r o f SYNLMA m a i n t a i n s a s e t o f complex d a t a s t r u c t u r e s w h i c h r e p r e s e n t s y n t h e s i s i n f o r m a t i o n found b y t h e l o w e r l a y e r s . One o f t h e s e s t r u c t u r e s , t h e problem s o l v i n g t r e e , i s a r e p r e s e n t a t i o n o f t h e pathways SYNLMA has g e n e r a t e d f o r t h e s y n t h e s i s of t h e t a r g e t compound. The nodes o f t h e t r e e r e p r e s e n t m o l e c u l e s , the a r c s r e p r e s e n t r e a c t i o n r u l e i n f o r m a t i o n . The r o o t node i s t h e t a r g e t compound. The l e a f nodes a r e t h e s t a r t i n g m a t e r i a l s . Other d a t a s t r u c t u r e s managed by t h e t o p l a y e r a r e t h e m o l e c u l a r h a s h t a b l e w h i c h c o n t a i n s i n f o r m a t i o n f o r a l l o f t h e m o l e c u l e s f o u n d so f a r i n the development o f t h e problem s o l v i n g t r e e . Only one e n t r y i s made f o r a m o l e c u l e i n t h e m o l e c u l a r h a s h t a b l e . T h i s e n t r y c o n t a i n s two v e c t o r s t h a t u n i q u e l y i d e n t i f y t h a t m o l e c u l e so t h a t i t i s n o t p r o c e s s e d more t h a n once. A r e a c t i o n r u l e h a s h t a b l e i s k e p t f o r a l l the r e a c t i o n s r e f e r r e d t o i n t h e problem s o l v i n g t r e e . Associated w i t h each r e a c t i o n a r e parameters i n d i c a t i n g y i e l d , e x p e r i m e n t a l d i f f i c u l t y , c o s t , and s a f e t y . I n a d d i t i o n t o t h e s e d a t a s t r u c t u r e s , a work l i s t o f a l l t h e m o l e c u l e s w a i t i n g t o be p r o c e s s e d i s k e p t by the t o p l a y e r i n l i n k e d l i s t form s o r t e d i n o r d e r o f t h e v a l u e s c a l c u l a t e d f o r them by t h e e v a l u a t i o n f u n c t i o n . The m o l e c u l e w i t h t h e l o w e s t v a l u e i s chosen t o be p r o c e s s e d f i r s t . These v a l u e s a r e c a l c u l a t e d by an e v a l u a t i o n f u n c t i o n c o n t a i n i n g h e u r i s t i c i n f o r m a t i o n w h i c h t a k e s i n t o account t h e c o m p l e x i t y o f t h e m o l e c u l e , t h e f u n c t i o n a l groups i t c o n t a i n s , and i t s p o s i t i o n i n the problem s o l v i n g tree. I n o r d e r t o d e r i v e p r e c u r s o r compounds, SYNLMA must s e a r c h t h e r e a c t i o n r u l e d a t a base and match t h e g o a l compound w i t h t h e p r o d u c t s i d e o f a r e a c t i o n r u l e . This process begins a t the top l a y e r , which b u i l d s t h e problem s o l v i n g t r e e . I t c a l l s t h e m i d d l e l a y e r t o add a new b r a n c h t o t h e t r e e . The m i d d l e l a y e r i s d e s i g n e d as a network o f what we c a l l environment p a i r s and s e r v e s as t h e i n t e r f a c e between t h e c h e m i c a l knowledge d a t a base and t h e i n f e r e n c e e n g i n e . Each p a i r c o n s i s t s o f a) a c a l l t o t h e theorem p r o v e r w i t h a l l t h e n e c e s s a r y i n f o r m a t i o n
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
Machine
Logic
(LMA)
Layer
Bottom
Figure
1: Layer
Prover
Architecture
- Theorem
Structure
Environments
Solving
- Network o f
- Problem
Layer
Layer
Middle
Top
of
SYNLMA
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
106
EXPERT SYSTEM APPLICATIONS IN CHEMISTRY
i t needs such as c l a u s e l i s t s , t e r m i n a t i n g c o n d i t i o n s and s t r a t e g i e s , and b) a u t i l i t y program t h a t e x t r a c t s and s t o r e s i n f o r m a t i o n from the c a l l t o the theorem p r o v e r and a l s o g a t h e r s i n f o r m a t i o n from the d a t a base. I n f o r m a t i o n f o u n d i n one c a l l t o the theorem p r o v e r i s made a v a i l a b l e f o r f u r t h e r c a l l s . F i g . 2 shows the environment network o f the m i d d l e l a y e r , and i t s component environment p a i r s . The f i r s t environment p a i r , 1A and I B , i s r e s p o n s i b l e f o r f i n d i n g and s t o r i n g f u n c t i o n a l group i n f o r m a t i o n . When the m i d d l e l a y e r i s c a l l e d t o add a new b r a n c h t o the t r e e , the f i r s t s t e p i s a c a l l from 1A t o the bottom l a y e r , the theorem p r o v e r , w h i c h generates f u n c t i o n a l group i n f o r m a t i o n f o r each g o a l compound. The second h a l f o f the p a i r , I B , c o l l e c t s and s t o r e s t h i s i n f o r m a t i o n . Once the f u n c t i o n a l group i n f o r m a t i o n i s a v a i l a b l e , the n e x t environment i s c a l l e d and 2A s t a r t s t o s e a r c h , by p r i o r i t y , a s u b s e t o f the r e a c t i o n r u l e f i l e s c o n t a i n i n g o n l y those r e a c t i o n s t h a t p e r t a i n t o t h e s e f u n c t i o n a l groups. To make t h i s s e a r c h e f f i c i e n t , our i n i t i a l approach has been t o p a r t i t i o n the r e a c t i o n r u l e d a t a base i n t o s u b s e t s ( c h a p t e r s ) o r d e r e d by unique f u n c t i o n a l group numbers. P r i o r i t y f o r c a l l i n g the f u n c t i o n a l group c h a p t e r s has been s e t by the f i r s t author. As soon as a match i s found between the p r o d u c t s i d e o f a r e a c t i o n r u l e and the g o a l compound, the u t i l i t y program i n the p a i r , environment 2B, s t o r e s t h i s i n f o r m a t i o n . There may be s e v e r a l r u l e s whose p r o d u c t p o r t i o n matches w i t h the g o a l compound. A l l t h e s e p o t e n t i a l r e a c t i o n s are s t o r e d f o r c o n s i d e r a t i o n by the third environment p a i r . The t h i r d p a i r c o n s t r u c t s new sub g o a l compounds from the r e a c t a n t h a l f o f a r e a c t i o n r u l e . I t a c c o m p l i s h e s t h i s by s u b s t i t u t i n g known atoms from the s u b g o a l compound m o l e c u l e f o r the v a r i a b l e s i n the reaction rule. The f o u r t h e n v i r o n m e n t a l p a i r p r o c e s s e s the newly g e n e r a t e d subgoals. The theorem p r o v e r i s c a l l e d t o check each s u b g o a l f o r c h e m i c a l f e a s i b i l i t y , p r e s e n c e o r absence i n the l i s t o f a v a i l a b l e s t a r t i n g compounds and t o d i s c o v e r whether i t has been p r e v i o u s l y g e n e r a t e d by the system. T h i s i n f o r m a t i o n g u i d e s the u t i l i t y program so t h a t i t can i n s e r t each s u b g o a l i n t o the a p p r o p r i a t e data s t r u c t u r e s m a i n t a i n e d by the top l a y e r . D u r i n g the c o u r s e o f the b u i l d i n g o f the p r o b l e m s o l v i n g t r e e , the c h e m i s t can e x t r a c t s u b g o a l m o l e c u l e s from the nodes o f the t r e e . The s u b g o a l s , w h i c h are g e n e r a t e d as c l a u s e l i s t s can be p a s s e d t o XTSYN w h i c h w i l l d i s p l a y them on the s c r e e n i n a g r a p h i c a l f o r m a t and s t o r e them i n XTCHEM connect t a b l e form f o r f u t u r e d i s p l a y . C u r r e n t l y the s e a r c h p r o c e s s t e r m i n a t e s when any o f the f o l l o w i n g c o n d i t i o n s occur: 1) t h e r e i s no more memory l e f t i n the computer, 2) the work l i s t i s empty; no more m o l e c u l e s t o p r o c e s s , 3) the u s e r - s p e c i f i e d upper l i m i t s on the h e i g h t and/or d e p t h o f the p r o b l e m s o l v i n g t r e e a r e exceeded, 4) the u s e r - d e f i n e d maximum number o f s o l u t i o n s i s reached. Scope o f the P r e s e n t
System
W i t h the p r e s e n t d e s i g n , a d a t a base o f f i f t y selected starting m a t e r i a l s , and two hundred s e l e c t e d r e a c t i o n r u l e s SYNLMA i s c u r r e n t l y able to generate s y n t h e t i c t r e e s , o f t e n i n a very naive or i n e f f i c i e n t manner, f o r m o l e c u l e s o f the s i z e and c o m p l e x i t y o f Darvon, I b u p r o f e n ,
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
9. JOHNSON ETAL.
Designing an Expert System for Organic Synthesis
F i n d and S t o r e Functional
Group
Information Insert
Subgoals i n
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
Problem S o l v i n g T r e e
Environment
1A
Theorem P r o v e r Environment
4B
Theorem P r o v e r
1 Environment
Environment
IB
Utility 4A
Utility
Environment Environment
3B
2A
Theorem P r o v e r
Utility *
Not
More
Matched r
1
Environment
2B
Matched Environment 3A
Utility
Theorem P r o v e r F i n d and S t o r e R e a c t i o n R u l e s G e n e r a t e New
Subgoals
F i g u r e 2:
A Network o f Environment
Pairs
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
108
EXPERT SYSTEM APPLICATIONS IN CHEMISTRY
and c o c a i n e . A p r u n e d sample problem s o l v i n g t r e e showing p l a u s i b l e r o u t e s t o I b u p r o f e n i s d i s p l a y e d i n F i g . 3. The most s e r i o u s p o t e n t i a l p r o b l e m i s c o m b i n a t o r i a l e x p l o s i o n . These s y n t h e t i c t r e e s r e p r e s e n t l i m i t i n g cases f o r system r e s o u r c e s on a VAX-11/750 r u n n i n g the UNIX o p e r a t i n g system. I t i s c l e a r t h a t a t t e m p t s t o s y n t h e s i z e more complex m o l e c u l e s , o r i n t e r f a c e w i t h data bases c o n t a i n i n g thousands o f s t a r t i n g m a t e r i a l s and r e a c t i o n r u l e s w i l l r e s u l t i n a combinatorial explosion. E r r o r s i n p r u n i n g a l s o cause s i g n i f i c a n t p r o b l e m s . Omitted p r u n e d p a t h s g e n e r a l l y r e s u l t e d from o u r n o t u s i n g r e a c t i o n r u l e c o n s t r a i n t s o r n o n s e l e c t i v e and/or n o n - i n t e l l i g e n t u s e o f t h e r u l e s . T h i s i s one r e a s o n why none o f SYNLMA's p a t h s r e p r e s e n t p u b l i s h e d s y n t h e s e s o f I b u p r o f e n (15) i n s p i t e o f t h e f a c t t h a t t h e r e q u i s i t e r u l e s were i n t h e d a t a base. On t h e p o s i t i v e s i d e , t h e s y n t h e t i c p a t h s t o I b u p r o f e n d i s c o v e r e d b y SYNLMA a r e s t r a i g h t f o r w a r d and w o u l d p r o b a b l y work as shown. SYNLMA, i n i t s p r e s e n t form i s c h e m i c a l l y u n s o p h i s t i c a t e d . I t does n o t have t h e r e a c t i o n i n s i g h t s , i n f o r m a t i o n on s t r u c t u r a l l i m i t a t i o n s , and p l a n n i n g s t r a t e g i e s t h a t t h e e x p e r t c a n c a l l i n t o p l a y d u r i n g t h e c o u r s e o f s o l v i n g a s y n t h e s i s problem. F o r example, when p l a n n i n g t h e s y n t h e s i s o f a C25 n - a l k a n e w h i c h c o n t a i n s a l o n g l i n e a r c h a i n o f r e p e a t i n g ( C H 2 ) n " groups, a c h e m i s t , h o p i n g t o minimize the s y n t h e t i c steps i n h i s s y n t h e s i s , would t y p i c a l l y s t a r t the s e a r c h f o r p r e c u r s o r synthons h a v i n g a p p r o x i m a t e l y h a l f t h e c h a i n l e n g t h c o n t a i n i n g a p p r o p r i a t e bond making f u n c t i o n a l groups. One o f the SYNLMA s o l u t i o n s t o t h i s problem was a s t e p - w i s e s y n t h e s i s o f t h e e n t i r e c h a i n , one methylene u n i t a t a t i m e , u s i n g a n o n s e l e c t i v e bond-making r e a c t i o n such as a carbene i n s e r t i o n r e a c t i o n . C l e a r l y no knowledgeable c h e m i s t w o u l d take t h i s approach! T h i s same n o n s e l e c t i v e carbene r e a c t i o n was used as p a r t o f t h e SYNLMA s o l u t i o n t o s u g g e s t e d s y n t h e s i s o f Darvon as shown i n F i g . 4. T h i s r e a c t i o n and s e v e r a l o t h e r s were removed from o u r r e a c t i o n r u l e d a t a base i n order t o prevent t h e i r nonselective use. As one c a n see, t h e n a t u r e and s e l e c t i o n o f r e a c t i o n r u l e s has p l a c e d l i m i t a t i o n s on SYNLMA. The r e a c t i o n r u l e d a t a base n o t o n l y c o n t a i n s the,* r u l e i t s e l f , b u t a l s o "must have-must n o t have" i n f o r m a t i o n / c o n s t r a i n t s c o n c e r n i n g f u n c t i o n a l group i n c o m p a t i b i l i t y (10). These mandated c o n s t r a i n t s , o f t e n i n v o k e d i n lieu of s e l e c t i v i t y knowledge, p r o t e c t e d us from i n c o r r e c t u s e o f some r e a c t i o n s , b u t , i n numerous c a s e s , a l s o caused SYNLMA t o e l i m i n a t e p o t e n t i a l l y u s e f u l r e a c t i o n r u l e s - r u l e s t h a t a c h e m i s t m i g h t have c o n s i d e r e d i n s p i t e o f t h e c o n s t r a i n t s . F o r example a c h e m i s t might be happy t o s a c r i f i c e one e q u i v a l e n t o f a cheap G r i g n a r d r e a g e n t t o a compound c o n t a i n i n g b o t h a ketone and an a l c o h o l i n o r d e r t o have the second e q u i v a l e n t add t o t h e k e t o n e . More i n s i d i o u s t o us was t h e i n a b i l i t y , w i t h c o n s t r a i n t s on, t o use many d o u b l e a d d i t i o n r e a c t i o n s r e q u i r e d t o make b i f u n c t i o n a l c o c a i n e s t a r t i n g m a t e r i a l s . Some examples a r e shown i n F i g . 5. T e t r a bromide XL was n o t c o n s i d e r e d as a p o t e n t i a l s t a r t i n g m a t e r i a l s i n c e t h e f i r s t bromine a d d i t i o n t o g i v e 16 was n o t a l l o w e d . The r e a c t i o n r u l e says y o u cannot add bromine t o a non c o n j u g a t e d a l k e n e i f t h e r e i s a n o t h e r a l k e n e p r e s e n t . In the second example, t h e s e l e c t i v e c o n s t r a i n t s p r e v e n t SYNLMA from adding hydride o r Grignard reagents a r b i t r a r i l y t o the carbonyl o f i t s c h o i c e t o g i v e a l c o h o l s 20 o r 21 when r e a c t i n g w i t h d i o n e 18. n
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
Designing an Expert System for Organic Synthesis
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
9. JOHNSON ETAL.
Figure
4:
Darvon
Synthesis,
Nonselective
Carbene
Insertion
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
109
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
110
EXPERT SYSTEM APPLICATIONS IN CHEMISTRY
Example 2 F i g u r e 5:
R e a c t i o n Rule C o n s t r a i n t s
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
9. JOHNSON ETAL.
Designing an Expert System for Organic Synthesis
T h i s p r o t e c t i o n a g a i n s t n o n s e l e c t i v i t y a l s o s t o p s a d d i t i o n o f two e q u i v a l e n t s o f r e a g e n t t o 18 t o g i v e d i o l 21, a c h e m i c a l l y r e a s o n a b l e r o u t e t o t h i s compound. I t i s c l e a r now t h a t we c o u l d r e w r i t e t h e r e a c t i o n r u l e s t o i n v o k e s u b r u l e s o r l a y e r s o f q u a l i f i e r s as a means o f e f f e c t i n g r e a c t i o n s e l e c t i v i t y b u t checking every q u a l i f i e r o f every r u l e c a l l e d would s l o w SYNLMA s i g n i f i c a n t l y . T h i s l e v e l o f c o n s i d e r a t i o n w o u l d more r e a s o n a b l y be done a f t e r s e v e r a l s t r a t e g i e s h a d been chosen f o r f u r t h e r i n v e s t i g a t i o n . Our e x p e r i m e n t s w i t h r e a c t i o n taxonomies a r e d i s c u s s e d l a t e r i n t h i s paper. W i t h o u t t h e c o n s t r a i n t s SYNLMA f i n d s more p a t h s b u t i s l e s s e f f i c i e n t i n i t s g e n e r a t i o n o f v i a b l e s y n t h e t i c pathways. W h i l e h a v i n g too many " c h e m i c a l r e s t r i c t i o n s , " t h e r e a c t i o n r u l e s have no " s t r u c t u r a l r e s t r i c t i o n s . " I n F i g . 6, w h i c h shows t h e f i r s t r e t r o - s y n t h e t i c s t e p s SYNLMA c o n s i d e r e d f o r c o c a i n e s y n t h e s i s , we see t h a t f o u r B r e d t ' s r u l e v i o l a t i o n s , enamines 27a.b and 28a.b were a c c e p t e d as s u b g o a l s . W h i l e d i s c u s s i n g F i g . 6, i t s h o u l d be n o t e d t h a t s t r u c t u r e s 22, 23, 24, and 26 a r e n o t a l l o w e d when c o n s t r a i n t s are on. S t r u c t u r e s 23, 24, 27, and 28 a r e t y p i c a l o f c u r r e n t SYNLMA output. When i t f i n d s a r e a c t i o n r u l e , i t a p p l i e s t h e r u l e e x h a u s t i v e l y . S t r u c t u r e s 29 and 30 a r e n o t s y n t h e t i c a l l y demodulated and r e p r e s e n t w a s t e d CPU time. F i n a l l y , one second generation s t r u c t u r e , 3JL, i s shown because i t r e p r e s e n t s an i n t e r e s t i n g v a r i a t i o n o f an N-oxide ene c y c l o a d d i t i o n r e a c t i o n t h a t has been u s e d t o s y n t h e s i z e t r o p a n o l ( 1 6 ) , t h e b a s i c c o c a i n e r i n g system. From t h e above examples, i t i s c l e a r we need t o b u i l d e f f e c t i v e p l a n n i n g s t r a t e g i e s i n t o SYNLMA and r e s t r u c t u r e o u r d a t a base o f c h e m i c a l i n f o r m a t i o n . T h i s w i l l improve t h e e f f i c i e n c y o f o u r system and make i t a v i a b l e a s s i s t a n t t o t h e s y n t h e t i c o r g a n i c c h e m i s t . M o d e l i n g S t r a t e g i c P l a n n i n g F o r The S y n t h e s i s
Process
Our new system d e s i g n i n v o l v e s p l a n n i n g and o r g a n i z i n g t h e s y n t h e s i s p r o c e s s so t h a t i t c l o s e l y models t h e human e x p e r t ' s approach. How do c h e m i s t s d e a l w i t h a c o m p l i c a t e d o r g a n i c s y n t h e s i s problem? They seem t o o r g a n i z e t h e i r work i n t o t h r e e s u c c e s s i v e s t a g e s w h i c h we c a l l the t r e e - d e f i n i t i o n , t r e e - b u i l d i n g , and t r e e - v e r i f i c a t i o n s t a g e s (17-18). We a r e now i n t h e p r o c e s s o f r e d e s i g n i n g and u p g r a d i n g SYNLMA t o r e f l e c t t h i s new approach. I n t h e f i r s t s t a g e , t h e t r e e - d e f i n i t i o n s t a g e , t h e main t h r u s t i s t o i d e n t i f y p o t e n t i a l s t a r t i n g m a t e r i a l s and/or m e t h o d o l o g i e s by n o t i n g resemblances between t h e t a r g e t compound and (1) c l a s s e s o f a v a i l a b l e s t a r t i n g m a t e r i a l s o r (2) s u b s t r u c t u r e s o r s u p e r s t r u c t u r e s p r o d u c e d by name r e a c t i o n s such as t h e F i s c h e r i n d o l e s y n t h e s i s . Examples o f t h e s e two approaches a r e shown i n F i g . 7 (19) and F i g . 8 (20) r e s p e c t i v e l y f o r t h e s y n t h e s i s o f t h e a l k a l o i d ibogamine, 34. In the substructure d r i v e n approach t o ibogamine, t h e i n d o l e s u b s t r u c t u r e 34 ( F i g . 7) i s r e c o g n i z e d as an a b s t r a c t e d s t a r t i n g m a t e r i a l . As o u t l i n e d below i n t h e d i s c u s s i o n o f t h e Tree D e f i n i t i o n Stage, the a b s t r a c t e d s t a r t i n g m a t e r i a l s a r e l i n k e d t o i n c r e a s i n g l y s p e c i f i c , l e s s a b s t r a c t p o t e n t i a l s t a r t i n g m a t e r i a l s (see F i g . 9 ) . The i d e n t i f i c a t i o n o f a p o t e n t i a l s t a r t i n g m a t e r i a l d r i v e s the r e t r o s y n t h e t i c a n a l y s i s i n a manner w h i c h p r e s e r v e s t h a t component, g e n e r a t i n g i n F i g . 7 t h e c h e m i c a l i s l a n d 35. I n t h e methodology
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
111
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
112
E X P E R T S Y S T E M APPLICATIONS IN C H E M I S T R Y
31 Figure
6:
SYNLMA Initial
Syntheses
of
Cocaine
Retro-synthetic
Paths
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
Designing an Expert System for Organic Synthesis
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
9. JOHNSON ETAL.
H
Aldrich 1 of
H
32
Catalog
thousands
of
Figure
6 membered 7:
rings
Ibogamine
synthesis:
Substructure
driven.
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
113
114
EXPERT SYSTEM APPLICATIONS IN CHEMISTRY
Fischer
indole
Synthesis
NH I
£1 Starting Ibogamine,
34
N
H
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
~
Material steps
2
4
40
. 42
0^
2 3
CH
.
yY
steps
\\
4
steps
< Starting
Figure
8:
Ibogamine
Synthesis.
ca.
25
Methodology
4-substituted
materials
Driven
indoles
4
ca.30
5-substi tuted
5
2 6-substituted
6
^
f
:
^
>
N ca.
3
c a . 55
3-substituted
2
c a . 35
2-substituted
\
1 ca.
20
N-substituted
Natural
products
Other Saturated ca.
10
Carbazole,
ca.
10 X = c o m b i n a t i o n s
Figure
9:
Indole
X=C;
or
ca.
10
unsaturated of C or
N
Substitution Patterns
Found
in Aldrich
Catalog
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
\
/
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
9. J O H N S O N E T A L .
Designing an Expert System for Organic Synthesis
d r i v e n approach t o ibogamine, t h e i n d o l e s u b s t r u c t u r e 34 ( F i g . 8) i s r e c o g n i z e d as t h e a b s t r a c t e d end r e s u l t o f a named o r g a n i c r e a c t i o n . I n t h i s case t h e r e t r o s y n t h e t i c a n a l y s i s u s i n g t h e named o r g a n i c r e a c t i o n w i l l i n d i c a t e t h a t 41 and 42 a r e p o t e n t i a l s y n t h e t i c s u b g o a l s for further synthesis. The d i s c o v e r y o f t h e s e r e s e m b l a n c e s g i v e s a t e n t a t i v e shape t o the problem s o l v i n g t r e e . This i s the h i g h e s t l e v e l p l a n n i n g stage where c h e m i s t s u s e i n d u c t i o n t o l i m i t t h e s e a r c h for starting m a t e r i a l s and t o d e t e r m i n e where t o f o c u s t h e i r d e d u c t i v e p r o c e s s e s , i n many c a s e s making what h a s been c a l l e d " t h e i n t u i t i v e l e a p " ( 2 1 ) . Resemblances between the target and s t a r t i n g materials, or methodology-produced i n t e r m e d i a t e s c a n be d e s c r i b e d i n terms o f c h e m i c a l s u b s t r u c t u r e s o r s u p e r s t r u c t u r e s and t r a n s l a t e d i n t o g r a p h o v e r l a y t e c h n i q u e s w h i c h c a n be implemented by SYNLMA. D u r i n g t h i s s t a g e t h e system c a n f i l l i n t h e r o o t node and many o f t h e l e a f nodes i n t h e p r o b l e m s o l v i n g t r e e . The l e a f nodes a r e the c l a s s e s o f s t a r t i n g m a t e r i a l s o r s y n t h o n s w h i c h have been i d e n t i f i e d by t h e system as h a v i n g s i g n i f i c a n t o r s t r a t e g i c s t r u c t u r e i n common w i t h t h e t a r g e t compound. D u r i n g t h e second s t a g e , t h e t r e e - b u i l d i n g s t a g e , a c o l l e c t i o n o f c r u d e , i m p r e c i s e l y d e f i n e d p r o b l e m s o l v i n g t r e e s w i l l be g e n e r a t e d . The c h e m i s t goes t h r o u g h an analogous s t a g e . Once an a n a l y s i s o f t h e t a r g e t has been c o m p l e t e d , rough s y n t h e s i s o u t l i n e s / r o u t e s a r e c o n s t r u c t e d u s u a l l y r e f l e c t i n g the i n d i v i d u a l ' s knowledge, c r e a t i v i t y , and p r e j u d i c e s . As p a r t o f t h e p r o c e s s , t h e c h e m i s t w i l l o f t e n i n s e r t , p o t e n t i a l l y a t any node a l o n g the path, intermediate s t r u c t u r e s w h i c h a r e e x p e c t e d t o be c o n v e r t i b l e t o t h e t a r g e t o r higher l e v e l intermediates i n t h e pathway. These intermediate s t r u c t u r e s a l s o have a r e a s o n a b l e chance o f b e i n g s y n t h e s i z e d from some s t a r t i n g m a t e r i a l s a v a i l a b l e , i n t h e a b s t r a c t ( 2 1 ) . I n summary, these intermediate compounds, w h i c h c a n be c o n s i d e r e d "chemical i s l a n d s " have a s t r u c t u r e w h i c h i s r e l a t e d t o t h e a b s t r a c t e d s t a r t i n g m a t e r i a l s o r methodology d e r i v e d s y n t h o n s and t h e t a r g e t m o l e c u l e . They c a n be r e p r e s e n t e d as l e a f nodes a l o n g a c r u d e l y defined s y n t h e t i c pathway. To r e a c h from s t a r t i n g m a t e r i a l s o r s y n t h o n s t o t h e s e " c h e m i c a l i s l a n d s " and t h e n t o t h e t a r g e t may i n v o l v e s e v e r a l m u l t i - s t e p s y n t h e s i s p r o c e s s e s w h i c h c a n be f i l l e d i n t h r o u g h s u c c e s s i v e t r e e - b u i l d i n g s t a g e s , each p r o v i d i n g a s k e l e t o n p l a n f o r the n e x t s t a g e ; each u s i n g more d e t a i l e d o r s e l e c t i v e r e a c t i o n r u l e s . What i s r e q u i r e d t o implement t h i s approach i s a new o r g a n i z a t i o n , o r taxonomy o f r e a c t i o n r u l e s f o r SYNLMA, r a n g i n g from t h e v e r y g e n e r a l t o t h e more s p e c i f i c . The more g e n e r a l r u l e s , w h i c h r e p r e s e n t m u l t i - s t e p r e a c t i o n s o r p r o c e s s e s , a r e a p p l i e d i n the e a r l i e r p l a n n i n g stages. The more s p e c i f i c s i n g l e - s t e p r e a c t i o n s a r e a p p l i e d l a t e r . D u r i n g t h i s second s t a g e we w i l l model t h e e x p e r t ' s a p p r o a c h t o i n i t i a l p a t h g e n e r a t i o n and " c h e m i c a l i s l a n d " d e r i v a t i o n b y h a v i n g SYNLMA c a l l a v e r s i o n o f U g i and G a s t e i g e r ' s " P r i n c i p l e o f Minimum C h e m i c a l D i s t a n c e " (PMCD) program w h i c h o f f e r s a c o m p u t e r - a s s i s t e d c o m b i n a t o r i a l s o l u t i o n f o r c o n n e c t i n g two graphs ( 2 2 ) . Incorporation o f t h i s s t r a t e g y w i l l h e l p SYNLMA choose e f f i c i e n t c o n n e c t i v e p a t h s between t a r g e t and g e n e r a l i z e d ( a b s t r a c t ) s t a r t i n g m a t e r i a l s o r synthons. The nodes a l o n g PMCD p a t h s c o n n e c t i n g t a r g e t g r a p h and synthon o r s t a r t i n g m a t e r i a l graphs r e p r e s e n t basic chemical s t r u c t u r e s . A d d i t i o n o f bond-making f u n c t i o n a l groups t o l a s t
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
115
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
116
EXPERT SYSTEM APPLICATIONS IN CHEMISTRY
d i s c o n n e c t i o n p o i n t s on these b a s i c s t r u c t u r e s w i l l c o n s t i t u t e o u r f i r s t a t t e m p t s a t g e n e r a t i n g " c h e m i c a l i s l a n d s " . The PMCD program, w h i c h i s b a s e d on minimum s t r u c t u r e change, h a s been u s e d t o demonstrate t h a t many c l a s s i c s y n t h e s e s c l o s e l y f o l l o w t h e most e f f i c i e n t graph r e p r e s e n t a t i o n d i s c o n n e c t i o n s between t a r g e t and starting material. For t h e c h e m i s t , t h e t h i r d s t a g e i n t h e s y n t h e s i s p r o c e s s i s u s u a l l y a d e t a i l e d a n a l y s i s . During t h i s stage a l l t h e steps a r e f i l l e d i n , b r i d g i n g t h e c h e m i c a l i s l a n d s t o t h e t a r g e t and t o t h e starting materials. F a c t o r s such as y i e l d , c o s t , and s a f e t y a r e c o n s i d e r e d a t t h i s p o i n t . F o r SYNLMA t h i s phase w i l l r e s u l t i n t h e c o m p l e t i o n o f t h e problem s o l v i n g t r e e u s i n g s i n g l e s t e p r e a c t i o n r u l e s chosen on t h e b a s i s o f f u n c t i o n a l group i n f o r m a t i o n . The system w i l l have t o examine a d j a c e n t nodes o f t h e t r e e , f i n d a p p r o p r i a t e s i n g l e s t e p r e a c t i o n s r u l e s and check c o s t a n d y i e l d factors. S t r u c t u r a l i n f o r m a t i o n w i l l be i n c o r p o r a t e d i n t o t h e r e a c t i o n r u l e s as c o n s t r a i n t s . System I m p l e m e n t a t i o n The new system resembles SYNLMA i n o v e r a l l s t r u c t u r e ; we have c o n t i n u e d t o use t h e t h r e e l a y e r e d approach. The bottom l a y e r i s much l i k e t h e bottom l a y e r o f t h e o l d system; t h a t i s , a custom b u i l t theorem p r o v e r c a l l i n g LMA r o u t i n e s t o do much o f i t s work. Argonne L a b o r a t o r y i s i n the process o f u p d a t i n g LMA, w i t h p a r t i c u l a r emphasis on s p e e d i n g i t up. Any improvements made b y t h e Argonne group w i l l be i n c o r p o r a t e d i n t o t h e new system. The m i d d l e l a y e r c o n t i n u e s t o be a network o f environment p a i r s , b u t w i t h t h e a d d i t i o n a l p a i r s needed f o r graph o v e r l a y and PMCD i m p l e m e n t a t i o n . The t o p l a y e r i s b e i n g e n t i r e l y r e o r g a n i z e d i n t o t h r e e s t a g e s , t h e t r e e - d e f i n i t i o n , t r e e - b u i l d i n g and t r e e - v e r i f i c a t i o n s t a g e s d e s c r i b e d above. The T r e e - D e f i n i t i o n Stage. O f t e n a c h e m i s t w i l l choose a s e t o f a p p r o p r i a t e s t a r t i n g m a t e r i a l s b y n o t i c i n g resemblances between t h e t a r g e t m o l e c u l e and c l a s s e s o f a v a i l a b l e compounds, be t h e y s t a r t i n g m a t e r i a l s o r name r e a c t i o n s y n t h o n s . Resemblances between t h e t a r g e t and s t a r t i n g m a t e r i a l s c a n be d e s c r i b e d i n terms o f c h e m i c a l substructures or superstructures. I f we v i s u a l i z e a c h e m i c a l s t r u c t u r e as a graph, resemblances i n t h e form o f s u b s t r u c t u r e s o r s u p e r s t r u c t u r e s c a n be r e v e a l e d b y t h e o v e r l a y i n g o f one graph on t o p of another. To implement a s u b s t r u c t u r e i d e n t i f i c a t i o n p r o c e s s i n SYNLMA we are a d d i n g t o o u r knowledge base a group o f c h e m i c a l l y m e a n i n g f u l substructures. The s u b s t r u c t u r e s d a t a base w i l l be c u l l e d from t h e A l d r i c h C h e m i c a l Co. c a t a l o g and t h e 500 O r g a n i c Name R e a c t i o n s l i s t e d i n t h e Merck Index. These w i l l be s t o r e d i n c l a u s e form and a r r a n g e d i n a h i e r a r c h i c a l format a c c o r d i n g t o c h e m i c a l c o m p l e x i t y . The m i d d l e l a y e r o f SYNLMA w i l l now have s e v e r a l a d d i t i o n a l environment p a i r s to h a n d l e t h e new p l a n n i n g s t r a t e g i e s . One such environment p a i r w i l l consist of a c a l l t o t h e theorem p r o v e r to find candidate substructures i n the t a r g e t molecule u s i n g i t s p a t t e r n matching algorithms. As i n t h e case o f Wipke's a b s t r a c t e d s t r u c t u r e s , e x a c t f u n c t i o n a l group b o n d i n g d e t a i l s w i l l be i g n o r e d a t t h i s t i m e . I n
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
9. JOHNSON ETAL.
Designing an Expert System for Organic Synthesis
order t o use t h i s substructure i n f o r m a t i o n t o f i n d p o t e n t i a l s t a r t i n g m a t e r i a l s , we a r e o r g a n i z i n g t h e s t a r t i n g m a t e r i a l s b y g e n e r a l i z e d classes. Each c l a s s i s r e p r e s e n t e d b y a p a t t e r n c l a u s e c o n t a i n i n g the substructure which defines that c l a s s , b u t w i t h v a r i a b l e s r e p r e s e n t i n g s i d e c h a i n s and n o n - s t r u c t u r e bonds. The g e n e r a l i z e d s t r u c t u r e s w i l l have p o i n t e r s t o more s p e c i f i c p a t t e r n s w h i c h w i l l i n t u r n have p o i n t e r s t o t h e u n i q u e s t r u c t u r e s l i s t e d i n o u r reference sources. We e x p e c t t o i d e n t i f y about two h u n d r e d g e n e r i c c l a s s e s . I n t h e case o f p o t e n t i a l ibogamine s t a r t i n g m a t e r i a l s ( s e e F i g . 7 ) , n e a r l y 120 o f t h e 14,000 compounds l i s t e d i n t h e A l d r i c h c a t a l o g have t h e i n d o l e m o i e t y as a s u b s t r u c t u r e . The i n d o l e r i n g s l i s t e d i n t h e A l d r i c h c a t a l o g c a n be grouped a c c o r d i n g t o t h e i r f i v e s u b s t i t u t i o n s i t e s c o n t a i n i n g s i g n i f i c a n t members. F i g . 9 shows t h e breakdown o f t h e i n d o l e p r o b l e m from most g e n e r a l t o i n d i v i d u a l structures. (Some compounds c o n t a i n m u l t i p l e s u b s t i t u t i o n s . These a r e m u l t i p l y c o u n t e d , once f o r each s u b s t i t u t i o n p o s i t i o n . ) Once t h e theorem p r o v e r h a s r e c o g n i z e d t h e s u b s t r u c t u r e s i n t h e g o a l compound, t h e system w i l l s e a r c h t h e a b s t r a c t e d s t a r t i n g m a t e r i a l d a t a base f o r c l a s s e s o f s t a r t i n g m a t e r i a l s c o n t a i n i n g t h o s e s u b s t r u c t u r e s . If/when an a b s t r a c t e d s t a r t i n g m a t e r i a l i s r e c o g n i z e d , i t w i l l p o i n t t o a more s p e c i f i c p o s s i b i l i t y . The same t y p e o f p a t t e r n m a t c h i n g w i l l be implemented f o r t h e methodology d r i v e n s y n t h o n s d a t a base. As shown i n F i g . 8 f o r ibogamine, upon r e c o g n i t i o n o f t h e i n d o l e s u b s t r u c t u r e as t h e p r o d u c t o f a name r e a c t i o n , i n t h i s case t h e F i s c h e r i n d o l e s y n t h e s i s ( o r one o f t h e o t h e r 13 name r e a c t i o n s l e a d i n g t o i n d o l e s y n t h e s i s l i s t e d i n t h e Merck I n d e x ) , t h e program w i l l c o n s t r u c t t h e s t r u c t u r e s need t o p e r f o r m t h e name r e a c t i o n . Name r e a c t i o n p r e c u r s o r s w i l l become " c h e m i c a l i s l a n d s " o r new t a r g e t s . The u t i l i t y p o r t i o n o f t h e e n v i r o n m e n t a l p a i r w i l l t h e n s t o r e t h e s u b s t r u c t u r e i n f o r m a t i o n and the p o t e n t i a l synthons o r s t a r t i n g m a t e r i a l s chosen. As o u t l i n e d above, t h e s t a r t i n g m a t e r i a l s and s y n t h o n s d a t a base w i l l be o r g a n i z e d f o r e f f i c i e n t s e a r c h b y g e n e r a l s t r u c t u r a l t y p e s ( g r a p h s ) a t t h e t r e e d e f i n i t i o n s t a g e . These g e n e r a l s t r u c t u r e t y p e s can be o r g a n i z e d as l i n k e d l i s t s o f r e l a t e d s t r u c t u r e s headed b y a general pattern clause representing that p a r t i c u l a r c l a s s o f s t a r t i n g m a t e r i a l s o r s y n t h o n s . Once a match h a s been made w i t h t h e p a t t e r n a t t h e head o f t h e l i s t u s i n g t h e graph o v e r l a y t e c h n i q u e s , t h e o t h e r s t r u c t u r a l l y r e l a t e d p o t e n t i a l s t a r t i n g m a t e r i a l s on t h e l i s t c o u l d be r e t r i e v e d t h r o u g h t h e u s e o f p o i n t e r s . The c u r r e n t system r e c o g n i z e s f u n c t i o n a l groups and r i n g s t r u c t u r e s . The new system w i l l r e c o g n i z e l a r g e r s u b s t r u c t u r e s . F o r example u s i n g t h i s new approach t h e system w o u l d be a b l e t o i d e n t i f y the f o l l o w i n g as a s u b s t r u c t u r e o f DARVON: 1$ 5$
I
I
I
I
C6H5 - C - C - C6H5 3$ 7$ where 1$, 3$, 5$ and 7$ a r e v a r i a b l e s r e p r e s e n t i n g v a r i o u s s i d e chains. From t h e g e n e r a l p a t t e r n c l a s s t h e system c o u l d choose d i p h e n y l ethane, s t i l b e n e o r d i p h e n y l a c e t y l e n e as p o t e n t i a l s t a r t i n g m a t e r i a l s t o be examined d u r i n g t h e l a t e r t r e e - v e r i f i c a t i o n s t a g e . The t r e e - d e f i n i t i o n s t a g e i s complete when: a) a l l s i g n i f i c a n t
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
117
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
118
EXPERT SYSTEM APPLICATIONS IN CHEMISTRY
s u b s t r u c t u r e s i n t h e p r e s e n t g o a l compound have been i d e n t i f i e d and b) c l a s s e s o f p o t e n t i a l s t a r t i n g m a t e r i a l s and/or s y n t h o n s have been found. S i g n i f i c a n c e i s d e f i n e d b y a graph c o m p l e x i t y a l g o r i t h m , w h i c h c o u n t s t h e i n c i d e n c e o f nodes ( t h e numbers o f a r c s e n t e r i n g a node) and g i v e s p r e f e r e n c e t o those w i t h h i g h i n c i d e n c e v a l u e s . O t h e r r e s e a r c h groups have used s u b s t r u c t u r e s e a r c h as a method f o r s e l e c t i n g s u i t a b l e s t a r t i n g m a t e r i a l s ( 2 1 ) . H a v i n g t h e theorem p r o v e r as o u r r e a s o n i n g component makes t h i s t a s k e a s i e r f o r us t o implement because o f t h e theorem p r o v e r ' s a b i l i t y t o u s e p a t t e r n m a t c h i n g t o i d e n t i f y t h e s u b s t r u c t u r e s and t h e n match them w i t h pattern clauses representing classes o f s t a r t i n g materials. In addition, i t i s e a s i e r f o r us t o r e p r e s e n t abstractions of s u b s t r u c t u r e s by t h e use o f c l a u s e s c o n t a i n i n g v a r i a b l e s w h i c h s u b s t i t u t e f o r atoms and s i d e c h a i n s . The T r e e - B u i l d i n g Stage. I n t h i s s t a g e we b e g i n t o s k e t c h o u t t h e shape o f t h e p r o b l e m s o l v i n g t r e e and c o n s t r u c t pathways from t h e t a r g e t m o l e c u l e t o s t a r t i n g m a t e r i a l s . Our p l a n i n c l u d e s the m o d e l i n g o f t h e " P r i n c i p l e o f Minimum C h e m i c a l D i s t a n c e " (PMCD), d e v e l o p e d by J . G a s t e i g e r and coworkers ( 2 2 ) . The use o f t h e PMCD w i l l h e l p t h e system d e v i s e " c h e m i c a l i s l a n d s " ; t h e s e a r e compounds w h i c h a r e s t r u c t u r a l l y r e l a t e d t o b o t h t h e t a r g e t and s t a r t i n g m a t e r i a l s . Our i m p l e m e n t a t i o n o f t h e PMCD w i l l d i f f e r from t h a t o f t h e G a s t e i g e r group i n t h a t we r e p r e s e n t c h e m i c a l knowledge i n terms o f c l a u s e s r a t h e r than matrices. To d e t e r m i n e the minimum c h e m i c a l d i s t a n c e , the l a r g e s t ensembles ( s e t s ) o f l a r g e s t s u b s t r u c t u r e s i n t h e t a r g e t m o l e c u l e must be i d e n t i f i e d . The n e x t s t e p i s t o f i n d the l a r g e s t s u b s t r u c t u r e s common to b o t h t h e p o t e n t i a l s t a r t i n g m a t e r i a l s ( s u b g o a l s ) and t a r g e t compound u s i n g t h e u n i f i c a t i o n r o u t i n e s imbedded i n t h e theorem prover. Some o f t h i s work has been done i n t h e t r e e - b u i l d i n g s t a g e . From t h e t r a c e o f t h e p r o o f we c a n d i s c o v e r where t h e t a r g e t and s t a r t i n g m a t e r i a l (subgoal) s t r u c t u r e s d i f f e r . A u t i l i t y program p a i r e d w i t h a c a l l t o t h e theorem p r o v e r c a n c a l c u l a t e t h e n e c e s s a r y " c h e m i c a l d i s t a n c e s " between t a r g e t and s u b g o a l from t h i s i n f o r m a t i o n . These t e l l us, i n graph form, w h i c h bonds need t o be made and w h i c h need t o be b r o k e n t o produce, i n a r e t r o s y n t h e t i c sense, s u b g o a l s which can l e a d t o t a r g e t s . A t t h i s p o i n t we c a n c o n s t r u c t t h e " i s l a n d " m o l e c u l e s between the t a r g e t and s t a r t i n g m a t e r i a l s t h a t w i l l s a t i s f y t h e PMCD. Our s t r a t e g y i s t o have SYNLMA choose, b a s e d on PMCD i n f o r m a t i o n , t h e bonds t o b r e a k t o g e n e r a t e t h e " c h e m i c a l i s l a n d s . " The bonds t o be b r o k e n i n t h e t a r g e t m o l e c u l e w i l l l e a d t o s u b g o a l s marked w i t h r e a c t i v e c e n t e r s a t the p o s i t i o n s where the bond was p r e v i o u s l y a t t a c h e d . U l t i m a t l y SYNLMA w i l l s e l e c t f u n c t i o n a l groups t o be p l a c e d a t t h e r e a c t i v e c e n t e r s t h a t w o u l d a l l o w s i m p l e f u n c t i o n a l group i n t e r c o n v e r s i o n and/or bond making r e a c t i o n r u l e s to " c h e m i c a l l y " r e c o n s t r u c t t h e bonds s u g g e s t e d f o r b r e a k i n g a t t h e r e t r o s y n t h e t i c planning stage. As c a n be seen t h e n , t h e " c h e m i c a l i s l a n d " formed as a r e s u l t o f the bond b r e a k i n g w i l l c o n t a i n the major s u b s t r u c t u r e s found i n t h e t a r g e t and s t a r t i n g m a t e r i a l s . They w i l l be r e p r e s e n t e d as a b s t r a c t i o n s h a v i n g marked r e a c t i v e s i t e s o f i n t e r m e d i a t e compounds; t h a t i s s u b s t r u c t u r e s common t o the t a r g e t and s t a r t i n g m a t e r i a l s w i l l be p r e s e n t and o t h e r groups such as s i d e c h a i n s w i l l be r e p r e s e n t e d as v a r i a b l e s . The n e x t t a s k i s t o
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
9. JOHNSON ETAL.
Designing an Expert System for Organic Synthesis
c o n s t r u c t b r i d g e s between t h e f l o a t i n g i s l a n d s and t h e t a r g e t . This may r e q u i r e many p a s s e s i f t h e compound t o be s y n t h e s i z e d i s v e r y l a r g e and complex. E s s e n t i a l l y each p a s s w i l l make a g e n e r a l p l a n f o r t h e n e x t , more s p e c i f i c p a s s . D u r i n g each p a s s t h e PMCD w i l l be u s e d as a s c r e e n i n g c r i t e r i a t o " t u r n o f f " random o r i n e f f i c i e n t s y n t h e t i c p a t h s t h u s k e e p i n g t h e t r e e from c o m b i n a t o r i a l e x p l o s i o n . To implement such a scheme we a r e o r g a n i z i n g o u r r e a c t i o n r u l e d a t a base i n t o a taxonomy o f r e a c t i o n r u l e s headed b y (1) bond m a k i n g / b r e a k i n g r e a c t i o n s and (2) f u n c t i o n a l group i n t e r c o n v e r s i o n r e a c t i o n s as shown i n F i g . 10. T h i s o r g a n i z a t i o n i s modeled a f t e r the systems d e v e l o p e d by S a c e r d o t i (23) and S t e f i k ( 2 4 ) . A t t h e t o p l e v e l o f t h e taxonomy r e a c t i o n s a r e e n t e r e d i n as a b s t r a c t a form as possible. These w i l l o f t e n be m u l t i - s t e p r e a c t i o n s . Lower i n t h e taxonomy t h e r e a c t i o n r u l e s a r e s t i l l i n somewhat g e n e r a l form, b u t t h e y have more c h e m i c a l and s t r u c t u r a l d e t a i l . Gross f u n c t i o n a l group i n c o m p a t i b i l i t y and s t r u c t u r a l r e q u i r e m e n t s ( i . e . B r e d t ' s r u l e i n f o r m a t i o n ) , w i l l be i n c l u d e d a t t h i s l e v e l . Finally, specific s i n g l e - s t e p r e a c t i o n s and t h e i r l i m i t a t i o n s a r e e n t e r e d a t t h e l o w e s t l e v e l s . These w o u l d be s i m i l a r t o t h e r e a c t i o n r u l e s i n t h e p r e s e n t v e r s i o n o f SYNLMA w h i c h c o n s i s t o f a g o a l p a t t e r n ( p r o d u c t ) and a subgoal p a t t e r n ( r e a c t a n t s ) . The theorem p r o v e r w i l l choose a p a r t i c u l a r r e a c t i o n r u l e whether m u l t i - s t e p o r s i n g l e - s t e p , when i t f i n d s a match between t h e compound t o be s y n t h e s i z e d and t h e g o a l p o r t i o n o f t h e r e a c t i o n r u l e ( 1 0 ) . The t o p o f F i g . 10 shows a fragment o f a r e a c t i o n type taxonomy. Each r e a c t i o n t y p e c a n be f u r t h e r subdivided according to structure r e a c t i v i t y d i f f e r e n c e s . The bottom o f F i g . 10 shows a fragment o f a s t r u c t u r e t y p e taxonomy for s u b s t i t u t i o n reactions. By use o f t h e taxonomy o f r e a c t i o n r u l e s t h e system o n l y s e a r c h e s a s m a l l r e l e v a n t p o r t i o n o f t h e d a t a base a t each s t a g e . I n the early planning stages only t h e most general o f t h e bond m a k i n g / b r e a k i n g r e a c t i o n s and f u n c t i o n a l group i n t e r c o n v e r s i o n s w i l l be r e q u i r e d ; t h o s e a t t h e head o f t h e r e a c t i o n taxonomy. As t h e s y n t h e s i s p r o g r e s s e s , t h e more s p e c i f i c r u l e s a r e a p p l i e d . For example, t h e system might choose t h e g e n e r a l c a t e g o r y o f s u b s t i t u t i o n r e a c t i o n s f o r f u n c t i o n a l group i n t e r c o n v e r s i o n a t t h e b e g i n n i n g o f a search. Further passes w i l l i n v o l v e t h e choice o f t h e type o f s u b s t i t u t i o n r e a c t i o n , f o r example SN1 o r SN2. F i n a l l y , a s p e c i f i c r e a c t i o n r u l e w i l l be needed f o r t h e type chosen. Presently, the r e a c t i o n r u l e s i n SYNIMA a r e s t o r e d i n f i l e s i n d e x e d b y t h e f u n c t i o n a l groups i n v o l v e d i n t h e r e a c t i o n . I n t h e new system, t h e r e a c t i o n r u l e d a t a base w i l l be l a r g e r and more complex. The p a r t i t i o n i n g o f t h i s d a t a base i s c r u c i a l t o o u r a b i l i t y t o s o l v e complex problems. Our p l a n s a r e t o o r g a n i z e t h e p a r t i t i o n i n g i n terms o f s i g n i f i c a n t s u b s t r u c t u r e s . The d e s i g n must a l s o a l l o w easy p r o g r e s s from t h e more g e n e r a l r e a c t i o n t y p e s t o t h e more s p e c i f i c . We b e l i e v e t h e b e s t approach w o u l d be t o s t o r e t h e taxonomy i n t r e e form w i t h a s e c o n d a r y index o f substructure p o i n t e r s . The T r e e - V e r i f i c a t i o n Stage. I n t h i s f i n a l s t a g e t h e system a t t e m p t s t o r e f i n e t h e t r e e produced b y t h e p r e v i o u s s t a g e u s i n g t h e d e t a i l e d a n a l y s i s approach t h a t SYNLMA uses i n i t s u n s o p h i s t i c a t e d v e r s i o n . I n f a c t t h e l a t t e r form o f SYNLMA c a n be u s e d by making o n l y s l i g h t changes t o t h e l o g i c . I n t h e t r e e v e r i f i c a t i o n s t a g e we examine each
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
119
120
EXPERT SYSTEM APPLICATIONS IN CHEMISTRY
Taxonomy of Reaction Rules 2. Functional group i n t e r c o n v e r s i o n Reactions (see below 2.)
1. Bond making/breaking Reactions (see below 1.)
1. Bond making/breaking reactions
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
A.
Carbon-carbon making/breaking reactions (see A. below)
B.
Carbon-heteroatom or heteroatom-heteroatom bond making/breaking reactions
Substitution, etc.
I Hofmann rearrangement
C. Substructure making/breaking reactions (usually multibond forming)
F i s c h e r indole synthesis, e t c .
G a b r i e l amine synthesis (multistep)
I i n d i v i d u a l steps
A. (from above) Carbon-carbon making/breaking reactions ( s e l e c t e d name reactions, some generic reactions)
aldo
malonic ester (multistep)
organometallic
/ \
i i n d i v i d u a l steps
1,2-addition
addition
Grignard
organocuprate
1,4-addition
substitution
elimination
carbene
etc.
Michael
etc.
etc.
2. F u n c t i o n a l group interconversion reactions (many name r e a c t i o n s , a l l generic reactions)
Fischer
Wolff-Kishner
oxidation
T y p i c a l taxonomy shown f o r :
substitution
reduction
etc.
Substitution
s u b s t i t u t i o n bimolecular
at a l l y l i c , benzylic carbon
at primary carbon / \
at secondary carbon
at t e r t i a r y carbon
/ \
at normal primary
at phenyl, vinyl carbon
at neopentyl carbon
Figure 10:
Reaction Taxonomy
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
etc.
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
9. JOHNSON ETAL.
Designing an Expert System for Organic Synthesis
p a i r o f a d j a c e n t nodes i n t h e t r e e , t r e a t i n g t h e upper node as a g o a l and t h e lower node as s t a r t i n g m a t e r i a l . Because t h e d i s t a n c e between the two nodes i s v e r y s m a l l a t t h i s p o i n t , i t i s f e a s i b l e t o make a thorough examination o f t h e i r best connection paths. The f i n a l l i s t i n g o f a l l node c o n n e c t i n g p a t h s r e s u l t s i n a complete s y n t h e s i s . D u r i n g t h i s s t a g e s t a r t i n g m a t e r i a l s need t o be i d e n t i f i e d as s p e c i f i c compounds r a t h e r t h a n g e n e r a l c l a s s e s o f compounds. The s t a r t i n g m a t e r i a l d a t a base w i l l have t o be o r g a n i z e d a t t h i s s t a g e , f o r easy access t o t h i s i n f o r m a t i o n . The o r g a n i z a t i o n w i l l be b a s e d on f u n c t i o n a l group and s u b s t r u c t u r e i n f o r m a t i o n . The system c a n e v a l u a t e t h e p r o p o s e d r e a c t i o n pathways b y u s e o f the PMCD i n t h i s s t a g e . A rough e v a l u a t i o n o f t h e c o s t o f a p a r t i c u l a r p a t h c a n a l s o be made i n terms o f t h e r a n k i n g o f t h e g e n e r a l a p p l i c a b i l i t y o f t h e r e a c t i o n and t h e number o f s t e p s r e q u i r e d . An i n d u s t r i a l c h e m i s t who needed t o make a more a c c u r a t e d e t e r m i n a t i o n o f c o s t and e f f i c a c y would have t o p r o c e e d w i t h a l i t e r a t u r e s e a r c h a t t h i s p o i n t . I t i s i n t e r e s t i n g t o n o t e t h a t o u r approach w o u l d a l l o w a c h e m i s t , i n SYNLMA's i n t e r a c t i v e mode, t o i n s e r t a " c h e m i c a l i s l a n d " or s t r u c t u r e i n order t o f o r c e o r guide i t s use i n a s y n t h e s i s pathway. I n t e r f a c i n g w i t h Commercial Data Bases For SYNLMA t o be o f p r a c t i c a l use t o an o r g a n i c c h e m i s t , i t must be a b l e t o i n t e r f a c e w i t h l a r g e c o m m e r c i a l d a t a b a s e s . Our p l a n s a r e t o c o n t i n u e work on b u i l d i n g . i n t e r f a c e s t o t h e I S I and CAS ( C h e m i c a l A b s t r a c t ) d a t a b a s e s . We a r e a l s o i n t e r e s t e d i n t h e machine r e a d a b l e form o f t h e A l d r i c h C h e m i c a l c a t a l o g , t h e Merck Index, and B e i l s t e i n C o l l e c t i o n among o t h e r s . I S I h a s added s e v e r a l u s e f u l f e a t u r e s w h i c h make i t e s p e c i a l l y a t t r a c t i v e f o r us t o u s e . F o r example, one c a n s e a r c h t h e d a t a base u s i n g what i s c a l l e d G e n e r i c s u b s t r u c t u r e s w h i c h may r e p r e s e n t many a c t u a l compounds. A u s e r c a n r e t r i e v e a s p e c i f i c compound b y i d e n t i f y i n g t h e groups d e s i r e d as s i d e chains. Darc-Chemlink a l l o w s o f f - l i n e p h r a s i n g o f s u b s t r u c t u r e q u e r i e s on a PC making c h e m i c a l s t r u c t u r e s e a r c h i n g e a s i e r f o r b o t h the e x p e r i e n c e d and i n f r e q u e n t o n l i n e u s e r . B o t h systems s h o u l d be a c c e s s i b l e t h r o u g h the C h e m i c a l A b s t r a c t s C o n n e c t i o n T a b l e Format. We a r e i n the p r o c e s s o f moving t o t h i s format o u r s e l v e s f o r i n t e r n a l u s e . T h i s change w i l l r e q u i r e a change i n o n l y one program, t h e program t h a t c o n v e r t s c o n n e c t i o n t a b l e i n f o r m a t i o n t o c l a u s e form. Summary Our i n i t i a l r e s e a r c h e f f o r t s were d i r e c t e d toward t h e development o f an e x p e r t system t h a t c o u l d s o l v e c h e m i c a l s y n t h e s i s problems u s i n g a theorem p r o v e r as i t s i n f e r e n c e engine. We have been s u c c e s s f u l i n i m p l e m e n t i n g such a system t h a t c a n c a r r y o u t s y n t h e s e s o f s i m p l e m o l e c u l e s such as Darvon, I b u p r o f e n , and t h e b i c y c l i c c o c a i n e . To d e v e l o p a system c a p a b l e o f h a n d l i n g more complex m o l e c u l e s w i t h a c c e s s t o l a r g e commercial d a t a bases we a r e augmenting o u r i n i t i a l d e s i g n t o i n c l u d e t h e p l a n n i n g s t r a t e g i e s u s e d b y human e x p e r t s . The new p l a n i n v o l v e s a t h r e e s t a g e approach; t h e t h r e e s t a g e s d e f i n e d as tree-definition, t r e e - b u i l d i n g and t r e e - v e r i f i c a t i o n . I n the t r e e - d e f i n i t i o n s t a g e graph o v e r l a y t e c h n i q u e s w i l l be u s e d t o do
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
121
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
122
EXPERT SYSTEM APPLICATIONS IN CHEMISTRY
s u b s t r u c t u r e searches which w i l l a l l o w f o r w i s e r choices o f s t a r t i n g m a t e r i a l s . The t r e e - b u i l d i n g stage i n v o l v e s u s e o f a taxonomy o f r e a c t i o n r u l e s w h i c h w i l l h e l p guide t h e system t h r o u g h s u c c e s s i v e l a y e r s o f d e t a i l a p p r o p r i a t e t o each p a s s . G a s t e i g e r ' s PMCD w i l l be u s e d t o e v a l u a t e t h e f e a s i b i l i t y o f p r o p o s e d i n t e r m e d i a t e compounds. A t t h e t r e e - v e r i f i c a t i o n phase a c t u a l d e t a i l s w i l l be f i l l e d i n . I n each s t a g e t h e theorem p r o v e r p l a y s a s i g n i f i c a n t r o l e i n t h e d e r i v a t i o n o f new c h e m i c a l i n f o r m a t i o n f o r t h e system. I n c o r p o r a t i o n o f o u r new s t r a t e g i e s , i n d u c t i o n t o g u i d e d e d u c t i o n t o d e t e r m i n e how to a t t a c k t h e problem, and p l a n n i n g a t s u c c e s s i v e l e v e l s o f g e n e r a l i z a t i o n t o manage c o m p l e x i t y , w i l l add s o p h i s t i c a t i o n t o o u r system. The new system w i l l be much smarter t h a n SYNLMA. I n s t e a d o f t r y i n g t o b u i l d b i g g e r and b i g g e r t r e e s , i t w i l l b u i l d b e t t e r t r e e s .
Literature Cited 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
14.
15. 16. 17.
Bersohn, M. ACS Symposium. 1977, 61, 128. Gelernter, H . ; Sanders, A . ; Larsen, D.; Agarwal, K . ; Boivie, R.;Spritzer, G.; Searleman, J . Science. 1977, 197, 1041. Corey, E . ; Pensak, P.A. J. Am. Chem. Soc., 1974, 96, 7724-37. Wipke, W.; Grund, P.; Grabowski, Z . ; Huff, P.; Smith, G.; Andose, J.;Rhodes, J . J . Chem. Inf. Comp. S c i . 1980, 20, 88. Salatin, T.; Jorgenson, W.; J. Org. Chem. 1980, 45, 2043. Vernin, G.; Chanon, M. Computer Aids to Chemistry. E l l i s Horwood Limited, West Sussex, England, 1986. Funatsu, K . ; Sasaki, S.; Tetrahedron Comput. Method. 1988, 1, (1), 39-51. Wang, T.; Ehrlich, S.; Evens, M.; Gough, A . ; Johnson, P. Proc. Conference on Intelligent Systems and Machines, 1984, 176-181. Wang, T.; Burnstein, I . ; Ehrlich, S.; Evens, M.; Gough, A . ; Johnson, P. Proc. 1985 Conference on Intelligent Systems and Machines, 1985. Wang, T., Burnstein, I. Corbett, M . , Evens, M . , Gough, A . , Johnson, P. ACS Symposium, Artificial Intelligence Applications in Chemistry. T. Pierce and B. Hohne, Eds.; 1986, 244-257. Crary, J. M.S. Thesis, I l l i n o i s Institute of Technology, 1988. Zehnacker, M.; Brennan, R.; Milne, G. W.; M i l l e r , J.; Hammell M. J . Chem. Inf. and Comput. Sci., 1986, 26, 193-197, and refs. cited therein. Lusk, E . ; McCune, W.; Overbeek, R. Proc. Sixth International Conference on Automated Reasoning. D. Loveland, Ed.; Computer Science Lecture Notes, #138, Springer-Verlag: New York, 1982, 85-108. Lusk, E . ; McCune, W.; Overbeek, R. Proc. Sixth International Conference on Automated Reasoning. D. Loveland, Ed., Computer Science Lecture Notes, #138, Springer-Verlag: New York, 1982, 70-84. For examples of Ibuprofen syntheses see Pinhey, J . and Rowe, B., Tet. Let., 1980, 21, 965, and refs. cited therein. Tufariello, J.; Mullen, G. J. Amer. Chem. Soc., 1978, 100, 3638. Bindra, J.; Bindra, R. Creativity in Organic Synthesis, v o l . 1, Academic Press, New York, 1975.
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
Downloaded by NORTH CAROLINA STATE UNIV on January 17, 2013 | http://pubs.acs.org Publication Date: September 1, 1989 | doi: 10.1021/bk-1989-0408.ch009
9. JOHNSON ETAL.
Designing an Expert System for Organic Synthesis
18. Warren, S. Organic Synthesis: The Disconnection Approach. John Wiley and Sons, New York, 1982. 19. Sallay, S. J. Amer. Chem. Soc., 1967, 89, 6762. 20. Nagata, W.; Hirai, S.; Kawata, K . ; Okumura, T. J. Amer. Chem. Soc., 1967, 89, 5046. 21. Wipke, W.; Rogers, D. J. Chem. Inf. Comput. Sci., 1984, 24, 71-81. 22. Jochum, C.; Gasteiger, J.; Ugi, I. Angew. Chem. Int. Ed. Engl., 1980, 19, 495-505. 23. Sacerdoti, E. A Structure of Plans and Behavior, Elsevier North Holland, New York, 1977. 24. Stefik, M. Artificial Intelligence. 1981, 16, 111-140. RECEIVED June 26, 1989
In Expert System Applications in Chemistry; Hohne, B., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1989.
123