An Artificial Intelligence System to Model and Guide Chemical

Jun 1, 1977 - Rutgers University, Computer Science Dept., New Brunswick, N.J. 08903 ... A system for Search Modelling is proposed in this paper which ...
0 downloads 4 Views 2MB Size
7 An Artificial Intelligence System to Model and Guide Chemical Synthesis Planning by Computer: A Proposal N. S. SRIDHARAN

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

Rutgers University, Computer Science Dept., New Brunswick, N.J. 08903

One o f t h e central problems i n a p p l y i n g computer methods t o c h e m i c a l s y n t h e s i s p l a n n i n g is S e a r c h Guidance. The ability o f a c h e m i s t t o g u i d e an interactive computer in its s e a r c h f o r s y n t h e s e s and the possibilities f o r self-guidance in Artificial Intelligence programs a r e b o t h limited by t h e form and content o f the search information that is made available as the exploration p r o c e e d s . A system f o r S e a r c h M o d e l l i n g is proposed in this paper which c a n augment existing systems f o r s y n t h e s i s p l a n n i n g and s e r v e t o gather, a n a l y z e and amplify t h e i n f o r m a t i o n generated d u r i n g controlled exploration. The s e a r c h management model is specified in a s i m p l e descriptive form and two example models a r e i n c l u d e d in t h e p a p e r . The g u i d a n c e o f s e a r c h u s i n g t h e i n f o r m a t i o n g a t h e r e d is specified by a r u l e s e t in t h e s i m p l e s y n t a x o f Condition=>Action pairs. A chemist can interactively m o d i f y this rule set. F u r t h e r , an advanced s e a r c h model is p r e s e n t e d w h i c h , by i n t r o d u c i n g t h e powerful concept o f a P l a n n i n g Space, a l l o w s t h e s e a r c h f o r s y n t h e s e s t o go f o r w a r d , backward and t o leap i n t o t h e m i d d l e under controlled conditions.

148 Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

Computer

149

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

INTRODUCTION Planning chemical synthesis routes for known molecular s t r u c t u r e s i s a r i c h problem area o f f e r i n g a challenge that i s being met in inventive and imaginative ways not only by chemists, but a l s o by computer s c i e n t i s t s and mathematicians. I b r i n g to t h i s task the perspective of a s p e c i a l i s t i n the methods of developing A r t i f i c i a l I n t e l l i g e n c e w i t h i n a computing system, and a p e r s i s t e n t concern for using the mechanisable aspects of human knowledge and human problem s o l v i n g techniques i n the medium of a machine. I b r i n g to t h i s task no e x p e r t i s e i n organic chemistry or i n s y n t h e s i s . But I do have the b e n e f i t of several years of intimate contact with the problem of mechanizing the search for syntheses and with expert chemists, e s p e c i a l l y Prof. W.F. Fowler, who foresaw such p o s s i b i l i t i e s and who worked with us. My a p p l i c a t i o n to t h i s task under the guidance of Prof. Gelernter culminated during 1971 i n the running v e r s i o n of SYNCHEM I , the f i r s t computer program to perform s u c c e s s f u l m u l t i - s t e p synthesis e x p l o r a t i o n s automatically without on-line guidance or intervention. This f i r s t v e r s i o n of SYNCHEM employed several key ideas and techniques that were discovered by others e a r l y i n the research on mechanical problem s o l v i n g . These key ideas w i l l be reviewed below. Since my main i n t e r e s t l i e s i n the d i r e c t i o n of artificial i n t e l l i g e n c e , I have spent the f i v e years f o l l o w i n g my work i n SYNCHEM i n working on the a p p l i c a t i o n of a r t i f i c i a l i n t e l l i g e n c e methods to problems i n Mass Spectrometry with the H e u r i s t i c Dendral p r o j e c t at Stanford U n i v e r s i t y and a l s o with Professor Ivar Ugi at Munich i n a s s i m i l a t i n g h i s a l g e b r a i c approach to the representation of r e a c t i o n s . Upon Todd Wipke's i n v i t a t i o n to p a r t i c i p a t e i n t h i s Symposium I have e l e c t e d to set f o r t h i n a very s p e c i f i c manner my proposals on how I would t a c k l e the task of synthesis planning i n the l i g h t of my current understanding of the advances that have been made i n the methods of a r t i f i c i a l i n t e l l i g e n c e . Thus, I have three aims i n w r i t i n g t h i s paper:

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

150

COMPUTER-ASSISTED

ORGANIC

SYNTHESIS

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

a) Propose a system to augment e x i s t i n g synthesis search systems by focusing on the issues of search management. To t h i s end the concept o f search modelling i s introduced and two search models are presented. b) C l a r i f y the f i n e d i s t i n c t i o n between s e l e c t i o n of transform by Relevance criteria and by Applicability criteria. S e l e c t i o n by A p p l i c a b i l i t y defines what i s u s u a l l y c a l l e d the State Sgace and search i n the s t a t e space u s u a l l y grows the synthesis path uniformly from one end only. S e l e c t i o n by Relevance i s not used by any e x i s t i n g system which when used y i e l d s the powerful Planning Space. The search i n a planning space can take "leaps" along the synthesis sequence. c) I n d i c a t e that a combination of search i n both the s t a t e space and the planning space i s p o s s i b l e and that t h i s a f u n c t i o n of the search model employed. The second search model described i n the paper allows the search for synthesis to go forward, backward and to leap into the middle under c o n t r o l l e d c o n d i t i o n s . I t i s hoped that the advantages of t h i s way o f developing a system w i l l include: upgrading the r o l e of the chemist from one of r a t i n g , pruning and s e l e c t i n g subgoals or precursors to that of g i v i n g i n j u n c t i o n s to the system i n the form of r u l e s added, removed or modified as the search proceeds a few steps at a time; the i n t r o d u c t i o n o f the s t r a t e g i c and judgmental knowledge i n the program i n a manner that decoupled from the knowledge o f r e a c t i o n chemistry; and the a b i l i t y t o experiment e a s i l y with v a r i o u s models of search management that are q u a l i t a t i v e l y d i f f e r e n t from each other. The payoffs foreseen are of genuine importance to those of us concerned with the techniques o f a r t i f i c i a l i n t e l l i g e n c e and when proven p r a c t i c a b l e should be e x c i t i n g and c h a l l e n g i n g to chemists as w e l l . I t must be stated at the outset that these new ideas on search modelling are an outgrowth of my attempts to develop a system for common sense reasoning about human a c t i o n s using a p s y c h o l o g i c a l theory of a c t interpretation [Schmidt, 1976; Sridharan, 1975; 1976]. This system has c a p a b i l i t i e s

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRIDHARAN

Chemical

Synthesis

Planning

by

Computer

151

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

of s e l e c t i n g and applying transforms associated with act names, s t r u c t u r i n g them i n t o a plan and reasoning both forward and backward along the plan s t r u c t u r e . The strategy of act i n t e r p r e t a t i o n i s coded i n the form of r u l e s e t s . The framework adapted for t h i s work is called Meta D e s c r i p t i o n System (MDS) [ S r i n i v a s a n , 1973; 1976], designed and developed by S r i n i v a s a n . I s h a l l explore here i n d e t a i l the use of t h i s framework i n the management of search for chemical s y n t h e s i s . The connections between the p s y c h o l o g i c a l theory and synthesis planning s t r a t e g i e s w i l l be l e f t for l a t e r e x p o s i t i o n . There are three major conceptual approaches [See Sridharan, 1974 f o r a review] that have been taken on the issue of a computer mediated design of chemical synthesis plans and they share a c e n t r a l idea among them - that of searching a space of p o s s i b i l i t i e s i n a systematic manner using e m p i r i c a l knowledge as appropriate. The very power of these approaches stems from the s y s t e m a t i z a t i o n of search of the space of possibilities. For those approaching synthesis as fertile grounds for designing and b u i l d i n g computer programs that solve d i f f i c u l t i n t e l l e c t u a l problems [Sridharan, 1971, 1973, 1974; G e l e r n t e r , 1973, 1976] the main problems are twofold. F i r s t , the a c q u i r i n g and packaging of knowledge of the r e a c t i o n s i n a form s u i t a b l e for use w i t h i n a program has to be done c a r e f u l l y and the success of the program depends upon the correctness and extent of the knowledge base. Second, the techniques of conducting search with an incomplete, uncertain and possibly inconsistent knowledge base have to be customized to the task of chemical s y n t h e s i s . The interactive problem solving concepts developed [Corey, 1969; Wipke, 1973, 1976] are a t t r a c t i v e because of t h e i r thoroughness and the chemist user tends to approach the system with hopes that the d i s c i p l i n e d e x p l o r a t i o n of pathways w i l l ensure him that he has not overlooked any reasonable alternative.

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

152

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

The t h i r d approach taken t o computer methods i n chemical synthesis i s one of f o r m a l i z i n g the set of p o s s i b l e r e a c t i o n s [Ugi, 1976], l i m i t i n g the use of e m p i r i c a l knowledge to the s e l e c t i o n r u l e s for these reactions. This method seeks out novel reaction schemes and can suggest to the chemist routes that could not be found by the other methods. However, a reasonable synthesis program i s yet to emerge from t h i s approach. The a l g e b r a i c c h a r a c t e r i z a t i o n of the search space however o f f e r s the p o t e n t i a l of h i g h l y i n t e r e s t i n g s t r u c t u r e s to be defined on the search space and might be very successful i n the long run. A l l of these methods c u r r e n t l y u t i l i z e a structure generally c a l l e d the H e u r i s t i c Method. THE

search Search

HEURISTIC SEARCH METHOD

The H e u r i s t i c Search Method [ N i l s s o n , 1971; Newell & Simon, 1972] i s a usual f i r s t approach t o problem s o l v i n g i f the s p e c i f i c a t i o n of the problem i t s e l f i s given p r e c i s e l y as a Goal S i t u a t i o n , and the s o l u t i o n required i s some sequence of Transforms i . e . Operators that can e f f e c t i v e l y transform the current s i t u a t i o n to the goal s i t u a t i o n . I t i s important that the allowable operators are f i n i t e and are w e l l - d e f i n e d . The problem s o l v i n g procedure involves the use of a Search Model that i s used i n guiding the search for a s o l u t i o n . The h e u r i s t i c nature of the procedure a r i s e s from the use of approximate methods of evaluating progress towards a s o l u t i o n and of assessing the merit and p o t e n t i a l of any p a r t i a l s o l u t i o n sequence. The user gives up in p r i n c i p l e the completeness and o p t i m a l i t y o f the search process gaining i n f a c t more frequent demonstrations o f s u c c e s s f u l problem s o l v i n g [Newell & Simon, 1972]. In chemical synthesis the goal s t a t e i s the target molecular s t r u c t u r e and the operators t o consider for c o n s t r u c t i n g a s o l u t i o n sequence are molecular r e a c t i o n s . The search for a synthesis plan proceeds backwards from the target molecular s t r u c t u r e by considering and applying r e a c t i o n s i n the r e t r o - s y n t h e t i c d i r e c t i o n . The process i s a s e r i a l one and has an "inner loop" that repeatedly asks

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical Synthesis Planning by Computer

153

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

i t s e l f the questions "Should the process stop now?" and "What next?". The process can be s u c c e s s f u l , i n t e r e s t i n g and powerful depending on the use of a proper Search Model to answer these questions. The answer to the question "What next?" comes i n two p a r t s : the choice of a molecular s t r u c t u r e that can be s e t up as a subgoal to complete the s y n t h e s i s , and the choice of the operator to t r y on that subgoal. The information a v a i l a b l e to the process i n a r r i v i n g at the answers i s of two kinds, both of which must be included i n the search model. The first kind comprises the catalog of s t a r t i n g m a t e r i a l s , the l i b r a r y of r e a c t i o n s , t e s t s of a p p l i c a b i l i t y of the r e a c t i o n s , the a p r i o r i merit r a t i n g f o r the goodness ( y i e l d , s p e c i f i c T t y etc.) o f the r e a c t i o n and includes i n general information that i s made a v a i l a b l e to the system p r i o r to the a c t u a l statement of the problem to solve. The nature ancf extent oT t h i s pr i o r information determines the s t r u c t u r e of the search possibilities. The other kind includes information that becomes a v a i l a b l e as the search proceeds and constitutes a r i c h body o f information that i s s p e c i f i c to the problem at hand. Only limited examples of the use of the second kind of information can be shown i n current programs, perhaps the c l e a r e s t one i s the placement of p r o t e c t i o n r e a c t i o n s f o r s e n s i t i v e f u n c t i o n a l groups. The manner i n which such problem s p e c i f i c Information i s c o l l e c t e d , organized and used t o guide search determines the character of the a c t u a l search performed for a given problem. THE PROBLEM SOLVING GRAPH MODEL.

AS

A

MINIMAL SEARCH

The Problem S o l v i n g Graph (PSG) [Gelernter, 1962] i s a g r a p h i c a l model of the dynamics of the search that i s conducted on a given problem, and c o n s i s t s of a root node representing the target molecule l i n k e d to a cascade of descendants which are one-level precursors to those at a l e v e l higher i n the PSG. The H e u r i s t i c Evaluation function assigns numerical weights (or i n some cases elements of an ad hoc s c a l e of d i s c r e t e values) t o each node i n the PSG. T y p i c a l l y , the node evaluation i s based not only on p r o p e r t i e s of the s i t u a t i o n designated by the node and

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

154

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Q

NODE

A N D - 0 R P R O B L E M SOLVING GRAPH MODEL Figure

1.

And-or

problem

solving

graph

model

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

155

Computer

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

the operator a p p l i e d to generate the node, but a l s o on the e n t i r e path from the goal node to the node i n question. The PSG model i n combination with the h e u r i s t i c value assignments provides the search mechanism ready answers to the "What next?" question. The s t r u c t u r e of the PSG f o r the chemical synthesis problem, shown i n Figure 1, c o n s i s t s of AND-nodes when operators generate m u l t i p l e precursors a l l of which need to be made a v a i l a b l e f o r the r e a c t i o n to be s u c c e s s f u l l y executed, and OR-nodes designating the choice a v a i l a b l e among s e v e r a l operators a p p l i c a b l e to a node. The selection c r i t e r i a that are w e l l entrenched i n the Theory of H e u r i s t i c Search [ S l a g l e , 1971] f o r handling such AND-OR problem s o l v i n g graphs are: At an OR-node s e l e c t the most promising subgoal; At an AND-node s e l e c t subgoals s t a r t i n g from the most l i k e l y to f a i l to the l e a s t l i k e l y to f a i l . INFORMATION-GATHERING HEURISTIC SEARCH.

AS

COMPLEMENTARY

ACTIVITY

TO

The planning a c t i v i t y involves a v a r i e t y of d e c i s i o n s that r e q u i r e information not customarily included i n the i n i t i a l s p e c i f i c a t i o n s of à transform. Such d e c i s i o n s i n v o l v i n g reasoning about the search process beyond the information provided by the PSG model c a l l f o r a more elaborate search model and i s considered i n t h i s s e c t i o n . The PSG model may be viewed as a c o l l e c t i o n ready answers to the f o l l o w i n g set of questions: a) Does a node have subgoals? they?

How

many?

What

of are

b) What i s the s t a t u s of a node? Was a s u c c e s s f u l path found? Was i t t r i e d and f a i l e d on a l l paths? Are there descendant subgoals s t i l l open? c) Does the s i t u a t i o n ( i . e . molecule)

represented

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

in

COMPUTER-ASSISTED ORGANIC SYNTHESIS

156

this node occur elsewhere i n the PSG? Is the s i t u a t i o n c i r c u l a r i . e . c a l l i n g for s y n t h e s i z i n g X to synthesize X higher up? d) At an OR-node what i s the best subgoal to take up? At an AND-node what i s the precursor that should be tackled f i r s t ?

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

Now consider the f o l l o w i n g set of questions that go beyond the information maintained by the PSG model: a) When there i s an operator whose relevance to synthesizing a molecule is clear, but the a p p l i c a b i l i t y c o n d i t i o n s are not s a t i s f i e d , under what set of circumstances should the unsatisfied preconditions be made into subgoals? Under what c o n d i t i o n s should the operator be r e j e c t e d altogether? b) For a given molecule what i s the most s t r a t e g i c sequence for the i n t r o d u c t i o n of f u n c t i o n a l groups? c) When i s the sequence i n t r o d u c t i o n immaterial?

of

functional

group

d) Given a subgoal, can the paths explored for another s t r u c t u r a l l y homologous s t r u c t u r e be considered v a l i d here? e) Given a synthesis route (or partial route) involving a protection/unprotection reaction p a i r , should an attempt be made to d e r i v e a r e v i s e d route not i n v o l v i n g p r o t e c t i o n by resequencing some of the reactions? Answering these questions c a l l s for maintaining a r i c h e r and b e t t e r organized base of information about the search paths than that provided by the PSG and i t s h e u r i s t i c value assignments to the nodes. As an a l t e r n a t i v e to the Heuristic Search process, consider the f o l l o w i n g two-stage process: a) Perform some e x p l o r a t i o n i n the search space (be i t the State Space of the H e u r i s t i c Search described above, or the Planning Space to be discussed below). b) Gather the search information, analyze, amplify and

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

Computer

157

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

use i t to guide f u r t h e r e x p l o r a t i o n . A great deal of f l e x i b i l i t y and i n v e s t i g a t i v e power comes to us i f we separate the t o t a l system i n t o two components g i v i n g explicit charge of the e x p l o r a t i o n by search to one, and the a n a l y s i s and a s s i m i l a t i o n o f the search information to the other. We gain conceptual c l a r i t y i n t h i n k i n g about the r u l e s for search guidance and set about designing novel Search Models with a new ease and v i g o r . I w i l l describe b r i e f l y the information gathering system which has been developed a t Rutgers and show by example a novel form of search model i n the remaining sections of the paper. ΜΕΤΑ-DESCRIPTION SYSTEM: A PARADIGM FOR INFORMATION-GATHERING The s t r u c t u r e of a system described i n the f a c i l i t y of MDS [ S r i n i v a s a n , 1973 & 1976; Sridharan, 1975] concepts i s radically d i f f e r e n t from the procedure based systems to which we are accustomed. I t i s more f a v o r a b l e , t h e r e f o r e , to introduce the system d i r e c t l y by an example. Table I presents the STRUCTURAL DESCRIPTIONS of the c l a s s e s of e n t i t i e s that are involved i n b u i l d i n g the PSG model. The c l a s s PSGRAPH designates the Problem S o l v i n g Graph whose ELEMENTS are NODES. There are two important c l a s s e s that help structure c o l l e c t i o n s of NODES i n t o OR-NODES and AND-NODES. The l a t t e r three c l a s s e s have MERIT and STATUS r e l a t i o n s associated with them, shown i n the Table r e l a t i n g these nodes to INTEGER values for MERIT, and a c l a s s c a l l e d STATUS f o r the STATUS r e l a t i o n . There are three values of STATUS defined as constants v i z . , OPEN, FAILED and SUCCEEDED. The PSGRAPH has a GOAL which i s a NODE and a c o l l e c t i o n of open nodes and f a i l e d nodes. The TRIALNODE designates the node to take up as subgoal when the search process i s set to explore the space again. The PSGRAPH has a r e l a t i o n STATUS which i s intended to i n d i c a t e the c o n d i t i o n s under which the process should terminate. NODES are r e l a t e d t o the OR-NODEs v i a the SUBGOAL r e l a t i o n , the OR-NODEs i n d i c a t e CHOICES of AND-NODEs and the AND-NODES i n turn INCLUDE any number of NODES

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC

158 (*

TABLE

(* (CONSTANTS (CONSTANTS

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

(*

I

*)

CONSTANTS

OF T H E DOMAIN

(YESNO (YES NO))) (STATUS (SUCCEEDED

STRUCTURAL

PSG

FAILED

DESCRIPTIONS

*)

OPEN)))

*)

(TON:

[PSGRAPH

(element N O D E elementof) (goal N O D E goalof) (opennodes NODE opennodeof) (failednodes NODE failednodeof) (status STATUS statusof) (trynode NODE|AND-NODEIOR-NODE trynodeof)])

(TDN:

[NODE (subgoal OR-NODE subgoalof) (subnode NODE subnodeof) (descendant NODE descendantof) (status STATUS) ( s i t u a t i o n MOLECULE) (merit INTEGER meritof) (repeated NODE repeatedby) ( c i r c u l a r NODE)])

(TDN:

[OR-NODE (subgoal AND-NODE) (status STATUS) (merit INTEGER)])

(TDN:

[AND-NODE (subgoal NODE) (status STATUS) (merit INTEGER)])

(TDN:

[MOLECULE ( s t r u c t u r e CHEMICAL-GRAPH) ( a v a i l a b l e YESNO)]) (* SENSE DEFINITIONS *)

(QSCC: [((NODE Ν) | (X elem Ν) (N status OPEN)) PSGRAPH opennodes]) (* Flags s p e c i f i a b l e on the r e l a t i o n s have been l e f t out f o r s i m p l i c i t y *)

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

SYNTHESIS

7.

Chemical

SRIDHARAN

Synthesis

Planning

by

Computer

Table I. Continued I

(QSCC:

[ ( ( N O D E N) PSGRAPH

(P elem Ν )

(N s t a t u s

FAILED))

(QSCC:

[ ( ( S T A T U S S) I (X ( g o a l s t a t u s ) S ) )

failednodes] ) PSGRAPH

status]) (QSCC:

[((STATUS

S) I

((X

(subgoal s t a t u s ) SUCCEEDED) => (X s t a t u s SUCCEEDED)) ( ( ( N O T [X (subgoal s t a t u s ) F A I L E D ] ) AND

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

(NOT

[X S t a t u s S U C C E E D E D ] ) ) (X S t a t u s OPEN))

=>

( ( ( A L L NODE N)(X subgroup Ν) (N s t a t u s

FAILED))

(X S t a t u s F A I L E D ) ) ) OR-NODE

status]) (QSCC:

[((STATUS

S) I

((X

(subgoal s t a t u s ) F A I L E D ) => (X s t a t u s F A I L E D ) ) (((X (subgoal s t a t u s ) OPEN) (NOT [X S t a t u s F A I L E D ] ) ) => (X s t a t u s OPEN)) (((NOT

(NOT

[X S t a t u s

FAILED])

[X S t a t u s O P E N ] ) ) => (X S t a t u s S U C C E E D E D ) ) )

AND-NODE

status]) (QSCC:

[ ( ( N O D E A) I

(A (subgoal

subgoal

subgoal)

X

NODE

subgoalof]) (QSCC:

[ ( ( N O D E A) |

(X s u b g o a l o f A) OR

(X (descendantof s u b g o a l o f ) A ) ) NODE

descendantof]) (QSCC:

[((STATUS

((X

S) I

( s i t u a t i o n a v a i l a b l e ) YES)

=>

(X s t a t u s SUCCEEDED)) ((X (subgoal s t a t u s ) SUCCEEDED) => (X S t a t u s SUCCEEDED)) ((X c i r c u l a r Y) => (X s t a t u s F A I L E D ) ) ) NODE

status]) (QSCC:

[ ( ( N O D E R) I

(X ( s i t u a t i o n

s i t u a t i o n o f ) R))

NODE

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

160 Table I.

ORGANIC SYNTHESIS

Continued

repeated]) C) I (X repeated C) (X descendantof C))

(QSCC:

[((NODE NODE

(QSCC:

[ ( ( I N T E G E R I)

circular]) |

(X (subgoal merit) I ) (NOT [X (subgoal merit >=) I ] ) )

OR-NODE

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

merit]) (QSCC: [ ( ( I N T E G E R I ) | (X (subgoal merit) I ) (NOT [ I (>= meritof subgoalof) X])) AND-NODE

merit]) (* PRODUCTION RULES GUIDING SEARCH *) [INITIALIZE (IT (PSGRAPH P)) (INPUT (P goal G)) (IR (P goal G)) (IR (P opennodes G))] (G s t a t u s SUCCEEDED) => (OUTPUT G)(HALT) (G Status FAILED) => (OUTPUT G)(HALT) (G element Χ)(X c i r c u l a r Y) => (ASSERT (X Status FAILED)) (NOT [G trynode X]) => (ASSERT (G trynode (G g o a l ) ) ) (G trynode X)(NOT [X subgoal Y]) => (ASSERT (G trynode NIL))(SPROUT X)(EVALUATE X) (G trynode Χ) (X subgoal Y) (X (merit meritof) Y) => (ASSERT (G trynode Y)) (* End of Table I *)

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

thus c l e a r l y graph.

Synthesis

exhibiting

Planning

by

Computer

the AND/OR

161

nature

of the

Let us turn our a t t e n t i o n b r i e f l y to the SENSE DEFINITIONS which are s p e c i f i c a t i o n s of the L o g i c a l c o n d i t i o n s that are to be met f o r a s s e r t i n g the various relations and are at the same time s p e c i f i c a t i o n s of the computations to be performed i f the system i s given the r e s p o n s i b i l i t y to f i l l i n values for c e r t a i n r e l a t i o n s . Simple d e f i n i t i o n s are given f o r the OPENNODES and FAILEDNODES r e l a t i o n s of the PSGRAPH c l a s s . The OPENNODES are the set of a l l NODES Ν which are ELEMENTS of X (denoting the PSGRAPH) whose STATUS i s OPEN. The STATUS of the PSGRAPH i s defined to be the status of the goal node of the PSGRAPH w r i t t e n [(STATUS S) I (X (goal status) S ) ] . The nature of the AND-NODES and the OR-NODES i s c l e a r l y s p e l l e d out i n the d e f i n i t i o n s for the STATUS r e l a t i o n s on these c l a s s e s . The d e f i n i t i o n f o r the status of the AND-NODE may be paraphrased i n t o E n g l i s h as f o l l o w s : a) I f X includes a node whose status then the status of X i s also FAILED; b) I f X status i s not FAILED and X OPEN node then the status of X i s OPEN;

i s FAILED

includes

f i n a l l y , c) I f X i s neither OPEN nor FAILED i t must be SUCCEEDED.

an

then

The information displayed i n Table I i s the description the user provides to the Information-gathering system as the s p e c i f i c a t i o n of the Search Model. The system accepts the S t r u c t u r a l D e s c r i p t i o n s and sets up Data Structures and access functions f o r each of the r e l a t i o n s . The Sense D e f i n i t i o n s are analyzed t o compile a Network of Information Flow [Sridharan, 1976] that p r e s c r i b e s the data flow paths when a new piece of information i s made a v a i l a b l e to t h i s system. This much could be termed the "compile-time" a c t i v i t y of the system.

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

162

(* TABLE I I *) (* STRUCTURAL DESCRIPTIONS *) (TDN: [SEARCHGRAPH (* This i s the search graph of FLEXI) (elements NODE) (goal NODE) (trynode RNODEISNODEIDNODEIFNODE)])

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

(TDN: [SNODE (* State Space Structure (Backward Search)) (node NODE snode) (subgoal OR-NODE) (merit MERIT) (status STATUS) (subgoal NODE) ( c i r c u l a r SNODE) (features FEATURE) (descendant SNODE)]) (TDN: [RNODE (* Planning Space Structure) (node NODE mode) (redgoal RNODE) ( d i f g o a l DNODE) (merit INTEGER meritof) (status STATUS) (relevantfeatures FEATURE)]) (TDN: [DNODE (* State Space Structure (Forward Search)) (node NODE dnode) (tonode NODE) (fromnode NODE) ( d i f f e r e n c e s FEATURE) (merit MERIT) (status STATUS)])

(TDN: [FNODE (* Non-Goal-Directed Forward Search) (node NODE fnode)

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRIDHARAN

Table II.

Chemical

Synthesis

Planning

by

Computer

163

Continued

(transform TRANSFORM appliedto) (reactivegroup FEATURE) (canproduce FNODE canbeproducedfrom) (merit MERIT meritof) (status STATUS)]) (TDN: [TRANSFORM

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

(* One i s needed f o r every growth) (appliedto FNODE transform) (product FNODE) (reagents MOLECULE)]) (* SAMPLE PRODUCTION RULES *) (ADDED (SNODE S)) => (ASSERT ((S node) fnode NIL)) (* No forward e x p l o r a t i o n i f S was given as a r e t r o s y n t h e t i c goal) (ADD (FNODE D))=> (FILLIN (INSTANTIATE (DNODE (fromnode &(N node)) (tonode &(N (canbeproducedfrom node dnode tonode)))))) (* I f Ν was generated by forward e x p l o r a t i o n t r y a s s e r t i n g a DNODE subgoal i f the information needed i s there) (* SENSE DEFINITIONS *) (QSCC: [((DNODE D) | (D tonode (X goalnode))) RNODE difgoal]) (QSCC: [((DNODE D) | ((X (node status) SUCCEEDED) => (X status SUCCEEDED)) (((X ( d i f g o a l status) SUCCEEDED) (X (redgoal status) SUCCEEDED)) => (X status SUCCEEDED)) (((X ( d i f g o a l status) FAILED)

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

164

COMPUTER-ASSISTED ORGANIC SYNTHESIS

OR (X (redgoal status) FAILED)) => (X status FAILED))) RNODE status])

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

(QSCC: [((NODE Y) | (X (difnodeof goalnode) Y)) DNODE tonode]) (QSCC: [((STATUS S) | ((X (fromnode canproduce* node) (X tonode)) => (X Status SUCCEEDED)) ( (X (goalnode goalnode descendant node) (X fromnode)) => (X Status SUCCEEDED))) DNODE status] ) (QSCC: [((STATUS S) | ( (X (node status) SUCCEEDED) => (X status SUCCEEDED)) ((X (subgoal status) SUCCEEDED) => (X Status SUCCEEDED))) SNODE status] ) (QSCC: [((STATUS S) | (((X (mode status) SUCCEEDED) OR (X (snode status) SUCCEEDED) OR (X ( s i t u a t i o n a v a i l a b l e ) YES)) => (X s t a t u s SUCCEEDED) ) ) NODE status])

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

Computer

165

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

The planning and problem s o l v i n g i s i n i t i a t e d and c o n t r o l l e d by a s e t of r u l e s that we s h a l l examine p r e s e n t l y . The i n i t i a l i z a t i o n i s s t r a i g h t f o r w a r d and involves c r e a t i n g an instance of PSGRAPH and f i l l i n g i t s goal node. This i n d i c a t e s that i t i s the s p e c i f i c a t i o n of the goal that t r i g g e r s the process. The s p e c i f i c a t i o n of the goal node involves submitting the s t r u c t u r e of the molecule to be synthesized. Turning our a t t e n t i o n away from the Rule s e t that c o n t r o l s the problem s o l v i n g process, l e t us consider the information gathering a c t i v i t y caused by the a d d i t i o n of a subgoal node f o r some operator explored by the search component of the system. This may be s p e c i f i e d as a conjunction of precursor molecules which are INCLUDED i n an AND-NODE. Consider the a c t i o n taken when one of the precursors i s asserted as a SUBGOAL of the GOAL node. The check of the c o n d i t i o n f o r the (NODE subgoalof NODE) r e l a t i o n i n d i c a t e s that t h i s AND-NODE needs to be introduced as a choice i n the OR-NODE pointed to by the GOAL node [ t h i s i s i n d i c a t e d by the expression (A (subgoal choice includes) X ) ] . The consequent a s s e r t i o n of the CHOICE r e l a t i o n flows along i t s data flow path to the STATUS relation of the OR-NODE i n question* It is appropriate to point out here that t h i s data flow l i n k was "compiled" when the d e f i n i t i o n of STATUS was scanned and i t was e s t a b l i s h e d then that any additions/changes to the CHOICE of an OR-NODE was to take e f f e c t i n turn on the STATUS r e l a t i o n . A v e r b a l d e s c r i p t i o n such as t h i s one cannot describe a l l the data flow that takes place but h o p e f u l l y the above explanation conveys the concept of the data flow and consequent " informations-gather ing" proceeding as per the s t r u c t u r a l and sense d e f i n i t i o n s of the Search Model given by the user f o r t h i s problem domain. At t h i s p o i n t , i f the reader w i l l grant that as the r e s u l t s o f the e x p l o r a t i o n conducted by the search component gets fed t o the Information-Gathering component the r e q u i s i t e search model w i l l be created or updated as appropriate, we can turn our a t t e n t i o n back again to the Search Guidance Rules. The Rules are w r i t t e n i n what i s known i n the computer science l i n g o as the "Production Rule Form" [Davis & King, 1975].

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

166

COMPUTER-ASSISTED ORGANIC SYNTHESIS

The production system takes a sequence c o n d i t i o n a l a c t i o n s p e c i f i c a t i o n s of the form:

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

RULE :

of

(Condition to check)=>(Action to take)

In applying one of these r u l e s the l e f t - h a n d side of the Rule i s f i r s t tested against the current s t a t e of the model and i f the t e s t i s s a t i s f i e d the a c t i o n s of the r i g h t side of the r u l e are performed. There are a v a r i e t y of r u l e sequencing methods conceivable, but we s h a l l use only the simplest of them here. The c o n t r o l s t a r t s from the beginning of the r u l e sequence and t r i e s each r u l e and c y c l e s back to the f i r s t r u l e a f t e r the l a s t r u l e i s t r i e d . The execution of a (HALT) i n some a c t i o n component of a r u l e terminates the e n t i r e process. The r u l e set for the PSG model i s very simple. I n i t i a l l y , the status of the PSGRAPH i s examined to see i f the process should terminate. The r u l e s w r i t t e n here s p e c i f y that i f the status of the PSGRAPH i s FAILED or SUCCEEDED then the graph i s output and the process h a l t s . Otherwise, I f there i s a node marked as a p o s s i b l e node to sprout, the goal node i s picked f i r s t . On the other hand, the presence of a trynode which has no subgoals i n d i c a t e s that the s e l e c t i o n i s completed and the a c t i o n to take i s to SPROUT the trynode and EVALUATE it. The responsibility of SPROUT is to choose a transformation, t e s t i t s a p p l i c a t b i l i t y and give the set of precursors to Modelling System. The Modelling System then updates the model and posts the tree h i e r a r c h y , the c i r c u l a r i t y and s t a t u s r e l a t i o n s . The evaluation a l s o submits i t s merit r a t i n g of the nodes to the Modelling System which i n t u r n , reassigns the merits of the a f f e c t e d AND-NODES and OR-NODES. The i m p l i c a t i o n s of the new information so entered are followed by the Modelling System and the information needed by the r u l e set i s provided i n a ready form. The control repeatedly flows in a SELECT-SPROUT-EVALUATE loop u n t i l h a l t e d . The r u l e set i s f l e x i b l e and can be changed i f a s p e c i f i c synthesis problem suggests a d i f f e r e n t form of c o n t r o l . I t i s not d i f f i c u l t to enter s y n t a c t i c guidance based on the model graph using the distance of a node from the g o a l , the number of conjuncts at an

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical Synthesis Phnning by Computer

167

AND-NODE e t c . . I t i s also p o s s i b l e to guide the search based on chemical information, f o r example, to disregard subgoals involving a seven-membered heteratomic r i n g . I t i s conceivable that when the system runs i n t e r a c t i v e l y the guidance w i l l be changed and experimented with as the search proceeds.

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

SEARCH IN A PLANNING SPACE The s t r u c t u r e of the search space i s determined not only by the c o l l e c t i o n of transforms a v a i l a b l e to the system but also by the r u l e s f o r s e l e c t i n g the transforms to be t r i e d f o r any subgoal. There are two basic ways f o r s e l e c t i n g transforms. a) S e l e c t i o n by A p p l i c a b i l i t y . I f the transforms are selected because they guarantee that the target molecular s t r u c t u r e w i l l be produced upon their a p p l i c a t i o n , the required precursors become subgoals. This i s the method of s e l e c t i n g transforms by t h e i r a p p l i c a b i l i t y i n the r e t r o s y n t h e t i c d i r e c t i o n . The synthesis sequence grows a step at a time ensuring that the target molecular s t r u c t u r e w i l l be a product of the r e a c t i o n sequence developed so f a r , once the precursors are made available. The space of p o s s i b i l i t i e s determined by this criterion of a p p l i c a b i l i t y i s termed the STATE SPACE. b

) S e l e c t i o n by Relevance. In some cases a transform i s selected because i t produces a molecule only s i m i l a r to the target molecular s t r u c t u r e but not e x a c t l y the same. When such a transform i s applied to a target s t r u c t u r e T, i t may synthesize some s t r u c t u r e S that i s s i m i l a r to T, from a set of precursors P. This breaks the o r i g i n a l problem of synthesis i n t o two subproblems, i) Synthesize P. S(P) i i ) Transform S==>T. TR(S,T) The transform selected s p e c i f i e s a r e a c t i o n that converts Ρ to S, and t h i s transform w i l l , i n general, c o n s t i t u t e an intermediate step i n the synthesis sequence. The space of p o s s i b i l i t i e s determined by the c r i t e r i o n of relevance i s termed the PLANNING SPACE and i n several s i t u a t i o n s the search toward a

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

168

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

Scheme I.

Reaction used in

transform

a.

TRANSFORM

SYNTHESIZE

b.

TRANSFORM

SYNTHESIZE Scheme II.

Pfonning

space

maneuvers

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

Computer

169

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

s o l u t i o n can be b r i e f e r i n t h i s space than i n the State Space. This d e f i n i t i o n of a Planning Space i s a v a r i a t i o n of the concept introduced i n the General Problem Solver system (GPS) [Ernst, 1969]. The d i s t i n c t i o n between the S e l e c t i o n of transforms by A p p l i c a b i l i t y and by Relevance i s an important one when considering the s t r a t e g i e s one might employ to search f o r synthesis sequences. The search i n a Planning Space has the c h a r a c t e r i s t i c that the search "leaps" i n t o some intermediate point i n the synthesis sequence and e s t a b l i s h e s an " i s l a n d " and the s o l u t i o n search could then proceed from the i s l a n d to the target molecule i n the forward d i r e c t i o n or from the i s l a n d backward i n the r e t r o s y n t h e t i c d i r e c t i o n toward a v a i l a b l e molecules. The s i g n i f i c a n c e of t h i s a b i l i t y to leap has been explored i n other task areas than synthesis search and has been found to be a powerful t o o l i n converging on s o l u t i o n s r a p i d l y . I t s u t i l i t y f o r synthesis search remains to be shown and for now can be i l l u s t r a t e d only i n terms of examples. The f o l l o w i n g sketch of an example i s o f f e r e d to i l l u s t r a t e the idea of planning space. The example has not been checked by any chemist and thus i t s chemical correctness cannot be assured. Wipke [Wipke, 1976] uses the example of a r e a c t i o n that synthesizes an a l c o h o l group i n 1,4 r e l a t i o n to an e l e c t r o n withdrawing group, say C=0, by the opening of epoxide by a s t a b i l i z e d i o n . Consider the target s t r u c t u r e shown i h Scheme I . The c r i t e r i o n of a p p l i c a b i l i t y would require that the target s t r u c t u r e contain the -C(OH)-C-COsubstructure and would not be a p p l i c a b l e i n the State Space search. In search conducted i n the Planning Space, i f the transform i n d i c a t e d i s considered relevant to the target s t r u c t u r e (e.g., i f the presence of the a l c o h o l and an unsubstituted 1,4 carbon i s s u f f i c i e n t ) then t h i s transform may be used. The o r i g i n a l synthesis problem i s replaced with two subproblems shown i n Scheme I I .

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

170

The above example could be s u c c e s s f u l l y completed by working forward reducing the d i f f e r e n c e between S and T, and working backward i n s y n t h e s i z i n g P. COMBINING SEARCH IN PLANNING AND STATE SPACE The search i n a planning space should be conducted by t a k i n g repeatedly the synthesize S type problem for each Ρ s p l i t t i n g i t each time i n t o a 'Synthesize and a 'Transform' type problem, d e f e r r i n g a l l the Transform TR type problems t i l l some a v a i l a b l e molecular s t r u c t u r e i s reached along a path. This w i l l generate a S k e l e t a l Plan where many of the intermediate steps are l a c k i n g i n d e t a i l but each one i s given as a TR type problem. I f an e v a l u a t i o n function can designed for these s k e l e t a l plans much useless search can be avoided by using the planning space.

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

1

The search i n the s t a t e space i s conservative and takes small steps attempting to make steady progress towards completing a set of s o l u t i o n s . This can cause e i t h e r aimless wandering because small changes i n the merit values assigned to subgoals cause no s i g n i f i c a n t s h i f t s of a t t e n t i o n or because the changes i n the merit values cause l a r g e abrupt changes i n behaviour. This has been given the graphic name of the Mesa Phenomenon by Minsky [Minsky, 1963]. The planning space s t r u c t u r e s the solution sequendes q u i t e d i f f e r e n t l y and causes a more g o a l - d i r e c t e d search to proceed. The method of transform s e l e c t i o n allows the program to "leap" i n t o the s o l u t i o n sequence and decide upon one of the intermediate r e a c t i o n s and permits the s o l u t i o n to grow i n both d i r e c t i o n s . It appears that a j u d i c i o u s combination of both the State Space and Planning Space search methods might be able to overcome some of the d i f f i c u l t i e s found i n each of the methods [Amarel, 1969] With these c o n s i d e r a t i o n s i n view the next s e c t i o n introduces a framework i n which to combine the two spaces, l e t t i n g the chemist user supply the search guidance r u l e s customized to the p a r t i c u l a r problem at hand. FLEXI:

A FLEXIBLE ADVANCED SEARCH MODEL

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

Computer

171

Given a c o l l e c t i o n of nodes designating molecules there are the f o l l o w i n g types of tasks one can generate (Table I I ) : a) For a given node whose molecule i s not yet synthesized, develop a synthesis by working backwards in the s t a t e space using a p p l i c a b l e transforms. b) For a given node whose molecule i s not yet synthesized, develop a synthesis by problem reduction using relevant transforms. c) For a given ordered p a i r of nodes develop a synthesis route that transforms one molecule to the other. d) For a given node execute a b r i e f non-goal-directed exploration forwards using reactions i n their conventional d i r e c t i o n s . A search model i s introduced here, c a l l e d FLEXI, in which four types of s t r u c t u r e s are used to symbolize the above four categories of tasks and these four nodes form the BUILDING BLOCKS of the search management model. Figure 2 shows that a NODE designates a molecule and i n d i v i d u a l l y can be set up as an SNODE for searching r e t r o s y n t h e t i c sequence, or as an RNODE f o r search for a planning route. The FNODE i s used t o s e t up the node f o r forward e x p l o r a t i o n without a goal guidance. The sprouting of an SNODE generates a piece of the f a m i l i a r AND/OR problem s o l v i n g graph and the status r e l a t i o n s on SNODE, OR-NODE and AND-NODE are posted s i m i l a r to that given e a r l i e r . The sprouting of an RNODE R l , see Figure 3, c o n s t i t u t e s a step i n the planning space and generates two tasks by problem reduction - an RNODE R2 and a DNODE D l . R2 c a l l s for the synthesis of a molecule by further problem reduction and the DNODE sets up a problem of transforming one molecule i n t o another. The transform used i n the sprouting of R l i s used to e s t a b l i s h that i t canproduce the fromnode N3 of the DNODE from the node N2 of R2. This i s e x h i b i t e d i n Figure 3. The new task set up i n the RNODE could of course be r e s t r u c t u r e d as an SNODE task by some r u l e i n the production system. The RNODE w i l l be considered successful as soon as the NODE connected t o i t has

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

172

COMPUTER-ASSISTED ORGANIC

SYNTHESIS

FNODE

Ο

fnode situation

NODE

MOLECULE

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

RNODE

SNODE Figure

2.

NODE

building

block

RNODE Rl node

redgoah

\difgoal

(RNODE R2 (

)DNODE D1

-K

) NODE N1 fnode

\snode

Ç)FNODE F3

(

)SNODE S2

SNODE SI

Figure

3.

The RNODE

building,

block

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical Synthesis Planning by Computer

173

SUCCEEDED, whether by i t s RNODE or the SNODE tasks or even by i t s designating a molecule which i s a v a i l a b l e i n the Catalog of S t a r t i n g Compounds.

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

Under c e r t a i n circumstances a synthesis f o r the fromnode N3 of the DNODE could be attempted independently and i t s success w i l l d i s a l l o w the RNODE R2 from further c o n s i d e r a t i o n during search. In that case, the success of the DNODE i s s u f f i c i e n t to guarantee the success of RNODE R l and thereby of NODE Nl. The DNODE can succeed e i t h e r when a path i s found by working forwards from FNODE F2 t o F3 or by working backwards from SNODE S2 t o SI. Figure 4 i l l u s t r a t e s the canproduce relation among p a i r s of FNODES. Working forwards from F l i f a molecule M2 r e s u l t s from a transformation i n v o l v i n g Ml then the corresponding FNODE F l can produce F2. The s t r u c t u r e s e x h i b i t e d here are only the b u i l d i n g blocks of the search model. By s u i t a b l y c o n t r o l l i n g and guiding the e x p l o r a t i o n , the search can take on great v a r i e t y t r a v e r s i n g the s t a t e space forwards or backwards or t r a v e r s i n g the planning space. The r u l e s e t can be made to contain broad i n j u n c t i o n s such as "Do not explore a node i n the forward d i r e c t i o n i f i t was created by i n s t a n t i a t i n g a SNODE, i . e . a node to proceed i n the r e t r o s y n t h e t i c d i r e c t i o n " ' or "When you add an FNODE immediately i n s t a n t i a t e a revised DNODE type task" as shown i n Figure 5. Of course, these two r u l e s are used here only as examples and i n given s i t u a t i o n s one might wish to advise the system otherwise. The e s s e n t i a l point i s that a f l e x i b l e form of search guidance s p e c i f i c a t i o n i s a v a i l a b l e and can be used t o bring t o bear on a given problem a wide v a r i e t y of h i n t s , suggestions and advice that would be d i f f i c u l t i n a standard H e u r i s t i c Search program. Overcoming some d i f f i c u l t i e s of H e u r i s t i c Search. One cause of the Mesa Phenomenon i n the case of chemical synthesis i s the use of f u n c t i o n a l group substitution reactions while working i n the

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

Ο

node

ORGANIC SYNTHESIS

FNODE F2 product

\

canproduce transform

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

FNODE Fl Figure

4.

a p p l i e d

FNODE

t 0

building

T

R

A

N

S

F

0

R

M

block

( A D D E D ( S N O D E S)) => ( A S S E R T ((S node) fnode N I L ) ) (* No forward exploration if S was given as a retrosynthetic goal) (ADDED (FNODE D)) => (FILLIN (INSTANTIATE ( D N O D E (fromnode &(N node)) (tonode & ( N (canbeproducedfrom node dnode tonode ) ) ) ) ) ) (* If Ν was generated by forward exploration try asserting a D N O D E subgoal if the information needed is there)

Figure

5.

Sample guidance

rules

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Vlanning

by

Computer

175

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

retrosynthetic direction. I f a molecule containing several f u n c t i o n a l groups i s selected f o r sprouting then performing f u n c t i o n a l group s u b s t i t u t i o n s often yields precursors that have nearly the same merit as the target molecule. The presence of several f u n c t i o n a l groups only aggravates the s i t u a t i o n . Within FLEXI t h i s c l a s s of r e a c t i o n s could be used mainly f o r transforming a molecule to another when they are s t r u c t u r a l l y s i m i l a r , i . e . a DNODE type task which might be c a r r i e d out most favorably using f u n c t i o n a l group s u b s t i t u t i o n s . Thus, avoiding the use of these reactions i n the r e t r o s y n t h e t i c d i r e c t i o n should prevent the problem. As further s p e c i f i c problems are i s o l a t e d and solved, the framework of FLEXI may help us to submit the proper r u l e s of search guidance to the system. CONCLUSION The H e u r i s t i c Search method i s b a s i c a l l y very simple. I t involves a s e r i a l processor working backwards that s e l e c t s subgoals and transforms by asking "What next?". A good implementation of the h e u r i s t i c method includes a SEARCH MODEL which i s a symbolic representation of the progress of search. The Problem Solving Graph that i s commonly used to guide search i s a u s e f u l but l i m i t e d search model. Symbolization i s the key to Reasoning and the computer can reason only about things i t can handle s y m b o l i c a l l y . Furthermore, the richness of the search model c o n t r i b u t e s to the conduct of an i n t e l l i g e n t search. In t h i s paper a search modelling system i s described by examples. This system allows the user to describe rather than to program the search model and to associate c o n s t r a i n t s that govern the growth of the model. The system provides a Rule Language based on the user described search model and the user may then p r e s c r i b e r u l e s i n the form of Production Rules. The r u l e s the user w r i t e s can be general, being v a l i d over a wide v a r i e t y of task s p e c i f i c a t i o n s , or can be b i t s of advice and h i n t s about a given problem. The paper concludes by showing how some of the standard d i f f i c u l t i e s with h e u r i s t i c search such as g e t t i n g locked into a plateau can be overcome by s u i t a b l e

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

176

techniques o f s e a r c h m o d e l l i n g . The m o d e l l i n g i d e a s shown h e r e c a n be used t o c o n t r o l and specify p r o t e c t i o n r e a c t i o n s and f o r sequencing f u n c t i o n a l group i n t r o d u c t i o n . I t i s also possible to carry out different s t y l e s o f e x p l o r a t i o n v a r y i n g t h e emphasis on t h e d i r e c t i o n a l i t y and space i n w h i c h t h e s e a r c h i s conducted. The applicability o f t h e model t o complement a h e u r i s t i c search process i s now made c l e a r , however i t s a c t u a l use i n c h e m i c a l s y n t h e s i s planning awaits the p a r t i c i p a t i o n of a chemist collaborator !

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

ACKNOWLEDGMENT I wish t o thank Prof. Srinivasan for h i s stimulating discussions of h i s modelling ideas r e p r e s e n t e d i n MDS w i t h o u t which my u n d e r s t a n d i n g o f MDS would be meager. A s u b s e t o f MDS adequate t o d e a l w i t h PSG and FLEXI i s b e i n g programmed and made a v a i l a b l e b u t so f a r no s y n t h e s i s problems have been t r i e d o u t i n t h e s e frameworks. The p r e s e n t system i s implemented i n t h e language FUZZY [ L e F a i v r e , 1974]. I look forward t o the continued cooperation of Prof. L e F a i v r e i n t h e f u t u r e and h i s h e l p i n t h e p a s t i s hereby g r a t e f u l l y acknowledged. I w i s h t o thank P r o f . S a u l Amarel f o r h i s e x p e r t a d v i c e upon r e a d i n g t h i s paper.

REFERENCES 1.

C h a r l e s Schmidt [1976] U n d e r s t a n d i n g Human Action: R e c o g n i z i n g t h e P l a n s and M o t i v e s o f Other P e r s o n s , i n Cognition and Social B e h a v i o r , J. Carroll & J. Payne (editors), Lawrence Earlbaum P r e s s , 1976.

2.

N.S. S r i d h a r a n [1975] The Architecture o f BELIEVER: A System for Intepreting Human Actions. T e c h n i c a l Report RUCBM-TR-46, Department o f Computer S c i e n c e , University, New Brunswick N J .

3.

Rutgers

N.S. S r i d h a r a n [1976] The Architecture of BELIEVER-Part II. The Frame and Focus Problems in AI. T e c h n i c a l Report RUCBM-TR-47, Department o f Computer S c i e n c e , R u t g e r s University, New Brunswick N J .

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

7. SRIDHARAN

Chemical Synthesis Planning by Computer

177

4.

Chitoor Srinivasan [1973] The Architecture o f a Coherent I n f o r m a t i o n System. Advanced P a p e r s o f t h e T h i r d International Joint C o n f e r e n c e on Artificial Intelligence, S t a n f o r d , 1973.

5.

Chitoor Srinivasan [1976] An I n t r o d u c t i o n t o t h e M e t a - D e s c r i p t i o n System. T e c h n i c a l Report SOSAP-TR-18, Department o f Computer Science, Rutgers University, New Brunswick NJ.

6.

N.S. S r i d h a r a n [1974] A Heuristic Program t o D i s c o v e r S y n t h e s e s f o r Complex O r g a n i c M o l e c u l e s . P r o c e e d i n g s o f the IFIP74 (International Federation for Information Processing) C o n g r e s s , S t o c k h o l m , August 1974.

7.

N.S. S r i d h a r a n [1971] An Application of Artificial Intelligence t o the D i s c o v e r y o f Complex O r g a n i c S y n t h e s i s . D o c t o r a l Dissertation, Department o f Computer S c i e n c e , SUNY a t Stony B r o o k , 1971.

8.

N.S. S r i d h a r a n [1973] Search Strategies f o r the t a s k o f O r g a n i c C h e m i c a l S y n t h e s i s . Advanced Papers o f the T h i r d International Joint Conference on Artificial Intelligence, Stanford, 1973.

9.

H. G e l e r n t e r [1973] The D i s c o v e r y o f O r g a n i c S y n t h e t i c Routes by Computer. T o p i c s i n C u r r e n t C h e m i s t r y , Volume 41, S p r i n g e r - V e r l a g , 1973.

10.

H e r b e r t G e l e r n t e r [1976] Empirical E x p l o r a t i o n s o f SYNCHEM, An Application o f Artificial Intelligence t o the Problem o f Computer-Directed Organic S y n t h e s i s D i s c o v e r y . This volume. 11.

E . J . Corey & W.T. Wipke [1969] C o m p u t e r - a s s i s t e d D e s i g n o f Complex O r g a n i c S c i e n c e , Volume 166 (178), 1969.

Synthesis,

12.

Todd Wipke [1973] C o m p u t e r - A s s i s t e d Three D i m e n s i o n a l S y n t h e t i c Analysis. I n Computer R e p r e s e n t a t i o n and M a n i p u l a t i o n o f C h e m i c a l I n f o r m a t i o n . W.T. Wipke et. al. (editors), John W i l e y (New York) 1974.

13.

Todd Wipke [1976]

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

178

COMPUTER-ASSISTED ORGANIC SYNTHESIS S E C S — S i m u l a t i o n and evaluation o f C h e m i c a l S y n t h e s i s : S t r a t e g y and P l a n n i n g . T h i s volume.

14.

I v a r U g i [1976] The S y n t h e t i c D e s i g n Program MATSYN as a p a r t o f MATCHEM, A System o f Computer Programs f o r t h e D e d u c t i v e Solution o f C h e m i c a l P r o b l e m s . T h i s volume.

15.

Nils Nilsson [1971] Problem S o l v i n g Methods in McGraw Hill (New York) 1971.

Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

16.

Artificial

A l l e n N e w e l l & H e r b e r t Simon [1972] Human Problem Solving. Prentice-Hall 1972.

Intelligence.

(New J e r s e y )

17.

H e r b e r t G e l e r n t e r [1962] Machine-Generated Problem S o l v i n g Graphs. P r o c e e d i n g s o f a Symposium on t h e M a t h e m a t i c a l Theory o f Automata Polytechnic Institute o f B r o o k l y n P r e s s (New York) 1962. 18. James Slagle [1971] Artificial Intelligence: The Heuristic Programming Approach. M c G r a w - H i l l , New Y o r k , 1971. 19.

R a n d a l l D a v i s and J o n a t h a n K i n g [1975] An Overview o f P r o d u c t i o n Systems, S t a n f o r d Artificial Intelligence L a b o r a t o r y Memo AIM-271, October 1971.

20.

George E r n s t & Allen N e w e l l [1969] GPS: A Case Study in Generality and Problem S o l v i n g . Academic P r e s s (New York), 1969.

21.

M a r v i n M i n s k y [1963] S t e p s Toward Artificial Intelligence. I n Computers and Thought, by Edward Feigenbaum and Julian Feldman (editors), M c G r a w - H i l l (New York) 1963.

22.

S a u l Amarel [1969] P r o b l e m - S o l v i n g and D e c i s i o n - M a k i n g by Computer: An Overview. In Cognition: A Multiple V i e w , G a r v i n (editor), S p a r t a n Books (Washington) 1970.

23.

R i c h a r d L e F a i v r e [1974] F u z z y P r o b l e m - S o l v i n g , Ph.D. Dissertation, Computer S c i e n c e Department, University o f W i s c o n s i n , Madison, 1974. The R e p r e s e n t a t i o n o f F u z z y Knowledge, J o u r n a l o f C y b e r n e t i c s , volume 4, p57-66, 1974.

Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.