LHASALogic and Heuristics Applied to Synthetic Analysis

TYPE NODE. -LINEAGE .... "no" · Depending on the answer, the type of the next question to ..... how densely the chemical data is packed (one qualifie...
0 downloads 0 Views 2MB Size
1 LHASA—Logic and Heuristics Applied to Synthetic Analysis

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

DAVID A. PENSAK Central Research and Develop. Dept., Ε. I. du Pont de Nemours and Co., Wilmington, Del. 19898 E. J. COREY Dept. of Chemistry, Harvard University, Cambridge, Mass. 02138

Despite the wealth o f knowledge about various chemical r e a c t i o n s , there e x i s t s no formal framework of i n t e r r e l a t i o n s h i p s t o guide the chemistint h e synthesis o f even moderately complex molecules. The LHASA (Logic and H e u r i s t i c s Applied t o Synthetic A n a l y s i s ) p r o j e c t is an attempt t o c o d i f y and organize the techniques used in organic synthesis. One important aspect o f the p r o j e c t has been t h e w r i t i n g o f a general purpose computer program which will a i d the laboratory chemist and will employ both the b a s i c and more complex techniques f o r s y n t h e t i c design as e l u c i d a t e d by this study. The program (hereafter also c a l l e d LHASA) is intended t o propose a v a r i e t y o f s y n t h e t i c routes t o whatever molecule it is given. The responsibility f o r final e v a l u a t i o n o f the merit o f the routes lies with the chemist. The program is t o be an adjunct t o the laboratory chemist as much as any analytical t o o l . Since LHASA is incapable o f proposing any routes the chemist could not have thought o f by himself, i . e . , it does not suggest new reactions that have never been tried, there needs t o be some justification for the massive e f f o r t involved in writing the program. It is well known that humanity and creativity b r i n g w i t h them c e r t a i n unavoidable shortsightedness and p r e j u ­ d i c e s . There will be particular reactions with which 1 In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

each chemist i s most f a m i l i a r and i t i s to these that he w i l l f i r s t look to f i n d h i s s y n t h e t i c r o u t e ( s ) . I t i s p r e c i s e l y t h i s f a i l u r e to consider a l l p o s s i b l e routes t h a t makes a program l i k e LHASA both u s e f u l and necessary. Computers are w e l l known f o r t h e i r a b i l i t y to perform rote tasks a great number of times without complaint. Examination of p o t e n t i a l s y n t h e t i c path­ ways may be broken down i n t o s u f f i c i e n t l y small steps as to be amenable to computer implementation. How should one go about designing a synthesis? One of the most b a s i c techniques i s to work back­ wards, the f i n a l product or t a r g e t molecule being the u l t i m a t e g o a l . This t a r g e t i s examined to f i n d any or a l l compounds which can be transformed i n t o i t i n a s i n g l e chemical step. Each of these precursors may then be s i m i l a r l y analyzed u n t i l s a t i s f a c t o r y s t a r t i n g m a t e r i a l s are obtained. This method of a n a l y s i s i s c a l l e d r e t r o s y n t h e t i c (or, e q u i v a l e n t l y , a n t i t h e t i c ) . A s t r u c t u r a l m o d i f i c a t i o n that i s being performed i n the r e t r o s y n t h e t i c d i r e c t i o n i s c a l l e d a transform and i s g r a p h i c a l l y depicted as a double arrow.

CH

3

When applied i n i t s most general form, r e t r o ­ s y n t h e t i c a n a l y s i s could be applied to every pre­ cursor of the t a r g e t molecule and then, i n t u r n , to each of the new s t r u c t u r e s . The g r a p h i c a l represen­ t a t i o n of such an a n a l y s i s i s c a l l e d a synthesis t r e e . (A complex example i s shown below). I t i s worthwhile to note that s t r u c t u r e s tend to be l e s s complex the f u r t h e r away they are from the t a r g e t molecule and no c o n s t r a i n t s are placed on the choice of r e a c t i o n s . The s t a r t i n g m a t e r i a l s are the l a s t to be generated, thereby maintaining f l e x i b i l i t y i n the choice of route u n t i l the end of the a n a l y s i s .

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK A N D COREY

LHASA

3

WRITE-THRU

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

MAKE

KILL

SAVE

RESTORE

TYPE NODE

-LINEAGE

-FAMILY

GET NODE

-LINEAGE

-FAMILY

SCOPE1

PROCESS

RESTART

MENU2

-ALL

Of considerable importance t o the way a synthesis i s analyzed i s the d e t a i l e d plan o f execution. The ordering o f the a d d i t i o n (or removal) o f i n d i v i d u a l f u n c t i o n a l i t i e s , the manipulation o f stereocenters, and the c l o s u r e o f rings can be o f c r u c i a l importance i n terms o f i n t e r f e r e n c e s o r competing r e a c t i o n s . The procedures f o r choosing the sequence i n which these d i s c r e t e steps are performed are c a l l e d s t r a t e g i e s . At present there are three broad categories o f s y n t h e t i c strategy that LHASA i s capable o f employing. They are 1) Opportunistic o r F u n c t i o n a l Group Based Strategies 2) S t r a t e g i c Bond Disconnections f o r P o l y c y c l i c Targets 3) S t r a t e g i e s Based on S t r u c t u r a l Features

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

4

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

a) Appendages b) Rings (small, common, medium-large) c) Masked F u n c t i o n a l i t y The aim o f the LHASA p r o j e c t has been the c r e a t i o n o f a computer program which employs the s t r a t e g i e s gleaned from the study o f s y n t h e t i c design. Such a program now e x i s t s though i t i s c o n t i n u a l l y undergoing m o d i f i c a t i o n and expansion as new s t r a t e ­ gies are e l u c i d a t e d and implemented. The remainder o f t h i s paper describes the o v e r a l l o r g a n i z a t i o n o f the LHASA program and the implementation o f these s t r a t e ­ g i e s . P a r t i c u l a r a t t e n t i o n i s paid t o those aspects of LHASA which are o f s p e c i a l i n t e r e s t t o s y n t h e t i c chemists o r t o computer s c i e n t i s t s working i n chemical areas. ORGANIZATION OF LHASA The LHASA program i s exceedingly complex - about 400 subroutines, 30,000 l i n e s o f FORTRAN code and a data base o f over 600 common chemical r e a c t i o n s . To describe i t i n d e t a i l i s w e l l beyond the scope o f t h i s paper. A general overview i s relevant as i t puts the f u n c t i o n s o f the data base i n a reasonable perspective. Figure 1 shows a g l o b a l view o f LHASA. 1

•See f o r example Corey, E. J . , W. J . Howe and D. A. Pensak, J . Amer. Chem. S o c , 2& 7724 (1974). Corey, E. J . , Quart. Rev. Chem. S o c , 25, 455 ( 1 9 7 1 ) · Corey, E. J . , W. T. Wipke, R. D. Cramer I I I and W. J . Howe, J . Amer. Chem. S o c , £ 4 , 421 (1972). Corey, E. J . , W. L. Jorgensen, J . Amer. Chem. S o c , 28,

189 (1976).

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK AND COREY

5

LHASA

PERCEPTION

GRAPHICS

\

EXECUTIVE

CHEMISTRY PACKAGES

STRUCTURE

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

TRANSFORM EVALUATION

MANIPULATION

CHEMISTRY DATA BASE Figure 1 Sample recognition questions

GRAPHICS There i s no doubt that the part o f LHASA which makes the most immediate impact on chemists i s the graphics. He may draw i n the s t r u c t u r e that he i s i n t e r e s t e d i n analyzing using standard chemical con­ ventions and a l l communication from the program t o him i s v i a s t r u c t u r a l diagrams. This manner o f i n t e r f a c ­ i n g w i t h the chemist-user was chosen because i t has been shown t h a t the rate a t which he can a s s i m i l a t e chemical data i s maximized i f i t i s i n the n o t a t i o n w i t h which he i s most f a m i l i a r . As the chemist s i t s f a c i n g the CRT (cathode ray tube) with which he communicates w i t h LHASA, on h i s r i g h t i s a d i g i t i z i n g data t a b l e t . This i s a device which measures the two dimensional coordinates o f t h e s t y l u s o r pen which i s used f o r i n p u t t i n g s t r u c t u r e s . As the s t y l u s i s moved along the surface o f the t a b l e t , LHASA i s t o l d by the t a b l e t where the s t y l u s touched the surface and where i t had moved t o when i t was l i f t e d . A l i n e i s displayed on the CRT between these two p o i n t s e x a c t l y l i k e a bond being drawn on a sheet o f paper. The LHASA graphics routines recognize t h a t two atoms are required t o make t h i s bond and makes appropriate i n t e r n a l e n t r i e s i n the program data base. Conforming t o standard s t r u c t u r e conventions, an atom

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

6

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

i s carbon unless otherwise i n d i c a t e d and s u f f i c i e n t hydrogens are assumed to f i l l out valence* In short, LHASA graphics l e t the chemist enter the s t r u c t u r e exactly as he would draw i t on a sheet o f paper. When a m u l t i p l e bond i s d e s i r e d , i t i s necessary only t o t r a c e over the s i n g l e bond one (or two) a d d i t i o n a l times. LHASA responds by redrawing the bond as double (or t r i p l e ) . I n d i c a t i n g stereochemistry i s e s s e n t i a l l y the same, the appropriate i n d i c a t o r (wedged o r dotted) i s chosen and the d e s i r e d bond i s traced w i t h the s t y l u s (see below). STORE

END

REPLAY

SCAN

SC0PE2

PROCESS

ORAM

MOVE

DELETE

WIPE

STEREO

MENU2

H UP

O

N

C

OOMN

P

S

LEFT

F

C

RIGHT

L

B SHALL

R

I

X

-

SINGLE-CRT



ic

ATOMNO

A

Ε BONONO

Considerable e f f o r t was expended t o insure that no a r t i s t i c t a l e n t i s required i n s t r u c t u r a l i n p u t . A reasonable amount o f inaccuracy i s permitted i n p o i n t ­ ing t o an atom o r bond. LHASA determines what was intended and acts a c c o r d i n g l y . There i s no need t o worry about consistency o r p r e c i s i o n o f bonds o r angles. With the one exception o f i n d i c a t i n g c i s trans isomerism around double bonds, i t makes no d i f f e r e n c e how d i s t o r t e d the s t r u c t u r e i s i n p u t . LHASA works s o l e l y from information about connec­ t i v i t y . This f l e x i b i l i t y i s q u i t e u s e f u l f o r drawing

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK AND COREY

7

LHASA

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

s t r u c t u r e s i n a p a r t i c u l a r conformation. The program w i l l process i t c o r r e c t l y and d i s p l a y a l l o f f s p r i n g i n the same conformation. The only other times t h a t the chemist must com­ municate w i t h LHASA are when d e c i s i o n s are to be made about which s t r u c t u r e i s to be analyzed and what method i s to be sed. I n a l l cases a t a b l e o f choices (or menu) i s displayed and the chemist merely p o i n t s to the one (or ones) that he wishes. A sample menu i s shown below. SINGLE GROUP

GROUP PAIR

DEBUG

FULL SEARCH

FULL SEARCH

TREE

BOND MODE

EXIT

BOND MODE

NARROW MODE SUBGOALS MANUAL MODE

KEY SUBSTRUCTURES

SEQUENTIAL FGI DIELS-ALDER ISOLATED STRAT BOND

APPENDAGE CHEMISTRY ROBINSON k+2 RING APPNDG ONLY

DISCONNECTIVE

ROBINSON X-3 BRANCH APPNDG ONLY

RECONNECTIVE UNMASKING

SMALL RINGS PERCEPTION ONLY

STEREOSPECIFIC C=C

At no time i s he forced to l e a r n any s p e c i a l command formats or memorize l i s t s o f o p t i o n s . The LHASA graphics and c o n t r o l s t r u c t u r e s were s p e c i f i c a l l y designed to be as easy and n a t u r a l f o r the chemist t o use as p o s s i b l e . PERCEPTION Inherent i n a s t r u c t u r a l diagram i s a wealth o f information - r i n g s , f u n c t i o n a l groups, stereo­ chemistry, e t c . To be as e f f e c t i v e as p o s s i b l e , LHASA must recognize a l l o f these and u t i l i z e them i n i t s planning processes. The procedures by which these are c a l l e d p e r c e p t i o n . By v i r t u e of having very s o p h i s t i c a t e d perception r o u t i n e s , LHASA can avoid f o r c i n g the chemist t o input

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

8

COMPUTER-ASSISTED ORGANIC SYNTHESIS

any a r t i f i c i a l (to him) i n f o r m a t i o n . An unexpected adjunct o f t h i s has been a guarantee of perceptual completeness. For example, consider the f o l l o w i n g s t r u c t u r e (the non-indole p o r t i o n o f the a l k a l o i d ajmaline).

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

HO

There are three six-membered r i n g s , one five-membered r i n g and one seven-membered r i n g yet few chemists perceived a l l of them. I f the s t r u c t u r e were redrawn as HO

a d i f f e r e n t , though s t i l l incomplete set of r i n g s i s recognized by the human. The point of t h i s example i s that LHASA must perceive r i n g s s o l e l y on the b a s i s of c o n n e c t i v i t y , not how the s t r u c t u r e i s drawn. The program would be useless i f i t missed syntheses based on c y c l i c substructures because the chemist had f a i l e d to i n d i c a t e a l l the r i n g s i n the molecule. RING PERCEPTION Many researchers have attacked the problem of f i n d i n g the set of c y c l e s i n a graph. T h e i r work 2

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK AND COREY

LHASA

9

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

has p r i m a r i l y been d i r e c t e d towards i d e n t i f y i n g the smallest set rings o f rings i n the network. For chem­ i c a l purposes, t h i s i s not s u f f i c i e n t however. F o r example, the s t r u c t u r e below

i s best synthesized by the D i e l s Alder a d d i t i o n shown, but the six-membered r i n g formed i s not part of the minimal c y c l i c b a s i s o f the molecule. I t i s necessary, t h e r e f o r e , t o redefine our problem as that of f i n d i n g the set o f cycles i n a graph which are o f chemical s i g n i f i c a n c e . For s y n t h e t i c purposes, rings must be s p l i t i n t o two classes - r e a l and pseudo. For each bond i n a molecule, the smallest r i n g c o n t a i n i n g that bond i s c a l l e d a r e a l r i n g . I t i s q u i t e p o s s i b l e that the number o f r e a l rings w i l l be greater than the c y c l i c order of the molecule (/ bonds - / atoms + 1). Pseudo r i n g s are the pairwise envelopes o f r e a l rings with the r e s t r i c t i o n that the s i z e o f the envelope be seven or l e s s . Real rings are u s e f u l because the chemistry of a bond i s best r e f l e c t e d by the s i z e o f the s m a l l ­ est r i n g c o n t a i n i n g i t - f o r example, the f u s i o n bond i n the s t r u c t u r e below.

2 See f o r example Paton, K., Commun. Assoc. Comput. Mach., 12, 514 (1969).

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

10

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Pseudo rings are u s e f u l because they are the r i n g s which are o f t e n formed i n the c o n s t r u c t i o n o f bridged molecules.

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

STRATEGIC BONDS - CYCLIC There are u s u a l l y c e r t a i n bonds i n a molecule whose disconnection i n the r e t r o s y n t h e t i c d i r e c t i o n leads t o a s i g n i f i c a n t s i m p l i f i c a t i o n o f the c y c l i c s t r u c t u r e . These are termed s t r a t e g i c bonds. Since these have been described i n d e t a i l elsewhere, we s h a l l consider them here only b r i e f l y . ^ The f i r s t premise o f s t r a t e g i c bonds i s that the chemical a c t i v i t y o f a bond i s a d i r e c t f u n c t i o n o f the s i z e o f the smallest r i n g c o n t a i n i n g i t . This leads t o the requirement that a s t r a t e g i c bond must be i n a r i n g o f f i v e , s i x , o r seven members and not i n or exo t o a c y c l o p r o p y l r i n g . A s t r a t e g i c bond must also be i n the r i n g ( i f any) with the maximum number of bridges on i t . This insures t h a t i t s disconnection w i l l s i m p l i f y the c y c l i c network as much as p o s s i b l e . The s t r u c t u r e below shows the power o f t h i s h e u r i s t i c . 0

HO

5 Corey, E. J . , W. J . Howe, H. W. Orf, D. A. Pensak and G. A. Petersson, J . Amer. Chem. Soc, 2L 6ll6 (1975).

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK A N D COREY

11

LHASA

Other r e s t r i c t i o n s on s t r a t e g i c bonds prevent them from being aromatic and from l e a v i n g c h i r a l side chains. These requirements are also based on current chemical technique.

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

FUNCTIONAL GROUP PERCEPTION F u n c t i o n a l i t y i n molecules i s w e l l defined. There i s no argument about whether a group i s a ketone o r not. The problem i n LHASA i s t o perceive a l l groups and j u x t a p o s i t i o n s o f groups which can be chemically meaningful. To t h i s end, a context dependent grammar has been developed t o unambiguously represent the p h y s i c a l domain o f a group and the s i t e ( s ) at which i t can be expected t o r e a c t . This grammar defines which atoms i n the group are considered o r i g i n s . Each f u n c t i o n a l group i s c h a r a c t e r i z e d by at l e a s t one carbon atom which i s i t s attachment t o the r e s t o f the molecule. This i s c a l l e d an o r i g i n atom and i t i s around these o r i g i n s t h a t many o f the data t a b l e s i n LHASA are organized. As an example, an alpha-beta unsaturated ketone has two o l e f i n i c p o s i t i o n s w i t h s i g n i f i c a n t l y d i f f e r e n t a f f i n i t y t o e l e c t r o p h i l i c reagents. To consider the double bond as one group w i t h constant r e a c t i v i t y i s an unreasonable s i m p l i f i c a t i o n , but r e c o g n i z i n g i t as two o r i g i n atoms each w i t h o l e f i n i c character makes s u i t a b l e d i f f e r e n t i a t i o n p o s s i b l e . To accomplish t h i s r e c o g n i t i o n e f f i c i e n t l y , a t a b l e d r i v e n approach was chosen since the types o f groups t o be recognized change from time t o time, depending on the needs o f those using the program ( s i x t y - f o u r d i f f e r e n t groups are c u r r e n t l y recognized by LHASA, see Table 1). This c o n s i s t s o f a s e r i e s o f questions which can be answered w i t h e i t h e r a y e s o r "no" · Depending on the answer, the type o f the next question t o be asked i s s p e c i f i c a l l y determined. F o r n

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

,!

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

Table 1 KETONE ALDEHYDE ACID ESTER AMIDE*1 AMIDE*2 AMIDE*? ISOCYANATE ACID*HALIDE THIOESTER AMINE*} AZIRIDINE AMINE*2 AMINE*1 NITROSO DIAZO HALOAMINE HYDRAZONE OXIME IMINE THIOCYANATE ISOCYANIDE NITRILE AZO HYDROXYLAMINE NITRO AMINEOXIDE THIOL EPISULFIDE SULFIDE SULFOXIDE

SULFONE C*SULFONATE LACTAM PHOSPHINE PHOSPHONATE EPOXIDE ETHER PEROXIDE ALCOHOL NITRITE 0*SULFONATE FLUORIDE CHLORIDE BROMIDE IODIDE DIHALIDE TRIHALIDE ACETYLENE OLEFIN HYDRATE HEMIKETAL KETAL HEMIACETAL ACETAL AZIDE DISULFIDE ALLENE LACTONE VINYLW VINYLD ESTERX AMIDZ

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK A N D COREY

LHASA

example, suppose a carbon-nitrogen t r i p l e bond has been found the questions would look, l i k e C=N

?

yes >

isocyanide

yes

-C ? Ino

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

thiocyanate

yes

c-s-•CsNf

?

no v nitrile

The a c t u a l data t a b l e which d r i v e s t h i s r e c o g n i t i o n process i s reproduced below. A22 A25 A24 A25 A26 A27 A28 A29

AJO A31 A32 A3 3 A34 A35 A3 6 A37 A38 A39

A40 A4l A42 A43

A44 A45 A46 A47

A48 A49

A50 A51 A52 A53

A54 A55

A56 A57 A58

A59 A60

A6l A62 A63 A64 A65

LOC A 2 3 NULL LOC A 2 5 NULL LOC A 2 7 LOC A 2 5 LOC A29 LOC A30 LOC A31 NULL LOC A33 LOC A34 LOC A35 LOC A36 NULL LOC A38 LOC A 3 9 LOC A40 LOC A4l NULL NULL LOC A44 NULL LOC A46 LOC A 4 7 LOC A48 NULL LOC A50 LOC A51 LOC A52 LOC A 5 3 LOC A54 LOC A55 LOC A56 NULL LOC A 5 8 LOC A59 LOC A 6 0 LOC A6l LOC A62 LOC A63 NULL LOC A65 NULL

LOC A24 NULL LOC A 2 6 NULL LOC A 2 8 LOC A 2 8 LOC A32 NULL LOC A31 NULL LOC A33 LOC A43 LOC Α35· LOC A37 NULL NULL LOC A 3 9 LOC A40 LOC A42 NULL NULL LOC A 4 5 NULL LOC A66 LOC A 4 7 LOC A 4 9 NULL LOC A 5 7 NULL LOC A52 LOC A 5 3 LOC A56 LOC A55 LOC A56 NULL NULL LOC A59 LOC A 6 0 LOC A64 LOC A62 LOC A63 NULL LOC A63 NULL

SHIFT + IF CARBON*COUNT IS TWO IDENTIFIED AS KETONE IF HYDROGEN*COUNT IS TWO IDENTIFIED AS ALDEHYDE IF HYDROGEN*COUNT IS ONE IF CARBON*COUNT IS ONE SEARCH FOR C**N SHIFT + SEARCH FOR C*N NONORIGIN ENTRY IDENTIFIED AS ISOCYANATE ENTRY BOND*SHARED SEARCH FOR C*0 BOND*SHARED SHIFT + IF HYDROGEN*COUNT IS ONE IDENTIFIED AS ACID SEARCH FOR C*0 SHIFT + NONORIGIN BOND*SHARED SHIFT + IF IN RING OF ANY SIZE IDENTIFIED AS LACTONE IDENTIFIED AS ESTER SEARCH FOR C*X IDENTIFIED AS ACID*HALIDE SEARCH FOR C*N BOND*SHARED SHIFT + IF HYDROGEN*COUNT IS TWO IDENTIFIED AS AM3DE*1 SHIFT + IF IN RING OF ANY SIZE SHIFT + SEARCH FOR C*N SHIFT + NONORIGIN SHIFT + BOND*SHARED SEARCH FOR C*N SHIFT + NONORIGIN BOND*SHARED IDENTIFIED AS LACTAM IDENTIFIED AS LACTAM BOND*SHARED SHIFT + NONORIGIN SHIFT + SEARCH FOR C*N BOND*SHARED SHIFT + NONORIGIN IDENTIFIED AS AMIDE*3 IF HYDROGEN*COUNT IS ONE IDENTIFIED AS AMIDE*2

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

14

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

FUNCTIONAL GROUP REACTIVITY Frequently, during the experimental r e a l i z a t i o n of a s y n t h e t i c plan, c e r t a i n f u n c t i o n a l groups w i l l i n t e r f e r e w i t h the performance o f desired r e a c t i o n s . When t h i s happens, i t becomes necessary t o protect the offending group ( r e v e r s i b l y modify i t t o some other f u n c t i o n a l i t y that i s s t a b l e t o t h e r e a c t i o n c o n d i t i o n s ) . The extension o f computer a s s i s t e d synthetic analysis to sophisticated levels necessi­ t a t e s the d e t e c t i o n o f p o s s i b l e i n t e r f e r e n c e s . Such s i t u a t i o n s must be presented t o t h e chemist i n a g e n e r a l l y u s e f u l manner. This problem has been attacked i n LHASA by the separation o f f u n c t i o n a l group i n t o d i f f e r e n t c l a s s e s based on t h e i r e l e c t r o n i c and s t e r i c environment. At the same time a l i b r a r y o f standard reagents (cur­ r e n t l y 60) has been prepared c o n t a i n i n g the s t a b i l i t y of each o f the c l a s s e s o f f u n c t i o n a l groups t o each reagent. By t h i s mechanism the program can decide whether groups o f an i d e n t i c a l o r s i m i l a r type w i l l i n t e r f e r e w i t h the transform. F o r example, i n the s t r u c t u r e below i t i s p o s s i b l e t o s e l e c t i v e l y hydrogenate bond A i n the presence o f Β

From a computational point o f view, f u n c t i o n a l group r e a c t i v i t y i s s t r a i g h t f o r w a r d . Associated w i t h each group o r i g i n i s a number which unambiguously defines the environment o f the o r i g i n . These i n c l u d e s t e r i c hindrance and a c c e s s i b i l i t y , s t r a i n , and e l e c t r o n i c environment. I t i s important t o note that a group can be i n s e v e r a l subclasses simultaneously, a l l o f which must be encoded i n t o the one number. This number i s used t o assign r e a c t i v i t y l e v e l s t o each o r i g i n r e l a t i v e t o each reagent.

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

1.

PENSAK AND COREY

15

LHASA

At the time of attempted r e a c t i o n execution, the program reads the d e s i r e d conditions from the data base. The r e a c t i v i t y l e v e l s of a l l n o n - p a r t i c i p a t i n g groups are examined. I f a l l are l e s s r e a c t i v e than the p a r t i c i p a t i n g group(s), then the transform i s allowed to proceed. I f t h i s i s not the case, the offending group(s) i s examined to see i f i t i s g e n e r a l l y p r o t e c t a b l e . I f i t i s then a s o l i d r e c ­ tangle i s drawn around the group as the transform i s displayed to the chemist. Unstable, unprotectable groups are g r a p h i c a l l y i n d i c a t e d by a dashed box around t h e i r bonds.

LHASA does not t r y to assign s p e c i f i c p r o t e c t i n g groups. There i s j u s t so much chemical d e t a i l that would have to be programmed that the i n t e r a c t i v e aspect of LHASA would be severely degraded. An addi­ t i o n a l problem e n t a i l s evaluating when i n the syn­ t h e t i c route i t would be best to protect and then deprotect the group(s). The program c u r r e n t l y deals w i t h the synthesis t r e e on a node by node b a s i s . A g l o b a l o p t i m i z a t i o n of the i n d i v i d u a l steps i n the t r e e i s one a d d i t i o n a l l e v e l of s o p h i s t i c a t i o n which has not yet been attempted. APPENDAGE BASED STRATEGIES The vast majority of m u l t i s t e p syntheses i n v o l v e e i t h e r the disconnection, reconnection, or m o d i f i c a ­ t i o n of what are l o o s e l y c a l l e d 'appendages . One p a r t i c u l a r l y u s e f u l r e t r o s y n t h e t i c strategy c o n s i s t s of fragmenting a r i n g and then disconnecting the r e s u l t i n g appendages, as shown below. 1

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

16

COMPUTER-ASSISTED ORGANIC SYNTHESIS

.0

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

S i m i l a r l y , s t r a t e g i e s i n v o l v i n g reconnection of appendages are e x c e p t i o n a l l y u s e f u l i n s t e r e o s p e c i f i c s y n t h e s i s . These reconnections have a l s o proven valuable i n the synthesis of medium s i z e r i n g s . ?T

5

OH I t i s important to note t h a t a l l stereochemistry i n these examples i s perceived by LHASA and used i n i t s strategies. There are two c l a s s e s o f appendages - r i n g appendages and branch appendages. A r i n g appendage i s a group o f atoms attached t o r i n g t h a t i s not i n a r i n g i t s e l f . A branch appendage may only o r i g i n a t e on a non-ring atom and must have three o r more attachments other than hydrogens. Non-terminal o l e f i n s and acetylenes are a l s o considered as o r i g i n s o f branch appendages f o r chemical reasons. Signifi­ cant i n the use of appendages i s the c o m b i n a t o r i a l problem o f determining i d e n t i c a l i t y o f appendages. This has been solved q u i t e e l e g a n t l y by Jorgensen. Appendage based s t r a t e g i e s may be d i v i d e d i n t o disconnective and reconnect!ve. The l a t t e r may be f u r t h e r p a r t i t i o n e d i n t o r i n g appendage - r i n g Corey, E. J . , W. L. Jorgensen, J . Amer. Chem. S o c , 28, 189 (1976).

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK AND COREY

LHASA

17

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

appendage, r i n g appendage-ring, and a c y c l i c recon­ n e c t i o n s . LHASA c u r r e n t l y knows about twenty d i f f e r e n t c l a s s e s of reconnective transforms - a small subset of the group p a i r chemistry data base (vide i n f r a ) . When i n a mode where these transforms are s p e c i f i c a l l y being executed, they are empowered to make s e v e r a l small s t r u c t u r a l m o d i f i c a t i o n s to achieve the d e s i r e d reattachment. THE CHEMISTRY PACKAGES OF LHASA F u n c t i o n a l Group Based Transforms As we have already seen, LHASA has a wide v a r i e t y of s t r a t e g i e s which i t can employ, e i t h e r of i t s own v o l i t i o n or by d i r e c t i v e of the chemist-user. In order t o f a c i l i t a t e the use of these s t r a t e g i e s , the chemical data base i n LHASA i s broken down i n t o s e v e r a l separate c a t e g o r i e s , two group transforms, one group transforms, f u n c t i o n a l group interchange, f u n c t i o n a l group a d d i t i o n and r i n g o r i e n t e d t r a n s ­ forms. This s e c t i o n w i l l describe each of these and g i v e b r i e f examples of t h e i r use. Two group transforms are keyed s p e c i f i c a l l y by two f u n c t i o n a l groups w i t h a path of predetermined l e n g t h between them. Examples of these are shown below.

One group transforms are s i m i l a r but are keyed by one s p e c i f i c group w i t h an a s s o c i a t e d path (not as

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

18

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

chemically meaningful as above). below.

Examples are shown

The world of chemistry would indeed be rosy i f there were always p r e c i s e matches against t h i s data base. Unfortunately, t h i s i s not o f t e n the case. Frequently, one of the group ( i n a two group s i t u a ­ t i o n ) does match, but the other one does not. I f the i n c o r r e c t group could r e a d i l y be converted i n t o a matching group, then the transform would become acceptable. As i n the A l d o l Condensation, i f the molecular fragment present were the ether, the match would not be found, yet the e t h e r i f i c a t i o n of the a l c o h o l can o f t e n be q u i t e s t r a i g h t f o r w a r d . I f the

«0^00 °^o^OO ^o*00 performance of the A l d o l i s considered a chemical ' g o a l , then the conversion of the ether to the a l c o h o l i s a 'subgoal . In t h i s case, the subgoal c o n s i s t e d of modifying a group or F u n c t i o n a l Group Interchange (FGI). 1

1

A more complicated case would e x i s t i f the second group necessary to key the transform was t o t a l l y absent. In the example below, the only f u n c t i o n a l substructure capable of keying a t r a n s ­ form would be the o l e f i n . To perform the A l d o l

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK AND COREY

19

LHASA

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

CO-XO-,00 transform i t would be necessary to add ( i n the r e t r o s y n t h e t i c d i r e c t i o n ) the a l c o h o l group. Such subgoals are c a l l e d F u n c t i o n a l Group A d d i t i o n (FGA). Obviously one does not want to always introduce a l l p o s s i b l e f u n c t i o n a l groups at a l l a v a i l a b l e p o s i t i o n s o r do i n d i s c r i m i n a n t group conversions without some guiding purpose or s t r a t e g y . As such, FGI's and FGA s are only executed i n response to a request from a higher l e v e l chemistry package. 1

Subgoal requests can be combined and mixed according to the s i t u a t i o n . Next to be added i s FGI then INTRO since not a l l groups may be INTRO ed. 1

Ring Oriented Transforms I t was recognized t h a t of great s i g n i f i c a n c e to LHASA type analyses was the i n c l u s i o n of chemistry packages whose sole purpose was the c o n s t r u c t i o n of r i n g s . These transforms could not and should not be keyed by the presence or absence of any p a r t i c u l a r f u n c t i o n a l i t y . Since they had s p e c i f i c long range goals, they were given considerable power i n the type and number of subgoals that they could request. This i s i n c o n t r a s t to the two group or one group chem­ i s t r i e s where only one FGI or FGA could be performed before the f i n a l d i s c o n n e c t i o n . Four r i n g forming transforms have been con­ sidered at length by the LHASA development group - the D i e l s A l d e r a d d i t i o n , the Robinson Annelation, the Simmons-Smith r e a c t i o n , and i o d o - l a c t o n i z a t i o n . The f i r s t three of these have been f u l l y implemented i n LHASA and the f o u r t h i s completely flow charted and awaits only coding i n t o the chemistry data base language.

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

20

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

A l l r i n g chemistry t a b l e s are organized i n t o what i s c a l l e d b i n a r y search t r e e s . Queries are posed about the existence o f c e r t a i n s t r u c t u r a l f e a t u r e s . Each o f these questions i s answerable w i t h a yes o r a no. Based on the answer one o f two d i f f e r e n t f o l l o w up questions i s s e l e c t e d . Embedded w i t h i n the t a b l e may be requests f o r subgoals, e i t h e r those already i n the FGI o r FGA t a b l e o r f o r s p e c i a l r e a c t i o n s which are needed only f o r these transforms and are not o f general s y n t h e t i c i n t e r e s t . The f i r s t step i n implementation o f a r i n g transform i s the preparation o f a chemical flow c h a r t . This defines a l l the questions about the s t r u c t u r e and describes i n a graphic r e p r e s e n t a t i o n the s y n t h e t i c steps t h a t w i l l be taken. I t i s q u i t e s t r a i g h t f o r w a r d f o r a chemist having no f a m i l i a r i t y w i t h LHASA t o read and make use o f these c h a r t s . A number o f graduate students and p o s t d o c t o r a l f e l l o w s i n the Corey group at,Harvard U n i v e r s i t y made s i g n i f i c a n t input t o the chemistry i n the t a b l e s without ever having t o worry about the computer implementation. The example below shows some o f the s y n t h e t i c routes generated by the D i e l s A l d e r transform f o r the i n d i c a t e d precursor. I t i s important t o note that while some o f the chemistry may look somewhat naive, i t can be q u i t e thought provoking.

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

PENSAK A N D COREY

LHASA

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

1.

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

21

22

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

I t i s c l e a r that designing a synthesis o f a r i n g with so many stereocenters presents a formidable challenge f o r most s y n t h e t i c chemists. I t i s f a i r t o say that the r i n g transforms are a g e n e r a l i z a t i o n o f the concepts derived f o r the group o r i e n t e d c h e m i s t r i e s . Work i s c u r r e n t l y underway t o g e n e r a l i z e t h i s s t i l l f u r t h e r , t o permit generation o f a r b i t r a r i l y complex molecular patterns, always s p e c i f i a b l e i n a n o t a t i o n e a s i l y readable by the chemist. CHMTRN - CHEMICAL DATA BASE LANGUAGE The chemical transforms are the heart and s o u l o f LHASA. Without good chemistry i n the data base, a l l the s o p h i s t i c a t e d perception would be e s s e n t i a l l y use­ l e s s . The f i r s t requirement i n the design o f the data base was that i t be m o d i f i a b l e without having t o recompile any other part o f the program. The second requirement was that i t require no knowledge o f FOKIRAN or how LHASA i s organized on a subroutine by sub­ r o u t i n e b a s i s . The t h i r d requirement was that the data base be e a s i l y readable by chemists with no t r a i n i n g i n LHASA and m o d i f i a b l e a f t e r only a l i t t l e i n t r o d u c t i o n t o the language. To meet these c o n d i t i o n s a s p e c i a l chemical pro­ gramming language CHMTRN (Chemical T r a n s l a t o r ) ^ was developed. By use o f a s p e c i a l assembler - TBLTRN (Table T r a n s l a t o r , w r i t t e n by Dr. Donald E. B a r t h ) , i t was p o s s i b l e t o convert the CHMTRN t a b l e s i n t o s p e c i a l l y encoded FORTRAN BLOCK DATA statements which could be loaded with LHASA o r read i n at run time. The b a s i c approach o f CHMTRN i s that there are keywords ( c u r r e n t l y s e v e r a l hundred) that have

I f t h i s name c o n f l i c t s o r d u p l i c a t e s that o f some other chemical program, I apologize. The d u p l i c a ­ tion i s unintentional.

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

P E N S A K AND

COREY

LHASA

23

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

s p e c i f i c numerical values assigned to them. A l l of these keywords that are typed on a s i n g l e l i n e i n the data base are l o g i c a l l y 'or' ed together i n t o one computer work. (The t r a n s l a t i o n and combination are handled by TBLTRN). Each such l i n e i s c a l l e d a q u a l i f i e r as i t l i m i t s or modifies the scope of the transform. LHASA contains an i n t e r p r e t e r c a l l e d EVLTRN (Evaluate Transform) which decodes the b i t patterns and performs the requested queries about the current s t r u c t u r e or performs a s p e c i f i e d operation. As an example, consider a l i n e from the t a b l e s which says SUBTRACT 20 FOR EACH PRIMARY HALIDE ALPHA TO CARB0N*1 OFFPATH ONRING. This q u a l i f i e r decrements the base r a t i n g of the transform f o r each primary h a l i d e that i s on a c y c l i c carbon which i s not a part of the path keying the transform. From t h i s example, i t i s p o s s i b l e to see how densely the chemical data i s packed (one q u a l i f i e r takes up only one computer word - 32 b i t s ) . There i s a t a r g e t to be searched f o r (the h a l i d e ) , a domain or l o c a t i o n to which the search i s r e s t r i c t e d (alpha to the carbon but on a r i n g and o f f the path), and an i t e r a t i o n command i n d i c a t i n g that the operation (the s u b t r a c t i o n ) i s to be performed f o r each occurrence. CHMTRN has s e v e r a l other c o n s t r u c t i o n s worthy of mention. The f i r s t i s the a b i l i t y to make m o d i f i c a ­ t i o n s to the s t r u c t u r e according to r e s u l t s of q u a l i f i e r e v a l u a t i o n s . One can say, f o r example, ATTACH AN ALCOHOL TO CARB0N*2 CIS TO CARB0N*4 This command also shows j u s t one case where s t e r e o ­ chemical considerations can be i n c l u d e d . Complete block s t r u c t u r i n g (as i n P L / l or ALGOL) has been incorporated. This i s u s e f u l where a com­ plex s e r i e s of queries should be a p p l i e d i n c e r t a i n

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

24

COMPUTER-ASSISTED ORGANIC

SYNTHESIS

repeatable circumstances, f o r example, FOR EACH KETONE ANYWHERE DO BEGIN

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

END. A l l q u a l i f i e r s between the BEGIN and END are executed f o r each KETONE. These may be nested t o any d e s i r e d depth. S i m i l a r l y IF-THEN-ELSE c o n s t r u c t i o n s are a l s o allowed. Software subroutines have a l s o been implemented i n CHMTRN/EVXTRN. Suppose there i s one group o f q u a l i f i e r s which needs o f t e n be applied t o d i f f e r e n t l o c a t i o n s i n the molecule at varying times. This can be handled by the c o n s t r u c t i o n CALL FGIW AT CARB0N*3 AND B0ND*2 In the subroutine FGIW these arguments are addressable as SPECIFIED*ATOM and SPECIFIED*BOND. I t i s p e r m i s s i ­ b l e t o apply stereochemical c o n s t r a i n t s t o arguments at the time o f execution o f the CALL. There i s no p r a c t i c a l l i m i t on the depth o f subroutine c a l l s . Subroutines may a l s o r e t u r n a value t o i n d i c a t e whether o r not they succeeded i n the task they were assigned t o do. The Robinson Annelation transform has received d e t a i l e d examination by the LHASA group. One sub­ r o u t i n e i n the t a b l e s p e c i f i c a l l y checks t o see i f there i s any f u n c t i o n a l i t y alpha t o the ketone i n the cyclohexane and i f there i s , remove i t by exchanging i t f o r something non-offensive. This subroutine i s reproduced below as an example o f the CHMTRN language.

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

P E N S A K A N D COREY

LHASA

25

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

...THIS SUBROUTINE IS CALLED TO CLEAR AWAY ANY UNDESIRABLE FUNCTIONALITY ...ALPHA TO A KETONE ON THE RING ALPHCHK IF NO HYDROGEN ON THE SPECIFIED ATOM THEN GO TO 19 IF THERE IS NOT A WITHDRAWING GROUP ON THE SPECIFIED ATOM THEN GO TO l8 IF THE SPECIFIED ATOM IS THE SAME AS CARB0N*2 THEN RETURN SUCCESS IF BOND*5 IS A FUSION*BOND THEN RETURN SUCCESS IF THERE IS NOT A NITRO ON THE SPECIFIED ATOM THEN GO TO l8 EXCHANGE THE GROUP FOR AN AMINE IF SUCCESSFUL THEN GO TO J2 OTHERWISE RETURN FAIL 18 IF THERE IS A HALIDE ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A KETONE ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A WITHDRAWING GROUP ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A DONATING GROUP ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS AN OLEFIN ALPHA TO THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A FUNCTIONAL GROUP ON THE SPECIFIED ATOM THEN RETURN FAIL IF THERE IS NOT A FUNCTIONAL GROUP ALPHA TO THE SPECIFIED ATOM THEN RETURN SUCCESS EXCHANGE THE GROUP FOR A WITHDRAWING GROUP IF SUCCESSFUL THEN GO TO J2 OTHERWISE RETURN FAIL 19

J2

IF THE SPECIFIED ATOM IS A QUATERNARY^ENTER THEN RETURN FAIL IF THERE IS AN ALCOHOL ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A HALIDE ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS NOT AN ETHER ON THE SPECIFIED ATOM THEN RETURN FAIL CALL DET THE GROUP AT THE SPECIFIED ATOM AND GO TO RET

Two examples o f how these constructs are a p p l i e d together w i l l demonstrate t h e i r u t i l i t y and f l e x i ­ b i l i t y . A number o f r e a c t i o n s , such as Michael a d d i t i o n depend on the conformation o f the i n t e r ­ mediate enolate f o r t h e i r s p e c i f i c i t y . I t i s p o s s i b l e to make i n i t i a l queries about the s t r u c t u r e , generate the enolate, ask about i t , then generate the f i n a l precursor and ask questions about i t . At each stage of t h i s process, i t i s p o s s i b l e t o detect a f a t a l c o n d i t i o n and terminate e v a l u a t i o n o f t h e transform. This same language i s being used t o s u c c e s s f u l l y c a l c u l a t e p r e f e r r e d conformations o f cyclohexanes f o r e v a l u a t i o n o f r e g i o s p e c i f i c i t y and i n f u n c t i o n a l group reactivity analysis. ORGANIZATION OF THE DATA BASE The one group and two group and the subgoal t a b l e s are queried very f r e q u e n t l y during a t y p i c a l a n a l y s i s s e s s i o n . A data s t r u c t u r e has been developed which i s extremely e f f i c i e n t f o r these searches - the r e t r i e v a l time being independent o f e i t h e r the s i z e o f the data t a b l e o r the number o f s u c c e s s f u l h i t s i n the t a b l e . Because o f the general a p p l i c a b i l i t y o f t h i s

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

26

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

technique, we s h a l l describe i t i n more d e t a i l - using the two group t a b l e s as an example*

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

At present there are s i x t y - f o u r d i f f e r e n t func­ t i o n a l groups which are capable o f keying a transform i n some manner. This can e i t h e r be as the f i r s t key­ ing group o r the second group ( f o r example, both a ketone and an o l e f i n key the A l d o l Condensation)

The keying mechanism must point t o the a p p l i c a b l e transform regardless o f the o r d e r i n g o f the groups and i t must a l s o handle s i t u a t i o n s where the keying groups are the same. The f i r s t element i n the r e p r e s e n t a t i o n i s a ' d i r e c t o r y s e t - a Boolean set w i t h a b i t on f o r each group t h a t can p a r t i c i p a t e i n any transform a t the d e s i r e d path l e n g t h . I f the group i s not marked i n t h i s set, then there i s no need t o f u r t h e r i n t e r r o ­ gate the t a b l e - there w i l l not be any acceptable entries· 1

For each group t h a t does p a r t i c i p a t e i n t r a n s ­ forms, there are two a d d i t i o n a l multi-word s e t s . A b i t i s on f o r each transform i n which the group takes part - i n the f i r s t set i f i t i s the f i r s t keying group and i n the second i f i t i s the second. L o g i c a l l y 'AND'ing the f i r s t group ε f i r s t set w i t h the second group's second set y i e l d s a set w i t h b i t s on f o r only these transforms. These b i t p o s i t i o n s are used as indexes i n t o a t a b l e o f addresses o f the q u a l i f i e r s for those a p p l i c a b l e transforms. While i t sounds r a t h e r complicated, i t r e a l l y i s not. What has been done i s t o generate an e x t e r n a l addressing s t r u c t u r e at assembly time. 1

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK A N D COREY

27

LHASA 2 word*



Address o f Path 0 D i r e c t o r y Address o f Transform Address Table Count o f S p e c i a l Sets f o r t h i s Table Address o f Path 1 D i r e c t o r y

Group P a r t i c i p a t i o n S e t Symmetrical Trans fori Re connective Transforms Special s t r a t e g i c Directory

Subgoal Trans forms

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

Simplifying

Sets

Trans fori

Disconnective Transforms 1st group as GROUP»1 1st group as GROUP*2 2nd group as GROUP*1 2nd group as GROUP*2 Group-transform sets

f

Transform Address Table

y

F i r s t Transform's

Qualifiers

X \y

Second Transform's

Qualifiers

Q u a l i f i eQrusa l i f i e r s TRemaining h i r d Transform's Group P a r t i c i p a t i o n S e t

The f i g u r e above shows the o v e r a l l s t r u c t u r e of the t a b l e . I t should be noted that i f you wish t o r e s ­ t r i c t your search t o , f o r example, those transforms which break carbon-carbon bonds a l l that i s necessary i s t o define, at assembly time, a set t o i n d i c a t e t h i s c h a r a c t e r i s t i c and i n d i c a t e which transforms are a p p l i c a b l e . At run time, AND ing t h i s set with otherwise allowed transforms applies the r e s t r i c t i o n i n p a r a l l e l . This technique o f generating an e x t e r n a l addressing s t r u c t u r e when coupled w i t h Boolean opera­ t i o n s i s a q u i t e powerful and u s e f u l technique. ,

,

HOW DO YOU ADD TO THE DATA BASE The c r i t i c i s m has o f t e n been l e v e l l e d at LHASA that i t takes considerable time t o add t o the data

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

28

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

base. The D i e l s Alder t a b l e , f o r example, took: almost s i x man-months t o prepare and debug. This s e c t i o n w i l l describe the process o f b u i l d i n g a s o p h i s t i c a t e d data t a b l e l i k e the D i e l s Alder o r the Robinson Annelation and demonstrate that the s o p h i s t i c a t i o n o f the r e s u l t s obtained i s a d i r e c t f u n c t i o n o f the exhaustiveness and s p e c i f i c i t y o f the t a b l e s . ( I t i s worthwhile t o point out, however, that there are o f t e n times when naive chemistry proposed by LHASA i n s i t u a ­ t i o n s where i t was not o r i g i n a l l y envisioned, has turned out t o be e x c e p t i o n a l l y i n t e r e s t i n g . ) A l l the r i n g transform packages i n LHASA employ binary search techniques. This means that a l l s t r u c ­ t u r a l questions are t o be answered w i t h a yes o r a no. Preparation o f the sequence o f questions r e l a t i n g t o s t r a i g h t f o r w a r d chemical s i t u a t i o n s poses no r e a l problems. I t i s the i d e n t i f i c a t i o n and r e s o l u t i o n o f the e x t r a o r d i n a r y cases t h a t are d i f f i c u l t . For example, i n a Robinson disconnection f o r t h e sequence below, the geminal dimethyl s u b s t i t u t i o n i s a f o r m i ­ dable problem.

0

I t i s up t o the chemist designing the t a b l e s t o f i r s t perceive that t h i s s i t u a t i o n might occur. Second, decide whether he wishes t o have the t a b l e s salvage the d i f f i c u l t s i t u a t i o n and i f he does, he has t o manually determine what kind o f chemistry should be attempted. The above example i s a c l e a r b l a c k o r white s i t u a t i o n . Unless the dimethyl s u b s t i t u e n t i s r e ­ moved, the transform j u s t cannot proceed. The grey areas cause j u s t as much o f a dilemma f o r the chemist. In M a r s h a l l ' s synthesis o f isonootkatone two p o s s i b l e stereoisomers could have r e s u l t e d .

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

1.

PENSAK A N D COREY

29

LHASA

à P r i o r i p r e d i c t i o n o f the stereochemical course o f a r e a c t i o n , even knowing the three dimensional s t r u c t u r e of the reagents i s q u i t e d i f f i c u l t , i f not impossible. When preparing a s e c t i o n o f the t a b l e d e a l i n g w i t h such an ambiguity the chemist i s faced w i t h two a l t e r n a t i v e s , disregard stereochemistry e n t i r e l y (and make sure that LHASA does not imply that any stereochemistry i s being s p e c i f i e d ) o r go i n t o the l a b o r a t o r y and run an experiment. The r e c o g n i t i o n o f t h i s s i t u a t i o n can o f t e n get the t a b l e w r i t i n g chemist t h i n k i n g and has sometimes even suggested s p e c i f i c reactions that should be run. As we have been adding t o the data base at Du Pont (to the one and two group t a b l e s ) , the ques­ t i o n has o f t e n been r a i s e d how much d e t a i l should we go i n t o i n the q u a l i f i e r s ? " This i s somewhat o f a dilemma, many o f the i n d u s t r i a l reactions we are d e a l i n g with have only been considered f o r a l i m i t e d number o f substrates. I t i s not c l e a r whether q u a l i f i e r s should be incorporated t h a t r e s t r i c t the transforms t o only those cases where i t i s known t o work o r whether only those which are known t o f a i l should be s p e c i f i c a l l y excluded. Both ways take an immense amount o f l i t e r a t u r e work t o do c o n s i s t e n t l y . M

We a l l know that butadiene can be dimerized under c a t a l y t i c c o n d i t i o n s t o a wealth o f d i f f e r e n t products. A d d i t i o n o f a l k y l substituents changes the product mix and introduces a v a r i e t y o f d i f f e r e n t stereoisomers as w e l l . What happens i f we put f u n c t i o n a l groups on butadiene. Do a l l the reactions s t i l l proceed - do you get any new ones, etc.? We do not know and s e r i ­ ous doubt whether many experiments have ever been run

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

30

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

on such systems. The second a l t e r n a t i v e above has been chosen - q u a l i f i e r s are being used only t o exclude r e a c t i o n s known not t o work. As such, a l o t of naive chemistry comes out o f our v e r s i o n LHASA, some o f i t e x c e p t i o n a l l y poor but at l e a s t the analyses are reasonably comprehensive. In summary, adding simple r e a c t i o n s t o LHASA i s simple. I n c o r p o r a t i n g s o p h i s t i c a t e d r e a c t i o n s can be as complicated as you wish t o make i t . (Work i s c u r r e n t l y underway t o prepare a general package o f subgoal transforms which w i l l serve t o remove i n t e r ­ ferences - r e l i e v i n g the chemist from having t o work them out separately f o r each super transform.) CONCLUSION Why i s Du Pont i n t e r e s t e d i n LHASA? The program was c l e a r l y designed f o r c a r b o c y c l i c n a t u r a l products synthesis i n mind - an area i n which the Company has only l i m i t e d i n t e r e s t . - Our attempt t o add i n d u s t r i a l s y n t h e t i c chem­ i s t r y t o LHASA i s f o r c i n g us t o organize our thoughts along l i n e s heretofore not done. We are being required t o look a t our r e a c t i o n s i n terms o f t h e i r known and unknown g e n e r a l i t y . This i n and o f i t s e l f i s highly b e n e f i c i a l . - Our pharmaceutical and agrichemical chemists have been using the n a t u r a l products aspects o f LHASA to generate new ideas, o f t e n not o f i n d u s t r i a l syn­ t h e t i c merit but c e r t a i n l y o f i n t e r e s t when l o o k i n g f o r ways t o make d e r i v a t i v e s , e s p e c i a l l y commonality of routes. - L a s t l y , we have already seen t h a t some o f our i n d u s t r i a l knowledge i s t u r n i n g out t o be u s e f u l t o organic synthesis i n other areas. For example, very few chemists o u t s i d e o f those who are a c t u a l l y using i t d a i l y are aware t h a t , given s u i t a b l e c a t a l y s t s , butadiene can dimerize i n t o the compound below - a

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

P E N S A K A N D COREY

LHASA

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

q u i t e a t t r a c t i v e (and inexpensive) f o r prostaglandin s y n t h e s i s .

J.

31

starting material

Our hope i s that LHASA w i l l help us t o insure that we have considered a l l reasonable routes t o our major products. ACKNOWLEDGMENTS The LHASA p r o j e c t has been i n existence since 1968. During the years a number o f e x t r a o r d i n a t e l y able graduate students and post d o c t o r a l f e l l o w s came t o work w i t h Professor Corey, almost a l l laboratory s y n t h e t i c chemists. Whether i t i s the i n f e c t i o u s enthusiasm f o r the p r o j e c t o r j u s t an enjoyment o f using the computer as a research t o o l , not one alumnus of the LHASA p r o j e c t has abandoned h i s involvement w i t h computers and returned t o bench chemistry. They are (with current l o c a t i o n s ) W. D. W. R. W. G.

J. Howe - Upjohn E. Barth - Harvard Business School L. Jorgensen - Purdue D. Cramer - Smith K l i n e and French Todd Wipke - Santa Cruz A. Petersson - Wesleyan W. Vinson - Harvard H. W. Orf - Harvard

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

32

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

Downloaded by TOBB UNIV ECON & TECH on December 20, 2014 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

ABSTRACT Design o f complex organic syntheses is a task w e l l s u i t e d t o computer implementation. For a molecule o f moderate s i z e the number o f p o t e n t i a l s y n t h e t i c pathways is extremely l a r g e . Furthermore the number o f u s e f u l l a b o r a t o r y r e a c t i o n s is growing e x p l o s i v e l y . The LHASA program is a tool for syn­ t h e t i c chemists to aid in choosing the most reasonable routes t o any d e s i r e d molecule without exhaustive enumeration. The b a s i c s t r u c t u r e o f the program and the chemistry it employs are discussed g i v i n g s p e c i a l c o n s i d e r a t i o n t o the s t r a t e g i e s employed in s e l e c t i o n o f routes.

In Computer-Assisted Organic Synthesis; Wipke, W., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.