Computer-Assisted Drug Design - ACS Publications - American

of reactants that might reasonably be expected to give rise to it. The ability of such ... attempt (7) but it appears that the program at the Universi...
0 downloads 0 Views 1MB Size
16 Syntheses of Drugs Proposed by a Computer Program M A L C O L M BERSOHN

Downloaded by UNIV OF ROCHESTER on June 12, 2018 | https://pubs.acs.org Publication Date: November 28, 1979 | doi: 10.1021/bk-1979-0112.ch016

University of Toronto, Toronto, Canada M5S 1A1

The Computer and the Synthetic Organic Chemist There are now three ways i n which the organic chemist can use a computer to aid him i n devising a synthesis. The most direct and simple mode i s to query a data base about the existence of a reaction for transforming a given substructure into another given substructure. The system available commercially from Derwent Publications Ltd.(1) as well as those available inside pharmaceutical companies(2) are examples. By means of such systems, synthetic organic chemists are f i n a l l y realizing the dream of keeping up to date on the best way to do reactions and the p o s s i b i l i t i e s for transformations. The records in such on line data bases are complete with literature references, yields, solvent, temperature, catalyst and certain interfering groups. The next mode of operation for the synthetic chemist using a computer i s to use a program such as the one written by Wipke (3^,4^), which, given a product substance, generates the structures of reactants that might reasonably be expected to give rise to i t . The a b i l i t y of such a program to store these reactants and, on command from the user, to use them as products from which to generate reactants, means that the user can devise a synthesis while being on line to the computer. The third mode of operation for the synthetic chemist using a computer i s to use a noninteractive program to derive a synthesis for a given substance, e.g. a drug. The f i r s t such program (5^6^ was operating early i n 1970. There has been another attempt (7) but i t appears that the program at the University of Toronto i s the only such program s t i l l operating. The chemist using the Toronto program specifies the maximum number of steps acceptable for a synthesis, the minimum overall yield, and, optionally, the starting material. The starting material can be suggested s p e c i f i c a l l y , i . e . by providing the exact structure, or else more generally, by providing the desired starting skeleton, without restricting oneself to particular 0-8412-0521-3/79/47-112-341$05.00/0 © 1979 American Chemical Society

Olson and Christoffersen; Computer-Assisted Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.

COMPUTER-ASSISTED DRUG DESIGN

Downloaded by UNIV OF ROCHESTER on June 12, 2018 | https://pubs.acs.org Publication Date: November 28, 1979 | doi: 10.1021/bk-1979-0112.ch016

342

f u n c t i o n a l groups. I f the chemist does not s p e c i f y a s t a r t i n g m a t e r i a l , the program w i l l decide the s t a r t i n g m a t e r i a l f o r i t s e l f . The backward search from the goal molecule to an acceptably simple molecule i s guided by the r e s t r i c t i o n t h a t i t must be compatible with the s t a r t i n g s t r u c t u r e decided upon. At present, our d e f i n i t i o n of " c o m p a t i b i l i t y " means that the skeleton of substructures which are present i n both the s t a r t i n g m a t e r i a l and the goal molecule may not be a l t e r e d . At input time, the user a l s o d e f i n e s t o the program what he means by an acceptably simple s t a r t i n g m a t e r i a l by p r o v i d i n g the maximum number of c h i r a l centers, f u n c t i o n a l groups and r i n g s t h a t an unknown s t a r t i n g molecule may have. The user may a l s o s p e c i f y the maximum number of consecutive f a c i l i t a t i n g r e a c t i o n s which he w i l l accept i n the s y n t h e s i s . B u i l d i n g Reactions and F a c i l i t a t i n g Reactions Somewhat a r b i t r a r i l y we may d i v i d e a l l r e a c t i o n s i n t o the two c a t e g o r i e s of b u i l d i n g r e a c t i o n s and f a c i l i t a t i n g r e a c t i o n s . A b u i l d i n g r e a c t i o n j o i n s two h i t h e r t o unattached carbon atoms or completes a h e t e r o c y c l i c r i n g not used as a p r o t e c t i o n f o r some f u n c t i o n a l i t y or i n t r o d u c e s * f u n c t i o n a l i t y a t a p r e v i o u s l y u n f u n c t i o n a l i z e d s i t e . A l l other r e a c t i o n s are f a c i l i t a t i n g reactions. The use of f a c i l i t a t i n g r e a c t i o n s may r e f l e c t the inherent l i m i t a t i o n s of known organic chemistry — i t i s not p o s s i b l e , f o r example t o j o i n ethanol t o propanol v i a a carbon carbon bond i n a s i n g l e r e a c t i o n — i t may a l s o r e f l e c t our ignorance or n e g l e c t of some e f f i c i e n t process i n favor of some more f a m i l i a r r e a c t i o n which the molecule a t hand does not yet have the d e s i r e d f u n c t i o n a l i t y . Consequently, t o ask the program to r e s t r i c t the number of successive f a c i l i t a t i n g r e a c t i o n s t h a t may be performed, i s t o r a i s e the l e v e l of e f f i c i e n c y demanded of the s y n t h e s i s s o l u t i o n s that are suggested. The i d e a l s y n t h e s i s would of course c o n s i s t only of b u i l d i n g r e a c t i o n s .

Pruning the Combinatorial

Thicket

Those f a m i l i a r with the numbers i n v o l v e d might even doubt that a program could ever generate a s u i t a b l e answer to a n o n t r i v i a l s y n t h e t i c problem. The number o f p o s s i b l e incomplete pathways i s such t h a t no program could consider a s i g n i f i c a n t f r a c t i o n of the pathways even i f i t devoted a year t o the task. Suppose f o r example t h a t the s y n t h e s i s i s to take ten steps and there are, on the average 40 conceivable r e a c t i o n s which c o u l d give r i s e to each molecule considered. Then the number of molecules i n the problem i s 10**40. Even with c u r r e n t f a s t hardware, such as the IBM 3033, the generation of one molecular s t r u c t u r e takes a m i l l i s e c o n d or so; we would need about 10**32 years of computer

Olson and Christoffersen; Computer-Assisted Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.

Downloaded by UNIV OF ROCHESTER on June 12, 2018 | https://pubs.acs.org Publication Date: November 28, 1979 | doi: 10.1021/bk-1979-0112.ch016

16.

BERSOHN

Syntheses of Drugs

343

time to i n s p e c t the whole of the s y n t h e t i c graph, thus d i s c o v e r i n g a l l the f e a s i b l e ten step pathways. The t h i c k e t i s impenetrable indeed, i f we are to do i t b l i n d l y . The chemist himself i n h i s t h i n k i n g suggests the s o l u t i o n . He does not attempt to consider a l l pathways; something f a m i l i a r s t r i k e s h i s a t t e n t i o n and he pursues a f a i r l y s t r a i g h t path backward from the g o a l substance to a s u i t a b l e s t a r t i n g m a t e r i a l . The Toronto program attempts t o do the same. A b r e a d t h - f i r s t approach, which generates a l l p o s s i b l e immediate predecessors of each molecule examined, i s hopeless; there are too many p o s s i b i l i t i e s . The "something f a m i l i a r " recognized by the chemist and by our program can be of two kinds: 1. A substance known to be a v a i l a b l e looks l i k e a subset o r a superset of the molecular s t r u c t u r e of the g o a l molecule. F o l l o w i n g t h i s d i s c o v e r y the program w i l l use the a v a i l a b l e substances as a r e q u i r e d s t a r t i n g p o i n t and w i l l only attempt to b u i l d the p a r t s of the goal molecule t h a t do not e x i s t i n the s t a r t i n g m a t e r i a l . 2. A substructure of the g o a l molecule i s i d e n t i c a l to or d e r i v a b l e from the substructure produced by some key b u i l d i n g r e a c t i o n . I f the exact product substructure i s present the r e a c t i o n w i l l be simulated and the r e a c t a n t ( s ) generated then present an e a s i e r problem of s y n t h e s i s , c l o s e r t o a s a t i s f a c t o r y s t a r t i n g m a t e r i a l . I f the substructure present i s d e r i v a b l e from the product substructure of the key b u i l d i n g r e a c t i o n then there must be r e c o g n i t i o n r o u t i n e s i n the program which c a l l i n the d e r i v i n g r e a c t i o n s . For example, a secondary a l c o h o l next to a branched carbon atom suggests t h a t the a l c o h o l may have been a ketone which was a l k y l a t e d . The r e a c t i o n of r e d u c t i o n of the ketone i s thus suggested. C u r r e n t l y a very major task i n developing the program i s f i l l i n g the r e p e r t o r y with r e c o g n i t i o n r o u t i n e s which can "see" b u i l d i n g r e a c t i o n product substructures a f t e r one, two and three steps of a l t e r a t i o n . We can, e v i d e n t l y , d i s t i n g u i s h three kinds of f a c i l i t a t i n g r e a c t i o n s . F i r s t , there are those concerned with p r o t e c t i n g and d e p r o t e c t i n g . Secondly, there are those which b r i n g about unreversed changes i n order to enable a r e a c t i o n to take p l a c e . These changes are r e q u i r e d i n order t o change a p a r t i c u l a r b u i l d i n g r e a c t i o n product substructure i n t o the substructure a t hand. A t h i r d category c o n s i s t s of the r e a c t i o n s which a l t e r substructures so t h a t the molecule a t hand w i l l be c l o s e r t o the s t r u c t u r e of a s e l e c t e d s t a r t i n g m a t e r i a l . A s o p h i s t i c a t e d theory of s y n t h e s i s o p t i m i z a t i o n should be able t o p l a c e separate l i m i t s on the numbers of these three kinds of f a c i l i t a t i n g r e a c t i o n s used i n a s y n t h e s i s . In any case a l i m i t on the t o t a l number of f a c i l i t a t i n g r e a c t i o n s t h a t can be employed i n succession means an avoidance of high p r i c e d b u i l d i n g r e a c t i o n s , ones t h a t can o n l y be performed with d r a s t i c amounts of p r o t e c t i o n and/or f u n c t i o n a l i t y a l t e r a t i o n . The user of the program d e s c r i b e d here i n d i c a t e s at input time the l i m i t on the number of s e q u e n t i a l f a c i l i t a t i n g r e a c t i o n s which he wishes

Olson and Christoffersen; Computer-Assisted Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.

COMPUTER-ASSISTED DRUG DESIGN

344

to impose on any solution to the problem. Examples of Output from the Program

Downloaded by UNIV OF ROCHESTER on June 12, 2018 | https://pubs.acs.org Publication Date: November 28, 1979 | doi: 10.1021/bk-1979-0112.ch016

I. Depoprovera 6-a-methyl-17-a- acetoxyprogesterone, as the only thoroughly tested and widely used injectable contraceptive, seemed to have enormous practical importance. Because of the close structural resemblance to progesterone, the latter was chosen as the recommended starting material. The program suggests an eight step method according to the accompanying scheme. I asked a class of eleven graduate students to produce a short synthesis of the goal substance, starting from progesterone. None of them were successful because the students did not know how to solve the problem of introducing the methyl group at C-6, i . e . the gamma alkylation problem. Evidently the performance of the program was superior to that of the graduate students for this problem but since the repertory i s at this writing s t i l l only about three hundred reactions the program w i l l f a i l to find syntheses for many substances with which the graduate students w i l l have no trouble. The most important reason for the superiority of the program in this case i s that i t was aware of the reaction developed by Yoshikoshi and his group, of formylation of a dienol ether(8). It appears that the total r e c a l l of a l l reactions i n i t s repertory i s responsible for much of the power of the program. A very simple case of this i s the synthesis of 1,4-benzenediamine. The program, given this problem and told that the starting a t e r i a l must have no functional groups (other than a benzene ring which i s not counted as a functional group i n this context) produced an unexpected result. I expected the familiar six step route v i a the nitration of acetanilide. Instead the program recommended a two step route: oxidation of o-xylene to phthalic acid and then treatment of cupric phthalate with ammonia under pressure to give 1,4-benzenediamine. Obviously the replacement of aromatic carboxyl by hydrogen and the simultaneous amination of the ortho position i s an unfamiliar reaction(9), which, however, i s just as acceptable to the program as a familiar reaction. With a more ample repertory one may anticipate many such surprising but presumably e f f i c i e n t synthèses(Figure 1). We observe parenthetically that to a f i r s t approximation no long synthesis proceeds as planned. One may ask what i s the use of a computer program to generate synthetic plans when any plan from whatever source w i l l have to be revised when exposed to laboratory practice? The answer i s that i t i s more convenient to revise (or i n the worst case abandon) an eight step synthesis than to do the same for a twelve step synthesis.

Olson and Christoffersen; Computer-Assisted Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.

Syntheses of Drugs

Downloaded by UNIV OF ROCHESTER on June 12, 2018 | https://pubs.acs.org Publication Date: November 28, 1979 | doi: 10.1021/bk-1979-0112.ch016

BERSOHN

Figure 1. Synthesis of Depoprovera proposed by the program

Olson and Christoffersen; Computer-Assisted Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.

COMPUTER-ASSISTED DRUG DESIGN

346 I I . PGF 2a

Downloaded by UNIV OF ROCHESTER on June 12, 2018 | https://pubs.acs.org Publication Date: November 28, 1979 | doi: 10.1021/bk-1979-0112.ch016

0

The p r o s t a g l a n d i n s t r u c t u r e i s a r i n g with two side chains. A simple o v e r a l l s t r a t e g y i s t o take the r i n g and the side chains as already b u i l t and t r y to connect the side chains t o the r i n g . A s l i g h t m o d i f i c a t i o n of t h i s i d e a i s to introduce a s i n g l e f u n c t i o n a l carbon atom a t the p o s i t i o n of a chain and then to add the r e s t of the side chain t o t h i s f u n c t i o n a l i t y . In other words, the side c h a i n minus the atom connecting to the r i n g may be taken as already synthesized. T h i s i s the s t r a t e g y s u c c e s s f u l l y used by Stork and Isobe (10) i n the b r i e f e s t of a l l synthe ses of PGF2a- ^ most ingenious p a r t i s the s t e r e o s e l e c t i v e conjugate a d d i t i o n of the v i n y l cuprate t o the cyclopentenone followed by the trapping of the r e s u l t a n t anion with formaldehyde a l l i n one step. The program, taught t h i s unusual r e a c t i o n , was able t o generate the f o l l o w i n g seven step s y n t h e s i s (Figure 2). The s y n t h e s i s recommended by the program d i f f e r s i n some small r e s p e c t s from the one a c t u a l l y achieved by Stork and Isobe. The most important p o i n t of d i f f e r e n c e i s t h a t l i t h i u m i n ammonia i s used t o reduce the ketone a t C-9. I t may c e r t a i n l y be questioned as t o whether the Li/NH3 r e d u c t i o n would be as s t e r e o s e l e c t i v e as the method a c t u a l l y used by Stork, i . e . Brown's l i t h i u m t r i s i a m y l b o r o h y d r i d e reagent(11). A Complete L i s t of the Tree-pruning

Devices Used i n the Program

1. Avoid d u p l i c a t e molecules i n the t r e e . The program i s unable t o d e t e c t a l l d u p l i c a t e molecules. T h i s would be p o s s i b l e i f we were prepared to store two hundred thousand s t r u c t u r e s onto a d i s k i n a s i n g l e run of the program. That would r a i s e the p r i c e of a computation to a p r o h i b i t i v e level. In the author's experience, the most common cause of d u p l i c a t e generation of the same molecule i s the occurrence of p a i r s of "commuting r e a c t i o n s . " Given that A and Β are s y n t h e t i c r e a c t i o n s i t f r e q u e n t l y happens that the e f f e c t of A then Β i s the same as the e f f e c t of Β then A. Such d u p l i c a t e generations of the same molecule are avoided e n t i r e l y by the f o l l o w i n g device i n the program. Assume t h a t there e x i s t s a substructure X which i s transformed i n t o substructure X' by r e a c t i o n A. S i m i l a r l y there i s a substructure Y which i s transformed i n t o substructure Y' by r e a c t i o n B. We f u r t h e r assume t h a t X and Y have no atoms i n common i n the molecules of i n t e r e s t . Suppose t h a t a molecule contains both substructure X and Y and r e a c t i o n s A and Β are used s u c c e s s i v e l y t o produce a molecule with substructures X' and Y'. In many cases the order of performance of A and Β i s immaterial. In other words, n e i t h e r substructure X nor substructure X' i n t e r f e r e with r e a c t i o n Β and n e i t h e r Y nor Y'

Olson and Christoffersen; Computer-Assisted Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.

16. BERSOHN

Syntheses of Drugs

\

H

(\ J—«

Q

347

LiCu( ^C=C< \ H CH(CH ) CH /2^ V

N

2 4

/ ~

CHgO

3

ΓΤ

0{Ph


< : = /

X

(ΟΗ ) 0(:ΗΟΕΪ 2

°9

4

CH CO H 3

2

P h

Ph Q

νΡ

(CH^C^OH

2 C r Q

^

· o-^ph

5

Ph

X^'\=/ V

O'

( C H 2 ) 3 C 0 2 H

Li/NH

3

: I

OCPh

Ph

oh

: OH

Figure 2.

Syntfgt^^Jf^^

Society Library 1155 fetb St. N. w. Olson and Christoffersen; Computer-Assisted Washington, D. C. 20036Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.

COMPUTER-ASSISTED DRUG DESIGN

348

preclude r e a c t i o n A. A t y p i c a l A and Β p a i r are h y d r o l y s i s of an e s t e r and hydrogénation of an i s o l a t e d carbon-carbon double bond. In such a case, as we p r o g r e s s backward from the molecule with s u b s t r u c t u r e s X and Y* t o the molecule with s u b s t r u c t u r e s X and Y we can reach the l a t t e r along two paths, i . e . A then Β o r Β then A. For b r e v i t y we can describe the s i t u a t i o n as ΧΥ->Χ'Υ+Χ'Υ' and ΧΥ+ΧΥ'-^Χ'Y . Before we c a l l i n r e a c t i o n Β on product XY t o generate r e a c t a n t XY we should f i r s t i n q u i r e whether the pathway from XY t o X'Y' through X'Y has p r e v i o u s l y been considered by the program. Since we do not r e t a i n any molecular s t r u c t u r e not i n the d i r e c t l i n e o f ascent to the g o a l molecule from the c u r r e n t molecule XY', t h i s means t h a t we cannot ask the simple question "Has X'Y been generated p r e v i o u s l y ? " Instead we have to get a t t h i s i n f o r m a t i o n i n d i r e c t l y . We can ask whether the r e a c t i o n Β has been performed t o produce the molecule X'Y . For each molecule whose s t r u c t u r e i s s t i l l s t o r e d i n the memory, a record i s kept of a l l the r e a c t i o n s simulated which g i v e r i s e to the molecule as a product. For each of these r e a c t i o n s the p a r t i c u l a r i n s t a n c e i n the molecule of the s u b s t r u c t u r e produced i s a l s o noted. Hence we can get a d e f i n i t e answer as t o whether Β has p r e v i o u s l y been used as a r e a c t i o n i n which X'Y i s the product. I f i t was not, we have no d u p l i c a t i o n problem here. I f Β was simulated to produce X'Y* then we have t o ask whether A i s i n t e r f e r e d with by Y. I f so, then there was no sequence AB. I f not, then there must have been a sequence AB; consequently we should avoid r e a c t i o n Β p r e c e d i n g A. In r e a l l i f e , i n the program, t h i s means t h a t we have t o ask these questions before u s i n g any r e a c t i o n except those t h a t produce the g o a l molecule directly. 1

Downloaded by UNIV OF ROCHESTER on June 12, 2018 | https://pubs.acs.org Publication Date: November 28, 1979 | doi: 10.1021/bk-1979-0112.ch016

1

1

1

1

There e x i s t of course, such sequences of "commuting" r e a c t i o n s as ABC, BCA and CAB. We have not programmed the e l i m i n a t i o n o f these t r i p l e occurrences y e t but are p l a n n i n g t o do so. 2.

L i m i t a t i o n on the number of steps. At i n p u t time the user s p e c i f i e s the maximum number of steps f o r an acceptable s y n t h e s i s . T h i s c u t s o f f the bottom of the t r e e (problem " t r e e s " grow downward). T h i s d e v i c e i s sine qua non but by i t s e l f i t i s s t i l l q u i t e a weak c o n s t r a i n t . 3. Minimum requirement f o r the o v e r a l l y i e l d . T h i s i s a l s o s p e c i f i e d a t input time by the user. 4. Minimum requirement f o r the y i e l d of the l a s t three steps. As everyone knows who has done syntheses e x p e r i m e n t a l l y , the amount o f m a t e r i a l that has t o be c a r r i e d through the v a r i o u s steps i s most important. I f there are low y i e l d i n g steps they should be near the beginning of the s y n t h e s i s r a t h e r than near the end. T h i s f a c t o r i s not n e c e s s a r i l y r e f l e c t e d i n the o v e r a l l y i e l d . Consequently, i n the program i t s e l f there i s the

Olson and Christoffersen; Computer-Assisted Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.

16.

BERSOHN

Syntheses of Drugs

349

Downloaded by UNIV OF ROCHESTER on June 12, 2018 | https://pubs.acs.org Publication Date: November 28, 1979 | doi: 10.1021/bk-1979-0112.ch016

requirement that the overall yield of the last three steps must be at least 35%. This means that the middle of the tree i s pruned rather drastically. The user does not specify the 35% parameter. Obviously i t could be altered i n minutes but most users have no objection to this value so we do not burden the user with this decision. 5. Limitation on the use of f a c i l i t a t i n g reactions. Building reactions can be used wherever appropriate but f a c i l i t a t i n g reactions can only be used i f they were called for specifically. The " c a l l " can come from the failure of a building reaction to proceed on account of interference with the reaction for some functional group i n the molecule. In such a case the group must be protected and, since we are proceeding backwards from the goal molecule, the reaction to remove protection from this functional group i s called i n . If the number of different kinds of groups to be protected exceeds the limit of the successive number of the f a c i l i t a t i n g reactions imposed by the user then none of the protections are effected. Sometimes f a c i l i t a t i n g reactions are called i n by reaction product derivation routines. An example i s the change of functionality required to obtain a six membered ring at hand from one which i s the product of a plausible Diels-Adler reaction. This single heuristic has resulted i n a drastic reduction of the number of reactions simulated by the program in investigating a problem. For example, the Depo-provera synthesis was found after simulating only 280 reactions and the t o t a l number of reactions simulated was only 1385. This was obtained using input requirements of 8 for the upper l i m i t on the number of steps, 3 for the upper limit on the number of consecutive f a c i l i t a t i n g reactions and 5% for the lower bound on the overall yield of the synthesis. Before introducing this heuristic we had been used to generating 20,000 to 200,000 molecular structures in a run. The exact number of reactants generated by the program in investigating a synthetic problem of any complexity w i l l vary with almost any slight change in the program. It i s increased by adding reactions to the program and of course i t i s decreased by refining the scope and limitations of any reaction i n the program. 6. Requirement for compatibility with a starting sturcture. This i s the requirement that the skeleton of substructures which are present i n both the starting material and the goal molecule may not be altered. This pruning device i s optional. Ordinarily i t i s not used when the problem thinks backward from the goal molecule without any particular direction i n mind. Any substance which satisfies the input restrictions on the number of chiral centers, functional groups and rings i s considered to

Olson and Christoffersen; Computer-Assisted Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.

Downloaded by UNIV OF ROCHESTER on June 12, 2018 | https://pubs.acs.org Publication Date: November 28, 1979 | doi: 10.1021/bk-1979-0112.ch016

350

COMPUTER-ASSISTED DRUG DESIGN

be a v a i l a b l e and the program d e c l a r e s a s y n t h e s i s t o be found i f i t reaches an a v a i l a b l e substance i n i t s backward search. For complex s t r u c t u r e s , t h i s i s an extremely important tree pruning method. When we are t h i n k i n g i n the forward d i r e c t i o n about a s y n t h e s i s the requirement of c o m p a t a b i l i t y with a s t a r t i n g s t r u c t u r e i s a u t o m a t i c a l l y s a t i s f i e d . As soon as we s t a r t to t h i n k backward from the goal molecule, the question of the d e s c r i p t i o n of the s t a r t i n g m a t e r i a l a r i s e s . At f i r s t thought i t seems f a r more elegant to ignore t h i s question; we simply work backward and as soon as we reach an acceptably simple molecule o r an a v a i l a b l e molecule, we have completed a s y n t h e s i s p l a n . However, t h i s o f t e n r e s u l t s i n u n i n t e l l i g e n t behavior. To see t h i s , l e t us consider a s t r i n g A-B-C-D, where A,B,C and D are s t r u c t u r a l fragments. Suppose t h a t on the l i s t of a v a i l a b l e substances i s A-B and C-D. Now the program "knows" t h a t A-B and C-D are a v a i l a b l e but i t accesses t h i s knowledge o n l y under a s i n g l e f i x e d circumstance, i . e . a new r e a c t a n t has been generated i n the course of s i m u l a t i n g a r e a c t i o n and the program compares the s t r u c t u r e of t h i s r e a c t a n t with those on i t s l i s t of a v a i l a b l e substances. But the program does not keep i t s a t t e n t i o n on the a v a i l a b i l i t y of A-B and C-D when i t i s s e l e c t i n g a r e a c t i o n t o simulate. For example, i t may generate A-B-C from A-B-C-D. Then i t may generate A-B from A-B-C and thus a two step process. But i n f a c t the one step process of j o i n i n g A-B to C-D may e x i s t and the program w i l l not immediately f i n d i t . I f the planner i n the program decides t h a t the s y n t h e s i s must proceed from A-B and C-D then the only problem remaining i s t o see i f the bond j o i n i n g Β t o C i n the goal molecule can be made. I f so, then the one step s y n t h e s i s i s immediately uncovered. We encountered analogous advantages of d e c i d i n g on a s t a r t i n g skeleton i n more complex circumstances. I f we are seeking a t o t a l s y n t h e s i s of a s t e r o i d , f o r example, and we have the B/C r i n g system already formed i n the s t a r t i n g m a t e r i a l , then the problems to be solved are the a d d i t i o n of the A and D r i n g s , not the formation of the Β and C r i n g s . However, when t h i n k i n g backward from the goal molecule, without a d e f i n i t e s t a r t i n g s k e l e t o n i n view, a program can e a s i l y l o s e i t s way, now adding p i e c e s of the A r i n g , then adding p i e c e s of the C r i n g , e t c . There tends t o emerge, a f t e r much searching and many steps down the s y n t h e t i c t r e e , a h i g h l y branched s t r u c t u r e that has many c h i r a l centers and i s as d i f f i c u l t to make as a s t e r o i d . Such was the u s u a l experience on asking the program to synthesize a s t e r o i d . With a s u i t a b l e upper l i m i t on the number o f c h i r a l centers r e q u i r e d f o r a s t a r t i n g m a t e r i a l t h a t i s not on the a v a i l a b l e substance l i s t , the program does not a c t u a l l y propose such a s y n t h e s i s , but i n v e s t i g a t i o n showed t h a t these were the l i n e s along which much of the u s u a l l y f r u i t l e s s search was proceeding. To avoid t h i s there was i n c o r p o r a t e d i n t o the program a planning r o u t i n e which a t the outset of the problem s o l v i n g process decides on a s t a r t i n g skeleton. When the planner

Olson and Christoffersen; Computer-Assisted Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.

Downloaded by UNIV OF ROCHESTER on June 12, 2018 | https://pubs.acs.org Publication Date: November 28, 1979 | doi: 10.1021/bk-1979-0112.ch016

16.

BERSOHN

Syntheses of Drugs

351

i s given c o n t r o l , no r e a c t i o n simulated may b u i l d t h i s s k e l e t o n . The p l a n n i n g r o u t i n e uses the s t a r t i n g s k e l e t o n f o r a g i v e n number of simulated r e a c t i o n s , e.g. 32,000 and then chooses another s t a r t i n g skeleton and so on u n t i l an acceptable s y n t h e s i s i s found o r the time allowed f o r the job i s exhausted. The f i r s t choice f o r the s t a r t i n g s k e l e t o n i s t h a t input by the user. The second choice i s t h a t imposed by a r e q u i r e d s t a r t i n g substance. The t h i r d choice c o n s i s t s o f a p a r t i c u l a r r i n g o f the s t r u c t u r e of the g o a l molecule. Which r i n g should the s t a r t i n g m a t e r i a l contain? The a l g o r i t h m i c answer t o t h i s question u l t i m a t e l y must be provided by many experienced s y n t h e t i c organic chemists. Corey(12) p r o v i d e d a f i r s t answer t o t h i s question f o r the case of b r i d g e d r i n g systems. However, subsequently Johnson's very e f f i c i e n t s y n t h e s i s o f l o n g i f o l e n e ( 1 3 ) which made two r i n g s a t a time, v i o l a t e d Corey's p r e l i m i n a r y r u l e s by making a bond t h a t was not " s t r a t e g i c " a t the c r u c i a l s t e p . E v i d e n t l y the theory needs refinement. P a r t i c u l a r l y , there i s no theory about an unbridged system. F o r example, we cannot say whether the most convergent s y n t h e s i s o f a s t e r o i d s t a r t s with the A and Β r i n g s or the Β and C r i n g s , e t c . Presumably the answer depends on what f u n c t i o n a l i t y i s around the skeleton, which methyl groups and C-17 s u b s t i t u e n t s are present, e t c . I t may even depend on what r e a c t i o n s have been invented, e.g. m i c r o b i a l o x i d a t i o n a t C - l l : most p r a c t i c i n g chemists are h i g h l y l i k e l y t o conclude t h a t general p r i n c i p l e s are not d i s c o v e r a b l e because they do not e x i s t . The p l a n n i n g r o u t i n e d e s c r i b e d here takes f o r the time being an ad hoc approach o f t r y i n g everything, i . e . each r i n g i s t r i e d s e p a r a t e l y as a s t a r t i n g skeleton and then a l l p o s s i b l e p a i r s o f r i n g s are used. As the general p r i n c i p l e s emerge they w i l l be i n c l u d e d , t o replace the ad hoc approach. 7. Requirement f o r success i v e l y b e t t e r syntheses. A f t e r f i n d i n g an η step s y n t h e s i s the program changes i t s requirements so t h a t any f u r t h e r s o l u t i o n s can be no longer than n-1 steps and must have a higher o v e r a l l y i e l d than t h a t already achieved i n the η step s y n t h e s i s . Acknowledgement: The author g r a t e f u l l y acknowledges f i n a n c i a l support f o r t h i s work from the N a t i o n a l Science and Engineering C o u n c i l o f Canada.

ABSTRACT A report is given about a program that proposes many-step syntheses for a given organic compound, without interacting with the user after the i n i t i a l description of the problem. The problem description includes the structure of the desired goal molecule, complete with chirality, the maximum number of steps for an acceptable synthesis and the minimum overall yield required. Optionally, the problem description may also include

Olson and Christoffersen; Computer-Assisted Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.

Downloaded by UNIV OF ROCHESTER on June 12, 2018 | https://pubs.acs.org Publication Date: November 28, 1979 | doi: 10.1021/bk-1979-0112.ch016

352

COMPUTER-ASSISTED

DRUG DESIGN

the specification of acceptable starting materials. These may be specifically stated, or generally indicated, i.e. with a given skeleton. The maximum number of rings, functional groups and chiral centers for an acceptably simple starting material may also be given. An eight step synthesis of the contraceptive Depo-Provera proposed by the program, which starts from progesterone, is presented. A short prostaglandin synthesis, similar to one reported in the literature, was also proposed by the program. Various ways of managing the combinatorial explosion problem were described. Basically we want to maximize the number of building reactions and minimize the number of facilitating reactions. The facilitating reactions should not be suggested unless they actually facilitate a specific building reaction. There must also be an upper limit on the number of sequential facilitating reactions that may be performed without doing any building reactions.

Literature Cited 1. Finch, A . F . , to be published in J . Chem. Inf. & Comp. Sci., (1979) 2. Fugmann, R., "The IDC System" in "Chemical Information Systems", Ash, J . E . and Hyde, E. editors, John Wiley & Sons, New York, (1975), pp 195 et seq. 3. Corey, E . J . and Wipke, W.T., Science, (1969), 166, 1978 4. Wipke, W.T., Dolata, D., Huber, M. and Buse, C., in this symposium volume, and references therein 5. Bersohn, M., Bull, Chem. Soc. Japan, (1972), 45, 1897 6. Bersohn, Μ., Esack, A. and Luchini, J., Computers & Chemistry, (1978), 2, 105 7. Gelernter, H.L., Sanders, A . F . , Larsen, C.L., Agarwal, K.K., Boivie, R.H., Spritzer, G.A., Searleman, J . E . , Science, (1977), 197, 1041 8. Kato, M., Kurihara, H., Kosugi, H., Watanabe, M., Asuka, S. and Yoshikoshi, Α., J . Chem. Soc. Perkin I, (1977), 2433 9. Arzoumanidis, G.G. and Rauch, F.C., Chem. Commun., (1973), 666 10. Stork, G. and Isobe, M., JACS, (1975), 97, 6260 11. Brown, H.C. and Krishnamurthy, S., JACS, (1972), 94, 7159 12. Corey, E.J., Q. Rev. Chem. Soc., (1971), 25, 455 13. Volkmann, R.A., Andrews, G.C. and Johnson, W.S., JACS, (1975), 97, 4777 Received June 8, 1979.

Olson and Christoffersen; Computer-Assisted Drug Design ACS Symposium Series; American Chemical Society: Washington, DC, 1979.