3 An Organic Chemist's View of Formal Languages H. W. WHITLOCK, JR.
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
Dept. of Chemistry, University of Wisconsin, Madison, Wisc. 53706
I t seems g e n e r a l l y recognized t h a t , except f o r the most difficult of cases, the i n t r o d u c t i o n of mathema tics tends to obscure problems r a t h e r than make t h e i r s o l u t i o n e a s i e r . Be that asitmay we are q u i t e intri gued by the r e l a t i o n between formal languages and chemical s t r u c t u r e s , particularly when c o n s i d e r i n g the area of computerization of organic s y n t h e s i s . The pur pose of this paper is to o u t l i n e some of our thoughts on the above s u b j e c t , showing that one can apply common f a c t s derived from language theory to questions of chemical i n t e r e s t . For the chemist readers we will de fine languages and grammars. We will then discuss a number of "theorems" of organic chemistry (formal proof will not be attempted). F i n a l l y , we will point out that a c e r t a i n subset of organic s y n t h e s i s , the Func tional Group Switching Problem, i s amenable t o attack by viewing it as a problem i n the context of formal languages. Molecules as S t r i n g s of Symbols As we will see, the tenets of language theory assume that one is d e a l i n g with s t r i n g s of symbols: one dimensional l i n e a r assays. I f we wish t o analyze molecules according to t h i s theory it behooves us t o i n q u i r e as to what extent we may consider molecules t o be l i n e a r entities. We immediately note that mole cules, in particular the set of all s t r u c t u r e s con t a i n i n g r i n g s , are i n h e r e n t l y nonlinear in nature. On the other hand we recognize that we can represent any t h i n g i n a l i n e a r manner. I f we make the r a t h e r i n t e r e s t i n g equation of r e p r e s e n t a t i o n w i t h s t r u c t u r e then it f o l l o w s that s t r u c t u r e s , as represented, may be l i n e a r . Now one can c a r r y this as far as one likes but it seems t o the author that f o r most purposes one 60 Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
WHITLOCK
Organic
Chemist's
View
of Formal
61
Language
should r e s t r i c t ones a t t e n t i o n to l i n e a r representa t i o n s that make a reasonable amount of i n t u i t i v e chemi c a l sense. For t h i s reason we w i l l not consider the case of c y c l i c s t r u c t u r e s . What types of molecules can be n a t u r a l l y represen ted i n a l i n e a r manner? C l e a r l y s t r a i g h t - c h a i n e d s t r u ctures can. The representation of n-hexane, CH^CHgCHgCI^CHgCHg,
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
i s i n f a c t what one t h i n k s of when considering t h i s molecule (representation equals s t r u c t u r e ) . In p a r t i c u l a r t h i s " s t r u c t u r e " i s i n t u i t i v e l y a s t r i n g of s i x symbols, namely
S i m i l a r l y e t h y l crotonate i s CH CH=CHCOOCH CH . This example p o i n t s up two d i f f e r e n c e s between a l i n e a r notation of a s t r u c t u r e and the a c t u a l s t r u c ture i t s e l f . The above i s a s t r i n g of symbols so we have to s p e c i f y what symbols are involved. P o s s i b i l i t i e s f o r e t h y l crotonate are: CH ,CH,=,CH,CO,OCH ,CH ; CH=CH,COO,CH ,CH ; and CH ,CH=CHCO,0,CH CH . Since i n general one w i l l attach meanings to the symbols i n volved, these d i f f e r e n t representations may have d i f ferent meanings. For example, the l a s t above might be described as s e q u e n t i a l l y a methyl, an αβ-unsaturated carbonyl, a dicoordinate oxygen, and an e t h y l . This i s a p e r f e c t l y good d e f i n i t i o n of t h i s molecule and, as a s t r i n g , i s somewhat more informative than a mere t a b u l a t i o n of the i n d i v i d u a l p a r t s . The second major d i f f e r e n c e between l i n e a r n o t a t i o n and a c t u a l s t r u c tures l i e s i n the observation that s t r i n g s have an i n herent ordering from l e f t to r i g h t while s t r u c t u r e s do not. This leads to a many i n t o one mapping of ordinary l i n e notations i n t o s t r u c t u r e s . Having seen that the ordinary l i n e n o t a t i o n of unbranched s t r u c t u r e s may be more or l e s s equated with the s t r u c t u r e i t s e l f we next turn to the case of branched s t r u c t u r e s . Just as an unbranched s t r u c t u r e corresponds to a s t r i n g of symbols, a branched s t r u c ture has as i t s counterpart a t r e e . The nodes of the t r e e are part s t r u c t u r e symbols and the edges are bonds.* Thus 2,2,4-trimethylhexane may be represented by a number of t r e e s , one of which i s 3
2
3
2
3
3
• M u l t i p l e bonds may be represented example as a n o n s t r u c t u r a l node.
3
2
2
3
s e v e r a l ways, f o r
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3
62
COMPUTER-ASSISTED
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
CH
3
CH
3
CH
ORGANIC
SYNTHESIS
3
One can define " g l o b a l " part s t r u c t u r e s as was done above f o r e t h y l crotonate and one notes that again there w i l l be many tree representations per s t r u c t u r e . The point of t h i s , of course, i s t h a t , as i s w e l l knownQ), trees ( i n p a r t i c u l a r binary t r e e s ) may be r e presented as l i s t s . The above tree has a l i s t represen t a t i o n , CH (C(CH3)(CH3)CH3)CHCCH2CH )CH3. Now t h i s doesn't look too appealing t o the chemists t r a i n e d eye but CH C(CH3)CCH3)CH2CHCCH3)CH CH does. This i s a l i s t representation of the tree 2
3
3
2
3
and i s s u s p i c i o u s l y close t o our usual l i n e notation of branched s t r u c t u r e s . We w i l l show below that the set of a l l a c y c l i c s t r u c t u r e s , wherein by " s t r u c t u r e " we mean that i n t u i t i v e l y w e l l defined s t r u c t u r a l n o t a t i o n used by organic chemists, comprises a context f r e e language. But f i r s t we must define the concept of languages and grammars. Grammars and Languages The f o l l o w i n g i s a very b r i e f i n t r o d u c t i o n t o the subject. We r e s t r i c t ourselves t o those aspects that are d i r e c t l y r e l a t e d t o the problem of applying the theory of languages to organic chemistry. For a more complete i n t r o d u c t i o n the reader i s r e f e r r e d to a num ber of e x c e l l e n t texts.(2-5) As conceived by Chomsky(6) the f o l l o w i n g are the
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
Organic
WHiTLOCK
Chemist's
View
of Formal
Language
63
c e n t r a l aspects of t h i s subject. Symbols. There i s some (normally f i n i t e ) set of symbols from which s t r i n g s (sentences) are made of. This set (Vx, the t e r m i n a l vocabulary) would be {CH , CH } f o r the unbranched alkanes above and would be {CH , CH ,CH,C,(,)} f o r the l i n e a r representation of branched alkanes. As a grammar embodies the concept of d e r i v a t i o n of some sentence i n a language there i s a l s o defined a set of symbols ( V N , nonterminal vocabulary). These are used i n d e r i v a t i o n s but do not appear i n the f i n a l sen tences of the language defined by the grammar of i n t e r est. The b a s i c act of d e r i v a t i o n i n v o l v e s replacement of a nonterminal symbol i n a s t r i n g by a s t r i n g . For example the s t r i n g CH CH(R)R may be turned i n t o the s t r i n g CH CH(CH R)R by r e p l a c i n g the nonterminal R by the s t r i n g CH R. On the other hand i t might be changed i n t o CH CH(CH )R by r e p l a c i n g R by CH , de pending on what our r u l e s are f o r e f f e c t i n g these changes. F i n a l l y , there i s some unique member of V J J , the " s t a r t " symbol, from which a l l sentences may be derived. 3
2
3
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
2
3
3
2
2
3
3
3
Productions. A production i s j u s t a r u l e f o r making the above changes. Replacement of R by CH R i s symbolized by R •> CH R; replacement of R by CH , by R •> CH . As c o n v e n t i o n a l l y t r e a t e d a p p l i c a t i o n of pro ductions i s permissive i n the sense that there are no r u l e s s t a t i n g what production of some set must be a p p l i e d to a given s t r i n g . The problem of determining what s e r i e s of productions w i l l turn the unique s t a r t symbol S i n t o some s p e c i f i e d sentence then becomes an o c c a s i o n a l l y i n t r i c a t e puzzle. The general form of a production i s αΧβ + αω$, where X i s some nonterminal symbol and a, 3, and ω are a r b i t r a r y s t r i n g s . Produc t i o n s of t h i s form w i t h no r e s t r i c t i o n s on a, 3, and ω are of type 0. Those with the r e s t r i c t i o n that ω not be the empty symbol are of type 1. An equivalent statement i s that a type 1 production may be of the form αγ3 αω3, length (ω) >_ length γ, γ being an a r b i t r a r y s t r i n g c o n t a i n i n g at l e a s t one nonterminal sym b o l . The presence of the context α and 3 leads to the term context s e n s i t i v e i n d e s c r i b i n g these productions. Productions of the Form X •> ω are of type 2. The absence of a context, α and 3 leads to the term context f r e e f o r t h i s type of production. F i n a l l y , the sim p l e s t type of production, type 3 or r e g u l a r , i s r e s t r i c t e d to be of e i t h e r the form X + a (X i n V N , a i n V ) or X ·> aY (X and Y i n V N , a i n V ) . 2
2
3
3
T
T
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
64
Note that a l l context s e n s i t i v e productions are of type 0, a l l context f r e e productions are context s e n s i t i v e , and a l l r e g u l a r productions are context f r e e . Note a l s o the d i r e c t analogy between productions as de f i n e d above and chemical r e a c t i o n s . The context s e n s i t i v e production CH=CH CHOH -> CH=CH CO i s e q u a l l y viewed as a r e a c t i o n . A p p l i c a t i o n of t h i s production t o the s t r i n g GH CHOHCH CH2CH=CHCHOHCH3 may produce the new s t r i n g CH CHOHCH CH CH=CHOOCH3 but not CH COCH CH CH= CHCH0HCH . C l e a r l y i f s t r u c t u r e s are equated w i t h s t r i n g s , chemical r e a c t i o n s have as t h e i r counterpart productions. The term "context s e n s i t i v e " has very s i m i l a r meanings i n both cases. Along these l i n e s the production CH(OCH ) -> CHO i s of type 0 as i t leads t o a decrease i n the length of the s t r i n g . * The context free production CHO •> CH(R)0H, corresponds t o Grignard a d d i t i o n t o an aldehyde, while examples of r e g u l a r pro ductions are CH 0H •> CH Br, CHOH -> CO, e t c . Just as type 0 productions lead t o a r i c h e r language the analogous chemical r e a c t i o n s lead t o a r i c h e r more com plex chemistry as we go from the simple r e g u l a r "func t i o n a l group s w i t c h i n g " r e a c t i o n s t o those i n v o l v i n g b l o c k i n g and deblocking r e a c t i o n s . 3
2
3
2
2
3
2
2
3
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
3
2
2
3
Grammar. A grammar i s j u s t a defined set of non t e r m i n a l and t e r m i n a l symbols, a s p e c i f i e d member of V N (the s t a r t symbol) and a set of productions. Viewing r e a c t i o n s as productions we may define a chemistry as a set of molecular p a r t s , a s p e c i f i e d s t a r t i n g m a t e r i a l , and a set of r e a c t i o n s . This assumes that the chemis t r y can be thrown i n t o the proper grammatical form as discussed above. Languages. For a defined grammar, i t s attendant language i s the set of a l l s t r i n g s (over V T ) that can be generated by repeated a p p l i c a t i o n of the grammar's pro ductions, s t a r t i n g w i t h the s t a r t symbol. Pursuing our analogy between grammars and s y n t h e s i s , the langu age defined by some chemistry (chemical grammar) i s the set of a l l molecules that can be synthesized from the s p e c i f i e d s t a r t i n g m a t e r i a l by repeated a p p l i c a t i o n of the r e a c t i o n s — a language of s y n t h e s i z a b l e s t r u c t u r e s f o r that chemistry. Just as grammars may be of type 0, 1, 2 or 3 according t o the most complex type of pro duction present, a language i s of type 0 i f s p e c i f i e d by a type 0 grammar, e t c . Note that while a grammar i s •Assuming our symbols are CH, 0CH , (,), 2, CHO. 3
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
3.
Organic
WHITLOCK
Chemist's
View
of Formal
65
Language
an exact (although sometimes opaque) d e f i n i t i o n of a language, a language does not i n general s p e c i f y a un ique grammar. Chemical questions which are a d i r e c t t r a n s l i t e r a t i o n of t h e i r corresponding language theory counterparts are: Given two chemical grammars: do they define the same language of s y n t h e s i z a b l e s t r u c tures. Given a p a r t i c u l a r chemical grammar (chemistry) of say type 0, i n v o l v i n g various blocking-deblocking sequences, i s there a simpler chemistry that defines the same language of s y n t h e s i z a b l e s t r u c t u r e s . Given a chemistry and a p a r t i c u l a r molecule i s the molecule a member of the chemistry's language (can i t be synthe s i z e d ? ) . Since t h i s membership question becomes more and more complicated as we go from type 3 to type 0 languages, can we place some sort of upper and lower l i m i t s on the complexity of t h i s problem w i t h i n the context of language types? These questions w i l l be dealt w i t h below. Examples 1) Grammar 1 V = {CH , CH , CEU} V = {ALKANE, R} S t a r t symbol = ALKANE Productions: ALKANE •> CH* ALKANE + CH R R + CH R -> CH R T
3
2
N
3
3
2
Pl.l PI.2 PI.3 PI.4
This i s an example of a s t r u c t u r a l grammar. The productions correspond to r u l e s f o r generating n-alkane s t r u c t u r e s rather than to chemical r e a c t i o n s . The language s p e c i f i e d by t h i s grammar i s the set of a l l nalkanes. D e r i v a t i o n of η-butane i s achieved t h u s l y : P1
2
P1
ALKANE ' >
4
P 1
CH R ' >
4
CH CH R ' )
3
3
2
P1
3
'>
CI^CHgCHgR CH CH CH CH 3
2
2
3
The sentence CH CH(CH )CH i s not i n t h i s language^ nor are CH CH OH or CH R ( t h i s l a t t e r i s a s e n t e n t i a l form). Since a l l productions are of type 3 (X + a or X •> aY) the set of a l l alkanes comprises a r e g u l a r language. The membership question f o r r e g u l a r langu ages i s exceedingly simple. From the r e g u l a r grammar above we may construct the algorithm or "machine" shown i n Figure 1. Rules f o r c o n s t r u c t i n g machines such as t h i s from r e g u l a r grammars are described elsewhere.(7) One s t a r t s i n the s t a r t s t a t e and makes s t a t e 3
3
2
3
3
3
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
66
Figure 1. Finite language defined ALKANE.
ORGANIC SYNTHESIS
state machine for recognizing members of the by grammar 1. The start state is that labelled The accept state is that one labelled F.
t r a n s i t i o n s as one reads the s t r i n g of i n t e r e s t from l e f t t o r i g h t . I f one i s i n the accept s t a t e when no more symbols are l e f t the s t r i n g i s a member of the language. I f not, not. Note that the machine requires only a f i n i t e amount of memory; hence the term f i n i t e s t a t e machine. 2) GRAMMAR 2 V = {CH , CH , OH, MgBr, Br} V = {S, OH, Br, MgBr} S t a r t State = S Productions: S CH OH OH ·* Br Br + MgBr MgBr CH OH T
3
2
N
3
2
P2.1 P2.2 P2.3 P2.4
This i s a chemical grammar, the productions correspond ing t o r e a c t i o n s . * The language i s the set of a l l 1a l k a n o l s , 1-alkylmagnesium bromides and 1-bromoalkanes. *The reader w i l l note that V and V»r are not d i s j o i n t as they are supposed t o be. This was done f o r c l a r i t y ' s sake as t h i s grammar can be e a s i l y r e w r i t t e n t o con form t o the standard d e f i n i t i o n . N
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
Organic
WHiTLOCK
Chemist's
View
of Formal
Language
67
Although the productions are not of a l l of the r e g u l a r form X a or X •> aY i t i s e a s i l y r e w r i t t e n to conform to t h i s format. The language i s thus a regular langu age and a f i n i t e s t a t e machine f o r recognizing members of the language i s e a s i l y constructed. Note that a d e r i v a t i o n of a sentence corresponds d i r e c t l y to i t s synthesis. D e r i v a t i o n of e t h y l bromide proceeds as: 2
1
S Ρ · ; CH OH
P
2
3
'
2
A
P 2
3
P 2
4
CH^Br ' > CH^MgBr ' > CH^CHgOH
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
P
2
'
2
j
CH CH Br 3
2
Theorem. The set of a l l wellformed a c y c l i c s t r u c tures comprises a d e t e r m i n i s t i c context free language. We have defined above what we mean by "structures," the ordinary l i n e n o t a t i o n used by organic chemists wherein we r e l y on a s t r i n g representation with branch ing i n d i c a t e d by p a r e n t h e s i z a t i o n . The reader w i l l r e c a l l that context free grammars allow more complicated nested productions, e.g. S •> aSb than are allowed f o r by r e g u l a r productions. Consider grammar 3 below: GRAMMAR 3 V = {CH , CH , CH, V = {S, R} S t a r t Symbol = S Productions: S S •> R ·> R + R + T
3
2
CH*,
(,)}
N
CH^ CH R CH CH R CH(R) 3
3
2
P3.1 P3.2 P3.3 P3.4 P3.5
This i s j u s t Grammar 1 with the a d d i t i o n of production P3.5. This production i s not regular but i s context f r e e . I t i s t h i s production that allows us to generate a r b i t r a r i l y branched s t r u c t u r e s such as CH CH(CH(CH CH )CH )CH . That the language defined by t h i s grammar i s not regular ( i . e . cannot be generated by a regular grammar) f o l l o w s from i t s being homomorphic a l l y equivalent with the set of balanced parentheses. The r e c o g n i t i o n problem f o r context f r e e languages i s i n h e r e n t l y more d i f f i c u l t than that f o r regular l a n guages i n that one needs u n l i m i t e d memory ( e s s e n t i a l l y for the reason i n t h i s case of remembering what branch one i s c u r r e n t l y examining). A complete grammar that handles a large subset of a c y c l i c s t r u c t u r e s i s pre sented i n Figure 2. The language i s not regular but i s context f r e e , as can be v e r i f i e d by observing the form of the productions. This grammar i s the b a s i s of a d e t e r m i n i s t i c algorithm f o r parsing ( i . e . recogniz3
2
3
3
3
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC
68
V
T
SYNTHESIS
= {CH CH 0 HC0 H HCN CC> CO CH OH COOH 4
2
2
2
3
CHO CN C l Br HO HOgC OHC NC CH Ο 2
0 C H C CH HC C = 2
V
N
2
Ξ
( ) N}
= {S LMV RMV CHN DB RDB TB RTB SM SRP SLP DV LDV RDV TV LTV QV}
S t a r t Symbol:
S
PRODUCTIONS: Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
Structure S -> SM / SLP RMV / (LMV)2 TV RMV / (LMV)3 QV RMV / LDV(RMV)2 / LDV RBD / TLV RTB / LTV(RMV3 / QV(RMV)4 / (LMV3 TV / (LMV)4 QV / (LMV)2 QV RDB L e f t Monovalent LMV -> SLP / SLP CHN / (LMV)2 TV CHN / (LMV)2 TV / (LMV)3 QV / (LMV)3 QV CHN / (LMV)2 QV DB/ LDV DB / LTV TB Right Monovalent RMV + SRP / DV RMV / TV(RMV)RMV) / TV(RMV)2 / TV RDB / QV(RMV)2 RMV / QV(RMV)3 / QV RTB / QV(RMV)RDB / (CHN)N RMV / QV(RMV)(RMV)RMV / QV(RDB)RMV Continued Figure
2a.
Context Free structural grammar for computer input of linear notation via teletype. Meanings of nonterminal symbols are as above.
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
structure
3.
Organic
WHITLOCK
Chemist's
View
of Formal
Language
69
Chaining CHN -> DV / DV CHN / TV(RMV) / TV(RMV)CHN / QV TB / QV(RMV)(RMV) / QV(RMV)(RMV)CHN / QV(RMV)2 CHN / TV DB / QV(RMV)DB Double Bond DB -> TV / = TV CHN / = QV(RMV) / = QB(RMV)CHN /
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
= QV DB Right Double Bond RDB •> RDV / « TV RMV / - QV(RMV)RMV) / = QV(RMV)2 / = QV RDB T r i p l e Bond TB -> RTB +
Ξ
QV /
Ξ
AV CHN
Ξ
TV /
Ξ
QV RMV
SM + C H 4 / CH 0 / HC0 2 H / HCN / C 0 2
2
/ CO
SRP + C H 3 / OH / C0 2 H / CHO / CN / CI / Br SLP -> C H 3 / HO / H0 2 C / OHC / NC / CI / Br DV + C H
2
LDV + C H
2
RDV TV
/ Ο / CO / C 0 2 / 0 2 C /
H2C
C H 2 / Ο / CO CH
LTV •> CH / HC QV -> C Figure
2a. Continued
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
70
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Figure 2b.
Parse tree of (CHs),C=CHCH2OH, sentative terpenoid
a repre
ing) s t r u c t u r e s input i n t o a LISP organic synthesis program w r i t t e n by P. Blower (8). This grammar gener ates most common f u n c t i o n a l groups (those not included such as -N0 were l e f t out f o r nongrammatical reasons) and includes chaining, the representation of repeated subunits by the enclosing of them w i t h i n brackets. The representations of η-butane, C H 3 C H 2 C H 2 C H 3 , C H ( C H ) 2 C H , C 2 H 5 C 2 H 5 , C H 3 C H 2 C 2 H 5 , and others are a l l accepted by t h i s program, parsed, and converted i n t o i n t e r n a l r e presentations, p o i n t i n g up the f a c t that there i s a many i n t o o&e mapping of s t r u c t u r a l representations into structure. We have seen above a simple s y n t h e t i c grammar wherein the productions are d i r e c t l y derived from chemi c a l r e a c t i o n s and the r e s u l t i n g language of s y n t h e s i z able s t r u c t u r e s i s the set of a l l molecules that can be synthesized from some s p e c i f i e d precursor by a p p l i c a t i o n of the r e a c t i o n s . In the simple case of 1a l k a n o l s and 1-bromoalkanes the r e c o g n i t i o n problem i s t r i v i a l — w e can answer i t i n a time p r o p o r t i o n a l to the length of the molecule. We are n a t u r a l l y curious as to what i s the minimally complex grammar needed to 2
3
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2
3
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
3.
WHiTLOCK
Organic
Chemist's
View
of
Formal
Language
71
mimic organic s y n t h e s i s of the more c o n v e n t i o n a l l y complex type. This i s not a s i l l y question f o r the f o l l o w i n g two reasons. Although one can quibble as to the extent to which a c y c l i c molecules may be equated with s t r i n g s there i s no question that the s i m i l a r i t y i s marked. Moreover, at l e a s t from a complexity sense i t i s c l e a r that algorithms f o r r e c o g n i z i n g r e g u l a r s t r u c t u r e s are i n h e r e n t l y simpler than those f o r r e cognizing context f r e e ones. S i m i l a r l y the r e c o g n i t i o n of members of context s e n s i t i v e languages i s more d i f f i c u l t yet. This i s true regardless of whether one i s t a l k i n g about the r e c o g n i t i o n of w e l l formed or synthes i z a b l e s t r u c t u r e s , or whether one i s doing t h i s i n a formal mathematical sense or v i a computer programs. Secondly i t i s not obvious that the question of synthes i z a b i l i t y i s i n f a c t answerable at a l l f o r a l l w e l l defined organic molecules. By answerable we mean having a r e c o g n i t i o n procedure that w i l l terminate i n some (not n e c e s s a r i l y short) p e r i o d of time w i t h the answer yes or no. The set of context s e n s i t i v e l a n guages are recognized by the s o c a l l e d l i n e a r bounded automata and the question of membership of some s t r i n g i n a defined context s e n s i t i v e language i s known to be answerable. Type 0 languages on the other hand are r e cognized by Turing machines and the question of member ship i s not answerable f o r type 0 languages as a s e t , although i t may be f o r some subset. Thus i f we cannot develop cogent arguments f o r the s u f f i c i e n c y of con t e x t s e n s i t i v e languages as a model f o r oganic syn t h e s i s we are l e f t w i t h the p o s s i b i l i t y that organic synthesis cannot be "solved" by computer. No guaran tees though, since organic chemistry i s not a c l o s e d science and what may be an a c c e p t i b l e s y n t h e s i s under some circumstances w i l l be unacceptible under others. We f i r s t show by a counterexample that context f r e e languages are i n s u f f i c i e n t model f o r organic s y n t h e s i s . We then argue ( a l a s we cannot prove, f o r the above reasons) that context s e n s i t i v e languages are a suf f i c i e n t model. Theorem. There e x i s t s a language of s y n t h e s i z a b l e s t r u c t u r e s that i s not context f r e e . I t i s w e l l known that languages such as ww, where w i s some s t r i n g over V T , are not context f r e e . For example the set of a l l alkanes CH CH(Ri)R where Ri=R2 i s not a context f r e e language. The set of a l l t e r t i a r y a l c o h o l s derived by Grignard a d d i t i o n to methyl acetate, represented as CH COH(Ri)R , Ri=R i s thus not context f r e e . That i t i s context s e n s i t i v e f o l l o w s from c o n s t r u c t i o n of a context s e n s i t i v e grammar f o r 3
3
2
2
2
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
72
COMPUTER-ASSISTED
ORGANIC SYNTHESIS
generating t h i s set (grammar 4, Figure 3). GRAMMAR 4 = {CH , CH, OH, C H , (,)} V N = {S, R, Δ, V, #, Χ, Y, Ζ, { W i | i c { C H , C H , CH (,)}}} S t a r t symbol = S S -* CH CHOH Δ V R # R -> CH X|CH R|CH(R)R i X + X i , i e { C H , C H , CH, (,)} RX + R XX -> X VX -> VY Yi iY YR -*• R YX + X Y# + Ζ iZ WiZi JW + WiJ, j e { C H , C H , CH, (,)} VW •> WiV AWi + Δ ί Δ - ( V -> ) Ζ Language = C H C H O H ( R i ) R , R =R
VT
3
2
3
2
3
3
2
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
3
±
2
3
2
±
3
2
x
2
Figure 3
Now i f we consider f u r t h e r the r e l a t i o n s h i p be tween d e r i v a t i o n of a sentence and i t s s y n t h e s i s , e s p e c i a l l y the r e l a t i o n s h i p between the intermediate s e n t e n t i a l forms and the precursor s t r u c t u r e s i n the syntheses we are l e d t o the f o l l o w i n g theorem. Theorem ( ? ) . For a chemistry derived from func t i o n a l group switching r e a c t i o n s , condensation reac t i o n s , demasking of masked f u n c t i o n a l groups, and b l o c k i n g and deblocking r e a c t i o n s : the language so de f i n e d i s a context s e n s i t i v e one ( i . e . there e x i s t s an equivalent context s e n s i t i v e grammar that generates the same language). The question of s y n t h e s i z a b i l i t y i s thus answerable. This f o l l o w s from the s o c a l l e d workspace theorem of context s e n s i t i v e languages.(9) The "proof" of t h i s simply e n t a i l s s t a t i n g i n f o r m a l l y the proof of the work space theorem, l e t t i n g s y n t h e t i c intermediates correspond t o i n d i v i d u a l steps i n the d e r i v a t i o n of the target molecule (sentence). 1) Consider a synthesis D of the form: S + Xο + X, 1 + '*' X„ η = target ° where S i s the s t a r t i n g m a t e r i a l (symbol), X
n
is a
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
WHiTLOCK
Organic
Chemist's
View
of
Formal
73
Language
sentence i n the language of synthesizable s t r u c t u r e s and X i X±+i i s some conversion. Assume moreover that there i s some estimate SIZE ( X i ) of the s i z e of each X i . We define the complexity C(X ,D) of the target X f o r t h i s p a r t i c u l a r synthesis to the max{SIZE(Xi), 0
—
CH3CH2COOH
Assume that the l e a s t complex route i s ^COOCHa CH3OH > CH Br > CH CH —» 3
3
CH3CH2COOH
\00CH3
SIZE
2
3
8
3
The r a t i o WS(X )/SIZE(X ) =2.7. However as the mole cules get l a r g e r , assuming that the condensing agents (e.g. malonic e s t e r ) stay constant i n s i z e , t h i s r a t i o n
n
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
74 decreases. CH3CH2OH
Thus f o r CH CH Br 3
2
—>
^COOCKU CH3CH2CH > CH (CH )2COOH \:000Η 9 4 3
2
3
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
SIZE
3
3
the r a t i o decreases to 2.25. For syntheses i n v o l v i n g condensation r e a c t i o n s i t would seem that there w i l l e x i s t some p. c) The same argument may be a p p l i e d to b l o c k i n g deblocking sequences and to operation i n v o l v i n g masked f u n c t i o n a l groups. I t seems i n e v i t i b l e that i f we con s i d e r b l o c k i n g groups to incrementally increase the complexity of some precursor the r a t i o WS(Xn)/SIZE(X ) must approach some number as s i z e ( X ) gets l a r g e r . In f a c t the only s i u t a t i o n wherein t h i s would not be the case would seem to be one wherein the complexity of some necessary b l o c k i n g group depended e x p o n e n t i a l l y on the complexity of the intermediate being blocked. This type of exponential b l o c k i n g group i s unknown to organic chemistry and indeed seems f o r e i g n to the very concept of i s o l a t e d and i n t e r a c t i n g f u n c t i o n a l groups. F u n c t i o n a l groups, even i n a relaxed d e f i n i t i o n are i n herently l o c a l a f f a i r s . d) Consideration of a number of published syntheses of complex n a t u r a l products suggests a ρ of approximately 1.8, a remarkably low number. The author f i n d s i t s t r i k i n g indeed that those syntheses i n v o l v i n g the s e l e c t i v e reagents c h a r a c t e r i s t i c of " s y n t h e t i c methods" chemistry so much i n the vogue l a t e l y seem to have a smaller workspace than those c a r r i e d out i n the grand t r a d i t i o n a l manner. We conclude that the workspace theorem i s probably v a l i d f o r organic s y n t h e s i s , although i t i s c e r t a i n l y true that the above a n a l y s i s ignores questions d e a l i n g w i t h the formal r e l a t i o n s h i p between s y n t h e t i c schemes and d e r i v a t i o n s . It would be nice to be able to w r i t e a program that would take as input two sets of chemical reactions^ the second being the f i r s t augmented with some new r e a gent, and give as output the answer to the question: "Does t h i s new s y n t h e t i c method allow us to do anything we couldn't do i n i t s absence?". This chemical c r i t i que program would be at l e a s t u s e f u l to e d i t o r s of chemical j o u r n a l s . We note a l a s that f o r the set of context s e n s i t i v e languages the problem of equivalence of two grammars i s not answerable. Thus the above undertaking would seem to be a dubious one. Now of n
n
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
WHiTLOCK
Organic
Chemist's
View
of Formal
Language
75
course the f a c t that the equivalence problem i s un answerable f o r the set of a l l context s e n s i t i v e l a n guages does not mean that i t i s so f o r some subset, f o r example augmenting a chemistry c o n t a i n i n g sodium hydroxide by a d d i t i o n of the reagent potassium hydr oxide. On the other hand we note that the equivalency problem i s not answerable f o r even the simpler set of context f r e e languages so i t seems u n l i k e l y that one could w r i t e a general program that would compare two chemistrys based on context s e n s i t i v e r e a c t i o n s and that would h a l t i n some f i n i t e time w i t h the e q u i valence answer. Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
The F u n c t i o n a l Group Switching Problem. (10) One frequently has occasion when d e v i s i n g a s y n t h e t i c scheme t o adjust f u n c t i o n a l i t y i n a molecule i n a manner that i s only i n d i r e c t l y r e l a t e d t o the s y n t h e t i c problem at hand. For example i f one d e s i r e s the transformation
i t i s necessary t o block the ketone toward a c t i o n of the organometallic reagent employed. Bearing i n mind the assumptions b u i l t i n t o t h i s problem we may define a "molecule" as being simply an ordered set of func t i o n a l groups. Chemically t h i s i s equivalent t o the idea of a molecule s being a set of f u n c t i o n a l groups imbedded i n a s t a t i c molecular framework. D e f i n i t i o n of a r e a c t i o n as an ordered t r i p l e t , (reagent, precursor f u n c t i o n a l group, product func t i o n a l group) then corresponds t o the assumption that r e a c t i o n s only i n t e r c o n v e r t f u n c t i o n a l groups; they do not lead t o increments of the carbon skeleton, or i f they do, i t i s only t o f i n i t e and l i m i t e d degree. Now t h i s sounds l i k e a rather r e s t r i c t e d p i c t u r e of organic chemistry, and i t i s , but i t i s s u r p r i s i n g l y c l o s e to the way one t h i n k s about b l o c k i n g group i n t e r conversions. We can pose the f u n c t i o n a l group s w i t c h ing problem w i t h i n the context of t h i s s t r u c t u r a l no t a t i o n i n the f o l l o w i n g manner. What i s the shortest sequence of reagents that w i l l transform a defined s t a r t i n g m a t e r i a l S = ( S i , S 2 , * ' ' S ) , where S i i s the i t h f u n c t i o n a l group of compound S, i n t o the target Τ = (Ti,T2,···Τ ). The reagent sequence turns S i i n t o T i , S2 i n t o T2. etc. With t h i s d e f i n i t i o n of the f
n
η
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
76
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
f u n c t i o n a l group switching problem we are lead to the following. Theorem. The set of a l l sequences of reagents that a f f e c t s a s t a t e d f u n c t i o n a l group switching pro blem comprises a r e g u l a r language. The proof of t h i s theorem i s f a i r l y obvious and not too i n t e r e s t i n g . What i s i n t e r e s t i n g i s the con sequence of the theorem. The proof f o l l o w s from the r e c o g n i t i o n that our f u n c t i o n a l group r e a c t i o n d i c t i o n a r y i s a f i n i t e d i r e c t e d graph wherein the nodes are l a b e l l e d with f u n c t i o n a l groups and the edges with reagents. A small r e a c t i o n graph i s shown i n Figure 4. This i s of the same form as the f i n i t e s t a t e ma chine i n Figure 1 and i f we define a s t a r t s t a t e (e.g. CO) and an accept s t a t e (e.g. CHOAc) i t i s a f i n i t e s t a t e machine f o r r e c o g n i z i n g a l l sequences of reagents that w i l l turn a ketone i n t o a secondary ace t a t e . The reagent sequence (NaBH^ DHP H 0 NaOH A c 0 ) i s a member of the language so defined while (NaBHi* DHP 3
H0 3
C r 0 / p y Α ο 0 ) i s not. 3
2
2
The r e l a t i o n s h i p between r e
gular languages, r e g u l a r grammars, and f i n i t e s t a t e machines i s such that most i n t e r e s t i n g questions d e a l ing w i t h them are answerable. Whether a p a r t i c u l a r s t r i n g i s i n the language i s answerable ( i . e . does t h i s reagent sequence do the t r i c k ) . More i n t e r e s t i n g how ever i s the f o l l o w i n g which represents a general s o l u t i o n to the f u n c t i o n a l group switching problem. Our r e a c t i o n d i c t i o n a r y defines, f o r the f u n c t i o n a l group switching problem Sx •> Τχ (S^ and Τχ s i n g l e f u n c t i o n a l groups) a language of s u f f i c i e n t s y n t h e t i c sequencesL For a problem S2 T2 a language L i s defined. I f we want to convert the binary compound ( S i S 2 ) i n t o the compound (Τι T ) , the language f o r t h i s i s the i n t e r s e c t i o n of L i and L , i . e . those members common to L i and L are sequences that convert S i i n t o T i and S into T . I t i s known that the i n t e r s e c t i o n of two r e gular languages i s i t s e l f regular so the problem of f i n d i n g the shortest sequence of reagents f o r e f f e c t i n g ( S i S ) —^> ( T i T ) i s that of f i n d i n g the shortest s t r i n g i n a r e g u l a r language. The d e t a i l s of construc t i n g the algorithm are presented elsewhere ( 1 0 ) but t h i s p o i n t s up what the author f e e l s to be one of the r e a l l y p r e t t y aspects of language theory. The conven t i o n a l proof of the statement that the membership ques t i o n i s answerable f o r regular languages i s construc t i v e i n that a proveably c o r r e c t algorithm f o r doing so i s developed. One may then s t a r t from t h i s f a c t and develop more e f f i c i e n t ways of achieving t h i s end. We o f f e r as evidence that the l i n g u i s t i c approach to 2
2
2
2
2
2
2
2
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
WHITLOCK
Organic
Chemist's
View
of Formal
Language
a e{DHP, NABH^, Cr0 Py, Ac 0, NaOH} 3
2
b e{DHP, NaBH , Cr0 Py, Ac 0, H 0} 4
3
2
3
c είϋΗΡ, Cr0 py, Ac 0, HgO, NaOH} 3
2
Figure 4. A small reaction graph for the four functional groups CO (ketone), CHOH (secondary alcohol), CHOTHP (tetrahydropyranyl ether), and CHOAc (secondary acetate)
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
78
ORGANIC SYNTHESIS
organic synthesis i s of more than j u s t i d l e i n t e r e s t by presenting i n Figure 5 some representative problems w i t h t h e i r s o l u t i o n s . These examples, although pre sented i n the rather disembodied order n-tuplet s t r u c t u r a l n o t a t i o n , c l e a r l y show that the r e s u l t s of t h i s approach represent nonobvious answers t o n o n t r i v i a l problems. I t would seem that the b a s i c feature of t h i s approach t o organic synthesis i s a p p l i c a b l e t o more complicated s t r u c t u r a l models as long as s e v e r a l condi t i o n s are met. F i r s t l y the s y n t h e t i c problem of i n t e r e s t must be capable of d i s s e c t i o n i n t o some inde pendent subproblems. This i s so because the s o l u t i o n procedure i n v o l v e s at l e a s t i m p l i c i t c o n s t r u c t i o n of the part s o l u t i o n s and working with t h e i r i n t e r s e c t i o n . Secondly the f i n i t e n e s s of the various sub-reaction d i c t i o n a r i e s i s important since one i s guided i n s o l v ing a problem by the exhaustive s o l u t i o n of subproblems For r e a l molecules r e a c t i o n d i c t i o n a r i e s are i n f i n i t e . A p p l i c a t i o n of ones chemists i n t i u t i o n suggests that only a f i n i t e part of a r e a c t i o n d i c t i o n a r y i s chemi c a l l y i n t e r e s t i n g however. While there i s an i n f i n i t e number of s t r u c t u r e s that can be involved as i n t e r mediates i n the conversion 1
only a small number are r e a l i s t i c i n nature when one has t h i s p a r t i c u l a r end i n mind. One problem of im mediate concern i s that of automating the making f i n i t e of the r e a c t i o n d i c t i o n a r y f o r f u n c t i o n a l i t i e s such as ketones that are destined f o r annulation. I n t e r a c t i o n of the computer w i t h the chemist i s c l e a r l y necessary i n t h i s respect. A r e l a t e d c o n d i t i o n deals w i t h the very s t a t e na ture of t h i s approach. The problem RCH=CH-COR—£-> RCH=CH-CHOHR i s not properly viewed as (CH=CH, CO) ^ (CH=CH, CHOH) since t h i s e n t a i l s the i n c o r r e c t assump t i o n that one i s d e a l i n g w i t h an i s o l a t e d double bond and ketone. The i n t e r s e c t i o n of the two r e a c t i o n d i c t i o n a r i e s would i n c o r r e c t l y have "no r e a c t i o n " f o r the reagent (CH )2CuLi. One thus must t r e a t i n t e r a c t i n g f u n c t i o n a l groups as l a r g e r e n t i t i e s . This i s an acceptable p r i c e t o pay except f o r two consequences. Reaction d i c t i o n a r i e s of aggregate f u n c t i o n a l groups can be very large and t h e i r generation by hand i s a tedious and time consuming a f f a i r . I t appears worth9
3
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
WHITLOCK
Organic
Chemist's
View
of Formal
START:
[RCHOEE RCO COOH]
TARGET:
[RCHOEE RCO RCHOH]
SEQUENCE:
Language
79
( G l y c o l , EVE, R L i , NaBH^, Ac 0, H 0, EVE, 2
3
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
OH) START:
[CH OH RCO COOH]
TARGET:
[CH^OH RCO R COH]
2
2
SEQUENCE: (GLYCOL, R L i RMgX H 0) 3
START:
[CH OH CHgOAc COOMe CHgOEE]
TARGET:
[CHgOAc CHgOH COOMe CHgBr]
2
SEQUENCE: (Cr0 /py H 0 TsCl NaBr OH EVE NaBH^ ACgO 3
CH N 2
3
2
H 0) 3
START:
[CH OH CH OAc COOMe CHgOEE]
TARGET:
[CH OH CH OAc COOMe CHgBr]
2
2
2
2
SEQUENCE: (Cr0 /py HgO TsCl NaBr NaBH^) 3
Figure 5. Functional is the ethoxyethyl
Group Switching problems solved as in Ref. 10. ether of a secondary alcohol; EVE is ethylvinyl
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
RCHOEE ether.
80
COMPUTER-ASSISTED ORGANIC SYNTHESIS
while t o solve t h i s reaction dictionary generation pro blem by a c o m b i n a t i o n o f c h e m i s t and computer w i t h t h e c h e m i s t e x e r c i s i n g h i s judgement i n p a r i n g t h e growth of t h e r e a c t i o n d i c t i o n a r y and making r a t h e r d e l i c a t e v a l u e judgements on e x a c t l y what r e a c t i o n i s e x p e c t e d of some a g g r e g a t e f u n c t i o n a l group under some s e t o f reaction conditions.(11)
Downloaded by CORNELL UNIV on August 27, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
Literature Cited
Parsing, Prentice
(1) McCarthy, J. Abrahams, P. W., Edwards, D. J., H a r t , T. P., Levin, M. I., "LISP 1.5 Programmer's Mannual", MIT P r e s s (1965). (2) H o p c r o f t , J. Ε., and U l l m a n , J. D., "Formal Languages and Their Relation t o Automata", A d d i s o n -Wesley, R e a d i n g , MA, 1969. (3) G i n s b u r g , S., "The M a t h e m a t i c a l Theory o f C o n t e x t F r e e Languages", M c G r a w - H i l l , New Y o r k , 1966. (4) K a i n , R. Υ., "Automata Theory: Machines and Languages", M c G r a w - H i l l , New Y o r k , 1972. (5) Aho, Α. V., and U l l m a n , J. D., "The Theory o f Translation and Compiling", two volumes, Hall, Englewood Cliffs, N.J., 1973. (6) Chomsky, Ν., Handbook o f Math. P s y c h . (19 ), 2 W i l e y , New Y o r k , pp. 323-418. (7) Hennie, F. C., "Finite-State Models for Logical M a c h i n e s " W i l e y , New Y o r k , 1968. (8) B l o w e r , P., Ph.D. Thesis, University o f W i s c o n s i n -Madison, 1975. (9) Salomaa, Α., "Formal Languages", Academic P r e s s , New York, 1973. (10) T h i s s u b j e c t is d i s c u s s e d a t l e n g h t in, W h i t l o c k , H. W., Jr., J. Am. Chem. Soc. ( 1 9 7 6 ) , 98, 0000. (11) Support o f this work by t h e National S c i e n c e F o u n d a t i o n is gratefully acknowledged.
Wipke and Howe; Computer-Assisted Organic Synthesis ACS Symposium Series; American Chemical Society: Washington, DC, 1977.