8 Computer-Assisted Studies of Chemical Structure and Olfactory Quality Using Pattern Recognition Techniques PETER C. JURS and CHERYL L. HAM
Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: March 20, 1981 | doi: 10.1021/bk-1981-0148.ch008
Department of Chemistry, Pennsylvania State University, University Park, PA 16802 WILLIAM E. BRÜGGER International Flavors and Fragrances, Inc., 1515 Highway 36, Union Beach, NJ 07735
The attempt to rationalize the connection between the molecular structures of organic compounds and their biological activities comprises the field of structure-activity relations (SAR) studies. Correlations between molecular structure and biological activity are important for the development of pharmacological agents, herbicides, pesticides, chemical communicants (olfactory and gustatory stimulants) and for the investigation of chemical and genetic toxicity. Practical importance attaches to these studies because the results can be used to predict the activity of untested compounds, e.g., design drugs. In addition SAR studies can direct the researcher's attention to molecular features that correlate highly with biological activity, thus confirming or suggesting mechanisms or further experiments. SAR studies have been used to some extent in the pharmaceutical and agricultural industries. The methods are beginning to be applied to the important problems of chemical toxicity and chemical mutagenesis and carcinogenesis. The superior way to develop predictive capability is to understand, at the molecular level, the mechanisms that lead to the biological activity of interest. Unfortunately, this knowledge is not yet available for most classes of biologically active compounds. Furthermore, the progress made through a living system by an active compound or its precursors is not usually known. Thus, two choices are presented: study the mechanisms for a very few compounds to develop fundamental information for those few compounds, or use empirical methods to study larger sets of compounds with correlative methods. The latter method comprises an SAR approach to the problem. Thus, one has available a set of compounds that have been tested in a standard bioassay and the observations that resulted from the tests. One can then search for correlations between the structures of the compounds tested and the biological observations reported. One is actually modelling the entire
0097-6156/81/0148-0143$05.00/0 © 1981 American Chemical Society
Moskowitz and Warren; Odor Quality and Chemical Structure ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: March 20, 1981 | doi: 10.1021/bk-1981-0148.ch008
144
ODOR QUALITY AND CHEMICAL STRUCTURE
process of uptake, t r a n s p o r t , d i s t r i b u t i o n , metabolism, c e l l penetration, receptor binding, e x c r e t i o n , e t c . The discovery and design of b i o l o g i c a l l y a c t i v e compounds (drug design) i s a f i e l d that has been subject to widespread and well-documented GL-&) changes i n the past decade. A host of new techniques and perspectives has evolved. While these techniques have been used l a r g e l y f o r the development of pharmac e u t i c a l s , they can also be a p p l i e d to the r a t i o n a l i z a t i o n of s t r u c t u r e - a c t i v i t y r e l a t i o n s among sets of t o x i c , mutagenic, o r carcinogenic compounds and to studies of o l f a c t o r y stimulants. Several approaches to SAR have been reported: the semie m p i r i c a l l i n e a r free enrgy (LFER) or extrathermodynamic model proposed by Hansch and coworkers (2,1Q,H), the a d d i t i v i t y or Free-Wilson model (12); quantum mechanically based models (13, 14) and p a t t e r n r e c o g n i t i o n methods (8,15). Reviews are c i t e d that describe the progress made using each of the approaches. S t r u c t u r e - A c t i v i t y Studies of O l f a c t o r y
Stimulants
Several theories r e l a t i n g molecular p r o p e r t i e s to perceived odor q u a l i t y have been advanced. Examples include the work o f Wright (16,17) who l i n k s odor q u a l i t y to molecular v i b r a t i o n s i n the f a r - i n f r a r e d , and o f Amoore (18) who l i n k s odor q u a l i t y to molecular shape, s i z e , and e l e c t r o n i c nature and who i n t r o duced the concept of primary c l a s s . Beets (19) has discussed odor q u a l i t y r e l a t i v e to molecular shape as represented by oriented p r o f i l e s , c h i r a l i t y , and f u n c t i o n a l groups. In a r e c e n t l y published book (20) he has expanded these d i s c u s s i o n s . Theimer and coworkers (21,22,^3) have discussed the importance of the molecular c r o s s - s e c t i o n a l areas, free energies o f des o r p t i o n , and c h i r a l i t y i n r e l a t i o n to odor. A d i s c u s s i o n o f musk odor q u a l i t y and molecular s t r u c t u r e has been presented by T e r a n i s h i (24). L a f f o r t and coworkers (25) have r e l a t e d odor q u a l i t y to four molecular p r o p e r t i e s derived from gas chromatographic r e t e n t i o n i n d i c e s measured on four s t a t i o n a r y phases. Focussing on a few molecular parameters a t a time does not allow p r e d i c t i o n s of odor q u a l i t y f o r l a r g e c o l l e c t i o n s of compounds. Studies have appeared i n which d i v e r s e sets of molecular parameters have been i n v e s t i g a t e d simultaneously using methods that can handle many parameters a t once, e.g., m u l t i p l e l i n e a r r e g r e s s i o n a n a l y s i s . Schiffman (26) used multidimensional s c a l i n g techniques to study c o r r e l a t i o n s between 25 physicochemical parameters and the o l f a c t o r y q u a l i t i e s o f 39 odorants. The physicochemical parameters used included molecular s i z e , weight, number of double bonds, f u n c t i o n a l groups, s o l u b i l i t y , and Raman s p e c t r a l bands. Another study (27) expanded the work to 19 d i f f e r e n t compounds and generated s i m i l a r conclusions. Dravneiks (28) used 14 s t r u c t u r a l features and m u l t i p l e l i n e a r regression a n a l y s i s to f i n d l i n e a r equations that f i t measured
Moskowitz and Warren; Odor Quality and Chemical Structure ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: March 20, 1981 | doi: 10.1021/bk-1981-0148.ch008
8.
JURS ET AL.
145
Computer-Assisted Studies
i n t e n s i t y , threshold, and odor q u a l i t y data. Dravneiks (29) used molecular weight, 38 a t t r i b u t e s derived from Wiswesser Line Notation representations o f molecular s t r u c t u r e s , and combinations of these parameters (118 i n d i c e s i n a l l ) to seek c o r r e l a t i o n s with odor i n t e n s i t i e s and vapor pressure o f o l f a c t o r y stimulants. Boelens (30) used m u l t i p l e l i n e a r r e gression a n a l y s i s o f physicochemical parameters to study a set of compounds with musk and b i t t e r almond odors. The 1-octanol/ water p a r t i t i o n c o e f f i c i e n t s , gas chromatographic r e t e n t i o n i n d i c e s , and molecular shape and volume parameters of the odorants (4 parameters t o t a l ) were used. He obtained equations f o r 16 b i t t e r almond compounds and f o r 16 musk compounds r e l a t i n g the four parameters to odor q u a l i t y with m u l t i p l e c o r r e l a t i o n c o e f f i c i e n t s o f 0.95 and 0.93. Greenberg (31) found strong c o r r e l a t i o n s between the 1-octanol/water p a r t i t i o n c o e f f i c i e n t of odorants and t h e i r i n t e n s i t i e s using m u l t i p l e l i n e a r r e g r e s s i o n a n a l y s i s . M c G i l l and Kowalski (32) used p a t t e r n r e c o g n i t i o n methods to i n v e s t i g a t e r e l a t i o n s h i p s between molecular s t r u c ture and odor q u a l i t y . The e l e c t r o n donor a b i l i t y and d i r e c t e d d i p o l e o f compounds were found to be r e l a t e d to odor q u a l i t y . Briigger and Jurs (33) used p a t t e r n r e c o g n i t i o n methods to i d e n t i f y 13 c a l c u l a t e d molecular s t r u c t u r e d e s c r i p t o r s that could c l a s s i f y odorants as musks or nonmusks. A data set o f 240 nonmusks and 60 musks was used to d e r i v e the c l a s s i f i e r . The c l a s s i f i e r was used to p r e d i c t the odor q u a l i t y o f nine unknown compounds, and a l l were c l a s s i f i e d c o r r e c t l y as musk odorants. Methodology f o r SAR Studies The fundamental premises involved i n applying p a t t e r n r e c o g n i t i o n methods to SAR s t u d i e s are as f o l l o w s . -
Molecular s t r u c t u r e and b i o l o g i c a l a c t i v i t y ( o l f a c t o r y q u a l i t y ) are r e l a t e d .
-
The s t r u c t u r e s o f compounds having a p a r t i c u l a r odor q u a l i t y and compounds of s i m i l a r s t r u c t u r a l c l a s s e s that do not can be adequately represented by a set o f molecular s t r u c t u r e d e s c r i p t o r s .
-
A r e l a t i o n can be discovered between the s t r u c t u r e and a c t i v i t y by applying s t a t i s t i c a l and p a t t e r n r e c o g n i t i o n methods to a set of tested compounds.
-
The r e l a t i o n can be extrapolated
to untested
compounds.
The heart of the approach i s f i n d i n g a set of adequate d e s c r i p tors f o r a p a r t i c u l a r data set c o n s i d e r a t i o n , that i s , a set of d e s c r i p t o r s f o r which a d i s c r i m i n a t i n g r e l a t i o n can be found. The s t r u c t u r e - a c t i v i t y s t u d i e s described here involve the ADAPT (automatic data a n a l y s i s using 2 . r e c o g n i t i o n techniques) computer software system. This system has been developed a t t e r n
Moskowitz and Warren; Odor Quality and Chemical Structure ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: March 20, 1981 | doi: 10.1021/bk-1981-0148.ch008
146
ODOR QUALITY AND
CHEMICAL STRUCTURE
over the period from 1974 to the present. I t i s f u l l y operat i o n a l and has been reported i n the s c i e n t i f i c l i t e r a t u r e (8, 34-36). Research performed on the ADAPT system has a l s o been reported i n a number of p u b l i c a t i o n s (33,37,42). The ADAPT system c u r r e n t l y c o n s i s t s of approximately s i x t y programs w r i t t e n i n the FORTRAN language and meant to be executed i n t e r a c t i v e l y on a minicomputer or a l a r g e r timesharing computer. Development at Penn State has been on a MODCOMP 11/25 16-bit minicomputer with 65,000 16-bit words of core memory. The system has been designed and implemented to provide the user with a l l the c a p a b i l i t i e s necessary to perform SAR studies on sets of up to s e v e r a l hundred compounds at a time. The fundamental steps involved i n performing an SAR study using t h i s system are shown i n Figure 1. The i n d i v i d u a l steps are as f o l l o w s : (a) I d e n t i f y , assemble, input, s t o r e , and describe a data set of s t r u c t u r e s f o r chemicals that have been tested f o r b i o l o gical activity. (b) Develop computer generated molecular d e s c r i p t o r s f o r each of the members of the data s e t . The d e s c r i p t o r s may be derived d i r e c t l y from the stored t o p o l o g i c a l representations of the s t r u c t u r e s , or they may r e q u i r e the development of three dimensional molecular models. (c) Using pattern r e c o g n i t i o n methods, develop c l a s s i f i e r s to d i s c r i m i n a t e between a c t i v e and i n a c t i v e compounds based on the sets of molecular d e s c r i p t o r s . (d) Test the p r e d i c t i v e a b i l i t y of these d i s c r i m i n a n t s on compounds of unknown a c t i v i t y . (e) S y s t e m a t i c a l l y reduce the set of molecular s t r u c t u r e d e s c r i p t o r s employed to the minimum set s u f f i c i e n t to r e t a i n d i s c r i m i n a t i o n between the a c t i v e and i n a c t i v e compounds and to r e t a i n high p r e d i c t i v e a b i l i t y . Entry of Molecular S t r u c t u r e s . The ADAPT system has as one of i t s components a l l the modules necessary to enter, modify, r e t r i e v e , and draw molecular s t r u c t u r e s of organic molecules. This p o r t i o n of ADAPT has been o p e r a t i o n a l f o r s e v e r a l years and has been employed i n s e v e r a l published s t u d i e s . The routines allow the convenient, i n t e r a c t i v e entry of s t r u c tures by sketching them on the screen of a graphics d i s p l a y t e r m i n a l . This can be done i n t h i r t y seconds to s e v e r a l minutes per compound, depending on s t r u c t u r a l complexity. No s p e c i a l techniques beyond those used i n sketching molecular s t r u c t u r e s on a blackboard are needed. Thus, s t r u c t u r e f i l e s on the order of hundreds of compounds can be entered i n t o ADAPT i n reasonable amounts of time. The s t r u c t u r e f i l e s are stored permanently on d i s c f i l e s f o r f u r t h e r processing by the other modules of ADAPT. Information saved f o r each compound includes a compressed connection t a b l e , r i n g information, a l i s t of a s s o c i a t e d numerical
Moskowitz and Warren; Odor Quality and Chemical Structure ACS Symposium Series; American Chemical Society: Washington, DC, 1981.
8.
147
Computer-Assisted Studies
JURS ET AL.
S5 O
cn
3
Downloaded by NATL UNIV OF SINGAPORE on May 5, 2018 | https://pubs.acs.org Publication Date: March 20, 1981 | doi: 10.1021/bk-1981-0148.ch008
o
. n
II -I ft.
§
.5
°o
.5 B
•5
pt.
CO
3
< W J P U U