8 Comparison of the Hansch and Free-Wilson Approaches to Structure-Activity Correlation
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
PAUL N. CRAIG Craig Chemical Consulting Services, Inc., Ambler, Pa. 19002
The basic principles on which the Hansch multiple parameter method for structure-activity correlation depends are described. These are compared with the basic features of the Free-Wilson method for assigning additivity constants to structural features of related compounds. An example is given for which the two methods of analysis have led to similar structure-activity relationships. Factors which determine the particular method to use in a new situation are discussed. The Free-Wilson method is presented in considerable operational detail with special emphasis on the detection and avoidance of situations which lead to singularity problems in solution of the matrix. Favorable analyses, which result in additivity constants that can be correlated with known physical constants, may lead to predictions for new compounds not covered in the original matrix.
/
T h e t w o m e t h o d s of s t r u c t u r e - a c t i v i t y c o r r e l a t i o n w h i c h h a v e r e c e i v e d ,
the most a p p l i c a t i o n i n t h e past d e c a d e are t h e H a n s c h m u l t i p l e p a r a m e t e r m e t h o d , or the so-called e x t r a t h e r m o d y n a m i c a p p r o a c h , a n d the F r e e - W i l s o n , o r a d d i t i v e m o d e l . T h e basic differences a n d similarities of these m e t h o d s are discussed i n this presentation. The Hansch Multiple
Parameter Method
S i n c e this entire b o o k has b e e n s t r u c t u r e d a r o u n d this m e t h o d , i t w i l l not b e discussed i n great d e t a i l , b u t t h e h i g h l i g h t s of t h e m e t h o d w i l l b e s u m m a r i z e d . T h e reader is r e f e r r e d to a n excellent r e v i e w b y H a n s c h ( 1 ), w h i c h b o t h describes t h e m e t h o d a n d places i t i n a p r o p e r h i s t o r i c a l perspective. 115 In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
116
BIOLOGICAL CORRELATIONS
T H E HANSCH APPROACH
T h e m u l t i p l e p a r a m e t e r a p p r o a c h tests for s i m p l e m a t h e m a t i c a l e q u a tions w h i c h c a n relate the b i o l o g i c a l activities of a series of
closely
r e l a t e d c o m p o u n d s to one or m o r e p h y s i c a l parameters, w h i c h m a y
be
m e a s u r e d or c a l c u l a t e d for these c o m p o u n d s .
be
T h e parameters m a y
u s e d s i n g l y or together, or i n c o m b i n a t i o n as l i n e a r a n d s q u a r e d terms w h i c h c a n s h o w p a r a b o l i c relationships. M a n y possible c o m b i n a t i o n s of parameters c a n be c o n s i d e r e d , a n d m u l t i p l e regression analysis is e m p l o y e d to o b t a i n statistical parameters for a l l c o m b i n a t i o n s of parameters s t u d i e d . T h e b i o l o g i c a l d a t a are a s s u m e d to be m o r e v a r i a b l e a n d less Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
a c c u r a t e l y d e t e r m i n e d t h a n the p h y s i c a l parameters. H e n c e the b i o l o g i c a l d a t a are assigned the role of the d e p e n d e n t v a r i a b l e , a n d the p h y s i c a l parameters are c o n s i d e r e d as i n d e p e n d e n t v a r i a b l e s i n the regression. T h e statistical parameters w h i c h result f r o m the regression
enable
one to reject those r e l a t i o n s h i p s w h i c h are not s t a t i s t i c a l l y significant a n d to choose f r o m those equations, w h i c h d o pass s t a n d a r d statistical tests for significance, the ones w h i c h best e x p l a i n the o b s e r v e d b i o l o g i c a l data. O f course, c o m m o n sense m u s t still p l a y a role i n the e v a l u a t i o n of m e a n i n g f u l equations, b u t the s t a t i s t i c a l parameters c a n b e p o w e r f u l guides, e s p e c i a l l y i n rejecting or q u e s t i o n i n g the v a l i d i t y of r e l a t i o n s h i p s ex pressed b y the m a t h e m a t i c a l equations.
T y p i c a l equations w h i c h often
are f o u n d to relate b i o l o g i c a l activities w i t h p h y s i c a l parameters i n c l u d e :
L o g ί = a + b%
L o g ^ = a + 6x +
(1)
L o g ^ = a + 6x — c x
Log ί T h e s y m b o l s π, σ, a n d E
= a + 6x s
(2)
CJ
2
cx
2
+ άσ
(3)
+ da + eE.
(4)
refer to the substituent constants for p a r t i t i o n ,
p o l a r , a n d steric factors ( J ). I n p r a c t i c e , w h a t is sought is u s u a l l y the simplest e q u a t i o n w h i c h is not i m p r o v e d b y a d d i t i o n of f u r t h e r terms. B y " i m p r o v e m e n t " is m e a n t a statistically significant r e d u c t i o n i n the o v e r a l l v a r i a n c e . W h e n s u c h a n e q u a t i o n is o b t a i n e d , s u b s t i t u t i o n of t h e p h y s i c a l parameters for
sub
stituents not yet s t u d i e d c a n be m a d e , a n d the e q u a t i o n leads to a p r e d i c t i o n of the b i o l o g i c a l a c t i v i t y of the u n p r e p a r e d
compound.
A d d i t i o n a l structure—activity i n f o r m a t i o n c a n be o b t a i n e d w h e n a p a r a b o l i c r e l a t i o n s h i p i n the p a r t i t i o n t e r m is o b s e r v e d , p r o v i d e d the sign
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
8.
Comparison
CRAIG
of the coefficient
of Hansch
and Free-Wilson
Approaches
f o r the s q u a r e d t e r m is negative.
117
A n optimal value
for the p a r t i t i o n factor c a n b e o b t a i n e d b y d i f f e r e n t i a t i n g t h e e q u a t i o n w i t h respect to π o r P; this results i n t h e o p t i m a l F v a l u e ( l o g F ° ) . T h i s c a n b e a v a l u a b l e g u i d e i n the d e s i g n of n e w molecules w h i c h m a y differ c o n s i d e r a b l y
i n structure f r o m those
s t u d i e d i n t h e regression
analysis. I f a p o s i t i v e coefficient is o b t a i n e d f o r the square of p a r t i t i o n t e r m , t h e e q u a t i o n m u s t be rejected as it i m p l i e s that a n y c h a n g e i n p a r t i t i o n i n g w i l l result i n a n i n c r e a s e d b i o l o g i c a l a c t i v i t y ; s u c h a m i n i m u m has never b e e n e n c o u n t e r e d .
O f course this a p p l i e s to t h e use of l o g
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
1 / C as the m e t h o d f o r q u a n t i t a t i n g the b i o l o g i c a l activities of the c o m p o u n d s u n d e r study. T h e most c o m m o n error i n a p p l i c a t i o n of this m e t h o d lies i n a l a c k of a p p r e c i a t i o n of the m i n i m u m statistical r e q u i r e m e n t s i n v o l v e d . T h u s , one needs to h a v e a b o u t five w e l l - c h o s e n c o m p o u n d s
for every variable
t e r m i n a H a n s c h analysis i n order to feel confident a b o u t the results. F o r e x a m p l e , a n e q u a t i o n s u c h as E q u a t i o n 2 a b o v e s h o u l d b e d e r i v e d f r o m 10 or m o r e c o m p o u n d s , a n d o n e s u c h as E q u a t i o n 3, f r o m 15 or m o r e examples.
A smaller n u m b e r of examples p e r t e r m m a y l e a d to u s e f u l
results, b u t o n e cannot often s u p p o r t these results b y statistics. A f r e q u e n t abuse is seen w h e n a large n u m b e r of v a r i a b l e terms are u s e d i n a c o m p l e x e q u a t i o n ( f o u r o r m o r e t e r m s ) w h i c h w a s d e r i v e d f r o m o n l y 10 o r 12 examples.
T h e statistician w o u l d prefer to h a v e 15 to 20 m o r e
com
p o u n d s t h a n the degrees of f r e e d o m i n t h e r e s u l t i n g e q u a t i o n ; n o t often is this l u x u r y met. The Free-Wilson Additivity
Model
U n l i k e the H a n s c h a p p r o a c h , i n this m o d e l no assumptions are m a d e c o n c e r n i n g p h y s i c a l parameters w h i c h m a y p l a y a role i n d e t e r m i n i n g the b i o l o g i c a l a c t i v i t y . I n s t e a d , a series of de novo substituent constants is o b t a i n e d u s i n g o n l y t h e e x p e r i m e n t a l l y o b t a i n e d b i o l o g i c a l test d a t a a n d the f o l l o w i n g b a s i c a s s u m p t i o n : e v e r y t i m e a p a r t i c u l a r substituent g r o u p appears at t h e same p l a c e i n t h e m o l e c u l e , i t is a s s u m e d that i t w i l l p l a y a constant role t o w a r d s d e t e r m i n i n g t h e o v e r - a l l b i o l o g i c a l a c t i v i t y of t h e m o l e c u l e (2).
It m a y c o n t r i b u t e to, or detract f r o m , the o v e r - a l l
b i o l o g i c a l a c t i v i t y , b u t i t m u s t a l w a y s p l a y the same role. T h i s basic a s s u m p t i o n is c h e c k e d b y means of t h e statistical p a r a m eters w h i c h result f r o m
s o l u t i o n of the m a t r i x , w h i c h expresses
a s s u m p t i o n stated a b o v e i n t h e f o l l o w i n g e q u a t i o n f o r e a c h Biological A c t i v i t y = μ + Σ
GiXi
w h e r e μ is t h e average b i o l o g i c a l a c t i v i t y , a n d GiX
t
c o n t r i b u t i o n f o r the i
t h
g r o u p at t h e i
t h
the
compound:
represents the a c t i v i t y
position. I n structuring the matrix,
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
118
BIOLOGICAL CORRELATIONS
T H E HANSCH APPROACH
X becomes 0 or 1, i n d i c a t i n g t h e absence or presence
of a p a r t i c u l a r
g r o u p at p o s i t i o n X . T h e m a t r i x thus represents a series of equations i n m u l t i p l e u n k n o w n s , one e q u a t i o n for each c o m p o u n d .
Its s o l u t i o n gives
the values f o r the de novo substituent constants for e v e r y substituent at each position ( = G / X ; ) .
( A m o r e c o m p l e t e discussion is g i v e n b e l o w . )
If t h e statistical parameters o b t a i n e d u p o n s o l u t i o n of t h e m a t r i x i n d i c a t e that t h e a d d i t i v i t y a s s u m p t i o n is v a l i d , t h e de novo constants c a n t h e n be u s e d to p r e d i c t t h e a c t i v i t y of ( a ) those c o m p o u n d s
used i n
d e r i v a t i o n of the constants a n d ( b ) a l l possible c o m b i n a t i o n s of t h e v a r Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
ious groups at e a c h p o s i t i o n . T h i s is not m u c h of a s a v i n g w h e n one has o n l y t w o or three positions of a m o l e c u l e w h i c h c a n b e s u b s t i t u t e d , b u t i n m o r e c o m p l e x situations this c a n be a p o w e r f u l tool. A n e x a m p l e is g i v e n b e l o w , w h e r e six different positions of the p h e n a n t h r e n e r i n g w e r e substituted w i t h
three, three, six, three, six, a n d three
substituents,
respectively: HO—CH—CH —Β 2
Β = 3 groups R i = 3 groups R2 = 3 groups T h i s represents 3 χ 3 χ 3 χ 6 χ 6 χ 3
R3 = 6 groups R = 6 groups R = 3 groups 6
7
= 2916 possible c o m p o u n d s .
A
g o o d F r e e - W i l s o n analysis w a s o b t a i n e d f r o m o n l y 42 of t h e possible analogs; t h e p r e p a r a t i o n of these 42 c o m p o u n d s
enables one to p r e d i c t
w i t h a f a i r assurance t h e a p p r o x i m a t e a n t i m a l a r i a l a c t i v i t y to b e expected for almost 2900 u n p r e p a r e d analogs. T h e m i n i m u m n u m b e r of c o m p o u n d s
r e q u i r e d for a F r e e - W i l s o n
t y p e of analysis w i l l v a r y d e p e n d i n g u p o n t h e n u m b e r of positions s u b s t i t u t e d a n d t h e n u m b e r of substituents at each p o s i t i o n .
T h e formula
for t h e absolute m i n i m u m r e q u i r e d to p e r m i t a s o l u t i o n of t h e c o m p l e x set of equations i n m u l t i p l e u n k n o w n s is Ν = +
1 +
( A — 1) -\- (Β — 1 )
( C — 1) -f- . . . , w h e r e A, B, C are t h e n u m b e r of substituents at e a c h
position.
T h i s m i n i m u m s h o u l d b e exceeded b y 1 0 - 2 0 c o m p o u n d s a l
t h o u g h u s e f u l results c a n sometimes result f r o m as f e w as five c o m p o u n d s i n excess of the m i n i m u m . T h e f u n d a m e n t a l p o i n t w h i c h differentiates b e t w e e n t h e t w o m e t h ods is t h e f o l l o w i n g : t h e H a n s c h m e t h o d seeks f o r correlations b e t w e e n
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
8.
Comparison
CRAIG
of Hansch
and Free-Wilson
119
Approaches
v a r i a b l e b i o l o g i c a l activities a n d v a r i a b l e p h y s i c a l parameters w h e r e a s the F r e e - W i l s o n m e t h o d uses o n l y the b i o l o g i c a l activities as v a r i a b l e terms, a l o n g w i t h exact i n f o r m a t i o n as to the presence or absence of e a c h substituent g r o u p .
T h e r e f o r e , the de novo substituent constants w h i c h
result e m b o d y a l l factors, k n o w n or u n k n o w n , t h a t p l a y roles i n determ i n i n g the b i o l o g i c a l a c t i v i t y of the p a r t i c u l a r c o m p o u n d s u n d e r s t u d y . It is o b v i o u s that there are cases w h e r e groups w i l l interact w i t h e a c h other a n d h e n c e cannot h a v e o n l y a d d i t i v e effects. (3)
Fujita and B a n
h a v e r e p o r t e d a successful a t t e m p t to i n c l u d e possible interactions
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
i n a F r e e - W i l s o n t y p e of analysis. I n a s t u d y of a series of substrates for d o p a m i n e - / ? - h y d o x y l a s e , a series of d o p a m i n e analogs w a s s t u d i e d . I n a d d i t i o n to the u s u a l F r e e - W i l s o n m a t r i x a t e r m was a d d e d to a l l o w for the i n t e r a c t i o n w h i c h m i g h t b e possible f r o m h a v i n g t w o h y d r o x y l groups p l a c e d ortho to e a c h other or f r o m a h y d r o x y l g r o u p ortho to a m e t h o x y l g r o u p . T h e s e terms w e r e a d d e d to t h e r e g u l a r m a t r i x as a h y p o t h e t i c a l n e w p o s i t i o n ; the statistical parameters w e r e t h e n c o m p a r e d w i t h those for the c o n v e n t i o n a l r u n . T h e a d d i t i o n of t h e t e r m expressing the o r t h o r e l a t i o n s h i p of the m e t h o x y l a n d h y d r o x y l groups d i d give a significant i m p r o v e m e n t to the c o r r e l a t i o n w h e r e a s the t e r m expressing the ortho r e l a t i o n s h i p of t w o h y d r o x y l groups d i d not i m p r o v e the r e g u l a r c o r r e l a t i o n . T h e v a l u e for the i n t e r a c t i o n t e r m h a d a n e g a t i v e coefficient, s h o w i n g that this i n t e r a c t i o n r e s u l t e d i n a r e d u c t i o n of t h e b i o l o g i c a l activity. Overlaps Between the Two Methods C a m m a r a t a c o m p a r e d the F r e e - W i l s o n constants d e r i v e d for a series of t e t r a c y c l i n e analogs w i t h some p h y s i c a l constants, a n d f o u n d a r e l a t i o n s h i p w h i c h i n v o l v e d t w o parameters ( 4 ). I n as yet u n p u b l i s h e d w o r k on antimalarial compounds exist b e t w e e n
I h a v e f o u n d g o o d l i n e a r r e l a t i o n s h i p s to
c e r t a i n F r e e - W i l s o n substituent constants
(S.C. )
and
H a n s e n ' s " p i " or H a m m e t t ' s " s i g m a " constants for the same substituents. T h e s e r e l a t i o n s h i p s are s h o w n i n T a b l e I.
Table I. Group CF Br CI H F OCH
Relationships among Parameters
R* S.C. 0.332 0.223 0.0688 -0.257 -0.265
3
3
-
Pi 1.16 0.86 0.71 0 0.14 -0.02
Sigma
para)
0.54 0.23 0.23 0 0.06 -0.27
S.C. 0.476 0.388 -0.118 -0.431 -0.159 -0.510
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
120
BIOLOGICAL CORRELATIONS
T H E HANSCH APPROACH
|+Si +Pi|
- r - 1.0 Sigma
CF3SO2
_.75· Νθ
SQ NH 2
2
• CN
2
• CH S0 3
CF
SF
5
3
2
^.50 CONH
CH3CO
2
\COOCHj OCF3
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
COOH
-2.0
-1.6
-1.2
L CI
.4
-.4
-.8
I
Br
1.2
CH3CONH
1.6
SCH,
.-.25
CH
t- Butyl
3
OH J--.50
NMe
2
+ -75
-Si -Pi
Journal of Medicinal Chemistry
Figure 1. Two-dimensional plot of pi vs. sigma constants for aromatic substituents. The constants for the particular groups listed in Table I lie essentially on a straight line. T h e substituent constants at R a n d R correlate q u i t e w e l l w i t h b o t h 3
6
the p i a n d s i g m a ( p a r a ) constants as s h o w n b y c o r r e l a t i o n coefficients o f f r o m 0.89 t o 0.99. I n h i n d s i g h t , o n e w o u l d expect that t h e H a n s c h m e t h o d s h o u l d g i v e g o o d results u s i n g either p i o r s i g m a constants as parameters.
A c t u a l l y , H a n s c h a n d I h a d p r e v i o u s l y o b t a i n e d s u c h cor
relations f r o m t h e same set o f d a t a , a n d these results represent b e t w e e n t h e t w o different methods.
agreement
However, w e were unable to decide
b e t w e e n p a r t i t i o n o r p o l a r factors as either o n e alone gave g o o d corre lations.
T h e p r o b l e m w a s f o u n d to reside i n t h e p a r t i c u l a r c h o i c e o f
substituent groups s t u d i e d ; a t w o - d i m e n s i o n a l p l o t o f p i vs. s i g m a c o n stants f o r a r o m a t i c substituents shows that t h e p a r t i c u l a r groups s t u d i e d (see a b o v e l i s t ) l i e essentially o n a straight l i n e . T h e r e f o r e t h e y cannot l e a d t o a c h o i c e b e t w e e n t h e m . T h i s t w o - d i m e n s i o n a l p l o t is r e p r o d u c e d i n F i g u r e 1; a m o r e c o m p l e t e discussion o f this " c o v a r i a n c e " p r o b l e m has been published ( 5 ) . W h e n w h i c h o f these methods t o a p p l y t o a g i v e n set of d a t a is considered, the choice w i l l usually depend u p o n the number a n d type
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
8.
CRAIG
Comparison
of Hansch
and Free-Wilson
121
Approaches
of analogs w h i c h h a v e b e e n p r e p a r e d . O n e s h o u l d h a v e d a t a for at least five m o r e c o m p o u n d s
t h a n t h e m i n i m u m r e q u i r e d for s o l u t i o n of
the
F r e e - W i l s o n m a t r i x . I n a d d i t i o n , one s h o u l d h a v e t w o or m o r e examples for e a c h g r o u p at e v e r y p o s i t i o n , i f possible, to increase the
confidence
w i t h w h i c h one c a n a p p l y the results. T o a p p l y the F r e e - W i l s o n m e t h o d , one m u s t have a series of closely r e l a t e d structures whereas the H a n s c h m e t h o d m a y be a p p l i e d to series of c o m p o u n d s w i t h q u i t e different structure, p r o v i d e d one has d a t a for one or m o r e p h y s i c a l parameters for a l l of the c o m p o u n d s
i n question.
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
W h e n one has o n l y 8 - 1 2 c o m p o u n d s , o n l y t h e H a n s c h m e t h o d m a y b e used. F r e e i n t e n d e d this a p p r o a c h to be u s e d as a g u i d e to p r o p e r e x p e r i m e n t a l design for the chemist, i n p l a n n i n g his c h o i c e of analogs
for
p r e p a r a t i o n . B y a d v a n c e p l a n n i n g one c a n a v o i d the s i t u a t i o n d e s c r i b e d a b o v e w h e r e one has o n l y one example of a substituent at a p a r t i c u l a r position.
Detailed Discussion of the Free-Wilson
Additivity
Model
I n 1964, F r e e a n d W i l s o n i n t r o d u c e d the m e t h o d for structure—act i v i t y c o r r e l a t i o n w h i c h is b a s e d u p o n the a s s u m p t i o n that e a c h substituent g r o u p i n a m o l e c u l e at a specific p o s i t i o n makes a constant a d d i t i v e c o n t r i b u t i o n t o w a r d s the o v e r a l l b i o l o g i c a l a c t i v i t y of the m o l e c u l e .
One
sets u p a series of equations, one p e r c o m p o u n d , w h i c h m a t h e m a t i c a l l y expresses this concept.
S o l u t i o n of these m u l t i p l e equations i n m u l t i p l e
u n k n o w n s is o b t a i n e d b y the m e t h o d of least squares, u s i n g c o m p u t e r t e c h n i q u e s , a n d the r e s u l t i n g statistical parameters a l l o w one to j u d g e w h e t h e r or not the o r i g i n a l a s s u m p t i o n of a d d i t i v i t y was v a l i d for the p a r t i c u l a r set of b i o l o g i c a l d a t a s t u d i e d . If a d d i t i v i t y is c o n f i r m e d , the m o d e l m a y be u s e d to p r e d i c t a p p r o x i m a t e b i o a c t i v i t i e s for those
com-
b i n a t i o n s of substituent groups w h i c h h a v e not b e e n p r e p a r e d . T h e ranges of substituent values at each p o s i t i o n h e l p i d e n t i f y those positions i n the m o l e c u l e w h i c h are most sensitive to change i n substituent; these are the positions w h e r e f a v o r a b l e groups m a y be expected to i n crease a c t i v i t y a p p r e c i a b l y .
I n a d d i t i o n , the r e l a t i v e activities of
substituent groups at a p a r t i c u l a r p o s i t i o n often suggest
the
relationships
( s u c h as w i t h p i a n d s i g m a a b o v e ) w h i c h c a n l e a d to extrapolations to suggest groups not o r i g i n a l l y s t u d i e d w h i c h m a y be w o r t h p r e p a r i n g . A d d i t i v i t y — T h e Basic Premise.
T h e c o n c e p t of a d d i t i v i t y of s u b -
stituent g r o u p c o n t r i b u t i o n s is m e r e l y a n expression of the m e d i c i n a l chemist's i n t u i t i o n w h i c h has so successfully l e d to the d e v e l o p m e n t
of
u s e f u l t h e r a p e u t i c agents i n the past 70 years. H o w e v e r , i t is s u c h a b a s i c
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
122
BIOLOGICAL
CORRELATIONS
T H E
HANSCH APPROACH
c o n c e p t that the c h e m i s t m u s t q u e s t i o n i t a n d m u s t consider its i m p l i cations. T h e r e are c e r t a i n l y cases w h e r e synergistic effects are seen ( 3 ) , a n d these are not c o m p a t i b l e w i t h the c o n c e p t of a d d i t i v i t y . M a n y successful studies h a v e c o n f i r m e d the g e n e r a l c o n c e p t , a n d the m e t h o d has a b u i l t - i n c h e c k i n t h a t the statistical parameters ( F v a l u e , R , s t a n d a r d d e v i a t i o n ) 2
a l l o w one to c h e c k the v a l i d i t y of t h e o r i g i n a l a s s u m p t i o n w i t h r e g a r d to the a c t u a l set of b i o l o g i c a l d a t a u n d e r c o n s i d e r a t i o n ( 6 ) .
Therefore,
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
one is not w i t h o u t g u i d e l i n e s i n u s i n g this t e c h n i q u e , a n d the process c a n b e h e l p f u l even i f a d d i t i v i t y is not c o n f i r m e d , as that i n itself c a n p r o v i d e u s e f u l i n f o r m a t i o n . A major c h e c k of the analysis is p r o v i d e d w h e n the activities of e a c h of the c o m p o u n d s are c a l c u l a t e d u s i n g the d e r i v e d substituent constants. T h e deviations b e t w e e n the o b s e r v e d a n d c a l c u l a t e d b i o a c t i v i t i e s c a n p o i n t out p a r t i c u l a r c o m p o u n d s w h i c h are p o o r l y c a l c u l a t e d . I n a n as yet u n p u b l i s h e d case the large d e v i a t i o n for one c o m p o u n d w a s u s e d to q u e s t i o n the structure, a n d i n d e e d , a rearrangement h a d occurred
d u r i n g its p r e p a r a t i o n w h i c h r e s u l t e d i n a n
i n c o r r e c t assignment of its structure.
T h e m o r e u s u a l cause of
large
d e v i a t i o n s is i n the b i o l o g i c a l test results as these are often difficult to quantitate reproducibly.
O f course, a large d e v i a t i o n m a y also
point
t o w a r d s other causes of n o n - a d d i t i v i t y s u c h as c h e l a t i o n effects, h y d r o g e n b o n d i n g , or s p e c i a l steric considerations. T h e g e n e r a l p r o b l e m of l a c k of accurate r e p r o d u c i b i l i t y of the b i o l o g i c a l test d a t a u s u a l l y results i n a n u n a v o i d a b l e s t a n d a r d d e v i a t i o n of a b o u t 0.20 to 0.25 l o g 1 / C units. T h u s " a d d i t i v i t y " r e a l l y means that s l i g h t v a r i a t i o n s of this m a g n i t u d e f r o m strict a d d i t i v i t y c o u l d not
be
detected. It s h o u l d be p o i n t e d out here that the H a n s c h m e t h o d , too, assumes that each substituent p l a y s a constant a n d a d d i t i v e r o l e f r o m
compound
to c o m p o u n d , a n d i t , too, is l i m i t e d b y the almost i r r e d u c i b l e s t a n d a r d d e v i a t i o n of about 0.2 or 0.25 l o g 1 / C units ( 8 ) .
I n a recently published
p a p e r , C a m m a r a t a treats the relationships a n d assumptions i n v o l v e d i n the H a n s c h a n d F r e e - W i l s o n methods f r o m a systematic p o i n t of v i e w ( 8 ). Procedure. T h e F r e e - W i l s o n m e t h o d is most u s e f u l w h e n three or m o r e positions of a m o l e c u l e are subjected to v a r i a t i o n ; a l t h o u g h
one
c a n a p p l y the m e t h o d to cases w h e r e o n l y t w o positions a r e i n v o l v e d , s i m p l e i n t u i t i o n c a n do a b o u t as w e l l i n s u c h s i m p l e cases. A s the c o m p l e x i t y of the s t r u c t u r a l changes increases, this m e t h o d becomes m o r e a n d m o r e v a l u a b l e , a n d i n v e r y c o m p l i c a t e d systems w i t h substituents at m a n y positions, i t c a n be extremely h e l p f u l . T h e f o l l o w i n g treatment w i l l i l l u s trate the m e t h o d i n g e n e r a l terms. T h e f o l l o w i n g generic f o r m u l a r e p r e -
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
8.
CRAIG
Comparison
of Hansch
and Free-Wilson
123
Approaches
sents a f a m i l y of c o m p o u n d s , a l l h a v i n g a c o m m o n structure X w h i c h is s u b s t i t u t e d at R i , R , a n d R : 2
3
R
:
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
L e t us assume that there are 15 c o m p o u n d s
i n this series; at R i
have f o u r different groups, at R , five groups, a n d at R , 2
five
3
we
groups.
C o m p a r a t i v e b i o l o g i c a l d a t a are a v a i l a b l e for a l l fifteen c o m p o u n d s .
The
b i o l o g i c a l d a t a m a y be expressed i n q u a n t a l units {e.g., 0,1,2,3,4), or i n activities r e l a t i n g t h e m to a s t a n d a r d agent, or most c o n v e n i e n t l y , i n terms of l o g 1 / C values w h e r e C is the m o l a r c o n c e n t r a t i o n of test c o m p o u n d w h i c h causes a s t a n d a r d effect. I n the case of w h o l e a n i m a l s t u d ies, C is u s u a l l y expressed as m o l e s / k g test a n i m a l . B y use of l o g
1/C
p o s i t i v e increases i n v a l u e m e a n increased b i o l o g i c a l a c t i v i t y . A struc t u r a l m a t r i x is a s s e m b l e d i n T a b l e I I . Table II.
Structural Matrix
Compound Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
R Log 1/C
R2
1
A
Β
C
D
Ε
F
G H
I
J
1 1 1 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
1 0 0 0 0 0 1 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 1 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0 0 0 0 0 1 1 0
1 1 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0 0 0 1 0 0 1 0
0 0 0 0 0 0 0 1 1 0 0 0 1 0 0
4
4
3
4
3
3
3
3
3
3
3
3
3
3
1.02 3.01 2.53 3.21 3.10 2.89 1.98 2.45 3.05 2.70 2.40 2.78 2.98 1.80 3.15
N u m b e r of examples
κ
L
M Ν
C o m p o u n d 1 has g r o u p A at p o s i t i o n R i , g r o u p Ε at p o s i t i o n R , 2
a n d g r o u p J at p o s i t i o n R . 3
of e a c h c o m p o u n d .
T h u s the m a t r i x defines the exact structure
T h e o n l y v a r i a b l e n u m b e r s are the log 1 / C values,
w h i c h are e x p e r i m e n t a l l y d e t e r m i n e d a n d w h o s e v a r i a b i l i t y is t h e reason w h y so m a n y c o m p o u n d s are r e q u i r e d i n excess of the t h e o r e t i c a l n u m b e r
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
124
BIOLOGICAL CORRELATIONS
T H E HANSCH A P P R O A C H
r e q u i r e d to a c h i e v e a solution. N o assumptions are m a d e about the p h y s i c a l parameters w h i c h m a y b e i n v o l v e d , a n d o n l y the exact s t r u c t u r a l d a t a a n d the b i o l o g i c a l test d a t a are u s e d to c a r r y o u t this analysis. T h i s m a t r i x serves as i n p u t to a c o m p u t e r for s o l u t i o n of the f o l l o w i n g e q u a t i o n b y least squares: Log 1/C =
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
+
G
+
H
+
[L
+
I
A + +
J
B +
+
K
C + +
L
D
+
+
M
E +
+
F
N
I n this e q u a t i o n , μ is the o v e r - a l l average l o g 1 / C v a l u e for t h e 15 c o m p o u n d s , a n d A t h r o u g h Ν are n u m e r i c a l coefficients, p o s i t i v e a n d n e g a tive, w h i c h represent t h e c o n t r i b u t i o n s of e a c h g r o u p ( A t h r o u g h Ν ) to the b i o l o g i c a l a c t i v i t y ; these constants
novo substituent constants,
(de
v a l i d o n l y for this set of d a t a ) represent t h e g r o u p c o n t r i b u t i o n s to t h e over-all biological activity. The
basic
a s s u m p t i o n of a d d i t i v i t y d e m a n d s
that the f o l l o w i n g
relationships h o l d f o r this set of d a t a : 4 A + 4 £ + 3 C + 4Z) = 0 ; or A SE
+
=
- B
3F + 3 C + 3 # + 3 / = 0 ; or E=
3 J + ZK
+
3 L + SM
- 3 / 4 C —D - F
+ 3N = 0 ; or J =
- G
- K
(at R i )
- H - L
- I
- M
(at R ) 2
- N
(for R ) 3
T h e s e are c a l l e d t h e restrictive equations; b e c a u s e of these r e l a t i o n ships b e t w e e n t h e v a r i a b l e s there are r e a l l y o n l y three u n k n o w n terms at R i a n d four e a c h at R a n d R . H e n c e there are 3 + 4 + 4 = 2
3
11 u n
k n o w n terms to b e solved, p l u s o n e m o r e t e r m for the o v e r - a l l average, μ.
T h e r e m u s t b e a m i n i m u m of 12 c o m p o u n d s
to p e r m i t a s o l u t i o n .
H o w e v e r , since t h e b i o l o g i c a l test results are the d e p e n d e n t v a r i a b l e s , a n d e a c h of t h e b i o l o g i c a l test results has a degree of u n c e r t a i n t y , or v a r i a b i l i t y , to its v a l u e , a n u m b e r of a d d i t i o n a l c o m p o u n d s
is r e q u i r e d
to g i v e a g o o d degree of assurance for the results, i.e., to increase the statistical significance of the d e r i v e d values for each t e r m (these are the so-called de novo substituent c o n s t a n t s ) .
I t is desirable to h a v e at
least five a n d , p r e f e r a b l y , 10 or m o r e c o m p o u n d s
i n excess of t h e m i n i
m u m r e q u i r e d f o r solution. O u r h y p o t h e t i c a l sample case has o n l y three m o r e c o m p o u n d s t h a n the 12 r e q u i r e d a n d so c o u l d n o t be expected to g i v e significant results. T h e use of a s p e c i a l m u l t i p l e regression analysis c o m p u t e r p r o g r a m w h i c h c a n i n c l u d e the r e s t r i c t i v e equations p r o p e r l y leads d i r e c t l y to the p r o p e r solution. T o use a s t a n d a r d m u l t i p l e regression analysis p r o g r a m one must i n c o r p o r a t e t h e restrictive c o n d i t i o n s as f o l l o w s . A =
R e c a l l i n g that
—B — 3 / 4 C — D, t h e e n t i r e c o l u m n f o r t h e A t e r m is r e m o v e d f r o m
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
8.
Comparison
CRAIG
of Hansch
and Free-Wilson
125
Approaches
the m a t r i x ; s i m i l a r l y , the c o l u m n s for Ε a n d / are r e m o v e d .
N o w w e are
left w i t h a m a t r i x w h i c h contains 3, 4, a n d 4 v a r i a b l e s at R i , R , a n d R , 2
3
respectively. W h e n i n t r o d u c i n g the expression for the c o m p o u n d s w h i c h c o n t a i n e d the terms A, E, or / , one introduces the e q u i v a l e n t values i n the same w a y as i l l u s t r a t e d i n T a b l e I I I . Table III.
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
Compound Number 1 2 3 4
Log 1/C 1.02 3.01 2.53 3.21
B
C
- 1 -0.75 - 1 -0.75 - 1 -0.75 0 1
Contracted Matrix D
F
G
H
I
K
L
M
N
- 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 -1 0 1 0 0 - 1 - 1 - 1 - 1 -1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0
S o l u t i o n of the o r i g i n a l m a t r i x b y a s t a n d a r d p r o g r a m w o u l d i n c o r r e c t statistical parameters i f the restrictive equations w e r e
give
entered
d i r e c t l y i n t o the m a t r i x . S o l u t i o n of the c o n t r a c t e d m a t r i x i n T a b l e I I I b y a s t a n d a r d p r o g r a m gives exactly the same results as are o b t a i n e d f r o m the o r i g i n a l m a t r i x b y the s p e c i a l p r o g r a m of F r e e a n d co-workers. H o w e v e r , the substituent constants for A , E , a n d / m u s t b e c a l c u l a t e d b y use of the restrictive equations w h e n the c o n t r a c t e d m a t r i x is used. T h e least squares s o l u t i o n gives the values of μ a n d the substituent constants A t h r o u g h N. I n a d d i t i o n , the F v a l u e for the o v e r - a l l regression is c a l c u l a t e d , as is the c o r r e l a t i o n coefficient, R.
The term R
(the v a r i
2
ance ) is expressed b y the ( m o d e l s u m of squares ) / ( t o t a l s u m of squares ) , a n d w h e n this v a l u e is 8 0 % or s o m e w h a t greater, the o r i g i n a l a s s u m p t i o n of a d d i t i v i t y of substituent g r o u p effects is c o n s i d e r e d to b e ported.
T h e r e m a i n i n g v a r i a b i l i t y of 1 0 - 2 0 %
sup
is the r e s i d u a l v a r i a t i o n
d u e to b i o l o g i c a l test v a r i a b i l i t y a n d is almost inescapable.
The F value
offers a n a d d i t i o n a l c h e c k of the significance of the results; this v a l u e c a n be c o m p a r e d w i t h the d e c i s i o n statistic " F " v a l u e f r o m tables to test the v a l i d i t y of u s i n g the d e r i v e d substituent constants to c a l c u l a t e the b i o l o g i c a l activities of the c o m p o u n d s u s e d i n t h e F r e e - W i l s o n analysis. If the F test fails, i t means that one cannot e x p l a i n the different b i o l o g i c a l activities of the c o m p o u n d s b y use of the a d d i t i v i t y e q u a t i o n ; i n s u c h a case one is as w e l l off u s i n g μ as the c a l c u l a t e d a c t i v i t y — a sad state of affairs i n d e e d . I n o u r h y p o t h e t i c a l e x a m p l e , the F v a l u e is the one for F , n - T h e s e 3
subscripts d e r i v e f r o m the analysis of v a r i a n c e for the regression, w h e r e 3-f4
4 =
l l degrees of f r e e d o m are a t t r i b u t a b l e to the regression,
a n d 15 — 1 =
+
14 represents the t o t a l degrees of f r e e d o m i n the m o d e l .
T h e n 14 — 11 =
3 degrees of f r e e d o m w h i c h are a t t r i b u t a b l e to the error
t e r m . If the o b s e r v e d F v a l u e exceed the t a b u l a r v a l u e of 8.76 ( F , 3
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
u
at
126
BIOLOGICAL CORRELATIONS
the 5 %
l e v e l ) or 26.13 ( F
3 ) 1 1
at the 1 %
T H E HANSCH A P P R O A C H
l e v e l ) , the d e r i v e d s o l u t i o n is
significant at or a b o v e those levels. I n p r e p a r i n g the m a t r i x for a F r e e - W i l s o n analysis, one m u s t
be
c a r e f u l to a v o i d situations w h i c h l e a d to singularities a n d hence cannot g i v e a u n i q u e solution.
The problem
to be
avoided
is i l l u s t r a t e d i n
T a b l e I V i n its simplest f o r m .
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
Table IV.
Situations Leading to
Compound Number
Log 1/C
A
1 2 3 4 5 6
1.2 1.4 1.6 0.9 0.6 2.1
1 0 0 0 0 0
0
Total
1
3
Singularities
— B
C
D
0 0 1 0 1 0
1 0 0 0 0 0
0
1 0 1 0 1
2
1
3
^ E
— F
1 1 0 1 0
0 0 0 1 0 1 2
C o m p o u n d 1 i n the m a t r i x represents the u n i q u e o c c u r r e n c e of t w o substituents i n one c o m p o u n d . T h i s is r e a d i l y seen as a v i o l a t i o n of the m e d i c i n a l chemist's c a r d i n a l r u l e of m a k i n g o n l y one change at a t i m e . It is no m o r e possible for the c o m p u t e r to assign values to A a n d D t h a n it w o u l d be for a chemist w h o has c h a n g e d t w o groups at once to ascribe the r e s u l t i n g change i n b i o l o g i c a l a c t i v i t y to either one of the groups w h e n h e has no other examples of the effects of either group. T o detect s u c h p r o b l e m s i n a d v a n c e of the c o m p u t e r r u n one s h o u l d a l w a y s p r e p a r e the f u l l structure m a t r i x , a n d each g r o u p w h i c h b u t once s h o u l d be c h e c k e d
appears
to be sure that the p a r t i c u l a r c o m p o u n d
b e a r i n g that substituent has no other substituent w h i c h occurs o n l y i n that
compound.
More
complex
singularities, w h i c h
have
m a t h e m a t i c a l basis, are s h o w n i n T a b l e s V a n d V I ; these are less often, b u t s h o u l d be c h e c k e d
the
same
encountered
for b y c a r e f u l s t u d y of the o r i g i n a l
matrix. I n the latter case, a l t h o u g h there are three examples of A a n d D , they o c c u r i n exactly the same set of c o m p o u n d s , a n d this is e q u i v a l e n t to t h e other cases w h i c h l e a d to singularities. T h i s explains w h y
H u d s o n , Bass, a n d P u r c e l l ( 7 )
obtained
two
different solutions to a m a t r i x b y m a k i n g different substitutions u s i n g the restrictive equations.
T h e i r presentation of the m a t r i x used is not i n
its most e x p a n d e d f o r m b u t is c o n t r a c t e d b y a p p l i c a t i o n of the restrict i v e equations; this makes it difficult to v i s u a l i z e the s i n g u l a r i t y p r o b lems.
W h e n t h e i r m a t r i x is r e w r i t t e n i n the c o m p l e t e f o r m , it becomes
r e a d i l y a p p a r e n t that t h e i r C o m p o u n d s 3 a n d 4 are i n v o l v e d i n a s i n g u -
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
8.
Comparison
CRAIG
of Hansch
Table V . Compound Number 1
α
2
A
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
3 4 5 6
α
and Free-Wilson
Complex
Log 1/C
A
1.2 0.7 0.9 0.8 1.1 0.3
1 1 0 0 0 0
Total
2
127
Approaches
Singularities
^1 B
C
D
1 1 0 0
0 0 0 0 1 1
1 1 0 0 0 0
2
2
2
0 0
?1 E
F
0 1 0 1
0 0 1 0 1 0
2
2
0 0
Cases leading to singularities.
l a r i t y of t h e t y p e i l l u s t r a t e d i n T a b l e I V ; t h e i r C o m p o u n d s 5 a n d 6 h a v e s i n g u l a r i t y p r o b l e m s of the t y p e i l l u s t r a t e d i n T a b l e I. R e m o v a l of these four c o m p o u n d s f r o m the m a t r i x leads to a m o r e s i m p l e m a t r i x w i t h n o s u c h p r o b l e m s , a n d i d e n t i c a l substituent constants result regardless
of
h o w the r e s t r i c t i v e equations are u s e d — e . g . , a u n i q u e s o l u t i o n is o b t a i n e d . Use
of the s p e c i a l p r o g r a m d e v e l o p e d
b y Free, w h i c h w i l l accept
the entire m a t r i x , avoids this p r o b l e m as the existence of a s i n g u l a r i t y prevents
any solution from being
obtained.
U n f o r t u n a t e l y , use of
a
s t a n d a r d regression p r o g r a m w i t h a p p l i c a t i o n of the restrictive c o n d i tions, as a l r e a d y discussed, c a n force a s o l u t i o n w h i c h is one of a f a m i l y of possible solutions; these are not u n i q u e , a n d cannot be r e l i e d u p o n . T h e t e c h n i q u e of s t u d y i n g the o r i g i n a l m a t r i x i n its entirety, to a v o i d singularities, w i l l p i n p o i n t these p r o b l e m s .
E l i m i n a t i o n of one o r m o r e
c o m p o u n d s f r o m the m a t r i x w i l l correct the p r o b l e m . If h y d r o g e n is c o n s i d e r e d as one of the substituents at R i , the result i n g v a l u e of μ ( t h e average b i o l o g i c a l a c t i v i t y ) m u s t b e c o n s i d e r e d to b e the b i o l o g i c a l a c t i v i t y of a h y p o t h e t i c a l c o m p o u n d w h e r e R i is a n o n entity. I n this case, the substituent constant for h y d r o g e n m u s t b e a d d e d for R i to o b t a i n t h e v a l u e of the ' u n s u b s t i t u t e d m o l e c u l e " w h e r e R i = Table V I . Compound Number 1* 2
A
3 5 6 α
Complex
Singularities
Ri Log
1/C
0.9 0.7 1.3 1.2 1.1 0.3 Total
i?2
A
Β
C
D
Ε
F
1 1 0 1 0 0 3
0 0 1 0 1 0 2
0 0 0 0 0 1 1
1 1 0 1 0 0 3
0 0 1 0 0 1 2
0 0 0 0 1 0 1
Cases leading to singularities.
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
H.
128
BIOLOGICAL CORRELATIONS
T H E HANSCH A P P R O A C H
T o a v o i d this, a n d to p l a c e the r e s u l t i n g substituent constants o n a scale r e l a t i v e to h y d r o g e n e q u a l to zero, C a m m a r a t a (4)
u s e d the t e c h
n i q u e of setting the substituent constant for h y d r o g e n as zero b y
not
i n c l u d i n g h y d r o g e n as one of t h e groups i n the matrix—e.g., i n T a b l e V , n e i t h e r A , B , C , n o r D , E , or F is h y d r o g e n . where R i and R
2
I n this case, the
compound
are h y d r o g e n w o u l d b e e n t e r e d as f o l l o w s : l o g 1 / C
=
000000. A n a d d i t i o n a l a d v a n t a g e of this a p p r o a c h is that n o w one n e e d n o t a d d t h e r e s t r i c t i v e e q u a t i o n since one v a r i a b l e ( H )
has b e e n
removed
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
f r o m the m a t r i x for e a c h p o s i t i o n . T h u s a s t a n d a r d m u l t i p l e regression p r o g r a m c a n n o w b e d i r e c t l y e m p l o y e d , t a k i n g care to a v o i d the s i n g u l a r i t y p r o b l e m , of course.
S h o u l d h y d r o g e n not o c c u r as a substituent
at one of the positions, one m a y a r b i t r a r i l y set another substituent g r o u p at zero as w a s i l l u s t r a t e d a b o v e for h y d r o g e n . Predictive Use of Free-Wilson Substituent Constants.
The newly
f o u n d substituent constants are l i n e d u p i n decreasing order of a c t i v i t y for e a c h p o s i t i o n . A p r e d i c t i o n m a y b e m a d e for a l l possible
compounds
a r i s i n g f r o m c h e m i c a l l y a l l o w a b l e c o m b i n a t i o n s of one g r o u p at each position.
I n these p r e d i c t i o n s i t m u s t be r e m e m b e r e d
that
constants
w h i c h w e r e o b t a i n e d for groups w h i c h o c c u r r e d o n l y once or t w i c e i n the m a t r i x u s u a l l y h a v e a l o w degree of significance.
H e r e one s h o u l d
b e g u i d e d b y reference to a Τ test v a l u e to establish the significance of the difference b e t w e e n the substituent constants at each p o s i t i o n
(6).
O f course, i t is possible to b u i l d the c a l c u l a t i o n s for a l l possible
com
b i n a t i o n s of the substituents at each p o s i t i o n i n t o a c o m p u t e r p r o g r a m , a n d this results i n a l i s t i n g of p r e d i c t e d b i o l o g i c a l activities for a l l possible compounds. S u c h p r e d i c t i o n s m u s t be u s e d w i t h c a u t i o n b u t c a n be g o o d guides w h e n the statistical parameters are f a v o r a b l e .
O n e m u s t a v o i d the p r e
d i c t i o n of extreme a c t i v i t y for u n u s u a l l y h i g h l y s u b s t i t u t e d analogs, w h i c h m i g h t be almost i m p o s s i b l e to p r e p a r e . F o r the m o d e l i l l u s t r a t e d i n T a b l e I I , 4 X 5 X 5 or 100 analogs are e x e m p l i f i e d ; f r o m a s t u d y of f r o m 20 to 25 c o m p o u n d s
w e s h o u l d have
a g o o d i d e a of t h e activities to be expected for the other 75 to 80 analogs. T h e h i g h e r the s t a n d a r d d e v i a t i o n for t h e analysis, the l o w e r the a c c u r a c y of the p r e d i c t e d values w i l l be.
F o r t y p i c a l b i o l o g i c a l tests, v a r i a t i o n of
f r o m one-half to t w i c e the o b s e r v e d a c t i v i t y is about
average.
E x t r a p o l a t i o n s b e y o n d the confines of the exact substituent groups s t u d i e d m a y b e m a d e if, after a s t u d y of the list of substituent g r o u p values at each p o s i t i o n , correlations w i t h p h y s i c a l constants are n o t i c e d . S u c h correlations m a y be o b s e r v e d b y g r a p h i c a l p r o c e d u r e s , or b y the use of regression analysis. T h e use of n e w groups not yet s t u d i e d c a n n o w be suggested b y reference to tables of p h y s i c a l constants.
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.
Extra-
8.
CRAIG
Comparison
of Hansch
and Free-Wilson
Approaches
129
p o l a t i o n b e y o n d t h e l i m i t s of t h e range of values f o r those g r o u p s a l r e a d y studied should b e carefully considered, a n d the resulting predictions s h o u l d n o t b e e x p e c t e d t o b e v e r y accurate.
H o w e v e r , b y this m e t h o d
a F r e e - W i l s o n analysis c a n l e a d to suggestions f o r n e w c o m p o u n d s i n a m a n n e r s i m i l a r t o use o f t h e H a n s c h m e t h o d .
Downloaded by UNIV OF CINCINNATI on February 18, 2015 | http://pubs.acs.org Publication Date: August 1, 1974 | doi: 10.1021/ba-1972-0114.ch008
Literature Cited 1. 2. 3. 4. 5. 6.
Hansch,C.,Accounts Chem. Res. (1969) 2, 232. Free, S. M., Jr., Wilson, J. W., J. Med. Chem. (1964) 7, 395. Fujita, T., Ban, T., J. Med. Chem. (1971) 14, 148. Cammarata, Α., Yau, S. J., J. Med. Chem. (1970) 13, 93. Craig, P. N., J. Med. Chem. (1971) 14, 680. Snedecor, G. W., "Statistical Methods," Iowa State University Press, Ames, Iowa, 1966. 7. Hudson, D. R., Bass, G. E., Purcell, W. P., J. Med. Chem. (1970) 13, 1184. 8. Cammarata, Α., J. Med. Chem. (1972) 15, 573. RECEIVED June 17, 1971. Work done at Smith Kline and French Laboratories, Philadelphia, Pa.
In Biological Correlations—The Hansch Approach; Van Valkenburg, W.; Advances in Chemistry; American Chemical Society: Washington, DC, 1974.