Quantitative structure-activity models. Conditions for application and

Theory, Applications and its Relationship to Hansch Analysis ... The Application of the PHL Concept to QSAR: Detection of Data Deficiencies Cross Vali...
0 downloads 0 Views 601KB Size
Journal of Xedicinal Chemistry, 1Y70, J70/. 13, .Yo. 6 1185

T h e different segments are those portions of the general structure that' change betryeen any two compounds in the series. A number of diff ererit substituents appear a t each segment; thus, for each molecule, oiie ivill have a different linear equation for the activit'y. The a@tivity contributions of the substit'ueritsare theunlmo\\-ns tem of independent, linear, iiihomogerieous equations \vhich are set up for the series of compounds under consideration. -411arbitrary assumptior1 of the I.'ree-Wilsori model is t h a t the total contribution of the substituents of a given segment over the entire series is zero. I;or example, a t segment li oiie might have four substituents (e.(].,H, CH,, CUH,,C,H7) with the corresponding activity contributions Sa,St,, S,, arid S d and thus require that m

C

(Ais,

+

BiSh

+

Cis,

+

DiSd)

=

0

(2)

\\-here the summation index i runs over the T ) L conipounds in the series aiid the coefficients, A i , Ei, C i , arid D i , are either 1 or 0, depending on whether the corresponding substituent is present in the ith molecule. I.'rom this treatment oiie can see that the substituent contributions a t a segment are riot linearly independent,; one is algebraically dependent and caii be expressed as a linear combination of the others. This algebraic relationship is referred to as a s\.mnietry eyuat,i011.~ Equation 2 caii be solved for one of the substituent contributions, e.y.,

Thuy, for a cJmpound n-hich coritairis the substit'uerit with contribution Sd, the linear equation is w i t t e n in terms of Sa, Si], arid S,, instead of Sd, using ey 3. I n this ~ v a y the , assumption of symmetry is imorporated into the solution. Under these conditions, a series of m compounds u.ith segments arid a total of H rubstitueiits transforms tem of m equations nith ) I - p independent variables, or unkno\\-ns. To solve t'his system of equations, one must have ~n 2 7 7 - p . The solution yields the activity contributions of the 11 - p substituents explicitly treated. The symmetry equations can then be used to obtain the p remaining contributions. As Free arid Wilson sI10\\-,~ once a solution has been found for a system of equations, the substituents a t each segment can then be ranked according to their individual contributions and the values of the substituent activity contributioiis can be used to predict the :tctivities of untest'ed compounds. Statistical Analysis.-To determine the success or failure of any series of compounds to fit' this additive

fact, meet these requirements. First, the independent variables are fixed variates and the dependent variables are randomly produced. ( 5 ) R . L. hnderson a n d T . A . Bancroft, "Statistical Theory in Research," lIcGraTv-Hill Book Company, I n c . , Kern York, N. I-., 19.52, p 168.

I n the Free-Wilson model, it is obvious that the substituent groups are fixed for any series of compounds tested and, since biological responses are riot determined a priori to experimentation, they may be considered randomly produced. Second, for any fixed set of independent variables, the dependent variables associated with this set' are normally aiid independently distributed. If one \yere to measure repeatedly the biological response of any one compound (assuming identical experimental conditions), this set of responses would indeed be normally distributed \i-ith no one measurement aff ectirig any of the others (independence). If oiie uses an already averaged value for t'he biological response, this requirement is still met since the sample means of R normally distribut,ed population are also normally distribut'ed.6 Finally, for any set of independent variables, the variance of the dependent variables must' be the same. Since there is ari underlying normal population of biological responses for each set of substituents, the total population of the biological responses n-ill be normally distributed; therefore, the variances can be considered equal for all of t'he responses. Aft'er performing a regression analysis using the 1;ree-Wilson model, several st,atist'ics indicative of the "goodness" of fit are appropriate for consideration; among these are the mult'iple correlation coefficient, the overall F value for the test of coefficient significance, and the explained variance. The multiple correlation coefficient, R , gives an indication of the degree of correspondence between the experimentally observed biological responses arid those calculated \\-ith the proposed linear equation resulting from the regression analysis ( R = 1.0 indicat'es perfect correlation). This correlation coefficient is usually used in terms of its square, R 2 ,because of the similarity in formulas with other statistics. The mathematical formula for R2 is Z$?/Zy?, where = (calculated response - mean response), arid y = (observed response - mean response). As such, R2 is interpreted as the fraction of the sum of squares of the deviations of observed responses froni the mean reapoiises that is attributable to the regressio~i.~ The F value is the decision statistic of the F test of significance. The overall F test with this model is a test of the null hypothesis that all of the substituent coefficients (activity contributions) are equal to zero; in other \T-ords, the mean biological response would be as good an estimate of the actual response as the response calculated from the linear regression equation. Thus, this value, after tabular interpretation, indicates the significance of the substituent, contributions t,o the activity in a series of compounds. The basic assumpt.ion that validates the use of t'he F test is t'liat t,he dependent variables are normally and independently distributed, \\hich in fact holds true for the biological responses iii the Free-Wilson model. The formula for the F statistic is [ ~ ~ z / ( k - l ) ] / [ B C 1 2 / ( ~ ~ -1,k )where cl = (observed response - calculated response), k = t'otal number of variables (or unknowns) used in the regresuion, arid 7z = (6) R . R . Sokal a n d F. J. Rohlf, "Biometry. T h e Principles and Practice H. Freeman a n d Company, S a n of Statistics i n Biological Research," \i-. Francisco, Calif., 1969, p 130. (7) G . \\-. Snedecor and W.G . Cochran, "Statistical l I e t l i o d s , " 6th ed, T h e Iowa S t a t e University Press. Ames, I o u a . 1967, pp 3 8 5 3 8 7 , 4013-402.

tot:d iiuniber of data poiiits (compouiids) used iii tlic r ~ grcssiori. The corresponding level of significance for :in F st,wt,isticc a n be found in any table of j dist'ributioii values under ( X - 1) arid ( / / - k ) degrees of freedom.' I l i c csplairied vuriance ves the fractioii of tlicL vaibi;ttice of the biological re )oiihes \vliicli is :Ittributc>cl t'() t Ir (> 1ii ica r r el :I t io 11sI i ill of iose snbstiturJtit coiitribut'iou+. o r uiil;rio\vns. iiicludccl i t i t'liv :iii:tl! foimiul:i for c:ilculating this qu:iiititj. is 1 -- [W' ( / / - / < ) 1 [ Z y ? iri-1) j ( \ v l l c w d , l i . /lenicwtsor iiiit.i:tl matrix vo&icieiith :ire (Jxtiwnelysniall. l ' l i c coefficients of tliv matrix tci be solvcd i i i this c~s:iniplc (cross 1)iwluc.t ni;iti,ix) art' : r 7

+-

7 .

\I

I5

(

7.;

-1)

-I

-0 7.i -1J7.i

I

I I

7.i

-(I

I

7;

-0

I I

- 0 7.; -0 7.i - I) 7.7

1 "

see Irf

I -I !I.

Journal of Medicinal Chemistry, 1970, Vol. 13, N o . 6 1187

QUAXTITATIVE STRUCTURE-ACTIVITY MODELS

TABLE I1 FREE-WILSON D E S I G N FOR SELECTED CHLOROQUINE DERIVATIVES EVALUATED P . gaZZinaceuni-SussTITu.l.IoNaL VARIATION AT SEVEN RINGPOSITIONS

AGAINGT

Series I1

X, = H (A) or CK, (B) X I = H (C), CH, (D) Xj = NH(CH,),N(C,H,), (E), NHCH,CHOHCH,N( C2Hj)L (F), NHCH[CH,N(CHJJn (G).

NHCH,CH(+

OCH,)CH,CH,N(C,H,) (H).

NHCH,CH(-@C~)CH,CH,N(C,H,I,

x1

xz

A

C

E

1 1 -10.5 -10.5 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 -2.83 1 -2.83 1 -2.83 1 -2.83 -2.83

1

1

1 1 1

-x-

7

F

I

J

K

L

h

I

N

O

1 1 1 1 1

1

-1

-2

-;{

-2

-2

-2

1 1 -2 -4

-2

-2

1

-22 1

1 1 1 1 1

1 1 I 1

1 1 1

1

1 1 1 1 -2.83 a See ref 9.

H

SI P

1 1 1 1

input data for the series is given in Table I. This is a system of 14 equations arid 10 un1ino~v.n~ with 4 depen-

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

(I),

---so---R

S

U

V

1 1 1

1 -17 -2 -2 1 1 1 1

-1

1

1 1 1 1 1 1 1

1 1 1 1

s.

\V

Y

.\I >IETDIa

1 1

1.921 1.699 0.699 0,301 - 0.494 0.301 1.301 1.699 1.398 0.700 1.398

-0.438 -0.438 1 1 1 1

1 1

Log (0.1/

x 5

-0.438 -0.438 -0.438 1 1 1 -0.438 -0.438 -0.438 -0 438 -0.438 -0.438 -0 438 -0.438 -0 438 -0.438 -0.438

1 1 1 -21 1 1 1 1 1 1 1 1 1

-1

1.496 1 ,301

1 301 1 . 007 1.699 0.824 1.097

1 1 1 1 1 1

1 1

1.601 1.097 0.432 2.000 1.ti99

dent substituents (corresponding to 4 segments). The symmetry equations (eq 2 ) for the segments are

s, s*s-

SI I

S

Journal of Medicinal Chemzstry, 1870, Vol. 13, S o . 6

A N 1’lCUXVCLb.LS I‘b. STKCLTLHI: .IND ACI‘IVITY

indicators. It is also advisable, with any series of compounds, to check the stability of the system by changing dependent substituents a t various segments, solving the tem of equations again, and compar the t n o sets solut,ion values. With an unstable tem of equations, there is no unique set of solution coefficients; thus, the subst,it,uent contribut,ions are unreliable and no sound conclusion can be reached about the resulting

1189

connection bet\veen changes in structure and changes in activity. Acknowledgment.-The authors wmld like to express their gratitude to A h . Walter Lafferty of the University of Tennessee Jledical Units Biometric Computer Center for fruitful discussions during the early stages of this \vorli.

Structure-Activity Correlations for Anticonvulsant Drugs ERICJ. LIEN School of Pharmacy, University o j Southern California, University Park, Los Angelus, California 90007 Received April 10, 1970 The aiiticonvulsant activity of series of drugs in mice and in rats against elect,roshock and pentyleiietetraxoleinduced seizures has been found t o be highly correlated wit’h the log P values of the drugs, where P is the 1octanol-water partition coefficient. From the data on hand, linear dependence on log P is found for the antielectroshock test in mice and the pentylenetetrazole protection test in rats, where the slope of the regression line associat,ed with log P is about 0.6 f 0.2. Parabolic dependence on log P is found for the antielectroshock activit,y in rats with an optimum lipophilic character (log Po) of 1.75.

It was estimated that more than 20,000 compounds had been screened for anticonvulsant action in the last 10 years,’ but many of them were not active or had very low activity. The need for better anticonvulsants to cope with epileptic seizures is reflected by continuous publications in this field. Unfortunately, not only is the mechanism of anticonvulsant action unknown, but also, few guide lines are available to help medicinal chemists in searching for better and safer anticonvulsants. The “common denominator” of clinically useful anticonvulsants has been known for some time.2 However, no quantitative correlation of the relative potency of these drugs with the chemical structure has been satisfactory. Recently A n d r e w examined the anticonvulsant activity of a number of potent anticonvulsants and tried to correlate it with the atomic charges of the so-called “biological active center” obtained from 110 calculations and with the dipole moments of the drugs.4 No significant correlation was obtained. The H-bonding atoms, although common to all the drugs studied, were not proven responsible for variations in activity. I n view of the fact that the anticonvulsant activity was studied in vivo and that the availability of the drug a t the biophase and the receptor site must be considered before any meaningful structure-activity correlation can be obtained,5 the author wishes to show that the variation in the anticonvulsant activity of series of potent drugs in 4 different tests can be correlated satisfactorily with log P ( P = l-octanol-H?O partition coefficient). Methods The antisupramaximal-electroshock data in mice, the atomic charge and the dipole moments were taken from (1) R . K . Richards, Clin. Phurmacol. Ther., 10, 602 (1969). (2) T . C. Daniels a n d E. C. Jorgensen in “Textbook of Organic Medicinal a n d Pharmaceutical Chemistry,” C. 0. Wilson, Gisvold, a n d R . F. Doerge, Ed., 5 t h ed, Lippinoott Co., Philadelphia, Pa., 1966, p 403. (3) JV. C . Cutting, “Handbook of Pharmacology,” 4 t h e d , AppletonCentury-Crofts. S e w York, N. Y.,1969, p 669. (4) P. R . A n d r e w , J . .Wed. Chem., 12, 761 (1969). ( 5 ) E. J. Lien, J . A m e r . P h u r m . Cduc. 33, 368 (1969).

Andrew’ paper.4 The antielectroshock data in rats and in mice were from the work of Chen and Ensor.‘j The data of pentylenetetrazole protection were from the report of Swinyard.’ For the details of the biological tests the original articles should be consulted. The log P values of 4 compounds were experimentally determined by Hansch’s group and the others were calculated from the log P values of the parent molecules and the A constants of the substituents8-” (see Table I). The following log P of r values were used in the calculation of the log P values: A of oxazolidine-2,4-dione = roc0 A C H ~ C O N = (-1.14) (-0.79) = -1.93; RCHzCO = -0.55; T N H C O S H , = -1.01; Xl3r(aliphatic) = 0.60; T h h d a n t o l n = log P of 3-ethyl-5-phenylhydantoin - ( X E ~ T P ~ J = 1.53 - (1.00 1.77) = -1.24; Asuccinlmlde = log f ‘ of 2-ethyl-2-phenylglutarimide (TEt VII 9 1 ’ 6 cicloliexane) = 1.90 - (1.00 1.77) -1/6(2.51) = -1.29; x p l l = 1.77 (on the heterocyclic ring); T P ~ ,= 2.13 (for terminal subitituents); ~ 1 (on 1 S)= 0.56; (011 C) = 0.50. The equations correlating the antielectroshock and the antipentylenetetrazole activity in mice and rats with the physicochemical constants (see Table 11) were derived via the method of least squares using an IBAI 360/65 computer.

+

+

+

+

+

+

+

Results and Discussion The equations obtained from the regression analysis are summarized in Table 11. The results are not presented where no better correlation coefficient than 0.85 could be obtained. From eq 1-3 it is clear that neither the dipole moment nor the charge on the “biological activity center” (EHT, CKD0/2) can account for the variations in the anticonvulsant activity ( r < 0.4). (6) G. Chen a n d C. R . Ensor, Arch. S e u r o l . Psychiat., 63, 56 (1950). (i E. ) A . S n i n y a r d , J . Amer. P h a r m . A s s . , 38, 201 (1949). (8) T. F u j i t a , J. I n a s a , arid C. Hansch, J . Amer. Chem. Soc., 86, 5 l i 5 (1964). (9) J. Iwasa, T. Fujita, and C. Hansch. J. .\fed. Chem., 8 , 150 (1965). (10) C, Hansch, privhte communication. (11) C. Hansch. A . R. S t e n a r d . S. BI. Anderson, a n d D. Bentley, J . M e d . Chem., 11, 1 (1968).

~