ES Critical Review: Structure-activity relationships. Quantitative

Reggie S. Spaulding, Robert W. Talbot, and M. Judith Charles ... Dean C. Luehrs, James P. Hickey, Peter E. Nilsen, K. A. Godbole, and Tony N. Rogers...
0 downloads 0 Views 10MB Size
Structure-activity relationships Quantitative techniques for predicting the behavior of Chemicals in the ecosystem

L

1c

N a g a m a n y w endan Richard E.Speece Drexel University Phihaklphia, Pa. 19104 Quantitative Structure-Activity Relationships (QSARs) are used increasingly to screen and predict the toxicity and the fate of chemicals released into the environment. The impetus to use QSAR methods in this area has been the large number of synthetic chemicals introduced into the ecosystem via intensive agriculture and industrialization. Bioconcentration tests, for example, have been estimated to cost $6OOO$lO,OOO for each chemical, and acute toxicity tests cost $2000-$3000 for each test (1).Because about 1ooO-1600 new chemicals are evaluated each year, regulatory decisions involving such testing must have a firm technical basis and justification (2). A recent report of the National Research Council found that in only 20% of all cases can health hazard assessments be satisfactorily performed (3). This problem is compounded by the complex chain of processes undergone by the chemicals when they are released into the environment, multing in various degrees of transfor“96 Environ. Sci. Technol., Mi.22, No. 6. 1988

mations, accumulations, and exposures. Additional difficultiesare posed by the lack of fundamental methods for calculating bioconcentration factors, biodegradation rates, or LC50 values (Eso is a measure of the concentration of a chemical necessary to inhibit bioactivity by 50%). Because of the costly and time-consuming nature of environmental fate testing, QSARs have been effectively used to screen large classes of chemical compounds and flag those that appear to warrant more thorough testing. In discussing the preliminary screening and priority se!Ang in chemical testing, Koch and Nagel recognized that “fundamental connections and relationships do exist between molecular structures and biological activity and toxicity as well as the environmental fate of chemicals” (4). Koch and Nagel explain that, because chemical reactivity and biological activity are related to molecular structure and physicochemical properties, the interaction between environmental chemicals and biological as well as nonbiological target molecules depends on fundamental regularities of chemical reactions. The distribution and transformation processes, which are of major importance to the ecotoxicological

fate of chemicals, can be described a p proximately by using regularities such as Henry’s law. Thus it is not surprising that relationships have been recognized among hydrophobic-lipophilic, electronic, and steric properties, between qnantum mechanics-related parameters and toxicity, and between environmental fate parameters such as sorption and bioconcentration tendency (4). A screening tool such as QSAR conserves the available financial resources for exhaustive testing, yet provides a methodology that helps to ensure that questionable compounds are more thoroughly tested. From another perspective, there is the pressing need to extract the maximum amount of information from the exhaustive test results, and QSAR models are most a p propriate for designing and developing such data banks. The abundance of research work reported in the pharmaceutical and medicinal chemistry literature clearly indicates the acceptance and success of QSAR techniques in these areas. The rapid growth of QSAR in such applications may be ascribed to several causes: The evaluation of a large number of potentially toxic compounds could be based on a selected lead compound using different substituents. For ex-

0013936x18810922~806$01.5010 @ 1988 American Chemical Socieh.

ample, with 90common substituents, 729,000 derivatives are possible for just 3-position substitution on a benzene ring. The cost of testing derivatives is estimated by EPA to be $2500-4500 for acute dermal toxicity, $20,00025,000 for 14-day inhalation studies, $300,000-400.000 for two-year dietary studies, and $l,OOO,OOO1,500,000 for two-year inhalation studies (5). There is a lack of detailed knowledee about the mechanisms and the man-y stages and steps involved in the drug-receptor complex interaction, although this understanding is now growing. In addition, there is a lack of information about the nature of the transport process involved from the portal of entry to the receptor site. The. competitive nature of the drug industry demands quick and reliable methods for hastening the drug development process. There are natural litations on experimentation with living organisms in the laboratq. These m e principles apply both to the ecotoxicology of chemical compounds released into the environment and to the estimation of physicochemical properties. Thus these principles

have had a strong impact on the environmental community, which is now capitalizing on major advances in QSAR methodology. This paper traces the development of QSAR studies through various discip l i and summarhs its accomplishments. It is not our intention to review the large body of literature on successful QSAR applications, but to discuss the fundamentals and advantages of QSAR. Selected QSAR models from the fields of pharmacology, ecotoxicology, and physicochemical property estimation and their application in modeling the ecosystem are presented with a view to arouse interest in, and increase the use of, QSAR techniques in envim e n t a l applications. HiFtorical background The recognition that certain pruperties of a compound depend on its chemical structure dates back to ancient times. The poisonous character of lead acetate and lead nitrate was attributed to lead and not to the “acidic component” long before Arrhenius postulated his theory of ionization in the 1880s. The biological activity of salt was attributed to the acidic component rather than to the whole molecule (6). As early as 1878, Langley observed

the opposing actions of drugs and proposed that “there is some substance in the nerve ending with which the drug combined” (7). This laid the foundation for the now-famous receptor concept used to explain biological response to drugs and toxicants. Ehrlich coined the word receptor for chemical groups in biological systems that could be stimulated by combining them with certain complementary groups in drugs and toxicants (8). For example, he showed that the mercapto group was the arsenic receptor in trypanosomes and that blocking of this receptor by arsenic killed the organism (8). This concept of group interaction amused the interest of many pbarmacologists and paved the way for structure-activity relationship (SAR) studies. What initially began as a study of various groups of chemicals and their effects on biological activity later emerged as a broad-based “respectable field of research” referred to as the “exploration of the inner space” by Hansch (9, who has done pioneering work in this area for more than two decades. The beginning of systematic SAR studies could be traced back to Richet, Meyer, and Overton in the late nineteenth century. Richet studied the toxicEnviron. ki.Technol., Vol. 22,NO.6, 1968 607

stiNted benzoic acid, respectively. The underlying assumption in this concept is the absence of steric effects. Hammett found that equilibrium constants for a variety of reactions showed a linear relationship with 0 ., Ormerod (14) and Hansen (15)were the first to use u, successfully in SAR studies, the former in explaining hydrolysis and the latter in relating toxicity to mosquito larvae. Tall made the next significant step in QSAR development during the 1950s (16). He extended Hammett’s idea to aliphatics by introducing a steric parameter, E,, defined by:

uiicuianng me ocranoi-water partitioning coemcient, p p = Concentration in octanollconcentration in water Estimation by contribution method.: Estimation by fragment methodb:

log pR-x= log pR + log p = Ean . f.

n,

Selected fragment constants Gmup contributions f Fragment CH3

0.89 0.66

CHP cn _.. C

n_.A._ ?

0.20 Bond contrlbutlons for dmerent bonds

m

Bondwe

Singlechain -0.12 Single-ring -0.09 Chain-branch -0.13 Groupbranch -0.13 uroup wntrlbutlonsfor polar groups 1’

Gmuo

--Br

fld

0.20 0.06 - 0.38 0.60 -2.16 -1.26 1.81 -0.79 -2.11 -1.54 -1.64 1.26

4 1 -F -I -N < -NO,

1.09 0.94 0.37 1.35 -1.17 -0.02 0.57 0.03 -1.03 -1.00 - 0.40 0.34

-

-0-S-NH-NHr -OH -CN

-

-

-

Sample calculations ot log p by fragment constant method CH3(CH&CH3: log p = 2 . f (CH3) + 3 . f (CHZ) + 3 . f (bonds) = Z(0.89) + 3(0.66)+ 3(-0.12) = 3.40 (Experimental value = 3.39) = 3 . f (CH3) + f (CH) + 2 . f (bond) + f (branch) CH3CH(CH3),: log p = 3(0.89)+ (0.43) + 2(-0.12) + (-0.13) = 2.73 (Experimental value = 2.76) - = 4 . f (CH$ + f (C) + 3 f (bond) + 2 f (brancn) P = q0.69) + (0.20) + 3(-0.12) + 2(-0.13) = 3.14 (Experimentalvalue= 3.1 = 2 f (CH3) + f (CH) + f (OH) +

.

f

.

= 2(0.89) + (0.43)

+ (-1.64) +

2(-0.12) = 0.11 (Experimental value = 0.05)

+ (-0.22)

.R-X aeenotes an X-subs1~lu1edparent Comwund X, n, is the contribution of subsntuent X 10 logp of the substituted M)m und b& is the number of fragmenrf in {he molecule. 1. ISthe wntribul~onof the fragment to .. loan ..~~. ... . (For groups in nonar~maticcompounds. “For groups anached to one aromatic ring. .For groups auached to two aromatic rings.

). FmfemnR

ity of alcohols and ethers and suggested that “the more soluble they are, the less toxic they are” (IO). The work of Meyer (1I ) and Overton (12) clearly showed that fat-water partition coefficients could be used to explain the narcotic action of simple Grganic comwunds. The first maior step in SAR bevelopment took pl&e in the 1930s, when Louis Hammett proposed the Bo8 Environ. Sci. Technol.. Vol. 22,NO.6. 1988

Hammett sigma constant, ux,to assign numerical values for the electronic effects of substiNtion on an aromatic ring (13).With benzoic acid as the reference compound, this electronic parameter is defined bv the eanation: where K, and & are the ionization constants for the x-substituted and unsub-

SE, = log (KJK,,) (2) where K, and K,, are hydrolysis rate constants for the x-substiNted derivative and unsubstiNted parent compound, respectively; 6 was chosen according to the system being studied. For the acid hydrolysis of esters, 6 was tixed at 1.OO, and methyl was chosen as the reference system (Le., E, for CH, equals O.Oo0). E, proved to be a very effective molecular descriptor for improving correlations for which u, was not adequate. For example, Fukuto studied inhibition of fly brain cholinesterase for which E, gave better correlation along with ux(17). During the 196Os, Hansch and m workers combined the concepts of Hammett and Tall to derive a hydre phobic panuneter for substituents that were related more closely to biological activities. Known as log p , this is the most popular and commonly encountered descriptor in QSAR studies. The concepts proposed by Hanunett and Thft were very satisfactory for clean and homogeneous aqueous solvent systems; however, Hansch and mworkers modified them so that they could be a p plied to drugs in heterogenous systems such as blood, tissue, and proteins (3). They postulated that, before it could take part in a reaction, a drug had to bind to certain target locations in the living material. To Bccount for this drug-ligand binding capability, they proposed an analogous hydrophobic parameter T , defined as: Tx =

logp,

- log ph

(3) where pI and ph are the oil-water partitioning coefficients of the x-substituted compound and of the unsubstiNted parent compound, respectively. An octanol-water system often is used to represent the fatty and the aqueous phases. In correlation equations, log ph could be combined with the constant term, and log p could be the free variable rather than T . Liwar free-emrgy relationships. Having proposed the hydrophobic,parameter to portray drug-ligand interactions, Hansch formulated a generalized

1

attempted to encode information related to the bonding structure and atom content at various levels. Randic’s index recently has been formalized and extensively developed by Kier and Hall under the name of molecular connectivity indices, y, (24,2.5). Electronic descriptors. Another type of descriptors derived from the electronic configuration of the molecule began to appear in QSAR studies in the late 1970s. Steric parameters, van der Waals radii, and intermolecular forces are some of the proposed electronic parameters. Although they are expected to depict the characteristics of the molecule accurately, only a few correlations have been reported in the literature, possibly because of the difficult nature of the calculation methods. Molar refractivity (MR), which is directly ~ ~ e c t ewith d the elecwnic configuration of the molecule, has been very successful in developing highquality QSAR models and even has been used to calibrate the suitability of other descriptors. For instance, in the development of molecular connectivity indices, the assignment of empirical valence values for heteroatoms such as C1, F, and S was made by calibrating with molar refractivity, so that consistency with those of C, N, and 0 could be maintained (24). The use of MR in biological activity correlations originated from the suggestion that polarizabiility (as measured by MR) is an important aspect in the drug-ligand interaction. The strength of binding between the two was expected to relate to the resulting activity. The results of these studies may be considered the foundation of today’s QSAR studies. Many pharmacists, chemists, and ecotoxicologists have used these and similar descrimors in various applications, as can be seen from the increasing number of reports in the literature. The initial developbiological activity model known as the lipophilicity. Using this model, reLinear Free-Energy Relationship searchers found that biological activi- ment and intensive application of (LFER) Model (18). This model is ties could be quantitatively related to QSAR occurred in the area of drug design and later was followed by toxicity based on the assumption that the effect molecular descriptors, and S A R emof substituents on the strength of inter- erged as Quantitative Structure Activity and toxicity inhibition studies. One of the main factors that acceleractions between the drug and the recep Relationship studies. Topological approach. While ated the growih in QSAR studies is the tor is an additive combination of the hydrophobic, electronic, steric, and Hansch and co-workers continued to tremendous concurrent growth in comwork on substituent-activity relation- puting power in terms of speed, memdispersive factors. A typical model for the chemical ships, another group of researchers b e ory, efficient algorithms for statistical concentration, C, required to produce a gan to study the topographical aspect of analysis, interactive graphic capability, desired biological response is given by: molecular data. The fundamental idea and accuracy. One important aspect of that the struchm of a chemical influ- computer use in QSAR studies is the log (110 = a b.logp - c’ ences its properties was applied to the ability to work in mathematical do(logp)* d.E, e.MR f.o (4) geometry of molecular s w h m , as o p mains in many dimensions, far in exposed to its structural components. cess of the maximum of the three that where MR is the molar rehctivity. The Some of the proposed topological indi- the unaided human mind can comprereciprocal of the concentration reflects ces are the Wiener Path Number (19), hend. With this advantage, new meththe fact that higher potency is associ- Alternburg Polynomial Index (2@, ods and techniques have begun to apated with lower dosage. The (log py Gordon and Scantlebury Index (21). pear in QSAR studies. Pattern term with the negative coefficient re- Z index of Hosoya (22), and Randic’s recognition and cluster analysis are flectsthe commonly observed optimum Branchmg Index (23). AU these indices some of the new techniques that com-

+

+

+

+

Fnviron. Sci. Technol., Mi. 22, No. 6, lSSS (109

ing connectivity indices, x algorithm: m ~ =t ~,.l,l-, {Ll,,-m+l I(6J-051}, order of index (0,1, 2, 3 . . . ); t = type of subgraph (path, clustl On); nm = number of subgraph type; 6, = nodal value at node I ues & are assigned by two schemes: by assigning the number of node to that node or bv assianina - - the valence value at a node to that node. Sample calculetion ot connectivity index :bopentane, CH3CH(CH3)CH2CH3

Mdecular structure Identifying subgraphs:

m,I

Order 1 (rn=l)

Path

1-2; 2-5; 23; 3-4

Hydrogenauppreas structure with node identification number Order 2 (rn = 2)

Order 3 (rn E 3) 1-2-3; 1-2-5; 1 - 2 3 4 ; 5-2-3-4 5-2-3; 2-3-4 Cltster None None 1-2-5-3 Calculating firstorder path connectivity index, 'xp Subgraph Nodes Nodal values Subgraph contribution 1-2 1,2 1,3 = (1'3)"-0.5 = 0.57 2-5 2.5 3. 1 = wr-0.5 = 0.57 2-3 2,3 3,2 = (3*2j"-0.5 = 0.41 3-4 3,4 2, 1 = (2'1)"-0.5 = 0.71 Total contrtbutlon = 2.26 clculatsng thiwrder cluster index 3 ~ c : SubQraph Nodes Nodal values Subgraph contribution 14-54 1,2,5,3 1,3,1,2 = (1 '3'1 '2)"- 0.5 Total contribution 0.41 V e r i mnectivily indices have correlated well with many physical poperties such as density, boiling pint, molar refractiviM polarizability, total surface area, structural flexibility,and chromatographic retention indices.

-

SWIM:

Rs(amw 25. . .. .. .-

plement more conventional statistical approaches. These techniques can ham dle multidimensional problems and present the results in a more comprehensive manner.

The QSAR methodology The main objective of a Q S A R study is to develop a quantitative relationship between given properties of a set of chemicals in terms of their molecular descriptors. The success of the study will depend, therefore, on the quality of the data set as well as on the suitability of the descriptor(s) chosen for that study. The better the representation of any given molecule of the data set and the entire data set itself, the better the results will be. For example, if a descriptor can quantify all the "branchedness'' in a molecule but if it cannot discriminate between other members of the data set, then such a descriptor cannot be expected to perform well in Q S A R studies. A p r e l i and essential step in a 810 Environ. Sci. Technoi., Vol. 22, No. 6. 19Pn

. ..-

i

I

QSAR study is to evaluate the data base to identify any outliers and hidden patterns, trends, and major groupings. Outliers can be attributed to experimental errors or, more importantly, to certain members of the data base exhibiting mechanistic behavior so different that the outlier cannot belong to the bulk of the data. A simple method based on the jackknife test has been proposed by Hawkins and colleagues to identify outliers (26). Many pattern recognition (PR) methods recently have become available for use in Q S A R studies. Jurs and coworkers have discussed various aspects of PR and demonstrated many applications of PR to studies of mutagenic compounds (27, 28). h n n and Wold have developed PR models based on principal component analysis and a p plied them to studies of carcinogenicity (29) and antimalarial activity (30). In some cases, a Q S A R model for the activity was estahlished after separating the data into active and nonactive

groups by PR techniques. Cluster analysis is another useful PR technique that can provide valuable insights into Q S A R studies. Its objective is to group similar members of a data matrix in a hierarchical manner to produce a tree cluster or box cluster. At any level of clustering, the points within a cluster will be similar to each other in terms of the variables used for clustering and will be different from a l l other members of the data set. Figure 1 shows this method, by which 53 petrochemicals were classified into toxic and nontoxic groups using the second-order connectivity index, 2xv. The recommendations and conclusions of many Q S A R practitioners demonstrate that several overall criteria can be set for a robust Q S A R model. Such a model has a high adjusted 9, which means it has a low standard error, s, preferably comparable with the experimental uncertainty; does not contradict common physical, chemical, or biological knowledge; follows theoretical considerations; contains subsets of the data set also satisfied by the model with comparable coefficients and signs; is able to predict correctly when a testing data set is used; and contains a restricted number of descriptors relative to the number of cases. k d p t o m used in QSAR studies. The success of Q S A R methods depends on two factors: the training data set o b tained by testing a group of chemicals and the descriptors obtained from the molecular structure or from some easily measurable or calculable property of the chemicals. The suitability of the descriptors is an important factor in the development of sound Q S A R models. The most commonly used descriptors in environmental Q S A R studies are log p, connectivity indices, and, more recently, the solvatochromic parameters (seeboxes). MR is another descriptor that has been used to improve correlatiom. The retiactive index of a molecule encodes information about the electronic configuration of the molecule and is related to the deformation of electronic arrangement under an electric field such as the one induced by the presence of another molecule. Thus MR is expected to correlate well with many properties of the molecule and its activity when interacting with other molecules. MR is correlated to many physicalxhemical prop erties, some of which could be theoretically derived. For example, the electronic polarization in a molecule, 6,is directly related to MR by:

6 = (3IU4)MR , (5) MR can be calculated using the equation

+

MR = (n2 - l/n2 2) MWId (6) where MW = molecular weight, d = density, and n = refractive index. Although n and d depend on the physical state, MR does not, and it is a molecular property. Its determination, bowever, involves experimental measnrement of two more parameters, d and n, of which the latter is relatively difficult. MR has been used-both by itself and in COmbiMtiOn with other descriptorsin ecological S A R studies in toxicity, sediment sorption coefficient, and bioconcentration. Q S A R studies in pharmacology. Pharmacologists and medicinal chemists were quick to realize the advantages of QSAR techniques and to make full use of them. Initially, QSAR gave them the advantage of producing reliable results without properly understanding the drug activity mechanism. However, as more ahd more data and QSAR results became available, progress was made toward understanding of the process itself. Fundamentally, drug-receptor interactions have been known to be governed hy an initial transport phenomenon, followed by an electronic phenomenon at the receptor site. Therefore most of the S A R models developed in these studies used log p and MR to depict the transport aspect and or and E, to cover the electronic aspect. With the development of the topological descriptors, geometrical aspects also are incorporated in the equations to account for matching and orientation of the dmg structure at the receptor site. Early QSAR applications in the pbarmaceutical field were mainly in research on narcotic effects, anesthesiology, tranquilizers, sedatives, and painkillers. Currently, QSAR methods are standard tools used by the industry in almost every drug project from initial proposals to series design to final design and testing stages. In a review of QSAR in medicinal chemistry, Martin listed more than 30 classes of compounds for which QSAR models correctly predicted biological activities ranging from antibacterial effects to acute toxicity (33). (Commercial interests may have prevented many more QSAR models from being published.) A typical example of successful commercial application of QSAR to drug design led to the identification of a drug with 1ooO-fold increase in potency (Figure 2). Current research interest in the pharmacological area includes QSAR analysis of the effects of site-specific log p and MR values on biological potency. This research is expected to aid in receptor mapping and in differentiating between stereoisomers. Computerized 3-D molecular modeling techniques

also are being extensively used in QSAR for drug design and understanding mechanisms. Other areas of current research include sophisticated statistical testing methods, refining substituent constants with larger data banks, and QSAR analysis of pharmacokinetics. QSAR studies in ecotoxicology. Following the successful use of QSARs in the drug industry, by the late 1970s QSARs were being applied to ecosystems to analyze toxicity to the biosphere and inhibition of bacterial activity, biosorption, bioconcentration, and biodegradation. Although pharmacologists were concerned with the therapeu-

tic efficacy of the chemicals, toxicologists were interested in their toxic and inhibitive actions and ecologists and environmental engineers were concerned about their persistence and biodegradability. Bioassays were developed to quantify biological activity in t e r m of LCso. Low LCsovalues imply greater toxicity. Bioconcentration Factor, BCE is defined as a partitioning coefficient that is related to the concentration of the chemical in the biophase to that in the water phase. A higher BCF implies higher residual concentrations in the biophase. BOD tests are used to grade

Envimn. Soi. Technol..Vol. 22, No. 6. IQSS 611

me Linear solvation energy Relationship (LSER) equations The relationship between solubilityrelated properties, (SP),and solutesolvent interactions can be modeled using linear combinations of the freeenergy contributions by three types of terms: = SPO+ cavity term + dipolar term + h@rogen-bondingterm The cavity term quantifies the freeenergy input necessary to overcome the solvent-solvent cohesive interactions, thus forming a cavity in the solvent to accommcdate the solute molecule. Its magnitude equals (Sff),V2, where 6" is the Hildebrand solubility parameter and Vis the molar volume or,more generally, the

solutes, measured by a1and &J2, or between HBA solvents and HBD solutes m Thus

sp = Bu'1

IysH2)1

. r . 2 + Cad&h + D&(o,,& +

Theabovdisted variables, lmov as solvatmhromic parametee, ha! been established either by experimentation or by correlation with known propertiis. Tabulated values are available in the literature. Estimation rules also have been proposed for n compounds. problems, the general equation shown above can be simplified. For

which the activity of protoplasmic structures is arrested, Veith et al. proposed a nonlinear model for such toxic action: log LC50 = -0.94 logp 0.94 log (0.000068p 1) - 1.25 (7)

+

+

These researchers also related this model directly to the solubility log p model and found that the toxicity curve intersected the solubility curve at logp = 5.5. This implies that narcotic chemicals with log p > 5.5 may not cause toxic effects to aquatic organisms, even at saturation concentrations, because of their greatly l i i t e d solubility in the aqueous phase. Speece and co-researchers presented an overview of toxicity-molecular structure relationships and reported many correlations using the descriptors enumerated above (39). This application has become more promising because of the increasing number of new chemicals proposed for manufacture and the delays in their approval by

EPA. dipole and dipole-induced interactis. I t s magnitude equals n'l r'2, where u* vahm are pmportional to molecular dipole

.

moments.

The hydrogmbnding term quantifiesthe energy released by complexation between hydrogenbonddonating (HED) solvents and hydrogen-bond-accepting (HBA)

biodegradabiility. Most initial OSAR modeline of biofate problems ;sed log p as %e only descriptor. The biophase was considered akin to octanol; hence, log p was expected to correlate well with LCS values and BCF in aquatic organisms. Many researchers modeled toxicity to fish, rat, and microorganisms satisfactorily by using logp. Despite poor precision in the LC50 values, the correlations often were found to be highly reliable. Connectivity indices also were used to correlate with LCso but were more common in correlations with BCF; which also is believed to be dependent on geometrical properties such as molecular size, flexibility, and surface area. Toxicity studies in general consider surrogate life forms to model the toxicity of the chemicals to the organisms in the ecosphere. Algae are used in aquatic systems;fish are used to represent the effect on food chain; activated sludge is used to represent the effect on treatment systems; and rats are used to represent the effect on ternstrial warma42

Envimn SCl Techno1 , Val 22,NO 6, 19P'

Under Section 5(e) of the Toxic Substance Control Act (TSCA), EPA now accepts QSAR results as an alternative models have been reported, and to test data. QSAR also can be used they cover many physicochemical under Section 4 of TSCA to call for properties and activities over a further testing of chemicals that "may broad category of compounds (33). present unreasonable risk" (9). One current drawback to this Another important use of QSAR in approach is the toxicological studies is in the prediction set of solvatoch of chronic toxicity to aquatic orgaSource: References36 and 37. nisms. These data can be used in turn to formulate water quality standards. In the case of fish toxicity, chronic effects appear after an exposure equivalent to blooded animals. QSAR methods have at least one life cycle of the test species. been used to predict the toxicity of Because of the large number of species chemicals by correlating LCso with mo- and the increasing number of chemilecular descriptors such as log p, x, cals, predictive models based on shortMR,and redox potential E O. term studies would be both cost effecLipnick and co-workers, for in- tive and timely. In a recent review, stance, studied 96-h guppy LCs0 values McCarty and co-workers concluded for 110 phenols and reported consistent that a linear logp model correlated well QSAR models using log p and dissocia- with chronic and acute toxicity by nartion constants as the descriptors (35). cosis (40).They also proposed a relaKamlet and colleagues used their linear tionship between toxicity and BCF: solvation energy relationship (LSER) log LCs0 = Const - A log (BCFJ (8) model on Microtox test results for 38 chemicals and reported strong correla- This model is significant in that lioth tions (r = 0.987; std error = 0.28) acute and chronic toxicity could be re(36). The Microtox test, developed by lated to the chemical accumulated in the Beckman Instruments, is a rapid bioas- MY. say based on the measurement of the A good QSAR toxicity model should inhibition of bioluminescence of Phoro- be able to predict the LCso values as bacterium phosphorem. In another well as the mode of toxic action. h e data set of 32 chemicals, toxicity to dicting the latter, however, remains a Golden Orfe fish was again modeled by difficult task, and the errors associated LSER (r = 0.983; std error = 0.14) with selecting the wrong structure-tox(37). Veith and co-workers studied the icity mechanism may be greater than narcotic action of 65 industrial chemi- the standard error of the estimate assocals, including alcohols, ketones, and ciated with any individual QSAR (38). ethers, on fathead minnow (38). BeApart from aiding in the develop cause narcosis is a reversible state in ment of predictive models for LC50.

QSAR methods can provide insights into the mechanisms behind toxicity and help in postulating and validating theories. Chou and co-workers, for instance, studied structure-toxicity effects of 53 petrochemicals and found that the presence of double bonds, rings, and aldehyde groups in the structure magnified their toxicity to methanogenic bacteria (41). A study reported by Lipnick shows how the QSAR results can be used to postulate mechanisms causing toxicity (42). In this study of 55 alcohols containing no additional heteroatom functional groups, five compounds (primary and secondary allylic and propargylic alcohols and vicinal diol) were found to be significantly more toxic than would be predicted by normal toxicity models showing narcotic toxicity. To explain this anomaly, Lipnick proposed a proelectrophile mechanism. A similar anomaly was noted in another study by Lipnick and Dunn (43). Hydroquinone, acrolein, acrylonitrile, and salicylaldehyde appeared to be more toxic than their esters when they were evaluated using the narcosis model. This behavior also was explained by the ability of these chemicals to act as electrophiles. Introduction of an electrophilicity parameter into the general narcosis model successfully correlated the toxicity of such chemicals. Another interesting application of QSAR in correlating and rationalizing ecotoxicological data was reported by Hall and Kier (25) when they interpreted QSAR results of a study done by Koch (44). In Koch’s study, the following models were reported for biosorption (BS), BCE and LCs0: log BS = 0.445 + 0.673 ‘xv (where n = 18; r = 0.974) (9) log BCF = 0.147

+ 0.789 ‘xv

(where n = 21; r = 0.957) (10) log LCso = 5.582 - 1.192 ‘xV (where n = 31; r = 0.903) (11) Containing the same descriptor, these three equations suggest that the three events-biosorption, bioaccumulation, and bioactivity-are all related to each other; the fact that r decreases in a particular order is in accordance with the physical sequence of the three events, and biosorption could be the rate-limiting step. The sign of the ’xV term in all three equations is consistent and correctly depicts the physical significance. It also is interesting to note that ‘xV is closely related to MR (n = 101; r = 0.996).

This is as expected because MR, in turn, is expected to correlate well in situations in which chemical-ligand interactions dominate. A promising application of QSAR in ecotoxicity is in the analysis of toxic interactions. In reality, chemical interactions are always present as opposed to laboratory studies of single toxicants in pure media. By use of a model log /3 = bo

bonds, lA61x-y as the descriptor, Deardon and Nicholson obtained excellent correlations for each congeneric series. It is remarkable that when all 79 compounds were combined, the resulting generalized equation had IA6 1 x-y as the only variable with an r of 0.993:

BOD = (1.015 X l@)*)A6),-, + 1.193 (15)

+ b1X1 + b2X2 +

where x-y now represents the appropriate bond specific to the congener. For example, in the case of eight log P = bo + biXi + (b2 + b12XdX2 amino acids, the x-y pair was C-0. . . . (13) The robustness of this QSAR model was demonstrated further by considerthe additive, antagonistic, or synergis- ing other x-y bonds in the molecule tic actions of the two toxicants (XI, X2 and showing poor correlations. In the being their concentrations) on the frac- above example, when x-y was chosen tional reduction of activity (0 = ZJZo, as the C-N pair, r dropped to 0.154. the ratio of the final activity to the origi- Yet another strength of this model is nal activity) can be statistically esti- that none of the data appeared to be an mated depending on the relative values outlier. Because the [A6/x-y term corof the coefficients in the model. related so well with BOD, Deardon and For instance, if the term b12is found Nicholson concluded that electronic to be significant, then it can be seen that factors dominate the degradation procthe effect of X2 also will depend on XI, ess, at least at the initial microbial atthereby showing interaction-positive tack on the molecule. More imporbI2implying synergistic action and neg- tantly, this initial stage is the a ~ v b12 e implying antagonistic action. rate-limiting step in the process. On the other hand, if b12is not signifiPhysicochemical properties. The cant, the effect is purely additive. use of QSAR techniques in the estimaBois and co-workers reported a study tion of physicochemical properties is a of an interaction problem in which Zn relatively recent application. The main and pentachlorophenol (PCP) were reasons for this application are the exused to inhibit the bioluminescence of perimental difficulties encountered in Microtox individually and in combina- the measurement of physicochemical tion at various concentrations (45). properties and the discrepancies among They derived the model: interlaboratory results. For certain classes of compounds, even fundamen-lOg/3 = 0.8 +1.19X1 0.62XI2 tal properties such as solubility and 0.79X2 O.645Xz2 (14) Henry’s law constants are difficult to measure, as evidenced by the contrawhere X, = log [(Concn. of i) 0.21, dictory values reported in the literature and i is Zn2+ or PCP They concluded (47). Unlike the experimental results, that there was no interaction (neither the reliability of a QSAR estimate will be known beforehand. antagonistic nor synergistic), but only a nonlinear additive action. Apart from their predictive use, QSAR in biodegradability models. QSAR models can aid in the planning Biodegradability’is an important factor of optimal experimental designs. They in the fate of chemicals in the natural also can be used to corroborate literaecosystem and in the assessment of ture and experimental values and to their impact on treatment facilities. A identify outliers in a data set so that classic example of a suwessful applica- they may be subjected to further scrution of QSAR in the postulation and tiny. In addition, QSAR models can be validation of a mechanism ’of a process used to study the fundamental connecrecently was reported by Deardon and tion between molecular structure and Nicholson in their study of biodegrada- property, providing valuable insight bility (46). After analyzing six classes into the design of chemicals for various consisting of a total of 79 aromatic and applications. The estimation of aqueous solubility aliphatic compounds, they proposed that “degradation of a series of conge- probably was the first such application, ners should proceed by a common and it remains the subject of many mechanism, which must involve the QSAR studies. Many correlations for bond(s) common to all compounds in small sets of congeneric chemicals have the series.” been reported in the literature. Here we By identifying various common refer to QSAR studies of larger sets bonds, x-y, and using the modulus of containing mixed classes of comthe atomic charge differences across the pounds. bltXjX2

+ ...

(12)

+

+

+

+

+

Environ. Sci. Technol., Vol. 22, No. 6,1988 613

Hansch and co-workers studied the solubility of 156 aliphatics and aromatics and derived a model using log p (48). In spite of the diversity among the chemicals considered, the quality of the model was found to be very good (n = 156; r = 0.935). Only 20 measured log p values, however, were used in this correlation. All others were estimated using the additive method. Thus the method error in the additive method could be expected to be propagated into the solubility model. In our QSAR study of solubility predictions we obtained a strong correlation with molecular connectivity (n = 315; r = 0.973). The data set comprised mixed classes of saturated and unsaturated aliphatic and aromatic compounds that contained alkyl, chloro, and bromo groups with mixed substitutions (49). The utility of this model was demonstrated by using it in a predictive mode on 170 new chemicals for which the agreement between the experimental and predicted values was found to be satisfactory with an r of 0.977 and a standard error of 0.32. Kamlet and co-workers used the LSER concept to model solubility by separating the solution process into the cavity formation and polar-dipolar interaction terms (50). Their model for aliphatic liquids, for example (n = 109; r = 0.995), had a standard deviation of 0.141, which is about the same degree of precision as can be expected in experimental results. Such a high degree of fit is said to have reached the limit of exhaustive fit. The outstanding contribution of this method is that, for the first time, solubility has been resolved into more fundamental properties of the solute. Moreover, the dominant property is identified as the hydrogen-bonding effect between water (as the donor) and the solute (as the acceptor), followed by the cavity formation process. The octanol-water partitioning coefficient, log p , is another property that, in turn, has been modeled with other descriptors, Murray and colleagues derived a linear one-variable model for log p using the simple first-order connectivity index, x (51). Their data set included aliphatic and aromatic esters, acids, and alcohols but excluded hydrocarbons (n = 138; r = 0.986; SD = 0.152). Kamlet and co-workers used their LSER model to correlate log p and reported a very robust model (n = 103; r = 0.991; SD = 0.16) that once again compared very well with other estimations and experimental results (32). Hine and Mookerjee used another variation of the atomic group fragment concept to predict the hydrophobic character of 292 organic compounds (water-air partition coefficients). They 614 Environ. Sci. Technol., Vol. 22, No. 6,1988

established group and bond contribution factors using statistical analysis of available data and then predicted the logarithmic activity coefficients in the gas phase relative to that in water within a standard error of 0.12, which is considered to be “a remarkable scientific accomplishment” in view of the experimental uncertainty and the diversity of the chemicals in the data set (52). Many other physicochemical properties have been modeled using QSAR methods for smaller data sets. Molecular connectivity has been used successfully to predict MR, boiling point, density, heat of formation, and chromatographic retention indices (24, 25). A comprehensive list of reported QSAR models that uses connectivity indices in physicochemical property estimations can be found in Kier and Hall (25). A data set containing all the reported soil sorption coefficients of PAHs and chlorinated hydrocarbons was recently studied by Sabljic (53, 54). He found a linear correlation with the simple first-order connectivity index, ‘x(n = 72; r= 0.976). In a subsequent paper, Sabljic pointed out the shortcomings in using log p in QSAR models in general and illustrated the superiority of the connectivity index over log p in soil sorption correlations (54). Kamlet and colleagues used their LSER concept to predict the adsorption coefficient of organic chemicals to activated carbon and reported a strong model (n = 37; r = 0.974; SE = 0.19) (55). Q S A R in ecosystem modeling. The success of QSAR methods and their ability to predict ecological and physicochemical properties has prompted their use in total modeling of ecosystems. In a preliminary stage, equilibrium models can be assembled using QSAR for estimating the partitioning constants; robust correlations for such constants have been reported by many workers in addition to those mentioned herein. QSAR predictions of reaction rate constants will be needed to compose more elaborate QSAR-based ecological models. Such models can predict the temporal and spatial distribution patterns of environmental contaminants and their behavior in the various compartments of the ecosphere. Such a model recently was reported by Bartell (56). This model is formulated to predict the fate of aromatics; QSAR, in turn, is used to predict the rates of transport, transformation, and accumulation. Molecular properties such as boiling point, ring structure, molecular weight, and light absorption spectrum are used to estimate the model parameters: rates of volatilization, photolysis, and sorption. This model has

been validated from laboratory scale to a natural trout system. Validation studies on this model using statistical methods strongly favored the validity of QSAR as a useful tool in the fate modeling of homologs.

Limitations of Q S A R methods The possibilities and limitations of QSAR in ecotoxicology are discussed by Koch (57) and Martin (33). After reporting many models for BCF and LC50 for various classes of chemicals using log p , x, and MR, Koch concluded that even though “topological indices place a direct connection between the total structure of the molecule and activity” as opposed to the indirect connection between log p or MR and activity, “it is difficult to interpret exactly the physicochemical sense of the computed regression equation.” This situation is being remedied by directing research toward understanding the significance of descriptors at the molecular level. Recent investigations on connectivity indices, for example, have revealed many important connections between molecular properties and the indices (e.g., 3xpand “flexibility” of molecule) (25). Correlationsusing log p are widespread and rather well accepted. However, they are not based upon fundamental mechanisms but on ill-defined mechanisms such as narcosis toxicity. Martin pointed out that disappointments with QSAR may be caused by a poor training set or an invalid or ambiguous regression model, extrapolation outside the range of properties represented by the training set members, and different conditions of testing in the training and testing sets (33). A more serious failure may be due to basic differences in the mechanistic behavior of the data set members. Currently, the application of QSAR to environmental problems is limited to distinguishing structures that should be thoroughly tested from those that should not. Thus QSAR is not expected to replace experimental verification. Future of Q S A R As can be seen from the increasing number of reports in the literature, QSAR techniques are emerging as one of the useful tools in tackling environmental problems. They are being accepted to a greater extent by regulatory agencies in decision making and policy implementation-for example, in Sections 4 and 5(e) of TSCA (9).EPA has identified many stages of development in QSAR methods during the last few years. Examples include the determination of thermodynamic properties and the classification of chemicals (58). The current trends in QSAR develop-

ment are toward establishing integrated computer programs and large data bases; applying artificial intelligence to QSAR model development using information such as pattern recognition and descriptor selection; developing a new breed of QSAR scientist who' specializes in multidisciplinary approaches, including expertise in xenobiochemistry, toxicology, statistics, and computer sciences (e.g., chemometricians); developing and establishing consistent sets of descriptors (such as log p and solvatochromic parameters): and standardizing surrogate testing procedures for generating testing data bases (e.g., Microtox and BCF tests). An ultimate objective in QSAR applications would be to develop QSAR-based fate models coupled with models based on dose exposure and effect. These models could serve as valuable earlywarning tools for managing and screening new chemicals for setting priorities among existing chemicals. QSAR applications to ecosystem modeling have great potential in environmental engineering. Applied to smaller domains, they have been shown to be reasonably valid, but validation of total system models will continue to be a challenge unless large and consistent data sets become available to calibrate and fine-tune the models. Analysis of the failures and successes of QSAR models will aid in the development of robust models. In the absence of experimental data on the physicochemical properties and toxicity of chemicals, QSAR models may provide the most reliable estimates for use in ecosystem modeling. These models are also valuable as algorithms in more general environmental modeling. Currently, QSAR analyses lack the existing toxicity and physicochemical property data bases in several areas. Many sets of data bases have yet to be correlated by QSAR methodology. Acknowledgment Special acknowledgment is given to L. 9. Kier of Virginia Commonwealth University, to the late M. I. Kamlet, and to F! C. Jurs of Penn State University for their assistance and encouragement in this effort. This article was reviewed for suitability as an €S&T critical review by Philip Watanak, Dow Chemical, Midland, Mich. 48674; and by Elizabeth K. Weisburger, National Cancer Institute, Bethesda, Md. 20892.

References

,

~~~

IinR AmonR Erisrinp Chem&ls: Gesellschaft fiGStrahlin: Munich, 1985: pp. 429-36. (3) National Research Council Report ISBN 06309-03433-7:, National Research Cormcil: Washington. D.C.. 1984; p. 76. (4) Koch. R.: Nagel, M. 7bxicol. Environ. ~~

~~~

~~~~~~~~~~

""".

(7) Langley, J . N. 3. Physiol. London 1878, (7 I , 339-23: 339-43. (8) Ehrlich. P; Morgenrath. 1. Studies in Immunity, 2nd ed.: Wiley: New York. 1910; munity. p" 76 (IC pp. 76-95. 10 (9) Josephson, J . Environ. Sci. Teehnol. I, 1984.18, 285A-286A. 19&1,18. ( I O ) Hansch. C. In Biologicol Acrivify and Chemicol Srructure; Elsevier: Amsterdam, I O,,. " 47 1977; p. 47. (11) ( I 1 Meyer. H. Arch. Erp. Pothol. Pharmokol. 1899.42, 109-18. (12) Overlan. E. Vierreljohresschr. 1899, 44, 88-92. (13) Hammetl, L. P Chem. Rev. 1935, 17. 175-76 . . . (14) Ormerod, W. E. Biochem. 3. 1953, 54. 7"I".-.

A

(15) Hansen. 0. R. Acto Chem. Scand. 1%2,

16, 1593-1600. (16) Tafl. R. W. 3. Am. Chem. Soc. 1953, 75, 4231-38. (17) Fukuto, T. R. Residue Rev. 1%9, 25. 277.20

(18) Hansch. C.; Fujita T 3. Am. Chem. Soe. 1%4,86. 1616-26. (19) Wiener, H. J . Am. Chem. Soc. 1947, 17,

2636-38. (20) Allernburg, K. Brennst. Chem. 1966.47, 331-36. (21) Gordon, M.; Scantlebury, G. R. Trans. Farado? Soc. M4.60. 605-2 I. (22) Hosoya. H . Bull. Chem. Soc. Jpn. 1971, ~~~

~~~

44 . ., 7??7-?9 . . ...,.

(23) Randic, M. J . Am. Chem. Soc. 1975.97, 6609- 15. (24) Kier, L. B.; Hall, L. H. Molecular Connectivity in Chemistry and Drug Design; Academic: New York, 1976: pp. 16-195. (25) Kier. L. B.: Hall. L. H. Molecular Con-

Quantitative Approaches IO Drug Desipn: Dearden, J . , Ed.; Pharmacochemistry Library Series: Elsevier: Amsterdam, 1983; Vol. 8. (44) Koch, R. 7bxicol. Environ. Chem. 1983, 6. 87-96. (45) Bois, F.: Vaillanl, N.; Vasseur, P Bull. Environ. Conlam. Toxicol. 1986, 36. 70714. (46) Deardan, 1. C.; Nicholson. R. M. Pesric. Sci. 1986, 17, 305-10. (47) Mackay. D.; Shiu. W. Y. 3. Phys. Chem. Re! Data 1981, 10. 1175.99. (48) Hansch. C; Quinlan J. E.; Lawrence. G. L. J . Org. Chcm. 1968,33, 347-50. (49) Nirmalakhandan, N . : Ph.D. Thesis. EnVimnmenlal Studies Institule, Drexel University. 1988. (50) Kamlet. M. J . et al. J . Phorm. Sci. 1986, 75, 338-49. (51) Murray, W. J.; Hall. L. H.: Kier, L. 8.3. Pharm. Sci. 1975, 64, 1978-81. (52) Hine, J . M.; Mookerjee. P K. J . Org. C h m . 1975.40, 292-98. (53) Sabljic. A. 3. Apic. Food Chem. 1984, 32. 243-46. (54) Sabljic, A. Environ. Sci. Techno/. 1987, 21. 358-66. (55) Kamlei, M. et al. Carbon 1985.23, 54954. (56) Bartell. S. M. In Proceedings-Workshop on Environmental Modeling for Priorify Setring Among Oisting Chemicals; Gesellschaft fur Strahlen: Munich, 1985; pp. 173-93. (57) Koch, R. In QSAR in Environmenroi ToxiCOlORY: Kaiser. K . L . E . . Ed.; Reidel: Dordrecht, the Netherlands. 1984; pp. 20722. (58) "Research Outlaok. 1986." EPA Publication EPA-60019186.004; U.S. Environmental Proleclion Agency: Washington, D . C . , 1986.

rcarch Studieq LmJon. 19% pp l l - 2 u . Hawking. D U , Brad". D , Kass. G . V Twhnomefrm 1984 2a31. , .. 197-20R. (27) Stouch, R. T.; Jura. P C. Environ. Health Perspect. 1985.66, 329-43. (28) Stuper. W. 1.; Brigger. W. E.; Jurs, P C. In ComourerAsrirred Studisr ~ of . ~ ,Chrmi- ~ ~ col Structurb ond Biological Functions: Wiley Interscience: New York. 1979; pp. 29(Zb,

~

~~

~

~~

~~~

~

65

(26j'Dunn V 1978,21, I

(30) Dum Y 865-68. (32) Kamlet, M. J.; Taft. R. W. Aero Chemico Scond. 1985,839. 61 1-28. (33) Martin Y. C. 3. Med. Chem. 1981. 24. 229-37. (34) Cramer. R. Chemrech 19%0, IO,7 4 4 4 7 . (35) Lipnick. R. L. el SI. ASTM Sp. Tech. Publ. 1986, Sol I C 7 (36) Kamlet.

(38) Veith. G. D.: Call. D. I.; Braoke. L. T Con. J . Fish. Aqum. Sci. 1983,40. 743.48, (39) SDeece. R. E.: Nirmalakhandan. N . : Jurs.' P C. In Pioceeding,-lnrerno;ionoi Conference on Innovative Biologicol TPOImenrofTbxic Wosre Worcrs: Schoke, R. J . et al.. Eds.; Consortium for Biological Waste Treatment Research and Technoloev: "_ Arlington, Va.. 1986. (40) McCarty, L. S . e l al. Environ. Pchnol. C h m . 1985.4. 595-606. (41) Chou, W: L. e l al. P m g . Worer Echnol. IWP rn 5dc.w_-. (42) Lipnick. R. L. In QSAR in Toxicology FA ' and Xenobiochsmirrn: Tichr M~~ Pharmacochemistry Library Series; Elsevier: Amsterdam. 1985; Val. 8, pp. 38-52. (43) Lipnick. R . L.; Dum. W. J . , 111. In ~

I.

~~~~~

,. -.. --..

F] ,

~

.

.

-.

&--;:.. a:.

Nagamany .Virmalnkhandarr i/i /ii,/ds (i B. S. dc~,qrcc, in mechrr,zi