1630
Chem. Res. Toxicol. 2004, 17, 1630-1637
A QSAR for Baseline Toxicity: Validation, Domain of Application, and Prediction Tomas O ¨ berg* Department of Biology and Environmental Science, University of Kalmar, SE-391 82 Kalmar, Sweden Received June 30, 2004
The interest in modeling and application of structure-activity relationships has steadily increased in recent decades. It is generally acknowledged that these empirical relationships are valid only within the same domain for which they were developed. However, model validation is sometimes neglected, and the application domain is not always well-defined. The purpose of this paper is to outline how validation and domain definition can facilitate the modeling and prediction of baseline toxicity for a large database. A large number of theoretical descriptors (867) were generated from two-dimensional molecular structures for compounds present in the U.S. EPA’s Fathead Minnow Database (611) and the Syracuse Research Corporation’s PhysProp Database (25 000+). A quantitative structure-activity relationship model was developed for baseline toxicity (narcosis) toward the fathead minnow (Pimephales promelas) using a projection-based regression technique, PLSR (partial least squares regression). The PLSR model was subsequently validated with an external test set. The main factors of variation were related to size/shape and polar interactions. The prediction error was comparable to, or slightly better than, the ECOSAR procedures. A set of 16 805 compounds, drawn from the PhysProp Database, was projected onto the PLSR model. More than 90% (15 597) of the compounds fall within the valid model domain, defined by the residual standard deviation and the leverage. The predicted baseline toxicity indicates an acute hazard for twothirds of these compounds, classes I-III in the OECD Globally Harmonized Classification System (LC50 e 100 mg L-1). Finally, the mode of action assigned in the U.S. EPA Fathead Minnow Database was investigated. Reclassification to narcosis as the mode of action is suggested for 92 compounds, mostly from the groups “unsure” and “mixed”. The present classification into specific modes of action seems to be further strengthened by the findings in this investigation.
Introduction Narcosis or nonspecific toxicity is often referred to as baseline toxicity since it is the minimum toxicity exhibited by organic compounds. Pioneering work at the end of the nineteenth century by Meyer and Overton linked the effect of narcotics to a simple physicochemical parameter, their solubility in oils (1, 2). Eighty years later, Ko¨nemann and Veith et al. applied the concept of narcosis to aquatic toxicology and established the relationship between the octanol-water partition coefficient (Kow)1 and acute toxicity by narcosis (3, 4). However, this straightforward relationship for baseline toxicity is not applicable to all groups of organic compounds. Nonpolar and polar compounds both have narcotic effects but show distinctly different correlations between the acute toxicity and the Kow (5). This was initially interpreted as different modes of action. However, more recent research has questioned the separation into nonpolar and polar narcosis (6, 7). * To whom correspondence should be addressed. Tel: +46-480- 44 73 00. Fax: +46-480-44 73 05. E-mail:
[email protected]. 1 Abbreviations: CAS no., Chemical Abstract Service registry number; DF, degrees of freedom; F, ratio of mean squares; Kow, octanol-water partition coefficient; MOA, mode of action; MS, mean squares; PLSR, partial least squares regression; Q2, square of the multiple correlation coefficient for the external test set; QSAR, quantitative structure-activity relationship; R2, square of the multiple correlation coefficient for the calibration objects; RSD, residual standard deviation; SEC, standard error of calibration; SEP, standard error of prediction; SMILES, simplified molecular input line entry system; SS, sum of squares.
Baseline toxicity can be understood as a disruption of the functions of biological membranes, although the detailed mechanism remains unclear. It is generally assumed that anesthesia is due to changes in the neural ion channels, but this does not identify the primary target sites. Although narcotics do not require specific functional chemical groups, their action can still be quite specific, for example, by binding to pockets on proteins (8). Support is now emerging for the idea that the primary sites of action are neuronal proteins (9, 10). Here, it is sufficient to note that the molecular sites, whether lipids or proteins, are of an amphiphilic character binding both nonpolar and polar compounds (11). This is then a unifying explanation for general anesthesia, and the apparent difference between nonpolar and polar compounds could be due to the choice of partition coefficient or the distribution between target and nontarget compartments (7, 12, 13). Narcosis is expressed by most industrial chemicals, with different sizes, shapes, and functional groups. Environmental samples, for example, contaminated surface waters, often contain mixtures of a large number of these anthropogenic compounds. Additivity is a fundamental part of the baseline toxicity concept and in an environmental risk analysis attention must be given to the possible joint action of all mixture components (3, 1418). Below the thresholds for specific toxicity, the baseline toxicity may still amount to an environmentally relevant effect (19, 20). It is therefore likely that baseline toxicity
10.1021/tx0498253 CCC: $27.50 © 2004 American Chemical Society Published on Web 11/04/2004
A QSAR for Baseline Toxicity
is a prominent effect in the environment and of particular importance for persistent compounds (14, 21). All structure-activity modeling relies on the assumption of a correspondence between the biological activity of a compound and its molecular structure and physical and chemical properties (22). The findings by Meyer and Overton, as discussed above, were among the first reported QSARs (quantitative structure-activity relationships). Ferguson later gave the observed data a thermodynamic interpretation, connecting baseline toxicity with chemical activity, which is then proportional to the lethal body burden (23). McGowan recognized the importance of the molecular volume or parachor and of hydrogen bonding (24), and Mullins expanded the thermodynamical treatment (25). Hansch carried the work forward, both by introducing the partition coefficient between octanol and water as an additive parameter describing hydrophobicity and by extending the QSAR equations into linear free energy relationships (26, 27). The Kow has since then remained the most frequently used parameter in QSAR studies (28). However, it is worth mentioning that the use of partition coefficients with higher alcohols was originally suggested by Overton (2). Recently, several investigators have attempted to correlate acute toxicity to the fathead minnow (Pimephales promelas) with computationally derived molecular descriptors, and the group contribution method represents one such approach (29). In this context, it should be noted that most QSARs are applied in several steps, where Kow is estimated using a group contribution method. A more direct approach is to correlate toxicity directly to molecular descriptors, and such models perform equally well within the domain of application (2934). A benefit of using a more diverse set of descriptors is that both nonpolar and polar compounds can be handled with the same model (6, 12, 34), which also seems to be more in agreement with the current understanding of the mechanisms of general anesthesia. Linear free energy relationships provide a third alternative method of constructing QSAR models that are particularly well-suited to increasing our understanding of the basic chemical properties involved (35). Ordinary least squares regression has often been used to estimate the parameters of QSAR equations for acute toxicity to P. promelas, but examples of other techniques of building these empirical models are neural nets and projection-based methods (6, 29-33, 36-39). Irrespective of the fitting scheme, rigorous validation and a welldefined domain of application are vital components for the future success of a QSAR model. It has been observed that these steps in model development are often lacking. Three recent papers that provide guidelines for the development of QSAR models also pay special attention to validation and misuse by extrapolating beyond the domain of application (40-42). Defining the domain of application necessitates some measure of chemical similarity, and descriptors of the chemical structure provide such measures (43). Eriksson et al. give further details on how the applicability domains are subsequently defined using projection-based methods (42). The main purpose of this investigation is to develop and validate a QSAR model for baseline toxicity that can easily be applied to the screening of large databases. This requires the model to use only structural elements and descriptors that are easily computed. For speed and ease of computation, a generally accepted two-dimensional (2-
Chem. Res. Toxicol., Vol. 17, No. 12, 2004 1631
D) molecular representation such as the Simplified Molecular Input Line Entry System (SMILES) is preferred.
Experimental Procedures The experimental part of this study consists of three steps: collection of experimental and structural data, generation of molecular descriptors, and data analysis. Collection of Data. Acute toxicity test results for the fathead minnow (P. promelas) were obtained from the Fathead Minnow Database compiled by the Mid-Continent Ecology Division of the U.S. EPA’s National Health and Environmental Effects Research Laboratory (Duluth, MN). Acute (96 h) median lethal concentration (LC50) values are reported for 617 compounds together with their chemical structures (SMILES notations) and a classification of the modes of toxic action. The database and the classification scheme for modes of action are described by Russom et al. (44). In this study, 311 compounds with molecular weights between 32 and 285 amu were selected on the basis of classification of their modes of action as narcosis (narcotics modes I-III). Another 213 compounds within the same molecular weight interval, classified as having other modes of action, were used as a comparison set. The geometric mean LC50 value was selected when more than one value was reported. Acute fish toxicity data for 76 nonpolar narcotics used by Moore et al. were selected for further comparison of different modeling schemes (33, 45). This data set originates from the AQUIRE database and was compiled from many different literature sources. The PhysProp Database (Syracuse Research Corporation, Syracuse, NY) contains chemical structures (SMILES notations), names, and physical properties for over 25 000 compounds. Here, 16 805 organic compounds within the same molecular weight interval were selected for further study, excluding charged molecules and organometallics. Generation of Molecular Descriptors. The 2D molecular structures were used directly as input for the generation of 867 empirical descriptors using the software Dragon v.5.0 (Milano Chemometrics and QSAR Research Group, University of MilanoBicocca, Milan, Italy). The molecular descriptors generated included constitutional descriptors, topological descriptors, walk and path counts, connectivity indices, information indices, 2D autocorrelations, edge adjacency indices, BCUT descriptors, topological charge indices, eigenvalue-based indices, functional group counts, and atom-centered fragments. Most of these descriptors are reviewed in the recent textbook by Todeschini and Consonni (46). The correct interpretation of the SMILES code was validated by comparison of the molecular weights reported in the databases with those calculated. Only the compounds passing this validation have been included in the investigation. Software and Data Analysis. Estimations of LC50 values using the ECOSAR analysis procedures were performed using the software ECOWIN v.0.99 g (U.S. EPA, Risk Assessment Division, Washington, DC). The data analysis and multivariate calibrations were carried out using the software Statistica v.6.1 (StatSoft Inc., Tulsa, OK) and Unscrambler v.9.0 (Camo Process AS, Oslo, Norway). ANOVA and partial least squares regression (PLSR) were used as methods for data analysis and modeling. ANOVA is described Ståhle and Wold (47). PLSR is based on a linear transformation of the original descriptors to a limited number of orthogonal factors, attempting to maximize the covariance between the descriptors and the response variable. Multivariate calibration is reviewed by Martens and Næs and Næs et al. (48, 49). Descriptor variables with a minor influence in the PLS regression were assigned zero weight; these variables were identified using a jackknife method for significance testing of the model parameters (50, 51). All descriptor variables were preprocessed by auto-scaling to zero mean and unit variance. Cross-validation was used to establish the rank of the calibration model (number of latent variables), and an external test set was used to estimate the prediction error. The dependent variable, LC50 values for P. promelas, was transformed to its logarithm. The calibration
1632
Chem. Res. Toxicol., Vol. 17, No. 12, 2004
Figure 1. Frequency distribution of experimentally determined LC50 values for P. promelas (log mmol L-1).
O ¨ berg
Figure 2. RSD vs leverage for objects in the test set, together with the 5% confidence bound and leverage limit. Five objects are outside the applicability domain.
model was characterized by the standard deviations of the prediction residuals for the calibration objects and the external test set, respectively: SEC (standard error of calibration) and SEP (standard error of prediction). The explained variances are defined as sums of squares due to regression divided by sums of squares about the mean: R2 (square of the multiple correlation coefficient for the calibration objects) and Q2 (square of the multiple correlation coefficient for the external test set). The PLSR calibration model also defines a valid domain for the descriptor variables. New validation and prediction objects were assessed by the residual standard deviation (RSD) (the Euclidean distance to the PLSR model) and the leverage (the Mahalanobis distance to the calibration objects within the PLSR model space). These two distance measures were then used to decide if an object was within the domain of application or not. Here, the 5% significance level was chosen as the limit for the RSD, and the limit for the leverage was set to three times the average leverage for the calibration objects.
Results The narcotic compounds selected from the Fathead Minnow Database represent a diverse selection of organic compounds with approximately normally distributed molecular weights (n ) 311, mean ) 152, and standard deviation ) 50.5). The LC50 values vary between 0.00027 and 920 mmol L-1 and follow a log-normal distribution (Figure 1). Multivariate QSAR Calibration Model. The narcotic compounds were randomly sorted into two groups: a calibration set of 208 objects and a test set of 103 objects. Two hundred eighteen descriptor variables were selected for inclusion in the final model, based on significance tests using jackknifing in preliminary runs; nine outlying objects were excluded from the calibration set based on the residual sample variance for the descriptor variables and the response variable. The number of latent variables to retain in the PLSR model was estimated at five using full (leave-one-out) crossvalidation. These five latent variables capture 69.9% of the variance in the 218 descriptor variables, thus demonstrating that the information contained in the descriptors is effectively used in the calibration model. The explained calibration variance (Rcal2) for the dependent variable (LC50 values for P. promelas) was 90.9%, and the SEC was 0.343 log mmol L-1. The explained prediction variance for the cross-validation objects (QLOO2) was 87.6%, and the SEP was 0.404 log mmol L-1. The PLSR model also defines a valid domain for the descriptor variables. Five objects in the external test set
Figure 3. Predicted vs measured LC50 values (log mmol L-1) for test set objects.
were substantially different from the calibration objects and fell outside of the 5% confidence bound for the residuals and the leverage limit of 0.0905. These objects were thus excluded from further use (Figure 2), and the model was subsequently validated with the remaining 98 objects from the external test set. The explained variance (Qext2) for the dependent variable was 88.8%, and the SEP was 0.424 log mmol L-1. Figure 3 shows predicted vs measured results for the test set objects. If all objects in the external test set had been retained, the explained variance and SEP would still remain almost the same (88.5% and 0.427 log mmol L-1, respectively). Main Factors of Variation. Most of the variance, both in the molecular descriptors and the LC50 values for P. promelas, is described by the first two latent variables (Figure 4). A large proportion of the variation in acute toxicity can be attributed to membrane-water partitioning, and the first latent variable describes mainly variation in size and shape. Figure 5 shows the correlation between the first latent variable and the McGowan’s molar volume, calculated by summing the characteristic atomic volumes and correcting for the number of bonds (52). The second latent variable can similarly be attributed to polar interactions, with highly significant contributions from both the number of acceptor atoms for hydrogen bonds (N, O, and F) and the mean atomic polarizability.
A QSAR for Baseline Toxicity
Chem. Res. Toxicol., Vol. 17, No. 12, 2004 1633 Table 2. Summary Statistics for Comparison of PLSR and ECOSAR Predictions with 96 h LC50 Values for P. promelas Reported in the Fathead Minnow Databasea Qext2 SEP (log mmol L-1) mean absolute residual (log mg L-1)
PLSR
ECOSAR
0.858 0.446 0.343
0.819 0.505 0.378
a Fifty-three narcotic compounds that are not used in either calibration set.
Table 3. Baseline Toxicity (96 h LC50 Values) for the Fathead Minnow (P. promelas) Predicted for 15 597 Organic Compounds from the PhysProp Databasea prediction results
Figure 4. Explained variance from cross-validation (%), for molecular descriptors and acute toxicity (log LC50), vs the number of latent variables in the PLSR model.
data for model building
LC50 (mg L-1)
no.
%
no.
%
e1 >1 to e10 >10 to e100 >100 to e1000 >1000
899 3171 6324 3864 1339
6 20 41 25 9
16 75 106 80 34
5 24 34 26 11
a Data from the Fathead Minnow Database used for model building are shown for comparison.
Figure 5. Molar volume (cm3 mol-1) vs first latent variable in the PLSR model. Table 1. ANOVA for a Linear Model Describing the Contribution of Three Main Factors to Baseline Toxicity sources of variation model error total
SS 155 102 257
factors molar volume (cm3 mol-1) 114 no. of N, O, and F atoms 3.31 mean atomic polarizability (Å3) 36.2
DF
MS
F
P
3 195 198
51.7 0.524 1.30
98.7