Liver Specificity of the Carcinogenicity of NOCs: A Chemical

Oct 8, 2012 - This study aimed to determine the most significant molecular features associated with the liver specificity of the carcinogenicity of N-...
0 downloads 0 Views 815KB Size
Article pubs.acs.org/crt

Liver Specificity of the Carcinogenicity of NOCs: A Chemical− Molecular Perspective Jintao Yuan, Yuepu Pu,* and Lihong Yin Key Laboratory of Environmental Medicine Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing, 210009, China S Supporting Information *

ABSTRACT: This study aimed to determine the most significant molecular features associated with the liver specificity of the carcinogenicity of N-nitroso compounds (NOCs). Accordingly, quantitative structure−activity relationship (QSAR) analysis was performed to extract molecular information from NOCs using a topological substructural molecular descriptor (TOPS-MODE) approach. A linear discriminant analysis (LDA) model of a series of NOCs for rat liver was developed using TOPS-MODE descriptors to predict nonliver- and liver-carcinogenic NOCs. Two descriptors exclusively calculated from the molecular structures of the compounds were selected by a genetic algorithm. The descriptors were then weighted with bond distances as well as the Abraham solute descriptor partition between water and aqueous solvent systems to indicate the importance of their roles in liver specificity. The performances of the LDA model were rigorously validated by leave-one-out cross-validation and external validation, with the prediction accuracy reaching 88.3% and 80.0%, respectively. The contributions of the different molecular fragments to rat-liver specificity were computed. The results served as important information related to liver specificity and were analyzed from the chemical−molecular perspective. The resulting model can provide an efficient method to discriminate between as well as extrapolate nonliver- and liver-carcinogenic NOCs. The contribution of the entire nitrosamine molecule was determined as being responsible for the liver specificity of nitrosamine carcinogenicity. Although the QSAR showed limitations in complex hepatocarcinogenicity, the proposed method may considerably help elucidate the role of nitrosamines in liver specificity from the chemical−molecular perspective. The nature of these enzyme−substrate interactions is characterized. Insight into the chemical−structural and biological factors related to the liver-specific biological activity of NOCs is also provided.



INTRODUCTION N-Nitroso compounds (NOCs) are toxic and carcinogenic in many animal species with a high degree of organ specificity. For example, most symmetrically substituted dialkylnitrosamines induce liver cancer. However, di-n-butylnitrosamine produces bladder tumors, and diamylnitrosamine selectively induces lung cancer. Most asymmetrical alkylmethylnitrosamines, particularly alkylmethylamyl-, cyclohexyl-, phenyl-, benzyl-, or phenylethylnitrosamines, selectively induce esophageal carcinomas.1 Quantitative structure−activity relationship (QSAR) modeling exploits existing chemical knowledge to investigate the interactions between chemicals and living organisms without further experiments.2,3 QSAR is used in pharmacology and computational toxicology, as well as in the study of the carcinogenic properties of NOCs because NOCs are potent and widespread environmental carcinogens.4−7 Some relationships between the molecular structures of NOCs and their carcinogenic properties have been established by some researchers including us.4−7 Although the molecular structures of NOCs are relatively more simple than those of many other classes of carcinogens, the carcinogenic effects of NOCs show remarkable organ specificity. © 2012 American Chemical Society

A few previous studies have attempted to determine the basis of this organ specificity. Dong and Jeffrey1 summarized some determinants of organ-specific carcinogenesis, including distribution, metabolism, elimination routes, DNA modification, DNA repair, cell proliferation, DNA damage fixation and promotion, as well as target genes. However, Preussmann and Wiessler8 considered that the organotropic action of NOCs is predominantly governed by the chemical structure of the compound and the animal species used. Soderman9 reviewed the target-organ specificity of 811 selected chemical carcinogens and found that the target organ of some test chemicals in a given species is influenced by the route of administration and dose. These inconsistent results indicate that the basis of the organ specificity of NOCs requires further clarification. However, organ specificity involves all aspects of the interactions of these compounds with biological systems, from their absorption, distribution, and metabolism to their ultimate reaction with some biological components. Determining the factors responsible for the organ specificity of NOCs in all organs or tissues in a single Received: June 27, 2012 Published: October 8, 2012 2432

dx.doi.org/10.1021/tx3002912 | Chem. Res. Toxicol. 2012, 25, 2432−2442

Chemical Research in Toxicology

Article

Table 1. CAS, SMILES and Corresponding Classification of N-Nitroso Compounds Used in This QSAR Study classification no.

CAS

SMILES

1c 2c 3c 4c 5a 6c 7b 8a 9c 10c 11c 12c 13a 14b 15a 16c 17a 18a 19b 20c 21b 22b 23a 24c 25c 26a 27c 28b 29c 30a 31b 32a 33c 34b 35c 36a 37b 38c 39a 40a 41a 42a 43b 44a 45a 46a 47a 48a 49a 50a 51a 52b 53c 54c 55c 56c 57c 58c 59a

760-56-5 10589-74-9 60391-92-6 54749-90-5 56654-52-5 145438-97-7 96724-44-6 110559-84-7 69112-98-7 18774-85-1 13743-07-2 71752-70-0 99-80-9 16699-10-8 63412-06-6 63642-1-17 14026-03-0 76014-81-8 38777-13-8 83335-32-4 89911-79-5 89911-78-4 61034-40-0 96806-34-7 96806-35-8 75896-33-2 760-60-1 55090-44-3 937-25-7 13256-11-6 75881-20-8 75881-22-0 684-93-5 15973-99-6 82018-90-4 91308-71-3 91308-69-9 51542-33-7 53609-64-6 1116-54-7 55-18-5 62-75-9 10595-95-6 42579-28-2 26921-68-6 70415-59-7 75411-83-5 16219-98-0 614-00-6 68107-26-6 53759-22-1 5632-47-3 81795-07-5 26541-51-5 3851-16-9 101-25-7 625-89-8 73785-40-7 55557-02-3

ONN(C(O)N)CCC ONN(C(O)N)CCCCC ONN(C(O)N)CC(O)O ClCCN(NO)C(O)N[C@@H](CO)[C@@H](O)[C@@H](O)[C@@H](O)CO ONN(C(O)NCCCC)CCCC ONN(c1ncn(c1C(O)N(NO)C)C)C ONN(C(O)NCCO)CC ONN(C(O)NCC(O)C)CC FCCN(NO)C(O)N ONN(C(O)N)CCCCCC ONN(C(O)N)CCO ONN(C(O)N)CCCO ONN(c1ccc(NO)cc1)C ONN(c1ccc(cc1)/CC\c1c2c(ncc1)cccc2)C ONN(C(O)c1ccccc1)C ONN(C(O)NCCC[C@@H](N)C(O)O)C ONN1[C@H](CCCC1)C ONN(CCC[C@H](O)c1cnccc1)C ONN(C(O)Oc1c(OC(C)C)cccc1)C FC(F)(F)CCCN(NO)CCCC(F)(F)F ONN(C[C@@H](O)CO)C[C@@H](O)C ONN(C[C@@H](O)CO)CCO ONN1C[C@H](N([C@@H](C1)C)C(O)c1ccccc1)C ClCCNC(O)N(NO)CCO ClCCNC(O)N(NO)C[C@@H](O)C ONN(C[C@@H](O)C)CCO ONN(C(O)N)CC(C)C ONN(CCCCCCCCCCCC)C Fc1ccc(cc1)N(NO)C ONN(CCc1ccccc1)C ONN(CCCCCCCCCCCCCC)C ONN(CCCCCCCCCC)C OC(N(NO)C)N ONN1CN(NO)CCC1 FC(F)(F)CN(NO)CC ONN(CC(O)C)CCC ONN(CCC)CCO s1c(nc2c1cccc2)NC(O)N(NO)C ONN(C[C@@H](O)C)C[C@@H](O)C OCCN(NO)CCO ONN(CC)CC ONN(C)C ONN(CC)C ONN1C(O)NC(O)C1 ONN(CCO)C ONN(CCCO)C ONN(C[C@@H](O)C)C ONN(c1ncccc1)C c1cccc(c1)N(NO)C ONN(CCCCCCCCCCC)C ONN1[C@@H](c2cnccc2)CCC1 ONN1CCNCC1 S1[C@H](S[C@H](N([C@@H]1C)NO)C)C S1CCN(CC1)NO ONN(C(O)c1c(C(O)N(NO)C)cccc1)C ONN1CN2CN(C1)CN(NO)C2 FC(F)(F)CN(NO)CC(F)(F)F S(Cc1nc[nH]c1C)CCN/C(N/C#N)/N(NO)C ONN1CC(CCC1)C(O)OC

2433

experiment

prob

1

1

1 1

1 1

1 −1 1

1 −1 1

1 1 1

1 1 1

1 −1 1

1 1d 1

−1

−1

−1

−1

1 1 −1

1 −1d −1

1

1

−1 −1

1d −1

1 −1 −1 −1 −1 1 −1 −1 1 1 1 −1 1 1

1 −1 −1 −1 −1 1 −1 −1 −1d 1 1 −1 1 −1d

1

1

dx.doi.org/10.1021/tx3002912 | Chem. Res. Toxicol. 2012, 25, 2432−2442

Chemical Research in Toxicology

Article

Table 1. continued classification

a

no.

CAS

60a 61a 62c 63c 64c 65a 66c 67a 68a 69b 70b 71b 72a 73a 74a 75a 76b 77a 78a 79a 80a 81a 82a 83a 84b 85a 86a 87a 88a 89a 90a 91c 92a 93a 94a 95a 96b 97a 98b 99a 100b 101a 102a 103a 104a 105a 106a 107a 108c 109c 110a 111b

25081-31-6 7519-36-0 110559-85-8 89837-93-4 869-01-2 16338-97-9 759-73-9 96724-45-7 36702-44-0 64091-91-4 145438-96-6 38347-74-9 55556-92-8 29929-77-9 75881-18-4 88208-16-6 91308-70-2 1133-64-8 924-16-3 86-30-6 621-64-7 40580-89-0 17608-59-2 614-95-9 20917-49-1 61445-55-4 86451-37-8 55984-51-5 59-89-2 100-75-4 930-55-2 816-57-9 62641-67-2 4515-18-8 13256-06-9 17721-95-8 6130-93-4 46061-25-0 6238-69-3 55556-86-0 34993-08-3 63441-59-8 16339-07-4 6335-97-3 5336-53-8 16339-01-8 3398-69-4 16339-18-7 13980-04-6 947-92-2

SMILES ONN(CC(O)O)CC(O)O ONN1[C@@H](C(O)O)CCC1 ClCCNC(O)N(NO)CC(O)C ONN(C(O)N)CC(O)C OC(N(NO)CCCC)N ONN(CCC)CCC OC(N(NO)CC)N ONN(C(O)NCC)CCO ONN1[C@H](CCCC1)C ONN(CCCC(O)c1cnccc1)C ONN(c1ncn(c1C(O)NC)C)C ONN1C(O)OCC1 ONN1CCCCC1 ONN1c2c(C(CC1(C)C)C)cccc2 ONN1C[C@H](N([C@@H](C1)C)C)C ONN(CCC)C[C@@H](O)CO ONN(CCC)C[C@@H](O)C ONN1[C@H](CCCC1)c1cccnc1 ONN(CCCC)CCCC ONN(c1ccccc1)c1ccccc1 ONN(CCC)CCC ONN1CCCCCCCCCCCC1 ONN([C@@H]([C@H](O)c1ccccc1)C)C O(CC)C(O)N(NO)CC ONN1CCCCCCC1 ONN(CCCC(O)O)C ONN(C[C@@H](O)CO)C ONN(CC(O)C)C O1CCN(NO)CC1 ONN1CCCCC1 ONN1CCCC1 ONN(C(O)N)CCC ONN1C(O)NC(O)[C@@H](C1)C ONN1[C@@H](C(O)O)CCCC1 CCCCCN(NO)CCCCC C1CCC(N(C1C)NO)C N1(C(CCCC1(C)C)(C)C)NO N1(CCC(CC1)C(C)(C)C)NO ONN1CCC(CC1)C(O)O N1(C(CCC1C)C)NO C1(CC(N(C1)NO)C(O)O)O ONN1C(C(OCC1)c1ccccc1)C ONN1C(C(N(C(C1C)C)NO)C)C ONN1CCN(CC1)C ONN(CCCCCCCC)CCCCCCCC ONN(Cc1ccccc1)Cc1ccccc1 ONN(c1ccc(cc1)/NN/c1ccccc1)C ONN(C(C)(C)C)CC N(CC#N)(CC#N)NO ONN1CN(NO)CN(C1)NO N1(NO)c2c(CC1)cccc2 ONN(C1CCCCC1)C1CCCCC1

experiment

prob

1 1

1 1

1

−1d

1 1 1 1 −1 −1 1 1 1 −1 1 −1 1 −1 −1 −1 1 −1 1 −1 −1 −1 −1 −1

1 1 1 1 1d −1 1 1 1 −1 1 −1 1 −1 1d 1d 1 −1 −1d −1 −1 −1 −1 −1

1 1 −1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 −1 1 1 1 1 1 1 1 1 1 −1d 1 1 1

1 1

1 1

Training set. bTest set. cOutlier. dMisclassified one.

relatively small data set of N-nitrosodialkylamines. Therefore, using QSAR to extract molecular information from a relatively large data set of NOCs and recover general molecular features associated with the liver specificity of the carcinogenicity of most NOCs is highly significant.

study is highly challenging. However, the liver is the most important metabolic organ. Edelman et al.10 reported a mathematical model that describes the selectivity between the liver and other target organs for a series of carcinogenic Nnitrosodialkylamines. However, this model was developed from a 2434

dx.doi.org/10.1021/tx3002912 | Chem. Res. Toxicol. 2012, 25, 2432−2442

Chemical Research in Toxicology

Article

The first step in QSAR modeling involves the representation of molecular structures by numbers termed molecular descriptors. These descriptors are usually computerized using quantum-chemical, graph-theoretic, information-theoretic, or geometric approaches.11 The topological substructural molecular design (TOPS-MODE) approach12 has been proven useful for the mathematical description of molecular designs in our previous study6 and those of others.13−15 However, the success of QSAR computational models depends on the molecular descriptors selected to characterize the chemical structure and on the use of appropriate statistical methods. The genetic algorithm (GA), which is an intelligent exploitation of a random search within a defined searching space to solve an optimization problem, shows very good performance.16−18 Hence, we used linear discriminant analysis (LDA) combined with GA to investigate the liver specificity of NOCs in this study. We performed a QSAR analysis of a data set of NOCs (bioassayed in rats) selected and refined from the Carcinogenic Potency Database (CPDB), Luan et al.,5 and Yuan et al.6 The TOPS-MODE approach was used to generate a discriminant function using LDA coupled with GA to classify the compounds as nonliver- or liver-carcinogenic NOCs. The model was assessed using various statistical parameters and an external validation set. Substructural contributions to liver specificity were finally analyzed. The results demonstrated that this approach was suitable for classifying compounds as nonliver- or livercarcinogenic-active NOCs. Information on the molecular substructure was also provided to investigate the mechanisms of NOC-related liver specificity from the chemical−molecular perspective.



decomposition to reduce its data space to the affine subspace spanned by the number of observations. Second, in every data point, the outlyingness was calculated by projecting the high-dimensional data points on many univariate directions. Finally, the data points were projected onto the k-dimensional subspace (k is selected as the number of principal components to keep) spanned by the k largest eigenvectors, with their center and shape computed using the reweighted minimum covariance determinant estimator. The threshold values of the score (SD) and orthogonal (OD) distances from ROBPCA were used to identify the outlier. For more information on SD, OD, and threshold values, see ref 20. Removing outliers from the data set, we used the remaining compounds in the following modeling. Set Generation. Among random selection, D-optimal design, Kennard and Stone algorithm, and DUPLEX algorithm, the DUPLEX algorithm is apparently the best technique to select representative training and test data sets.22 Therefore, this algorithm was used in this article to divide the remaining compounds mentioned above into training (60) and test (20) sets. The distribution of the training and prediction test set compounds is shown in Table 1. Two model validation methods were used in this study, i.e., the leave-one-out crossvalidation and the hold-out method, in which the whole training set was used to build the final classification model, and the independent test set was used to test the predictive ability of the final classification model. QSAR Modeling. To determine the relationship between the chemical structure and liver specificity of NOCs, graph-based molecular descriptors were computed according to the TOPS-MODE theoretical approach (http://www.modeslab.com). Briefly, this approach codifies the molecular structure using the bond adjacency matrix E and computes spectral moments of the bond matrix. Spectral moments are defined as the trace (i.e., the sum of the main diagonal entries) of the different powers of the bond adjacency matrix E, which is a square symmetric matrix whose nondiagonal entries are one or zero, depending on whether the corresponding bonds share one atom. The diagonal elements of the adjacency matrix E are weighted, with the weights being chemically meaningful numbers such as bond distances, bond dipole, bond polarizabilities, or even mathematical expressions involving atomic weights.13,14,23,24 In this work, the bond weights included the atomic weight (AW), bond dipole moments (Dip and Dip2), bond distance (Dis), polar surface area (Pols), polarizability (Pol), hydrophobicity (H), molar refractivity (MR), Gasteiger−Marsilli atomic charges (Gas), van der Waals atomic radii (vdW), and Abraham molecular descriptors (Ab). The latter included the excess molar refraction (Ab-R2), the combined dipolarity/polarizability (Ab-πH2 ), the solute gas-hexadecane partition coefficient (Ab-logL16), and the total solute hydrogen bond basicity. For the partition of a solute between water and nonaqueous solvent systems, Ab-∑βH2 was used, whereas for the partition between water and aqueous solvent systems, an alternative Ab-∑β02 was used.25 The atomic properties were converted into bond weight contributions w(i, j) according to the following equation, as described by Estrada et al.26

MATERIALS AND METHODS

Data Set. The target organ data set of 111 NOCs was carefully selected from the report of Yuan et al.,6 Luan et al.,5 and the target organ table in the CPDB established in the CRC Handbook of Carcinogenic Potency and Genotoxicity Databases and at http://potency.berkeley. edu/index.html. The data set from rats was considered because the tests in these animals appear to be more reproducible than those in mice.19 The data set of 111 NOCs is shown in Table 1, in which the chemicals listed induced or did not induce tumors in certain target organs. Each chemical was categorized as a hepatocarcinogenic NOC if it can induce liver cancer in rats according to the Summary of Carcinogenic Potency Database by Target Organ (http://potency.berkeley.edu/pathology. table.html). Otherwise, the chemical was categorized as a nonhepatocarcinogenic NOC. To develop the classification function, the values of 1 and −1 were assigned to nonliver and liver-carcinogenic NOCs, respectively. Thus, 32 NOCs were classified as group −1, and the rest were classified as group 1. Group 1 included the carcinogenic chemicals that can induce tumors at more than 20 target sites (including the bone, esophagus, hematopoietic system, lung, ear/Zymbal’s gland, kidney/ureter, large intestine, mesovarium, mammary gland, nasal cavity, nervous system, oral cavity, pancreas, peritoneal cavity, prostate, skin, small intestine, stomach, testes, thyroid gland, urinary bladder/ urethra, uterus, vagina, and vascular system) and noncarcinogenic chemicals. ROBPCA Screening. Outliers may be present in the entire data set including nitrosomonoalkylureas, nitrosodialkylureas, N-nitrosamines, and cyclic N-nitrosamines, and even those without different substituents. To build a QSAR model that is not considerably influenced by outliers, a preliminary analysis of the quality of the data set was performed using the ROBPCA method.20 The algorithm of ROBPCA combines robust covariance estimation with projection pursuit techniques and is suited for multivariate calibration as well as classification.21 The algorithm can be briefly concluded as comprising three major steps, and the detailed procedure is found in the literature.20,21 First, the data was analyzed using a singular value

w(i , j) =

wj wi + δi δj

(1)

where w and δ represent the atomic weight and the vertex degree of the atoms i and j, respectively. The TOPS-MODE descriptors were calculated using MODESLAB 1.5 software.27 The chemical structures were encoded into the MODESLAB software as simplified molecular input line entry specification (SMILES).28 The first 15 spectral moments (μ1−μ15) were calculated for each bond weight and the number of bonds in the molecules (μ0), excluding the hydrogen atoms. Considering the nonlinearity of the biological process (carcinogenic activity) under study, the cross-terms between μ0 and μ1 with all variables were also evaluated. The LDA approach was used to establish a discriminant model. The coefficients and statistical parameters of the discriminant function were obtained using the LDA implemented in SPSS. The default parameters of this program were used in the development of the model. The procedure for establishing an LDA model is described elsewhere.29,30 Variable Selection. The use of LDA in the classification of TOPSMODE descriptor data usually requires appropriate variable selection 2435

dx.doi.org/10.1021/tx3002912 | Chem. Res. Toxicol. 2012, 25, 2432−2442

Chemical Research in Toxicology

Article

inside a squared area within ±3 standard deviations and a “warning leverage” h* (h* = 3(p+1)/n, with p as the number of model parameters and n as the number of training compounds) as stated by Gramatica.33

procedures.31 GA, which is derived from Darwin’s theory of natural selection and controlled by biological evolution rules, is a highly efficient and extensively applied optimization algorithm.16−18 Thus, this study used GA to perform variable selection for classification using LDA. A GA formulation using binary chromosomes was adopted. Each of the TOPSMODE descriptor variables was associated with a position in the chromosome. The chromosome values can either be 1 (variable is included in the model) or 0 (variable is excluded in the model). To apply the biological evolution rules, a fitness value as well as crossover and mutation operators was assigned. The fitness was calculated as the LDA accuracy. Crossover and mutation operators were employed with 100% and 4% probabilities, respectively. The GA was conducted for 500 generations. ROBPCA, DUPLEX, and GA variable selection was implemented in MATLAB. Model Evaluation. Various diagnostic statistical tools were used to evaluate our model equations in terms of their goodness-of-fit and goodness-of-prediction. Measures of goodness-of-fit were estimated using standard statistics such as Wilk’s lambda (λ), Mahalanobis distance (D2), Fisher’s ratio (F), and the corresponding p level (p). Wilk’s λ statistic implies perfect discrimination for λ = 0 and the absence of discrimination when λ = 1. The Mahalanobis distance indicates the separation of the respective groups, i.e., whether the model possesses adequate discriminatory power to differentiate between the two respective groups. The goodness-of-prediction of the discriminant models was assessed by external validation. The validation of the models with compounds not used in the model setup was a crucial step to ensure generalization and was also highly relevant for future QSAR studies. Bond Contributions. One of the main advantages of the TOPSMODE approach over other traditional QSAR methods stems from its substructural nature, which enables QSAR models to be transformed into a bond-additive scheme. Accordingly, the end point activity is described as the total of bond contributions related to the different structural fragments for the molecule in question.31 The characterization of such bonds enables the identification of the groups or fragments of a molecule that are responsible for its biological activity. The fragments of a given molecule that positively or negatively contribute to the underlying activity can also be detected, and their effects can be interpreted in terms of physicochemical properties. Bond contributions are based on the local spectral moments, which are defined as the diagonal entries of the different powers of the weighted bond matrix

μkT (i)

k

= bii(T )



RESULTS Data Preparation for QSAR. The collected data set of 111 NOCs included N-nitrosamine, nitrosodialkylureas, nitrosomonoalkylureas, and cyclic N-nitrosamine, among others. The study of the liver-specific carcinogenicity of NOCs is more complex than that of carcinogenicity. Therefore, eliminating the influence of outliers is important. To avoid such influence as much as possible, the data set was analyzed using ROBPCA. The optimal number of principal components (PCs) was then selected where the kink in the curve appears and by the cumulative percentage of explained variance. Thus, 3 was the optimal number of PCs in Figure 1. The three PC ROBPCA model explained the 99.9%

Figure 1. Scree plot of the 111 NOCs data set with ROBPCA.

(2)

μTk (i)

denotes the kth local moment of the bond i, bii(T) are where diagonal entries of the weighted bond matrix, and T is the type of bond weight (H, Dip, Dip2, Pol, and so on). For a given molecule, the values of the local spectral moments computed by eq 2 can be substituted into eq 3, and the total contribution to the biological activity of its different bonds can thus be calculated as p = ao +

∑ akμkT k

variance in the mean-centered data. Applying a 97.5% confidence level, we determined the threshold values for the SD and OD of each sample to identify the anomalous NOCs. The sample was considered as an outlier if its SD or OD exceeded the corresponding threshold value. Finally, 31 NOCs were considered as outliers (the SD and OD values of 31 outliers are listed in Supporting Information). Determining the factor that classified the 31 NOCs as outliers was challenging. However, 13 out of the 31 NOCs were nitrosomonoalkylureas, which are not very stable, having half-lives at pH 7 ranging from a few minutes to a few hours. Eleven out of the 31 NOCs contained heteroatoms such as fluorine, chlorine, and sulfur. Seven out of 31 NOCs contain anomalous groups such as the cyano group, among others. To build a reliable and general model, these 31 chemicals were removed from the data set, and the remaining 80 NOCs (42 N-nitrosamines, 29 cyclic nitrosamines, 5 Nnitrosamides, and 4 nitrosodialkylureas) were divided into training (60) and test (20) sets by the DUPLEX algorithm and then used in the subsequent modeling. Discriminant Model. The first classification model derived from the training set, produced by combining the LDA and GA techniques along with two TOPS-MODE descriptors, is given below, together with the statistical parameters of the LDA.

(3)

These contributions represent the additive features of the property modeled, and they can be expressed as fragment contributions, the total contributions of the different bonds that are inside the substructure whose contribution is under examination. Applicability Domain (AD) of the Models. The AD is an important concept in QSAR that allows the estimation of the uncertainty in the prediction of a particular molecule based on its similarity to the compounds used to build the model. Several methods in assessing the AD of QSAR/QSPR models have been reviewed by Netzeva et al.32 The leverage approach, the most widely used distancebased measure, was used in this article. The leverage value (hi) of a chemical in the original variable space is defined as follows:

hi = xi T(XTX )−1xi(i = 1, ..., n)

(4)

where xi is the descriptor vector of the considered compound, and X is the model matrix derived from the training set descriptor values. The leverage values for every compound against the standard residuals (σ) were plotted. In this so-called William’s plot, the AD was established 2436

dx.doi.org/10.1021/tx3002912 | Chem. Res. Toxicol. 2012, 25, 2432−2442

Chemical Research in Toxicology

Article 0

class = −6.470 + 7.806 × 10−2μ4Ab −∑ β2 − 1.052 × 10−5μ0 μ7Dis N = 60 = 0.496 p < 10−8

D2 = 4.427

F(2,57) = 29.007 (5)

In eq 5, the two variables were high-order spectral moments. These variables can be mathematically colinear, and overfitting of the results can occur. The degree of colinearity of the selected variables of the model can be diagnosed by analyzing the crosscorrelation matrix, which found that the two descriptor variables were highly correlated with each other. The Randić orthogonalization procedure was performed to overcome this problem. This method is described in detail elsewhere.34−36 The new orthogonalized variable had the nomenclature: mΩμwk , where Ω indicates orthogonality, m is the degree of importance of the descriptor to explain the property determined by the order, μk is the kth spectral moment, and w is the bond weight used. After the Randić orthogonalization procedure, the mathematical colinearity of the two new orthogonalized variables was eliminated. Following the strategy outlined above, the discriminant model including two orthogonalized variables is given below, together with the statistical parameters. This model demonstrates a reasonable level of statistical significance.

Figure 2. Receiver operating characteristic (ROC) curve for eq 6.

All of the validation procedures and statistical parameters confirmed the predictive power of eq 6, which was therefore considered to have good statistical power and be suitable for calculating bond contributions in relation to mechanisms. For this chemical set, the variables in the LDA model (eq 6) encoded specific structural information. The most influential descriptors were those weighted with the bond distance (Dis) as well as the Abraham solute descriptor partition between water and aqueous solvent systems (Ab-∑β02). The simpler term 0 2 μAb‑∑β was included in the QSAR model, i.e., the spectral 4 moment of the fourth length, indicating that the structural fragments corresponding to the self-returning walk of length 4 was important for the liver specificity of this set of NOCs. The 0 2 descriptor μAb−∑β was related to the partition coefficient in body 4 fluid and had a positive coefficient. This result was consistent with an earlier observation indicating a rough correspondence between partition coefficients and organ distribution for a relatively smaller set of nitrosamines.37 The descriptor μ0μDis 7 provided information on molecular size and showed a negative coefficient. This result was in accordance with a report suggesting that, similar to unsubstituted nitrosamines, the differences between structures in terms of carcinogenic activity can be profound in nitrosamines substituted with other atoms or groups in one or both alkyl chains. Such differences can range from activity abolition to changes in the target organ and increased carcinogenic activity.38 Nitroso compounds are known genotoxic agents. These chemicals must undergo transport and distribution in vivo, pass through liver cell membranes, and generate alkylating agents that alkylate DNA in target organs. The metabolic activation of nitrosamines and cyclic nitrosamines is mediated by the oxidation of the α-carbon to the N-nitroso group. However, Nnitrosoureas act directly because they do not need to be biotransformed to be carcinogenic. Interestingly, the two descriptors in eq 6 can interpret these two important steps. 0 2 The descriptor 1ΩμAb−∑β is related to transport and distribution. 4 2 Dis The descriptor Ωμ0μ7 , which contains information on the size and shape of compounds, influences their transport properties through a biological system as well as their steric hindrance in terms of interactions with enzymes. AD. The AD of eq 6 was analyzed, and the results are shown in Figure 3. One compound in the training set, i.e., compound 48, was out of the AD of the model similar to another compound from the test set, i.e., compound 19, as a result of their leverage

0

class = −2.639 + 2.024 × 10−2 × 1Ωμ4Ab −∑ β2 − 1.052 × 10−5 × 2Ωμ0 μ7Dis N = 60 λ = 0.496 p < 10−8

D2 = 4.427

F(2,57) = 29.007 (6)

The large F index and small p value indicate the statistical significance of the model. The values of the Wilks statistic (λ) and the Mahalanobis distance (D2) show that the model displays sufficient discriminatory power for differentiating among groups. After applying eq 6 to the training set (60 NOCs), 17 out of 20 compounds were classified correctly as group −1 (85.0% accuracy), and 36 out of 40 were correctly classified as group 1 (90.0% accuracy). The sensitivity (e.g., ability to identify correctly group −1) and specificity (e.g., ability to identify correctly group 1) of eq 6 were 85.0% and 90.0%, respectively. In the test set, consisting of 20 NOCs chosen, 6 out of 8 were classified correctly as group −1 NOCs (75.0%) and 10 out of 12 as group-1 NOCs (83.3%). The leave-one-out cross-validation of the training set showed that 17 out of 20 group −1 NOCs (85.0%) and 36 out of 40 group −1 NOCs (90.0%) were correctly classified. Overall, the value of correct classification with cross-validation was 88.3%. The sensitivity and specificity of the model for the test set were 75.0% and 83.3%, respectively. The global classification for the whole data set was 86.3%. For easy visualization of the performance of eq 6, the results are expressed as a receiver operating characteristics (ROC) graph (Figure 2). An ROC graph reports the true positive rate (sensitivity) on the y-axis and the false positive rate (1specificity) on the x-axis. In an ROC graph, the present QSAR model (eq 6) had good specificity and accuracy. A better threshold (−0.740) for a priori classification probability can be estimated using an ROC curve. An ROC curve is depicted in Figure 2 with the area under the curve (0.926) markedly higher than 0.5, the area under the curve expected for a random classifier (diagonal line). 2437

dx.doi.org/10.1021/tx3002912 | Chem. Res. Toxicol. 2012, 25, 2432−2442

Chemical Research in Toxicology

Article

biotransformation of N-nitrosamine, N-nitrosurea, and cyclic N-nitrosamine.38,41−43 These toxicophores are indicated by a dotted line in the figures. To compare the molecular structures and interpret the rationality of eq 6, several molecular representations and substructural (fragment) contributions of hepatocarcinogenic and nonhepatocarcinogenic N-nitrosamines (Figures 4 and 5, respectively) and hepatocarcinogenic and nonhepatocarcinogenic cyclic ones (Figures 6 and 7, respectively) are shown. The contributions of the alkyl chain to the liver specificity of the hepatocarcinogenic dialkylnitrosamines 42, 41, 80, 78, and 94 (Figure 4) increased with increasing chain length, and the contribution of the nitrosamino also increased. Consequently, the total contributions of the five dialkylnitrosamines show an increasing trend, leading to the inference that nitrosodi-noctylamine was nonhepatocarcinogenic, consistent with the study of Lijinsky and Taylor.44 A similar increasing trend was observed for the cyclic nitrosamines 90, 89, and 84 (Figure 6). Thus, the three-carbon nitrosoazetidine (nitrosoazetidine was not included in the training and test sets), the simplest cyclic nitrosamine, can induce liver cancer, whereas the larger cyclic nitrosamine nitrosododecamethyleneimine (nitrosododecamethyleneimine was not included in the training and test sets) was a relatively weaker inducer of liver cancer, consistent with the study by Lijinsky.38 This phenomenon indicated that eq 6 can be used to extrapolate nonliver- and liver-carcinogenic NOCs. Few studies have traced the tissue disposition of nonmetabolized N-nitrosamines. However, Shephard et al.45 reported that primary nitrosamines were sufficiently stable to penetrate the cell membrane, whereas Tjälve41 found that volatile nonpolar N-nitrosamines were evenly distributed throughout the body, freely passed through cellular membranes, and distributed in the intra- and extracellular tissue water. Studies on N-nitrosodiethanolamine (i.e., chemical 40), a polar Nnitrosamine, indicated that this compound can freely permeate the cellular membranes of all tissues in rats, except the blood− brain barrier.41 Most organ-specific effects of carcinogens were not caused by the preferential uptake of the unmetabolized carcinogens in specific organs,1 suggesting that liver specificity examined in the current study was not related to the tissue disposition of nonmetabolized N-nitrosamines. However, this result implies that the tissue disposition of nonmetabolized Nnitrosamines or their metabolites, which can undergo further bioactivation, was insignificant in terms of the tissue specificity of the carcinogenicity of these compounds. 41 Before the formation of a DNA-alkylating agent, nonmetabolized N-nitrosamines must undergo an α-hydroxylation reaction catalyzed by microsomal enzymes dependent on cytochrome P-450. Several of these reactions may have overlapping substrate specificities. Cytochrome P-450 enzymes exist in hepatic and most extrahepatic tissues engaged in the metabolism of N-nitrosamines; variations in the forms and organ distribution of these enzymes indicate that the bioactivation of Nnitrosamines may proceed at different catalytic rates in different tissues. A cytochrome P-450 enzyme may also not be designed specifically to metabolize nitrosamines, and the relative fit of the molecule into some receptor (e.g., an active site of an enzyme) is an important factor that determines the activation of the nitrosamine to a proximate carcinogenic form or reactive intermediate. In terms of the liver tissue of rats in the current study, the existing form, level, and catalytic rate of the cytochrome P-450 enzymes indicate that only the N-nitrosamine molecular structures that fit can be α-hydroxylated.

Figure 3. Williams plot based on eq 6, that is, a plot of the standardized residuals versus leverage values, with a warning leverage of 0.15.

values. However, the two compounds were not excluded as outliers and considered as influential chemicals because the standardized residuals of both compounds were smaller than 3 standard deviation units (3σ).32,33



DISCUSSION Equation 6 provides a sophisticated method that physicochemically interpreted the influence of these complex TOPS-MODE descriptors on liver specificity by simple inspection of LDA coefficients. However, one of the main advantages of the TOPSMODE approach exploited in this study was its ability to identify the quantitative contribution of any kind of substructure to the studied property. Thus, the LDA model can be transformed into a bond-additive scheme, and a liver-specificity discrimination value can be calculated as the sum of the bond contributions for each molecule. These quantitative contributions of bond or substructure can provide valuable insights into the mechanism of organ specificity and the influences of these complex TOPSMODE descriptors. The discrimination threshold total value to classify the compounds in the current study was −0.740, i.e., if the model yielded a value less than −0.740, the compound was considered to be a liver-carcinogenic NOC. Otherwise, it was classified as a nonliver-carcinogenic NOC. At physiological pH (7.4), N-nitrosoureas undergo decomposition without enzymes by amide bond breakage to generate unstable carbamic acid and carbonium ions.39,40 The latter are related to intracellular and intercellular alkylation, as well as to carbamoylation reactions of various biological macromolecules, including DNA. In contrast to nitrosoureas, N-nitrosamines require metabolic activation to exert their carcinogenic effects. The key activation step is the hydroxylation of the α-carbon to the nitroso group catalyzed by cytochrome P450. The α-hydroxyl intermediate is unstable and can spontaneously decompose to the appropriate aldehyde as well as diazohydroxide fragments, which then yield diazonium ions or the alkylating carbocation. To understand the role of the various molecular structural factors responsible for liver specificity, several homologous chemicals that were well classified by eq 6 are discussed. Other chemicals that were not included in this study were similarly studied. The discussed chemicals were divided into toxicophores (a group, fragment, or a region of a molecule associated with liver specificity) according to the general mechanisms for the 2438

dx.doi.org/10.1021/tx3002912 | Chem. Res. Toxicol. 2012, 25, 2432−2442

Chemical Research in Toxicology

Article

Figure 4. Molecular representations of hepatocarcinogenic N-nitrosamines and their bond contributions.

Figure 5. Molecular representations of nonhepatocarcinogenic N-nitrosamines and their bond contributions.

Comparing several molecular pairs in Figures 6 and 7 (pair 1, chemicals 90 and 99; pair 2, chemicals 89 and 93; pair 3, chemicals 89 and 95; pair 4, chemicals 89 and 98; pair 5, chemicals 89 and 88; and pair 6, chemicals 90 and 61), we found that the hydrogen atoms on the N-nitrosodialkylamines or cyclic nitrosamine rings were substituted with an electron-donating group/atom or electron-withdrawing one and that the

distribution of the electron cloud of the molecule was altered. Consequently, the contributions of the molecular substructure changed (increased or decreased) compared with that of their unsubstituted parents. This phenomenon strongly implied the interaction among groups within the same molecule. Comparing the substructure contributions of some molecular groups in Figures 4 and 5 (the methyl group in the group-1 2439

dx.doi.org/10.1021/tx3002912 | Chem. Res. Toxicol. 2012, 25, 2432−2442

Chemical Research in Toxicology

Article

ylation is a major activation pathway) with the help of cytochrome P-450 enzymes, leading to electrophilic diazohydroxide intermediates that may act as ultimate carcinogens. Different forms of cytochrome P-450 have different affinities for cyclic nitrosamines with different molecular sizes, leading to the activation by a particular enzyme for which the structure has an affinity. Thus, organ-specific enzymatic activation for a given cyclic nitrosamine determines which tissue develops the tumor. Different forms of cytochrome P-450 are known to catalyze particular hydroxylation reactions, and the distribution of these forms is species and organ dependent. Thus, a given nitrosamine substrate (including nitrosamines and cyclic nitrosamines) has a dominant feature that determines its reaction with some receptor in a particular type of cell in a particular organ.38 In other words, a given nitrosamine substrate chooses the specific enzyme in a particular organ. In this case, only nitrosamines with a molecular contribution of less than −0.740 can adequately fit well the corresponding enzyme in the liver of rats and then be metabolized to generate the ultimate carcinogen. Nitrosoureas and nitrosamides require no metabolic activation before their carcinogenic activities are elicited. Some factors such as organ-specific removal of miscoding bases or cell- and pharmaco-kinetic parameters may decide in which tissue a tumor develops. However, interpreting these biological factors exceeds the ability of QSAR because QSAR only exploits existing chemical knowledge to investigate the interactions between chemicals and living organisms. In summary, a comprehensive analysis of the contributions of toxicophores of the molecules can aid in understanding the liver specificity of carcinogenicity of NOCs. The following two conclusions are drawn: For nitrosamines and cyclic nitrosamines, the liver-specific distribution of an activating enzyme is an important factor affecting liver specificity. The entire molecule, not only a substructure, is involved in the interplay of molecules and corresponding activating enzymes. For nitrosoureas and nitrosamides, some biological factors such as target-tissue metabolism, cellular replication, and rates of repair of specific carcinogen−DNA adducts may be important factors affecting liver specificity.

Figure 6. Molecular representations of hepatocarcinogenic cyclic Nnitrosamines and their bond contributions.

hepatocarcinogenic chemicals 42, 43, 32, 50, 45, and 46, as well as the nonhepatocarcinogenic chemical 49; 2-hydroxypropyl group in the group-2 hepatocarcinogenic chemicals 26 and 76, as well as the nonhepatocarcinogenic chemical 21; and the 2,3-dihydroxypropyl group in the group-3 hepatocarcinogenic chemical 86, as well as the nonhepatocarcinogenic chemicals 75 and 21) demonstrated that the same groups (fragment or substructure) had positive or negative contributions to liver cancer because they are affected by other groups within the same molecule. On the basis of the results obtained by comparing the contributions of the same groups in hepatocarcinogenic and nonhepatocarcinogenic N-nitrosamines, as well as the same groups that had positive or negative contribution to liver cancer in hepatocarcinogenic and nonhepatocarcinogenic N-nitrosamines, one can infer that specific substructures responsible for liver specificity do not exist and that hepatocarcinogenicity is deemed to be determined by the global effect (i.e., the total contributions of all fragments in the molecules). However, Nnitrosamines require metabolic activation by cytochrome P-450 enzymes to exert their carcinogenic effects, suggesting that the relative fit of a molecule and an enzyme play a key role. In other words, tissue-specific enzymes that had different substrate specificities determine tumor development in response to Nnitrosamines, which agreed with the fact that the target organs of the carcinogenic N-nitrosodialkylamines can be directly associated with the molecular structure and that selectivity is dependent on a relatively complex interplay of the parent molecule and the metabolites within the exposed organism.10 Cyclic nitrosamines also need to undergo metabolic αhydroxylation, β-hydroxylation, or γ-hydroxylation (α-hydrox-

Figure 7. Molecular representations of nonhepatocarcinogenic cyclic N-nitrosamines and their bond contributions. 2440

dx.doi.org/10.1021/tx3002912 | Chem. Res. Toxicol. 2012, 25, 2432−2442

Chemical Research in Toxicology



On the basis of these results, the liver-specific carcinogenicity of NOCs can be attributed to both chemical and biological factors. The TOPS-MODE approach can be used to provide quantitative bond data and investigate some details concerning the liver specificity of NOC carcinogenicity from the chemical− molecular viewpoint.

CONCLUSIONS The approach used in this study does not only discriminate and extrapolate nonliver- and liver-carcinogenic NOCs but also enables the visualization of the impact of various structural modifications on the contributions of molecule substructures. An LDA model is also found to be an effective tool for studying the main chemical factors that determine the liver specificity of the carcinogenic effects of NOCs. The LDA model indicates that the bond distance, as well as the Abraham solute descriptor partition between water and aqueous solvent systems, is a potentially important chemical factor that determines the liver specificity of NOCs. Several biological factors such as the capacity for enzymatic repair of the DNA lesions, target-tissue metabolism, and cellular replication also appear to be related to the liver specificity of NOCs. Thus, the organ specificity of NOCs can be attributed to a combination of chemical and biological factors. In this article, the TOPS-MODE approach is used to obtain information related to the chemical structure to interpret and understand the liver-specific carcinogenicity of NOCs, as well as demonstrate the limitations. ASSOCIATED CONTENT

S Supporting Information *

SD and OD values of 31 outliers. This material is available free of charge via the Internet at http://pubs.acs.org.



REFERENCES

(1) Dong, Z., and Jeffrey, A. M. (1990) Mechanisms of organ specificity in chemical carcinogenesis. Cancer Invest. 8, 523−533. (2) Leong, M. K., Lin, S., Chen, H., and Tsai, F. (2010) Predicting mutagenicity of aromatic amines by various machine learning approaches. Toxicol. Sci. 116, 498−513. (3) Li, Y., Liu, J., Pan, D., and Hopfinger, A. J. (2005) A study of the relationship between cornea permeability and eye irritation using membrane-interaction QSAR analysis. Toxicol. Sci. 88, 434−446. (4) Dai, Q. Y., Zhong, R. M., and Gao, X. M. (1987) Structure activity relationships of N-nitroso compounds using pattern recognition based on di-region theory. Environ. Chem. 6, 1−11. (5) Luan, F., Zhang, R., Zhao, C., Yao, X., Liu, M., Hu, Z., and Fan, B. (2005) Classification of the carcinogenicity of N-nitrosocompounds based on support vector machines and linear discriminant analysis. Chem. Res. Toxicol. 18, 198−203. (6) Yuan, J., Pu, Y., and Yin, L. (2011) Predicting carcinogenicity and understanding the carcinogenic mechanism of N-nitroso compounds using a TOPS-MODE approach. Chem. Res. Toxicol. 24, 2269−2279. (7) Helguera, A. M., Cordeiro, M. N. D. S., Pérez, M. Á . C., Combes, R. D., and González, M. P. (2008) Quantitative structure carcinogenicity relationship for detecting structural alerts in nitroso-compounds species: rat; sex: male; route of administration: water. Toxicol. Appl. Pharmacol. 231, 197−207. (8) Preussmann, R., and Wiessler, M. (1987) The enigma of the organspecificity of carcinogenic nitrosamines. Trends Pharmacol. Sci. 8, 185−189. (9) Soderman, J. V. (1982) CRC Handbook of Identified Carcinogens and Noncarcinogens: Carcinogenicity-Mutagenicity Database, Vol. 2, Target Organ File, CRC Press, Boca Raton, FL. (10) Edelman, A. S., Kraft, P. L., Rand, W. M., and Wishnok, J. S. (1980) Nitrosamine carcinogenicity: a quantitative relationship between molecular structure and organ selectivity for a series of acyclic N-nitroso compounds. Chem.-Biol. Interact. 31, 81−92. (11) Basak, S. C., Bertelsen, S., and Grunwald, G. (1994) Application of graph theoretical parameters in quantifying molecular similarity and structure-activity studies. J. Chem. Inf. Comput. Sci. 34, 270−276. (12) Estrada, E. (2000) On the topological sub-structural molecular design (TOSS-MODE) in QSPR/QSAR and drug design research. SAR QSAR Environ. Res. 11, 55−73. (13) Estrada, E., and Uriarte, E. (2001) Quantitative structure-toxicity relationships using TOPS-MODE. 1. nitrobenzene toxicity to tetrahymena pyriformis. SAR QSAR Environ. Res. 12, 309−324. (14) Estrada, E., Molina, E., and Uriarte, E. (2001) Quantitative structure-toxicity relationships using TOPS-MODE. 2. neurotoxicity of a non-congeneric series of solvents. SAR QSAR Environ. Res. 12, 445− 459. (15) Estrada, E., Uriarte, E., Gutierrez, Y., and González, H. (2003) Quantitative structure-toxicity relationships using TOPS-MODE. 3. Structural factors influencing the permeability of commercial solvents through living human skin. SAR QSAR Environ. Res. 14, 145−163. (16) Wang, G., Li, Y., Liu, X., and Wang, Y. (2009) Understanding the aquatic toxicity of pesticide: structure-activity relationship and molecular descriptors to distinguish the ratings of toxicity. QSAR Comb. Sci. 28, 1418−1431. (17) Mazzatorta, P., Cronin, M. D., and Benfenati, E. (2006) A QSAR study of avian oral toxicity using support vector machines and genetic algorithms. QSAR Comb. Sci. 25, 616−628. (18) Tapp, H. S., and Kemsley, E. K. (2008) Optimizing the efficiency of cross-validation in linear discriminant analysis through selective use of the Sherman−Morrison−Woodbury inversion formula. J. Chemom. 22, 419−421. (19) Gottmann, E., Kramer, S., Pfahringer, B., and Helma, C. (2001) Data quality in predictive toxicology: Reproducibility of rodent carcinogenicity experiments. Environ. Health Perspect. 109, 509−514. (20) Hubert, M., Rousseeuw, P. J., and Branden, K. V. (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47, 64−79.





Article

AUTHOR INFORMATION

Corresponding Author

*School of Public Health, Southeast University, 87 Dingjiaqiao, Nanjing, China, 210009. Tel: +86-25-83794996. Fax: +86-2583783428. E-mail: [email protected]. Funding

This work was supported by the State Key Program for Basic Research of China (Grant No. 2011CB933404), National Natural Science Foundation of China (NSFC, Nos. 81273122), and the Innovation Project for Graduate Student of Jiangsu Province (CXZZ12_0123). Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We are grateful to Professor E. Estrada (University of Santiago de Compostela) and Dr. Reinaldo Molina Ruiz (Universidad Central de las Villas) for their expert assistance with the software MODESLAB 1.5.



ABBREVIATIONS TOPS-MODE, topological substructural molecular descriptors; NOCs, N-nitroso compounds; QSAR, quantitative structure activity relationship; LDA, linear discriminant analysis; GA, genetic algorithm; CPDB, Carcinogenic Potency Data Base; MCD, minimum covariance determinant; SD, score distance; OD, orthogonal distance; SMILES, simplified molecular input line entry specification; AD, applicability domain 2441

dx.doi.org/10.1021/tx3002912 | Chem. Res. Toxicol. 2012, 25, 2432−2442

Chemical Research in Toxicology

Article

(21) Hubert, M., Rousseeuw, P., and Verdonck, T. (2009) Robust PCA for skewed data and its outlier map. Comput. Stat. Data Anal. 53, 2264− 2274. (22) Ren, Y., Liu, H., Yao, X., and Liu, M. (2007) Prediction of ozone tropospheric degradation rate constants by projection pursuit regression. Anal. Chim. Acta 589, 150−158. (23) Estrada, E., and Uriarte, E. (2001) Recent advances on the role of topological indices in drug design discovery research. Curr. Med. Chem. 8, 1573−1588. (24) Estrada, E. (1995) Edge adjacency relationships and a novel topological index related to molecular volume. J. Chem. Inf. Comput. Sci. 35, 31−33. (25) Platts, J. A., Butina, D., Abraham, M. H., and Hersey, A. (1999) Estimation of molecular linear free energy relation descriptors using a group contribution approach. J. Chem. Inf. Model. 39, 835−845. (26) Estrada, E., Patlewicz, G., and Gutierrez, Y. (2004) From knowledge generation to knowledge archive. a general strategy using TOPS-MODE with DEREK to formulate new alerts for skin sensitization. J. Chem. Inf. Comput. Sci. 44, 688−698. (27) Gutiérrez, Y., and Estrada, E. (2002−2004) MODESLAB 1.5 (Molecular DEScriptors LABoratory) for Windows, 1.5, Universidad de Santiago de Compostela, España. (28) Weininger, D. J. (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Chem. Inf. Comput. Sci. 28, 31−36. (29) Cabrera, M. A., González, I., Fernández, C., Navarro, C., and Bermejo, M. (2006) A topological substructural approach for the prediction of P-glycoprotein substrates. J. Pharmacol. Sci. 95, 589−606. (30) Pérez-Garrido, A., Helguera, A. M., López, G. C., Cordeiro, M. N. D. S., and Escudero, A. G. (2010) A topological substructural molecular design approach for predicting mutagenesis end-points of α, βunsaturated carbonyl compounds. Toxicology 268, 64−77. (31) González, M. P., Terán, C., Saíz-Urra, L., and Teijeira, M. (2008) Variable selection methods in QSAR: an overview. Curr. Top. Med. Chem. 8, 1606−1627. (32) Netzeva, T. I., Worth, A. P., Aldenberg, T., Benigni, R., Cronin, M. T. D., Gramatica, P., Jaworska, J. S., Kahn, S., Klopman, P., Marchant, C. A., Myatt, G., Nikolova-Jeliazkova, N., Patlewicz, G. Y., Perkins, R., Roberts, D. W., Schultz, T. W., Stanton, D. T., van de Sandt, J. J. H., Tong, W., Veith, G., and Yang, C. (2005) Current status of methods for defining the applicability domain of (quantitative) structure−activity relationships. ATLA 33, 155−173. (33) Gramatica, P. (2007) Principles of QSAR models validation: internal and external. QSAR Comb. Sci. 26, 694−701. (34) Randić, M. (1991) Correlation of enthalpy of octanes with orthogonal connectivity indices. J. Mol. Struct (Theochem). 233, 45−59. (35) Randić, M. (1991) Orthogonal molecular descriptors. New J. Chem. 15, 517−525. (36) Randić, M. (1991) Resolution of ambiguities in structureproperty studies by us of orthogonal descriptors. J. Chem. Inf. Comput. Sci. 31, 311−320. (37) Mirvish, S. S., Issenberg, P., and Sornson, H. C. (1976) Air-water and ether-water distribution of N-nitroso compounds: implications for laboratory safety, analytic methodology, and carcinogenicity for the rat esophagus, nose, and liver. J. Natl. Cancer Inst. 56, 1125−1129. (38) Lijinsky, W. (1987) Structure-activity relations in carcinogenesis by N-nitroso compounds. Cancer Metast. Rev. 6, 301−356. (39) Faustino, A., Garcia-Rio, L., Leis, J. R., and Norberto, F. (2004) Decomposition of NA-benzoyl-N-nitrosoureas in aqueous media. Eur. J. Org. Chem. 27, 154−161. (40) Golding, B. T., Bleasdale, C., McGinnis, J., Muller, S., Rees, H. T., Rees, N. H., Farmer, P. B., and Watson, W. P. (1997) The mechanism of decomposition of N-methyl-N-nitrosourea (MNU) in water and a study of its reactions with 2′-deoxyguanosine, 2′-deoxyguanosine-5′- monophosphate and d(GTGCAC). Tetrahedron 53, 4063−4082. (41) Tjälve, H. (1991) The tissue distribution and the tissue specificity of bioactivation of some tobacco-specific and some other N-nitrosamines. Toxicology 21, 265−294.

(42) Mirvish, S. S. (1995) Role of N-nitroso compounds (NOC) and N-nitrosation in etiology of gastric, esophageal, nasopharyngeal and bladder cancer and contribution to cancer of known exposures to NOC. Cancer Lett. 93, 17−48. (43) Helguera, A. M., González, M. P., Cordeiro, M. N. D. S., and Pérez, M. Á . C. (2008) Quantitative structure-carcinogenicity relationship for detecting structural alerts in nitroso compounds: species, rat; sex, female; route of administration, gavage. Chem. Res. Toxicol. 21, 633− 642. (44) Lijinsky, W., and Taylor, H. W. (1978) Carcinogenicity tests in rats of two nitrosamines of high molecular weight, nitrosododecamethyleneimine and nitrosodi-n-octylamine. Ecotoxicol. Environ. Safe 2, 407−411. (45) Shephard, S. E., Schlatter, C., and Lutz, W. K. (1987) Assessment of the risk of formation of carcinogenic n-nitroso compounds from dietary precursors in the stomach. Food Chem. Toxicol. 25, 91−108.

2442

dx.doi.org/10.1021/tx3002912 | Chem. Res. Toxicol. 2012, 25, 2432−2442