Predicting Carcinogenicity and Understanding the ... - ACS Publications

Article pubs.acs.org/crt

Predicting Carcinogenicity and Understanding the Carcinogenic Mechanism of N-Nitroso Compounds Using a TOPS-MODE Approach Jintao Yuan, Yuepu Pu,* and Lihong Yin Key Laboratory of Environmental Medicine Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing 210009, China ABSTRACT: A linear discriminant analysis (LDA) coupled with an enhanced replacement method (ERM) was used as an alternative method to predict the carcinogenicity of N-nitroso compounds (NOCs) in rats. This presented LDA based on the topological substructural molecular descriptors (TOPS-MODE) approach was developed to predict the carcinogenic and noncarcinogenic activity on a data set of 111 NOCs with a good classification value of 90.1%. The predictive power of the LDA model was validated through an external validation set (37 compounds) with a prediction accuracy of 94.6% and a leave-one-out cross-validation procedure (LOOCV) with a good prediction of 86.5%. This methodology showed that the TOPS-MODE descriptors weighted, respectively, by bond dipole moment and Abraham solute descriptor dipolarity/polarizability affected the NOC carcinogenicity. The contributions of certain bonds and fragments to carcinogenicity were used to assess biotransformation and carcinogenic mechanisms. The positive contribution of the carbon−nitrogen single bond (between the N-nitroso group and α-carbon to the N-nitroso group) indicated that the α-hydroxylation reaction could occur at the α-carbon or otherwise not occur. Similarly, the contributions from the molecular fragment could be applied to indicate whether the fragments generated an alkylating agent. These results suggested that this approach could discriminate between carcinogenic and noncarcinogenic NOCs, thereby providing insight into the structural features and chemical factors related to NOC carcinogenicity.

■

of substituents at the α-carbon.3 Subsequently, the same authors predicted the organ specificity of 19 compounds based upon physicochemical properties.4 Chou et al. expanded the approach to structure−activity relationships (SARs) using computerassisted mathematical and statistical methods to 144 NOCs. 5 Singer et al. linked liposolubility with nitrosamine carcinogenicity through QSAR.6 Dunn et al. presented a simple modeling of class analogy (SIMCA) pattern recognition method to classify 61 NOCs.7,8 Rose et al. reported the SARs of 150 nitrosamines using a pattern recognition method and reported 97% correct classification using 22 descriptors.9 Dai et al. also studied a pattern recognition method of 153 nitrosamines based on diregion theory, and they reported 97% correct classification using 10 descriptors.10 More recently, Luan et al. used linear discriminant analysis (LDA) and support vectors machine (SVM) to predict the carcinogenic properties of 148 nitroso -compounds using 7 descriptors.11 Helguera et al. developed several QSAR models for predicting the carcinogenic potency of NOCs via different routes of administration for male and female rats based upon a topological substructural molecular descriptors (TOPS-MODE) approach12−16 and found some structural alerts for carcinogenicity predictions. In those reports, each of the models provided different information and had advantages and disadvantages. Some models were developed from a relatively small dataset of compounds, understanding the physical meaning of descriptors was difficult for certain models, and regression or pattern recognition instead of classification was used for some

INTRODUCTION N-Nitroso compounds (NOCs) are a class of potent and widely distributed environmental carcinogens. Of the 300 NOCs evaluated, >90% have been demonstrated to be carcinogenic in a wide variety of animal species.1 They are also potentially important in the etiology of human cancer. Human exposure to NOCs occurs readily via food, cigarettes, drugs, car interiors, and cosmetics. To prevent cancers induced by NOCs, it is necessary to screen large numbers of NOCs. However, evaluation of the health risk of NOCs to humans through conventional animal testing is very costly, of long duration, and involves many animals being subjected to adverse welfare conditions, which is contrary to the “3 Rs policy” (“replace, reduce, and refine the use of animals in science”). To support the 3 Rs policy as well as policies set by the Regulation on Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), quantitative structure−activity relationship (QSAR) modeling is increasingly becoming more important as a potentially useful replacement alternative. This is because QSAR modeling exploits a huge wealth of existing chemical knowledge to increase our understanding of the interactions between chemicals and living organisms without the need for undertaking conventional experiments. In recent years, certain models for prediction of the carcinogenicity of NOCs have been studied extensively. Wishnok et al. demonstrated, using QSAR, that the carcinogenicity of 51 nitrosamines was inversely proportional to the number of carbon atoms of the alkyl chains.2 Later, they reported a significant correlation between nitrosamine carcinogenicity and water−hexane partition coefficients, as well as the electronic inductive effects © 2011 American Chemical Society

Received: September 26, 2011 Published: November 15, 2011 2269

dx.doi.org/10.1021/tx2004097 | Chem. Res. Toxicol. 2011, 24, 2269−2279

Chemical Research in Toxicology

Article

entries) of the different powers of the bond adjacency matrix E. To discriminate heteroatoms, the diagonal elements of the adjacency matrix E are weighted. The weights are bond contributions to physicochemical properties such as bond distances, bond dipole, and bond polarizabilities, or even mathematical expressions involving atomic weights.20,21,26,27 Thus, these atomic contributions are transformed into bond contributions.28 In this work, the bond weights chosen were hydrophobicity (H), atomic weight (AW), molar refractivity (MR), bond dipole moments (Dip and Dip2), bond distance (Dis), polar surface area (Pols), polarizability (Pol), Gasteiger−Marsilli atomic charges (Gas), van der Waals atomic radii (vdW), and Abraham molecular descriptors (Ab), to be exact, for the latter, the excess molar refraction (Ab-R 2), the combined dipolarity/polarizability (Ab-π2H), the solute gas−hexadecane partition coefficient (Ab-logL16), and the summation solute hydrogen bond basicity. Notice that, for the partition of a solute between water and nonaqueous solvent systems, Ab-Σβ2H is used and, for partition between water and aqueous solvent systems, Ab-Σβ20 is used.29 According to eq 1 below, the atomic properties are converted into bond weight contributions w(i, j), which have been described by Estrada et al.30

models. Therefore, establishing a new, readily interpretable classification model of the carcinogenicity of a relatively large dataset of NOCs is important. Lastly, a class of topological indices, the so-called “spectral moments” (TOPS-MODE descriptors), has become available. These indices represent information about electronic and topological structures, and can solve a wide variety of problems in QSAR research.17−19 Spectral moments code the information of molecular structures by such indices by focusing on submolecular fragments, and express physical and biological properties in terms of the substructural features of the molecules. This approach has been successfully applied to studies using QSAR and quantitative structure−property relationships (QSPRs).22−22 However, in multivariate analyses, selecting a combination of variables that produce the best result is also one of the most important problems. The enhanced replacement method (ERM) as a selection tool for descriptor variables is simple and of low computational cost, and ERM outperforms or equals the performance of the genetic algorithm (GA). 23 Therefore, in the present study, a LDA combined with the ERM was selected to predict the classification of the carcinogenicity of NOCs based on TOPS-MODE descriptors. The aims of the present study were to use ERM to select the best subset of TOPS-MODE descriptor variables; develop a classification model that permits the classification of NOCs (carcinogenic NOCs and noncarcinogenic NOCs); assess an in silico model using various statistical parameters and an external validation set (37 compounds); analyze the substructural contributions to carcinogenicity; and understand the biotransformation and mechanism of carcinogenesis.

■

(1) where w and δ, respectively, represent the atomic weight and vertex degree of the atoms i and j. Calculation of the spectral moment descriptors was computed with MODESLAB 1.531 from the simplified molecular input line entry specification (SMILES)32 inputting of the chemical structures. We calculated the first 15 spectral moments (μ1−μ15) for each bond weight and the number of bonds in the molecules (μ0), excluding the hydrogen atoms. Owing to the nonlinearity of the biological process (carcinogenic activity) under study, the cross-terms between μ0 and μ1 with all variables were also evaluated. As for the modeling method, we opted for an LDA approach. The discriminant function was obtained by using LDA implemented in SPSS software (Chicago, IL, USA). The default parameters of this program were used in the development of the model. The procedural details for establishing an LDA model have been reported.33,34 Selection of Variables. The use of LDA for classification of TOPSMODE descriptor data usually requires appropriate procedures for the selection of variables.35 In the present work, the ERM23,36 was adopted for this purpose. The ERM is a modified version of the replacement method (RM), and the ERM is less likely to be caught into local minima as well as less dependent on the initial solution. The purpose of RM and ERM is to find out an optimal subset of d (d ≪ D) descriptors from a large set of D descriptors with a minimum standard deviation (S):

MATERIALS AND METHODS

Dataset. The dataset of this investigation consisted of 111 NOCs. Of these, 95 were carefully selected from the Carcinogenic Potency Data Base (CPDB) established in the CRC Handbook of Carcinogenic Potency and Genotoxicity Databases and at http://potency.berkeley. edu/index.html. This is because considering data collected from a widely used international resource was deemed to be a reliable method. In addition, 16 noncarcinogenic NOCs were taken from the study reported by Luan et al.11 The dataset of 111 NOCs was derived from rat species because the test on rats seems to be more reproducible than that on mice.24 For the NOCs selected from CPDB, a chemical was categorized as a carcinogenic N-nitroso compound if it could induce cancer in any target organ of rats listed in the Summary of Carcinogenic Potency Database by Target Organ (at http://potency.berkeley.edu/ pathology.table.html); otherwise, it was categorized as a noncarcinogenic N-nitroso compound. The complete dataset of 111 NOCs and their corresponding classifications is shown in Table 1. A chemical listed in the Table 1 can induce tumors or not. In the key for Table 1, “+” represents carcinogenic compounds, and “−” represents noncarcinogenic compounds. Thus, 83 NOCs were classified as group “+”, and the other ones as group “−”. The 111 NOCs were divided into training (74) and test (37) sets by applying the classic Kennard−Stone (KS) uniform sampling algorithm25 because the KS algorithm with a fast way of calculating Euclidean distances can select objects to train sets such that they are uniformly scattered over the experimental space. QSAR Modeling. The TOPS-MODE approach is based on computation of the spectral moment of the edge adjacency matrix, upon which whole main diagonal entries represent bond weights describing the hydrophobic/polarity, electronic and steric features of molecules. In this method, molecular structure is codified into the edge adjacency matrix E, which is a square symmetric matrix of order m whose elements are equal to 1 if the bond i and j are adjacent, (i.e., they are incident to a common atom) or 0 otherwise. The spectral moments are defined as the trace (i.e., the sum of main diagonal

(2) In eq 2, N is the number of molecules in the training set, and resiis the residual for molecule i (difference between the experimental and predicted properties). Notice that S(dn) is a distribution in a space of D!/ [d!(D − d)!] ones. The full search (FS) can arrive at the global minimum by calculating S(dn) in all space points, but it is computationally prohibitive if D is sufficiently large. However, the ERM can reach the global minimum more efficiently than FS. The ERM consists of two steps. First, an initial set of descriptors dk is chosen at random, one of the descriptors is replaced, say Xki with all the remaining D − d descriptors, one by one, and the set with the smallest value of S is kept. This is what we called a “step”. Second, from the resulting set we choose the descriptor with the greatest standard deviation in its coefficient and substitute all the remaining D − d descriptors, one by one, for it. This procedure is repeated until the set remains unmodified. In each cycle, we obtain the candidate dm(i) that comes from the thus-constructed path i. More detailed information for the ERM algorithm can be obtained in the references.23,36 KS and ERM variable selection was implemented in MATLAB. 2270



Article

Table 1. Names, SMILES, and Corresponding Classification of N-Nitroso Compounds Used in This QSAR Study classification no.

name

1 2 3 4

1-allyl-1-nitrosourea 1-amyl-1-nitrosourea carboxymethylnitrosourea chlorozotocin

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

1,3-dibutyl-1-nitrosourea dinitrosocaffeidine 1-ethylnitroso-3-(2-hydroxyethyl)-urea 1-ethylnitroso-3-(2-oxopropyl)-urea 2-fluoroethyl-nitrosourea N-hexylnitrosourea 1-(2-hydroxyethyl)-1-nitrosourea 1-(3-hydroxypropyl)-1-nitrosourea N-methyl-N,4-dinitrosoaniline 4-(4-N-methyl-N-nitrosaminostyryl)quinoline N-methyl-N-nitrosobenzamide N-(N-methyl-N-nitrosocarbamoyl)-l-ornithine R(−)-2-methyl-N-nitrosopiperidine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanol nitroso-baygon N-nitroso-bis-(4,4,4-trifluoro-N-butyl)amine N-nitroso-2,3-dihydroxypropyl-2hydroxypropylamine N-nitroso-2,3-dihydroxypropylethanolamine 1-nitroso-3,5-dimethyl-4-benzoylpiperazine 1-nitroso-1-hydroxyethyl-3-chloroethylurea 1-nitroso-1-(2-hydroxypropyl)-3-chloroethylurea N-nitroso-(2-hydroxypropyl)-(2-hydroxyethyl) amine N-nitroso-N-isobutylurea N-nitroso-N-methyl-N-dodecylamine N-nitroso-N-methyl-4-fluoroaniline nitroso-N-methyl-N-(2-phenyl) ethylamine N-nitroso-N-methyl-N-tetradecylamine N-nitroso-N-methyldecylamine N-nitroso-N-methylurea di(N-nitroso)-perhydropyrimidine N-nitroso(2,2,2-trifluoroethyl)ethylamine N-nitrosoallyl-2-oxopropylamine N-nitrosoallylethanolamine N-nitrosobenzthiazuron N-nitrosobis(2-hydroxypropyl) amine N-nitrosodiethanolamine N-nitrosodiethylamine N-nitrosodimethylamine nitrosoethylmethylamine 1-nitrosohydantoin N-nitrosomethyl-(2-hydroxyethyl)amine N-nitrosomethyl-(3-hydroxypropyl)amine N-nitrosomethyl-2-hydroxypropylamine 2-nitrosomethylaminopyridine nitrosomethylaniline nitrosomethylundecylamine N′-nitrosonornicotine N-nitrosopiperazine N-nitrosothialdine N-nitrosothiomorpholine N,N′-dimethyl-N,N′-dinitrosophthalamide N,N-dinitrosopentamethylenetetramine N-nitrosobis(2,2,2-trifluoroethyl)amine

22 23 24 25 26 27 28a 29 30 31 32a 33 34 35 36 37 38 39 40a 41 42 43 44 45 46 47 48 49a 50a 51 52 53 54 55 56 57

SMILES

experiment

prob

+ + + +

+ + + +

+ + + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + −b + +

ONN(C[C@@H](O)CO)CCO ONN1C[C@H](N([C@@H](C1)C)C(O)c1ccccc1)C ClCCNC(O)N(NO)CCO ClCCNC(O)N(NO)C[C@@H](O)C ONN(C[C@@H](O)C)CCO

+ + + + +

+ + + + +

ONN(C(O)N)CC(C)C ONN(CCCCCCCCCCCC)C Fc1ccc(cc1)N(NO)C ONN(CCc1ccccc1)C ONN(CCCCCCCCCCCCCC)C ONN(CCCCCCCCCC)C OC(N(NO)C)N ONN1CN(NO)CCC1 FC(F)(F)CN(NO)CC ONN(CC(O)C)CCC ONN(CCC)CCO s1c(nc2c1cccc2)NC(O)N(NO)C ONN(C[C@@H](O)C)C[C@@H](O)C OCCN(NO)CCO ONN(CC)CC ONN(C)C ONN(CC)C ONN1C(O)NC(O)C1 ONN(CCO)C ONN(CCCO)C ONN(C[C@@H](O)C)C ONN(c1ncccc1)C c1cccc(c1)N(NO)C ONN(CCCCCCCCCCC)C ONN1[C@@H](c2cnccc2)CCC1 ONN1CCNCC1 S1[C@H](S[C@H](N([C@@H]1C)NO)C)C S1CCN(CC1)NO ONN(C(O)c1c(C(O)N(NO)C)cccc1)C ONN1CN2CN(C1)CN(NO)C2 FC(F)(F)CN(NO)CC(F)(F)F

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + − − −

+ + + + + + + + + + + + + + + + + + + + + + + + + + + −b − +b −

ONN(C(O)N)CCC ONN(C(O)N)CCCCC ONN(C(O)N)CC(O)O ClCCN(NO)C(O)N[C@@H](CO)[C@@H](O)[C@@H](O) [C@@H](O)CO ONN(C(O)NCCCC)CCCC ONN(c1ncn(c1C(O)N(NO)C)C)C ONN(C(O)NCCO)CC ONN(C(O)NCC(O)C)CC FCCN(NO)C(O)N ONN(C(O)N)CCCCCC ONN(C(O)N)CCO ONN(C(O)N)CCCO ONN(c1ccc(NO)cc1)C ONN(c1ccc(cc1)/CC\c1c2c(ncc1)cccc2)C ONN(C(O)c1ccccc1)C ONN(C(O)NCCC[C@@H](N)C(O)O)C ONN1[C@H](CCCC1)C ONN(CCC[C@H](O)c1cnccc1)C ONN(C(O)Oc1c(OC(C)C)cccc1)C FC(F)(F)CCCN(NO)CCCC(F)(F)F ONN(C[C@@H](O)CO)C[C@@H](O)C

2271



Article

Table 1. continued classification no. 58 59 60 61 62 63 64a 65a 66 67a 68a 69a 70a 71a 72a 73a 74a 75a 76 77a 78a 79a 80a 81 82a 83a 84a 85a 86 87a 88a 89a 90a 91a 92a 93a 94 95 96a 97a 98 99a 100a 101 102 103 104a 105 106 107 108a 109 110 111 a

name N-nitrosocimetidine N-nitrosoguvacoline nitrosoiminodiacetic acid nitrosoproline 1-(2-oxopropyl)nitroso-3-(2-chloroethyl)urea 2-oxopropylnitrosourea N-n-butyl-N-nitrosourea diallylnitrosamine 1-ethyl-1-nitrosourea 1-(2-hydroxyethyl)-nitroso-3-ethylurea S(+)-2-methyl-N-nitrosopiperidine 4-(methylnitrosamino)-1-(3-pyridyl)-1(butanone) mononitrosocaffeidine 3-nitroso-2-oxazolidinone nitroso-1,2,3,6-tetrahydropyridine N-nitroso-2,2,4-trimethyl-1,2-dihydroquinoline polymer 1-nitroso-3,4,5-trimethylpiperazine N-nitrosoallyl-2,3-dihydroxypropylamine N-nitrosoallyl-2-hydroxypropylamine nitrosoanabasine nitrosodibutylamine N-nitrosodiphenylamine N-nitrosodipropylamine nitrosododecamethyleneimine N-nitrosoephedrine nitrosoethylurethan nitrosoheptamethyleneimine nitrosomethyl-3-carboxypropylamine N-nitrosomethyl-2,3-dihydroxypropylamine N-nitrosomethyl(2-oxopropyl) amine N-nitrosomorpholine N-nitrosopiperidine N-nitrosopyrrolidine N-propyl-N-nitrosourea 1-nitroso-5,6-dihydrothymine nitrosopipecolic acid dipentylnitrosamine 2,6-dimethyl-N-nitrosopiperidine 2,2,6,6-tetramethyl-N-nitrosopiperidine 4-tert-butyl-N-nitrosopiperidine 4-carboxy-N-nitrosopiperidine 2,5-dimethyl-N-nitrosopyrrolidine 2-carboxy-N-nitrosopyrrolidine 2-carboxy-4-hydroxy-N-nitrosopyrrolidine N-nitrosophenmetrazine 2,3,5,6-Tetramethyl-N,N′-dinitrosopiperazine 4-methyl-N-nitrosopiperazine N-nitrosodi-n-octylamine N-nitrosodibenzylamine N-nitroso-4-(methylamino)azobenzene N-nitroso-N-ethyl-tert-butylamine N-nitrosodiacetonitrile hexahydro-1,3,5-trinitroso-1,3,5-triazine N-nitrosoindoline

SMILES

experiment

prob

S(Cc1nc[nH]c1C)CCN/C(N/C#N)/N(NO)C ONN1CC(CCC1)C(O)OC ONN(CC(O)O)CC(O)O ONN1[C@@H](C(O)O)CCC1 ClCCNC(O)N(NO)CC(O)C ONN(C(O)N)CC(O)C OC(N(NO)CCCC)N ONN(CCC)CCC OC(N(NO)CC)N ONN(C(O)NCC)CCO ONN1[C@H](CCCC1)C ONN(CCCC(O)c1cnccc1)C

− − − − − − + + + + + +

− − − − − +b + + + + + +

ONN(c1ncn(c1C(O)NC)C)C ONN1C(O)OCC1 ONN1CCCCC1 ONN1c2c(C(CC1(C)C)C)cccc2

+ + + +

+ + + −b

ONN1C[C@H](N([C@@H](C1)C)C)C ONN(CCC)C[C@@H](O)CO ONN(CCC)C[C@@H](O)C ONN1[C@H](CCCC1)c1cccnc1 ONN(CCCC)CCCC ONN(c1ccccc1)c1ccccc1 ONN(CCC)CCC ONN1CCCCCCCCCCCC1 ONN([C@@H]([C@H](O)c1ccccc1)C)C O(CC)C(O)N(NO)CC ONN1CCCCCCC1 ONN(CCCC(O)O)C ONN(C[C@@H](O)CO)C ONN(CC(O)C)C O1CCN(NO)CC1 ONN1CCCCC1 ONN1CCCC1 ONN(C(O)N)CCC ONN1C(O)NC(O)[C@@H](C1)C ONN1[C@@H](C(O)O)CCCC1 CCCCCN(NO)CCCCC C1CCC(N(C1C)NO)C N1(C(CCCC1(C)C)(C)C)NO N1(CCC(CC1)C(C)(C)C)NO ONN1CCC(CC1)C(O)O N1(C(CCC1C)C)NO C1CC(N(C1)NO)C(O)O C1(CC(N(C1)NO)C(O)O)O ONN1C(C(OCC1)c1ccccc1)C ONN1C(C(N(C(C1C)C)NO)C)C ONN1CCN(CC1)C ONN(CCCCCCCC)CCCCCCCC ONN(Cc1ccccc1)Cc1ccccc1 ONN(c1ccc(cc1)/NN/c1ccccc1)C ONN(C(C)(C)C)CC N(CC#N)(CC#N)NO ONN1CN(NO)CN(C1)NO N1(NO)c2c(CC1)cccc2

+ + + + + + + + + + + + + + + + + + − − + − − − − − − − − − − − − − − − − −

+ + + + + + + + + + + + + + + + + + − − + +b − − +b − − − +b − +b +b − − − − − +b

Test set. bMisclassified one.

Model Evaluation. Various diagnostic statistical tools were used for evaluating our model equations, in terms of the criteria goodness-

of-fit and goodness-of-prediction. Measures of goodness-of-fit have been estimated by standard statistics such as the Wilk’s lambda (λ), the 2272



Article

Mahalanobis distance (D2), Fisher’s ratio (F), and the corresponding p-level (p). The Wilk’s λ statistic implies perfect discrimination for λ = 0 and the absence of discrimination if λ = 1. The Mahalanobis distance indicates separation of the respective groups, showing if the model possesses suitable discriminatory power for differentiating between the two respective groups. Goodness-of-prediction of the discriminant models has been assessed by two means. The first model validation method was a leave-one-out cross-validation (LOOCV) method. The second validation method was the hold-out method, in which the whole training set is used to build the final classification model, and the independent test set is used to test the predictive ability of the final classification model. Bond Contributions. One of the greatest advantages of the TOPS-MODE approach over other traditional QSAR methods stems from its substructural nature. This enables one to transform the QSAR model into a bond-additive scheme and thus describes the end point activity as a sum of bond contributions related to different structural fragments for the molecule in question.35 Characterization of such bonds enables identification of the groups or fragments of a molecule responsible for its biological activity. Moreover, one can detect the fragments of a given molecule that contribute positively or negatively to the underlying activity and put forward an interpretation of their effects in terms of physicochemical properties. Bond contributions are derived from local spectral moments, which are defined as the diagonal entries of the different powers of the weighted bond matrix E

Table 2. Intercorrelation among the Variables Selected by ERMa μ0μ7Dip2 Dip μ11 Pol μ1μ12 μ1μ7Ab−logL16

μ0μ7Dip2

Dip μ11

pol μ1μ12

1.00

0.419 1.00

0.194 0.087 1.00

μ1μ7 Ab−logL16 μ8Ab−π2H 0.394 0.153 0.861 1.00

μ8Ab−π2H μ1μ3Dip a

μ1μ3Dip

0.727 0.332 0.580 0.766

0.696 0.764 0.067 0.353

1.00

0.485 1.00

Significant correlations are marked in bold.

nomenclature mΩμkw, where the symbol Ω denotes orthogonal, m is the degree of importance of the descriptor to explain the property determined by the order, μk is the kth spectral moment, and w is the bond weight used. After the Randić orthogonalization procedure, the mathematically collinear among the six new orthogonalized variables was eliminated. Following the strategy outlined above, the discriminant model included the six orthogonalized variables is given below together with the statistical parameters. This model had reasonable statistical significance. The classification results are illustrated in Table 3.

(3) μkT(i)

where stands for the kth local moment of bond i, bii(T) are diagonal entries of the weighted bond matrix, and T is the type of the bond weight (H, Dip, Pols, MR, Pol, etc.). For a given molecule, one can substitute the values of the local spectral moments computed by eq 3 into the equation below and thus gather the total contribution to the biological activity of its different bonds. (4) (6)

In this case, these contributions represent the structural information, together with other theoretical and experimental evidence, for a better understanding of the mechanisms of carcinogenic action.

2

where N = 74, λ = 0.418, D = 6.874, F(6,67) = 15.559, and p < 10−10.

■

RESULTS AND DISCUSSION Discriminant Model. The first classification model was obtained to discriminate carcinogenic NOCs from noncarcinogenic ones in the entire training set. The model with six independent variables and the statistical parameters of the LDA were as follows:

Table 3. Results of Equations 6 and 7 results of eq 6 “+” in the training set “−” in the training set “+” in the test set “−” in the test set false positive for the entire dataset false negative for the entire dataset total accuracy %

(5)

where N = 74, λ = 0.418, D2 = 6.874, F(6,67) = 15.559, and p < 10−10. In eq 5, some of the six variables were spectral moments of high order. Taking into consideration that these variables could be mathematically collinear and that overfitting results could be produced, the degree of col-linearity of the selected variables of the model can be readily diagnosed by analyzing the crosscorrelation matrix (Table 2). Several descriptor variables were highly correlated with each other (Table 2). To overcome this problem, the Randić orthogonalization procedure was carried out. 37−39 The new orthogonalized variable will have the

results of eq 7

+

%

−

%

+

%

−

%

52

96.3

2

3.7

52

96.3

2

3.7

6

30.0

14

70.0

7

35.0

13

65.0

27 1 25.0

93.1 12.5

2 7

6.9 87.5

28 96.6 1 12.5 28.6

1 7

3.4 87.5

4.8

3.6

90.1

90.1

As could be seen in Table 2, the variables μ0μ7Dip2, Pol μ1μ7Ab−logL16, and μ1μ12 were highly correlated with other Dip three descriptors μ11 , μ8 Ab‑π2H and μ1μ3Dip. The orthogonalized Pol variables 4Ωμ0μ7Dip2, 2Ωμ1μ7Ab−logL16, and 1Ωμ1μ12 had low coefficients in eq 6, which indicated that the three variables should be excluded. Using a forward stepwise procedure as a variable selection strategy, the variables 4 Ωμ 0 μ 7 Dip2 , 2 Pol Ωμ1μ7Ab−logL16, and 1Ωμ1μ12 were also excluded. However, 2273



Article

Table 4. Statistical Parameter Comparison of Different Models for Carcinogenicity Classification of NOCs a refs Luan et al. this work

N 11

descriptor type

148

CODESSA

111

TOPS-MODE

statistical technique

V

Ro (N/V)

SE (%)

SP (%)

A (%)

LDA SVM LDA

7

21.1

3

37

91.7 95.8 90.9

81.5 93.1 87.0

89.8 95.2 90.1

a

N, total number of compounds in the training and test set; V, number of variables in the model; Ro, compound−variables ratio; SE, sensitivity; SP, specificity; A, accuracy.

this streamlining operation of variables did not affect the prediction of the model but retained the predictive power of the model with only three descriptors (eq 7).

The coefficients and descriptor variables included in eq 7 can help us understand the main differences between carcinogenic and noncarcinogenic NOCs. It is well-known that the carcinogenicity of NOCs is by a genotoxic mechanism. In general, for the N-nitrosoureas, the basic biochemical transformations are related to intra- and intercellular alkylation reactions and carbamoylation reactions. For N-nitrosamines and cyclic nitrosamines, hydroxylation of the α-carbon to the N-nitroso group catalyzed by cytochrome P450 is the key to yield electrophilic diazohydroxide intermediates which may ultimately act as carcinogens. Thus, it is reasonable to expect the inclusion of the structural information descriptor variables of the spectral moments that are weighted by the bond dipole moment (Dip) and Abraham solute descriptor; the latter involves the dipolarity/polarizability (Ab−π2H). The dipole moment property is an electronic factor and is related to polar interactions; the dipolaritry/polarizability term (which is related to charge) explains phenomena taking into consideration solvation/desolvation in polar and nonpolar Dip regions of the molecules.40,41 In eq 7, the descriptor μ11 has a positive contribution to carcinogenicity, whereas the descriptors μ1μ3Dip and μ8Ab−π2H have a negative contribution. Here, it is Dip important to highlight the opposite contributions of μ11 and Dip μ1μ3 even though they are weighted by the same property. The balance for reaching the biological target and exerting their Dip action is controlled by μ11 and μ1μ3 Dip terms. Investigation of the Relationship between Substructure Contribution and Carcinogenic Mechanism. One of the main advantages that the TOPS-MODE approach brings to the study of QSAR and QSPR is related to the structural interpretability of the model. This interpretability is because the TOPS-MODE approach can obtain the quantitative contribution of any structural fragments to the studied property. This is possible because spectral moments can be expressed as a linear combination of molecular fragments. Thus, the LDA model can be transformed into a bond-additive scheme, and for each molecule, the discrimination value as a sum of bond contributions can be calculated as described above. These quantitative contributions of bond or substructure can provide valuable information to gain insight into the influence of the complex TOPS-MODE descriptors used in eq 7. In this work, the discrimination threshold total value to classify the compounds was −0.979. That is, if the model yielded a value higher than −0.979, the compound was considered to be a carcinogenic N-nitroso compound; otherwise, it was classified as a noncarcinogenic N-nitroso compound. The role of structural factors driving carcinogenicity was discussed for several series of homologous chemicals that were well classified by eq 7. Other NOCs were not included in the present study but studied in a similar way. The discussed chemicals were divided into toxicophores (a group, fragment, or region of a molecule associated with carcinogenicity) according to the general mechanism for the biotransformation of N-nitrosamines, N-nitrosoureas, and cyclic N-nitrosamines.15,16,42 The metabolic activation of nitrosamines is mediated by oxidation of the α-carbon to the N-nitroso group. However, N-nitrosoureas

(7)

where N = 74, λ = 0.449, D2 = 6.064, F(3,70) = 28.679, and p < 10−11. The statistical parameters for eq 7 were satisfactory. The large F index and small p value were indicative of the statistical significance of the model. In addition, the values of the Wilks’ lambda statistic (λ) and Mahalanobis distance (D2) showed that the model displayed an adequate discriminatory power for differentiating between the two groups. Using eq 7, in the training set (74 NOCs), 52 out of 54 were correctly classified as carcinogenic NOCs (96.3% accuracy), and 13 out of 20 were correctly classified as noncarcinogenic NOCs (65.0% accuracy); in the test set, consisting of 37 NOCs chosen by KS, 28 out of 29 carcinogenic NOCs (96.6%) and 7 out of 8 noncarcinogenic NOCs (87.5%) were classified correctly. These classification results are listed in Table 3. LOOCV carried out on the training set showed that 51 out of 54 carcinogenic NOCs (94.4%) and 13 out of 20 noncarcinogenic NOCs (65.0%) were correctly classified. Overall, the value of correct classification with cross-validation was 86.5%. The total accuracy with eq 7 was 90.1%, which was equal to that with eq 6 (90.1%). It was very clear that performance of eq 7 included less descriptor variables and was close to that of eq 6. The percentages of false positives and false negatives for eq 7 were 28.6% (8/28) and 3.6% (3/83), respectively. It was considered more important, from a practical viewpoint, to avoid false negatives because these compounds can induce cancers. Conversely, false positive compounds can arouse human vigilance and have a lower chance of harming human health. Overall, both validation procedures evidenced the predictive power of eq 7. To further evaluate the quality of eq 7, a comparison between this method and the approach reported by Luan et al.11 was undertaken (Table 4). Using the same LDA approach, the statistical parameters including the ratio (number of compounds/number of variables), specificity (SP), and accuracy (A) were all better than the ones in the approach reported by Luan et al.,11 only sensitivity (SE) was slightly inferior to the one reported in ref 11 (Table 4). The SVM was not used in this work because SVM cannot provide the coefficients of the descriptors in the SVM model for calculating bond contribution to understand the mechanisms of the carcinogenic action of NOCs. Therefore, eq 7 had good statistical quality and was suited for calculating bond contributions for studying mechanisms. 2274



Article

Figure 1. Molecular representations of carcinogenic cyclic nitrosamines as well as their selected fragment and bond contributions to the carcinogenicity.

51, 95, 96, 93, 61, 99, 101, and 108 (chemical 108 in Figure 4), the contributions of the carbon−nitrogen single bonds between the N-nitroso group and α-carbon to the N-nitroso group all decreased greatly in α-substituted chemicals, which was consistent with the report by Lijinsky.42 If the hydrogen atoms on the piperidine or pyrrolidine ring were substituted with methyl groups or a carboxyl group, the contribution of the piperidine or pyrrolidine ring would be greatly decreased (Figure 2). In addition, these different substitution groups changed the contributions of the carbon−nitrogen single bonds between the N-nitroso group and α-carbon to the N-nitroso group. These bond contributions of carcinogenic cyclic nitrosamines were positive except the ones in chemicals 77 and 51 (Figure 1). However, chemicals 77 and 51 had one bond whose contribution (−0.020 and −0.027, respectively) was very close to a positive contribution. Interestingly, in Figure 2, the contributions of the carbon−nitrogen single bonds of noncarcinogenic cyclic nitrosamines all were negative. Although the contribution of the carbon−nitrogen single bond in chemical 98 was also very close to positive contribution, chemical 98 had a hydrophilic carboxyl group (a hydrophilic group could decrease the capability of traversing biological membranes to reach intracellular targets such as DNA). All of the changes in contribution suggest that the replacement of hydrogen with groups such as the methyl and carboxyl groups in positions alpha to the N-nitroso group reduces or eliminates carcinogenic activity and supports the concept that oxidation at one of these positions is the key step in activation to carcinogenic forms, which is in accordance with the results of Lijinsky.42 When obversing chemicals 59, 93, 61, and 101 in Figure 2, it was found that methyl formate or the carboxyl group instead of the hydrogen atom in the ring had a negative effect on the contribution to carcinogenicity. This was probably primarily because they were ionized (esters are likely to be

act directly because they do not need to be biotransformed to be carcinogenic. Therefore, the contributions of the toxicophores which could react with DNA to cause genetic damage and the carbon−nitrogen single bonds between the N-nitroso group and α-carbon to the N-nitroso group were shown in the figures below, and then analyzed. In these figures, the redcolored spheres represent the positive contributions to carcinogenicity (i.e., α-hydroxylation for carcinogenicity); the green spheres represent negative contributions (i.e., α-hydroxylation occurs with difficulty); and the radius of each sphere is proportional to the magnitude of the contribution. The dotted line is used to mark the toxicophores which could react with DNA to cause genetic damage. In this dataset, chemicals 90 and 89 were the simplest fivemembered or six-membered cyclic nitrosamines, respectively. Some their derivatives were included in this dataset and are shown in Figures 1 and 2. Comparing carcinogenic chemicals 89, 52, 88, and 72, although the contributions of the rings in these chemicals would be changed (increased or reduced) if the γ-carbon atom on the piperidine ring was substituted by nitrogen or oxygen, or a double bond between the second and third carbon atoms was formed, the rings of these four cyclic nitrosamines had positive contributions to carcinogenicity. The contributions of the carbon−nitrogen single bonds between the N-nitroso group and the α-carbon to the N-nitroso group were changed, but were all positive. Comparing chemicals 89 and 77, it was found that if the hydrogen at the α-carbon on the piperidine ring was substituted with a nitrogen containing the aromatic ring, the contribution of the piperidine ring was remarkably decreased. Similarly, comparing chemicals 90 and 51, if the hydrogen at the α-carbon on the pyrrolidine ring was substituted with a nitrogen containing an aromatic ring, the contribution of the pyrrolidine ring would be clearly decreased. When comparing chemicals 74, 23, 98, and 59 and the α-substituted chemicals 77, 2275



Article

Figure 2. Molecular representations of noncarcinogenic cyclic nitrosamines as well as their selected fragment and bond contributions to the carcinogenicity.

hydrolyzed) and because the solubility and ease of excretion was increased.42 For the carcinogenic dialkylnitrosamines 42, 41, 80, 78, and 94, the short alkyl chains all had positive contributions to carcinogenicity (Figure 3). Similarly, the alkyl chains in the carcinogenic N-nitrosureas 33, 66, 91, 64, 2, and 10 all had positive contributions. The results were in agreement with experimental results that suggested that these groups could generate alkylating agents which alkylate DNA in “target” organs. Comparing chemicals 41 and 35, 80, and 20 demonstrated that if the hydrogen atoms on a dialkylnitrosamine were substituted with fluorine atoms, the contribution of their alkyl counterparts would be greatly decreased. Comparing chemicals 42 and 41 in Figure 3 and chemicals 60, 108, and 109 in Figure 4, if the hydrogen atoms on a dialkylnitrosamine were substituted with a hydroxyl group, double bonded oxygen, methyl group, or a cyano group, the contribution of their alkyl counterparts would be greatly decreased. This tendency was consistent with an in vitro mutagenic study by Lee and Guttenplan, in which, substitution of hydroxyl, methoxyl, and cyano moieties at the alpha or beta carbon of unsubstituted methyl or ethyl groups of N-nitrosamines reduced mutagenic activity.43 Interestingly, in Figure 3, the contributions of the carbon− nitrogen single bonds of carcinogenic nitrosamines and Nnitrosoureas were all positive except the ones in chemicals 94 and 49. A well-established major activation pathway for chemical 49 is P450 (CYP)-mediated α-hydroxylation at its methyl group.44 This finding is consistent with the bond contributions of chemical 49 in the presented study (the bond between the methyl and N-nitroso group had a positive contribution to

carcinogenicity). For chemical 94, the contribution of the carbon−nitrogen single bond was also very close to a positive contribution. In Figure 4, the contributions of the carbon−nitrogen single bonds of noncarcinogenic nitrosamines and N-nitrosoureas were all negative except the one in chemical 106. The latter should be considered to be a noncarcinogenic NOC based on the molecule contribution. However, the positive contribution suggests it can be metabolically activated by α-hydroxylation at one α-carbon to the N-nitroso group and then induce DNA strand breaks.45 In summary, a comprehensive analysis of the contributions of toxicophores of the molecules mentioned above is helpful in understanding the carcinogenic mechanism. It can provide valuable “rules” to infer the carcinogenicity of an untested compound. This in turn affords important pointers in the design of safer chemicals and in the prevention of the release of potentially toxic chemicals onto the market. Three important rules are discussed below. First, if a hydrophobic or hydrophilic group on a chemical is changed, the distribution of the electron cloud of a molecule is altered. As a result, the contributions of substructures of the molecule are also changed. This is because the weights dipole moment and dipolaritry/polarizability in eq 7 are related to electronic factors. Second, for nitrosamines, the contributions of the carbon− nitrogen single bonds between the N-nitroso group and α-carbon to the N-nitroso group in carcinogenic nitrosamines are all positive (the positive contribution of the carbon−nitrogen single bond suggests that the chemical tends to be metabolized via the α-hydroxylase 2276



Article

Figure 3. Molecular representations of carcinogenic nitrosamines and N-nitrosureas as well as their selected fragment and bond contributions to the carcinogenicity.

The approach used in the present study can not only correctly discriminate between carcinogenic NOCs and noncarcinogenic NOCs but also readily analyze the impact of various structural modifications on the contributions of compound fragments and then infer how the latter influence NOC carcinogenicity. This is without doubt an important advantage of our methodology over other QSAR analyses.

pathway) except those in very few chemicals and vice versa. This rule indicates that the charge distribution of the molecule is most probably an influence in the key step of α-hydroxylation, which is in accordance with the literature.11,42 Third, for NOCs, the contributions of the alkyl chains with or without substituent groups which can generate an alkylating agent to alkylate DNA in carcinogenic NOCs are all positive except those in very few chemicals and vice versa. This theoretical behavior, as recognized by the TOPS-MODE approach, is consistent with the experimental findings that nitrosamines undergo enzymatic degradation and alkylation, and that N-nitrosoureas undergo spontaneous, nonenzymatic degradation to form an alkylating agent that can alkylate DNA.

■

CONCLUSIONS We set out to explore an LDA model based on a TOPS-MODE approach for the classification of carcinogenic and noncarcinogenic NOCs selected from CPDB. The combination of LDA and an efficient variable selection procedure (ERM) led to a 2277



Article

(8) Dunn, W. J. III, and Wold, S. (1981) The carcinogenicity of Nnitroso compounds: A SIMCA pattern recognition study. Bioorg. Chem. 10, 29−45. (9) Rose, S. L., and Jurs, P. C. (1982) Computer-assisted studies of structure-activity relationships of N-nitroso compounds using pattern recognition. J. Med. Chem. 25, 769−776. (10) Dai, Q. Y., Zhong, R. M., and Gao, X. M. (1987) Structureactivity relationships of N-nitroso compounds using pattern recognition based on di-region theory. Environ. Chem. 6, 1−11. (11) Luan, F., Zhang, R., Zhao, C., Yao, X., Liu, M., Hu, Z., and Fan, B. (2005) Classification of the carcinogenicity of N-nitroso compounds based on support vector machines and linear discriminant analysis. Chem. Res. Toxicol. 18, 198−203. (12) Helguera, A. M., González, M. P., Cordeiro, M. N. D. S., and Pérez, M. Á . C. (2007) Quantitative structure carcinogenicity relationship for detecting structural alerts in nitroso-compounds. Toxicol. Appl. Pharmacol. 221, 189−202. (13) Helguera, A. M., Pérez, M. Á . C., Combes, R. D., and González, M. P. (2006) Quantitative structure activity relationship for the computational prediction of nitrocompounds carcinogenicity. Toxicology 220, 51−62. (14) Helguera, A. M., Pérez-Machado, G., Cordeiro, M. N. D. S., and Combes, R. D. (2010) Quantitative structure-activity relationship modelling of the carcinogenic risk of nitroso compounds using regression analysis and the TOPS-MODE approach. SAR QSAR Environ. Res. 21, 277−304. (15) Helguera, A. M., González, M. P., Cordeiro, M. N. D. S., and Pérez, M. Á . C. (2008) Quantitative structure-carcinogenicity relationship for detecting structural alerts in nitroso compounds: species, rat; sex, female; route of administration, gavage. Chem. Res. Toxicol. 2008 (21), 633−642. (16) Helguera, A. M., Cordeiro, M. N. D. S., Pérez, M. Á . C., Combes, R. D., and González, M. P. (2008) Quantitative structure carcinogenicity relationship for detecting structural alerts in nitrosocompounds species: rat; sex: male; route of administration: water. Toxicol. Appl. Pharmacol. 231, 197−207. (17) Estrada, E., Patlewicz, G., Chamberlain, M., Basketter, D., Larbey, S. Computer-aided knowledge generation for understanding skin sensitization mechanisms: the TOPS-MODE approach. Chem. Res. Toxicol. 16, 1226-1235. (18) Vilar, S., Estrada, E., Uriarte, E., Santana, L., and Gutierrez, Y. (2005) In silico studies toward the discovery of new anti-HIV nucleoside compounds through the use of TOPS-MODE and 2D/3D connectivity indices. 2. purine derivatives. J. Chem. Inf. Model 45, 502− 514. (19) González, M. P., Díaz, H. G., Ruiz, R. M., Cabrera, M. A., and Armas, R. R. (2003) TOPS-MODE Based QSARs derived from heterogeneous series of compounds. applications to the design of new herbicides. J. Chem. Inf. Comput. Sci. 43, 1192−1199. (20) Estrada, E., and Uriarte, E. (2001) Quantitative structuretoxicity relationships using TOPS-MODE. 1. nitrobenzene toxicity to tetrahymena pyriformis. SAR QSAR Environ. Res. 12, 309−324. (21) Estrada, E., Molina, E., and Uriarte, E. (2001) Quantitative structure-toxicity relationships using TOPS-MODE. 2. neurotoxicity of a non-congeneric series of solvents. SAR QSAR Environ. Res. 12, 445− 459. (22) Estrada, E., Uriarte, E., Gutierrez, Y., and González, H. (2003) Quantitative structure-toxicity relationships using TOPS-MODE. 3. structural factors influencing the permeability of commercial solvents through living human skin. SAR QSAR Environ. Res. 14, 145−163. (23) Mercader, A. G., Duchowicz, P. R., Fernández, F. M., and Castro, E. A. (2010) Replacement method and enhanced replacement method versus the genetic algorithm approach for the selection of molecular descriptors in QSPR/QSAR theories. J. Chem. Inf. Model 50, 1542−1548. (24) Gottmann, E., Kramer, S., Pfahringer, B., and Helma, C. (2001) Data quality in predictive toxicology: reproducibility of rodent carcinogenicity experiments. Environ. Health Perspect. 109, 509−514.

final model with only three selected descriptors and good statistical quality. A reasonable interpretation of these molecular descriptors from a toxicological viewpoint was achieved by means of bond contributions. The model could explain the difference between NOCs with different substituents. The structural features identified were, in most cases, consistent with experimental evidence (i.e., key carcinogenicity steps including α-hydroxylation and DNA alkylation), which let us understand the mechanisms of the carcinogenic action of the tested NOCs. Therefore, this is a particularly interesting tool that can be recommended for future studies in this field.

■

AUTHOR INFORMATION Corresponding Author * Tel: +86 25 83794996. Fax: +86-25-83783428. E-mail address: [email protected]. Funding This work was supported by the State Key Program for Basic Research of China (Grant No. 2011CB933404).

■ ■

ACKNOWLEDGMENTS

We are grateful to MODESLAB 1.5 software owners for delivering a free copy of the program.

ABBREVIATIONS TOPS-MODE, topological substructural molecular descriptors; NOCs, N-nitroso compounds; QSAR, quantitative structure activity relationship; LDA, linear discriminant analysis; ERM, enhanced replacement method; LOOCV, leave-one-out crossvalidation; REACH, Regulation on Registration, Evaluation, Authorisation and Restriction of Chemicals; SARs, structure− activity relationships; SIMCA, simple modeling of class analogy; SVM, support vectors machine; CPDB, Carcinogenic Potency Data Base; KS, Kennard−Stone; RM, replacement method; GA, genetic algorithm; N, total number of compounds in the training and test set; V, number of variables in the model; Ro, compound−variables ratio; SE, sensitivity; SP, specificity; A, accuracy

■

REFERENCES

(1) González-Mancebo, S., Gaspar, J., Calle, E., Pereira, S., Mariano, A., Rueff, J., and Casado, J. (2004) Stereochemical effects in the metabolic activation of nitrosopiperidines: correlations with genotoxicity. Mutat. Res. 558, 45−51. (2) Wishnok, J. S., and Archer, M. C. (1976) Structure−activity relationships in nitrosamine carcinogenesis. Br. J. Cancer 33, 307−311. (3) Wishnok, J. S., Archer, M. C., Edelman, A. S., and Rand, W. M. (1978) Nitrosamine carcinogenicity: a quantitative Hansch-Taft structure−activity relationship. Chem.-Biol. Interact. 20, 43−54. (4) Edelman, A. S., Kraft, P. L., Rand, W. M., and Wishnok, J. S. (1980) Nitrosamine carcinogenicity: A quantitative relationship between molecular structure and organ selectivity for a series of acyclic N-nitroso compounds. Chem.-Biol. Interact. 31, 81−92. (5) Chou, J. T., and Jurs, P. C. (1979) Computer assisted structureactivity studies of chemical carcinogens. An N-nitroso compound data set. J. Med. Chem. 22, 792−797. (6) Singer, G. M., Taylor, H. W., and Lijinsky, W. (1977) Liposolubility as an aspect of nitrosamine carcinogenicity: Quantitative correlations and qualitative observations. Chem.-Biol. Interact. 19, 133−142. (7) Dunn, W. J. III, and Wold, S. J. (1981) An assessment of the carcinogenicity of N-nitroso compounds by the SIMCA method of pattern recognition. J. Chem. Inf. Comput. Sci. 21, 8−13. 2278



Article

(25) Kennard, R. W., and Stone, L. A. (1969) Computer aided design of experiments. Technometrics 11, 137−148. (26) Estrada, E., and Uriarte, E. (2001) Recent advances on the role of topological indices in drug design discovery research. Curr. Med. Chem. 8, 1573−1588. (27) Estrada, E. (1995) Edge adjacency relationships and a novel topological index related to molecular volume. J. Chem. Inf. Comput. Sci. 35, 31−33. (28) Estrada, E., and Gonzalez, H. (2003) What are the limits of applicability for graph theoretic descriptors in QSPR/QSAR? Modeling dipole moments of aromatic compounds with TOPSMODE descriptors. J. Chem. Inf. Comput. Sci. 43, 75−84. (29) Platts, J. A., Butina, D., Abraham, M. H., and Hersey, A. (1999) Estimation of molecular linear free energy relation descriptors using a group contribution approach. J. Chem. Inf. Model 39, 835−845. (30) Estrada, E., Patlewicz, G., and Gutierrez, Y. (2004) From knowledge generation to knowledge archive. a general strategy using TOPS-MODE with DEREK to formulate new alerts for skin sensitization. J. Chem. Inf. Comput. Sci. 44, 688−698. (31) Gutiérrez, Y., and Estrada, E. (2002) MODESLAB 1.5 (Molecular DEScriptors LABoratory) for Windows, 1.5, Universidad de Santiago de Compostela, Spain. (32) Weininger, D. J. (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Chem. Inf. Comput. Sci. 28, 31−36. (33) Cabrera, M. A., González, I., Fernández, C., Navarro, C., and Bermejo, M. (2006) A topological substructural approach for the prediction of P-glycoprotein substrates. J. Pharmacol. Sci. 95, 589−606. (34) Pérez-Garrido, A, Helguera, A. M., López, G. C., Cordeiro, M. N., and Escudero, A. G. (2010) A topological substructural molecular design approach for predicting mutagenesis end-points of α,βunsaturated carbonyl compounds. Toxicology 268, 64−77. (35) González, M. P., Terán, C., Saíz-Urra, L., and Teijeira, M. (2008) Variable selection methods in QSAR: an overview. Curr. Top. Med. Chem. 8, 1606−1627. (36) Mercader, A. G., Duchowicz, P. R., Fernández, F. M., and Castro, E. A. (2008) Modified and enhanced replacement method for the selection of molecular descriptors in QSAR and QSPR theories. Chemometr. Intell. Lab. 92, 138−144. (37) Randić, M. (1991) Resolution of ambiguities in structureproperty studies by us of orthogonal descriptors. J. Chem. Inf. Comput. Sci. 31, 311−320. (38) Randić, M. (1991) Orthogonal molecular descriptors. New J. Chem. 15, 517−525. (39) Randić, M. (1991) Correlation of enthalpy of octanes with orthogonal connectivity indices. J. Mol. Struct. (Teochem.) 233, 45−59. (40) Abraham, M. H. (1993) Scales of solute hydrogen-bonding: their construction and application to physicochemicaI and biochemicaI processes. Chem. Soc. Rev. 22, 73−83. (41) Platts, J. A., Butina, D., Abraham, M. H., and Hersey, A. (1999) Estimation of molecular linear free energy relation descriptors using a group contribution approach. J. Chem. Inf. Comput. Sci. 39, 835−845. (42) Lijinsky, W. (1987) Structure-activity relations in carcinogenesis by N-nitroso compounds. Cancer Metast. Rev. 6, 301−356. (43) Lee, S. Y., and Guttenplan, J. B. (1981) A correlation between mutagenic and carcinogenic potencies in a diverse group of Nnitrosamines: determination of mutagenic activities of weakly mutagenic N-nitrosamines. Carcinogenesis 2, 1339−1344. (44) Šulc, M., Hodek, P., and Stiborová, M. (2010) The binding affinity of carcinogenic N-nitrosodimethylamine and Nnitrosomethylaniline to cytochromes P450 2B4, 2E1 and 3A6 does not dictate the rate of their enzymatic N-demethylation. Gen. Physiol. Biophys. 29, 175−185. (45) Schmezer, P., Pool, B. L., Lefevre, P. A., Callander, R. D., Ratpan, F., Tinwell, H., and Ashby, J. (1990) Assay-specific genotoxicity of N-nitrosodibenzylamine to the rat liver in vivo. Environ. Mol. Mutagen 15, 190−197.

2279


Predicting Carcinogenicity and Understanding the ... - ACS Publications

Recommend Documents