Three-Tier Strategy for Screening High-Energy Molecules Using

This research aims to develop a three-tier QSPR modeling strategy for screening the HEMs using qualitative (ternary classification: nonexplosive, noni...
0 downloads 8 Views 2MB Size
Article pubs.acs.org/IECR

Three-Tier Strategy for Screening High-Energy Molecules Using Structure−Property Relationship Modeling Approaches Shikha Gupta,† Nikita Basant,‡ and Kunwar P. Singh*,† †

Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India ‡ ETRC, Gomti Nagar, Lucknow 226010, India S Supporting Information *

ABSTRACT: Experimental determination of the explosive properties of chemicals is a tedious, time- and resource-intensive, and involved risk. In this study, we have established a three-tier structure−property relationship (SPR) modeling strategy for screening the chemicals for their explosive behavior. Accordingly, qualitative (ternary classification: nonexplosive, ideal and nonideal explosive), semiquantitative (binary classification: industrial and high explosive), and quantitative SPR models based on decision tree forest (DTF) and decision tree boost (DTB) algorithms were developed for discriminating the chemicals in different categories and predicting the detonation velocity (DV) of ideal and lower flammability limit (LFL) of nonideal explosives in accordance with the OECD guidelines. The statistical quality and external predictive power of the developed SPR models were evaluated through the internal and external validation procedures. In the test set, the qualitative and semiquantitative SPR models (DTF, DTB) rendered accuracy of >99%, while the quantitative SPR models (DTF, DTB) for ideal and nonideal explosives yielded correlation (R2) of >0.93 and >0.94 between the measured and predicted DV and LFL values, respectively. Values of various statistical validation coefficients derived for the test data were above their respective threshold limits and thus put a high confidence in this analysis. The applicability domains of the constructed SPR models were determined using the descriptors range, Euclidean distance, and leverage approaches. The SPR models in this study performed better than the previous studies. The results suggest that the developed SPR models can reliably predict the explosive properties of diverse chemicals and can be useful tools for the screening of candidate molecules in the future development process.

1. INTRODUCTION Energetic materials (EMs) are widely used in various civil, defense, and space applications worldwide. EMs generally contain high-energy molecules (HEMs), which upon decomposition undergo oxidation and detonate with sudden release of energy. Several incidents of premature explosion of EMs during synthesis, evaluation, handling, transportation, storage, and applications have been reported causing loss of lives and property at the work place.1 Several efforts are being made worldwide to explore and design new HEMs that could be more powerful and safe in various related operations. The concept of green EMs for defense and space applications is acquiring importance, and it may become mandatory requirement in the near future.2,3 Research for new HEMs having certain desired properties is an ongoing effort of scientists and engineers. Since exploration of new EMs with improved performance is not only time and resource intensive, but also it involves high risk due to a premature explosion, alternate methods for predicting the performance prior to synthesis are highly desirable to use in early stage of development.4 Predicting the performance of new molecules contains some important advantages so that it can be applied to suggested candidate target molecules prior to understanding synthesis and newly prepared compounds even when the available amount is insufficient for laboratory characterization. The detonation velocity (DV) and the lower flammability limit (LFL) are the respective characteristic properties of the ideal and nonideal explosives.5,6 The DV of © XXXX American Chemical Society

an explosive is the velocity at which the chemical reaction transverses the explosive assuming there is no energy loss at the boundary of the material. The flammability limits represent flammability characteristics of fuels and are essential for quantitative risk assessment of the explosion hazard associated with the use of the fuel−air mixtures.6 The fuel concentration limit, often referred to as LFL, is the lowest concentration of a combustible gas in air that can propagate an explosion.7,8 The flammability limits are rare and unavailable for combustible mixtures at nonambient conditions. The experimental determination of DV and LFL is very tedious and cumbersome.6,9 Thus, it is necessary to find out other methods to determine the DV and flammability limits of the ideal and nonideal explosives. In recent years, some prediction models have been developed to estimate the DV and LFL.6,9−17 In the past, many computer codes have been developed and used for calculation of DV of explosives.18 However, most of such approaches are restricted to CHNO explosives and heat of formation form an integral part of the input parameters in all of them and need estimation of probable products of explosion.18 Moreover, none of these empirical codes provides qualitative prediction of these compounds, an essential requirement for screening the Received: September 24, 2015 Revised: November 26, 2015 Accepted: December 4, 2015

A

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

2. MATERIALS AND METHODS This research aims to develop a three-tier QSPR modeling strategy for screening the HEMs using qualitative (ternary classification: nonexplosive, nonideal explosive, ideal explosives), semiquantitative (binary classification: industrial explosives, high explosives), and quantitative (precise end point) methods in accordance with the OECD guidelines. Our intention here was to save the computational efforts and cost at the end of the user. A flow-chart of the three-tier strategy is shown in Figure 1.

chemicals. Therefore, the need for some simple and cheap methods is always desirable. Qualitative and quantitative structure−property relationships (QSPRs) provide an alternative method for screening the compounds as explosive or nonexplosive and prediction of DV and LFL end points using descriptors derived solely from the molecular structure to fit the experimental data. With the QSPR approach, a mathematical relationship that binds the structure of a chemical to a property is built using a set of compounds for which the experimental values of the property are available. The advantage of this method lies in the fact that once a model is established, and it can be applied to novel chemicals for which it is required only their chemical structure.19 Once a correlation is established and validated, it can be applicable for the prediction of the property of new compounds that have not been synthesized or discovered. Thus, the QSPR method can expedite the process of development of new molecules and materials with desired properties. Although several QSPR methods have been successfully applied to predict the behavior (impact sensitivity) of explosives,20−22 only a few attempts have been made to develop QSPR for predicting the DV of the explosives.9,10 Therefore, there is a need for the development of QSPR models for screening the chemicals for their explosive behavior, which could help in designing of new molecules with desired and controlled properties. Such strategy improves the quality of results and also provides a better understanding of the predictions. In recent past years, the computational approaches, capable of successfully establishing relationships between the structural features and the end point property of the chemicals, have emerged as viable methods for predicting the property and in guiding the designing of new molecules.23,24 Among several modeling algorithms available today, the ensemble machine learning (EML) approaches have emerged as unbiased methods for QSPR analysis and have been considered as the most efficient approaches capable of enhancing the accuracy of weak learners and capturing the complex nonlinearities in the data.23,25−28 The main objective of this study is to identify the relevant structural features of the chemicals responsible for their explosive behavior and to develop a reliable three-tier strategy based on a qualitative, semiquantitative, and a quantitative SPR modeling for screening the HEMs in accordance with the OECD guidelines for QSAR development.29 The proposed screening strategy in tier-1 (qualitative SPR) would help to discriminate the chemicals as the nonexplosive, ideal, and nonideal explosives; in tier-2, the semiquantitative SPR would help to screen out the “industrial” and “high” explosives, whereas in tier-3, the quantitative SPR will estimate the value of the end point property of explosives (DV, LFL) in a quantitative manner. Accordingly, structural features of the chemicals considered here were extracted, and QSPR models based on the EML approach (decision tree forest, DTF, and decision tree boost, DTB) were constructed and rigorously validated using OECD recommended statistical parameters to ensure their external predictive power. The proposed three-tier SPR modeling strategy may be useful for the research and development activities in initial screening of chemicals for possibly explosive behavior for developing new HEMs with desired and controlled properties.

Figure 1. A flow diagram showing the three-tier strategy for screening the HEMs.

2.1. Data Set. In this study, the data on ideal and nonideal explosive chemicals were collected from multiple sources,3,4,9,30−35 whereas for the nonexplosive group, chemicals with structure closely similar to the ideal-explosives were considered.9 Here, we considered 106 ideal, 249 nonideal, and 63 nonexplosive chemicals. These data sets report the DV (m/ s) of the ideal explosives (106) and the LFL (vol %) of the nonideal explosives (231). The data set on nonideal explosives (qualitative SPR) also included 18 flammable compounds for which LFL values were not available. The ideal explosive chemicals considered here included the aliphatic amines, aromatic amines, dinitro-aromatic amines, dinitro-benzenes, dinitro-phenols, ester, imide, neutral organics, triazines, and peroxy-acid. The nonideal explosive chemicals mainly belong to paraffinic, unsaturated, cyclic and aromatic hydrocarbons, alcohols, ethers, aldehydes, ketones, acids, anhydrides, esters, amines, amides, cyanides, hydrazines, and nitrated, chlorinated, brominated, sulfur compounds.30 The list of chemicals used for SPR modeling here is provided in the Table S1 (Supporting Information). The DV and LFL values of the compounds considered here ranged between 1804 and 10 100 m/s and 0.3 and 15.9 vol (%), respectively. For quantitative SPR analysis, the DV values were transformed into km/s, which is commonly used in explosive prediction modeling.36−38 2.2. Molecular Descriptors. The calculation of descriptors is the fundamental step in QSAR/QSPR modeling. For each molecule 1D, 2D, and 3D molecular descriptors were computed using the Chemopy program.39 The SMILES40 (simplified molecule input line entry system) of the molecules B

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 2. Plot showing the distribution of the PCA scores of the descriptors in training and test compounds in (a) qualitative, (b) semiquantitative, (c) quantitative (ideal), and (d) quantitative (nonideal) SPR analysis.

were obtained using the ChemSpider.41 The chemical structures available in ChemSpider corresponding to the SMILES of the considered molecules were compared with those in the Pubchem.42 The compounds for which the chemical structures were found different, the SMILES of such molecules were taken from the Pubchem for descriptor calculation. The geometry of the molecules was optimized using ChemMop43 in which the structure of each compound was refined with the semiempirical molecular orbital program MOPAC with the PM7 force field. The optimized geometries of the molecules were transferred into the Chemopy. Here, a total of 1135 descriptors (physicochemical, constitutional, geometrical, topological) were calculated for each molecule. Although, during development of the QSPR models, all the descriptors in the pool were used to identify the most relevant features; in final QSPR models, the descriptors that can demonstrate the physical meaning of the structural attributes of the molecules were retained to ensure the compliance of the OECD principles. 2.3. Data Processing and Feature Selection. In this study, the data sets for the qualitative, semiquantitative, and quantitative SPR analysis were composed of 418, 105, 106 (DV), and 231 (LFL) compounds, respectively (Table S1). For the SPR model development, the data sets were split into the training (80%) and test (20%) subsets using the random distribution approach.44 In SPR modeling, selection of the training and test data is important and for a statistical significant model, and the test data should lie within the chemical space occupied by the training data. The distribution of the structural descriptors of the chemicals in respective test and training data was checked using the principal components analysis (PCA). The constructed scores plots (Figure 2) suggested that the test

compounds were located in the close proximity to the training set compounds. Since all the molecular properties may not be relevant to the modeling, elimination of less significant descriptors can improve the accuracy of prediction and facilitate the interpretation of the model through focusing on the most relevant variables. From the pool of calculated descriptors, those with low variance (≤1.0) were removed. For development of qualitative, semiquantitative, and quantitative SPR models, initial features were selected by model-fitting approach.24 The models were trained by using the complete set of features computing the respective scoring functions to rank the contribution of features in the current set. The lowest ranked descriptors (7000 m/s are C

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

all positive examples are ranked higher than the negative ones and AUC = 1. Any deviation from this ranking decreases the AUC, and the expected AUC value for a random ranking is 0.5.56 Further, the performance of the quantitative SPR models was assessed by deriving the concordance correlation coefficient (CCC), QF12, QF22, QF32, and rm2 for the external test data57−61 and by calculating R2 and the RMSE in the training and test data. A quantitative model will be statistically significant if the conditions, R2 (training) > 0.5, R2 (test) > 0.6, CCC > 0.85, QF12, QF22, QF32 > 0.7, and rm2 > 0.65, respectively, are satisfied.62,63 2.6. Applicability Domain Analysis. A QSPR model can offer reliable predictions for compounds similar to those belonging to the training data set, which could be determined by the applicability domain (AD) of the model, which denotes a theoretical region in the chemical space defined by the descriptors used in the model.64 In this study, we used multiple approaches for the evaluation of the structural AD of the proposed SPR models in accordance with the third principle of OECD. The first approach is based on the ranges of individual descriptors used for the model building. According to this method, a compound with descriptor values within the range of those of the training set compounds is considered as being inside the AD of the model.64 The second method was based on the leverage approach.65 The distance of a compound from the centroids of its training set was measured by the leverage of the moiety. The leverage value, hi for each ith compound is calculated from the descriptor (I × j) matrix (X), as hi = xTi (XTX)−1xi, where xi is a raw vector of molecular descriptors for a particular ith compound. The value of hi greater than the critical h* value indicates that the structure of the compound substantially differs from those used for the calibration. Therefore, the compound is located outside the optimum prediction space. The h* value can be calculated66 as h* = 3(p + 1)/n, where p is the number of variables used in the model, and n is the number of training compounds. The third approach is based on the Euclidean distance (ED) method64 in which the mean distance of one compound to the remaining ones (d̅t) was computed from the descriptors (m) space matrix. d̅t = ∑jm= 1dij/n − 1, i = 1,2,...,m, where, dij is a distance score for a pair of compounds, measured by the ED norm67 based on the compound descriptors (xik and xjk), dij = (∑km= 1(xik − xjk)2)1/2. The normalized mean distance scores for training set compounds are calculated (0−1). Then normalized mean distance scores for the test set are calculated, and those test compounds scored outside the 0−1 range are said to be outside the AD area.

categorized as the high explosives. Accordingly, the numbers of compounds in two categories were 35 and 70, respectively. The DV values of the compounds considered for the semiquantitative SPR analysis ranged between 3850 m/s and 10 100 m/s. 2.5. Three-Tier SPR Modeling and Validation. Here, we attempted to develop a three-tier strategy based on the qualitative (ternary classification), semiquantitative (binary classification), and quantitative (regression) SPR modeling (DTF and DTB) for the discrimination of compounds based on their explosive characteristics and vis-à-vis to predict the DV (ideal explosives) and LFL (nonideal explosives) values of the compounds in a quantitative manner so that an untested molecule can be screened for its explosive behavior prior to synthesis. The qualitative SPR model thus can serve as an initial screening tool for categorizing the molecule into one of the three (nonexplosive, ideal, and nonideal explosive) categories, whereas the semiquantitative model can be used to categorize an ideal explosive chemical as industrial or high explosive. Finally, the quantitative SPR models can be employed to predict the DV (ideal) and LFL (nonideal) value of the molecule in a precise manner. Here, well established DTF and DTB algorithms were employed for developing the SPR models satisfying the second requirement of the OECD guidelines.29 These are ensemble machine learning approaches implementing the bagging (DTF) and boosting (DTB) algorithms, respectively.48−50 The number and depth of each tree are the model parameters in DTF and DTB analysis. Here, DTF and DTB models were developed using DTREG,51 and all other computations were done in Excel 97. Both the methods have widely been used in predictive modeling.23,25−28 Prior to application of the SPR models to predict new test data, validation is an important requirement (OECD principle 4). To check whether the selected descriptors in model development have a chance correlation, Y-randomization test was performed. In this test, the models were derived using various randomly rearranged end point properties with the selected descriptors and the corresponding R2 and cRp2 were calculated for the scrambled models.52 The threshold value of c 2 Rp is 0.5, and a model exceeding this value might be considered not the outcome of mere chance only.53 Moreover, the external validation is a significant and necessary method used to determine both the generalizability and the true predictability of the SPR models for new chemicals. The external predictive ability of the developed qualitative and semiquantitative SPR models on the external prediction set was evaluated by deriving metrics54 such as the accuracy, sensitivity, specificity, and Matthews correlation coefficient (MCC). The accuracy represents the total number of correctly predicted compounds in different categories, whereas the MCC value equal to 1 is regarded as a perfect prediction. A sensitivity value indicates the ability of a model to recognize the target property of diverse compounds, whereas the specificity value indicates the ability of the model to recognize the false positive compounds, and it can save the experimental costs.55 Further, the discriminating efficiency of binary classification models developed here was evaluated using the area (AUC) under the receiver operating characteristic (ROC) curve. The AUC is closely related to the ranking quality of the classification and can be viewed as a measure based on pairwise comparisons between classifications of two categories. It is an estimate of the probability that the classifier ranks a randomly chosen positive example higher than a negative example. With a perfect ranking,

3. RESULTS AND DISCUSSION In this study, we intended to identify and extract the structural features of the chemicals responsible for their explosive/ flammable behavior and to develop a three-tier strategy based on qualitative, semiquantitative, and quantitative SPR models for screening the chemicals using the identified structural features and the DTF and DTB algorithms. The qualitative SPR models (tier-1) were established to discriminate the chemicals into three different categories (nonexplosive, nonideal, and ideal explosives). The semiquantitative models (tier-2) were developed for the ideal explosive chemicals capable of differentiating between the industrial and high explosive ones, whereas the quantitative (tier-3) SPR models were established to predict the DV of the ideal and the LFL of the nonideal explosives. D

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research 3.1. Qualitative SPR Modeling (Tier-1). For the qualitative (ternary classification) SPR modeling, the optimal DTF and DTB approaches identified four descriptors (ncarb, phi, nhet, UI). In optimal DTF and DTB models, 210 and 360 trees were generated using maximum depth and average number of group splits of 12, 9, and 41.5 and 985.6, respectively. In training and five-fold CV, the ternary discrimination accuracies were 99.50% and 93.03% in both the DTF and DTB models, which suggest no obvious overfitting of the data. In Y-randomization, the average misclassification rates of five scrambled DTF and DTB models were 45.50% and 49.28%, respectively, which are much higher than in respective training sets (1.20% and 0.90%). A high misclassification rate value rendered by both the DTF and DTB models suggests that the original QSPRs are relevant and disapproves the chance-correlation. The contributions of the selected descriptors in qualitative SPR models are plotted in Figure 3.

test (99.22%, 100%), whereas the MCC values in both the sets were >0.97. Further, both the models (DTF and DTB) yielded high sensitivity (>98%) and specificity (>99%) values in the training and test data. For a qualitative SPR model to be used in screening of chemicals, the sensitivity is the most important parameter.71 An investigation of the discrimination results yielded by two models revealed that only four compounds (propane, o-cresol, durene, anthracene) in DTF and three compounds (o-cresol, durene, anthracene) in DTB were misclassified. Moreover, all the common misclassified compounds belong to the nonexplosive and nonideal explosive compounds categories. None of the compounds from the ideal explosive category was misclassified by DTB SPR model. A cursory look at the performance parameters (Table 1) suggests that both the SPR models (DTF and DTB) successfully discriminated the compounds in three different categories and rendered closely comparable results. It is reassuring that the use of only four very simple descriptors led to significant discriminant models. A nearly perfect (>99.2%) discrimination of the chemicals into three different categories by both the modeling approaches here may be due to the fact that the chemicals in these classes differ significantly in their structural characteristics (descriptors) considered for modeling. The mean values of these descriptors corresponding to the chemicals in three different categories (nonexplosives, nonideal explosives, and ideal explosives) were 8.75, 5.69, 6.23 (ncarb); 2.87, 2.92, 3.60 (phi); 2.41, 1.15, 11.66 (nhet); and 2.63, 1.02, 2.79 (UI), respectively. It may be seen that the mean values of all the descriptors (except ncarb) were higher for the compounds in ideal explosives category. 3.2. Semiquantitative SPR Modeling (Tier-2). In the semiquantitative (binary classification) SPR modeling, the optimal DTF and DTB methods identified three structural features (nnitro, TIAC, Gravto) as the most relevant to develop predictive models. The optimal DTF and DTB models were based on 180 and 400 trees, 13, 5 maximum depth, and 14.2, 60.2 average number of group splits, respectively. The binary discrimination accuracies in training (97.02%, 100%) and fivefold CV (88.24%, 88.24%) achieved by the DTF and DTB methods suggest no obvious overfitting of the data. The average misclassification rate in five scrambled DTF and DTB models (39.43% and 39.05%) suggested that the original SPRs are relevant and disapproves the chance-correlation. Among the three descriptors, TIAC in DTB and Gravto in DTF had highest contributions, whereas Gravto in DTB and nnitro in DTF contributed the least (Figure 4). The TIAC is total information content on atomic composition of the molecule

Figure 3. Plot showing the contribution of input descriptors in qualitative SPR models.

From the figure, it is evident that in both the models, nhet had the highest (100%) contribution. The nhet and ncarb are simple atom type count descriptors that encode information on atomic composition of the molecule.68 The nhet considers hetero atoms, while the ncarb is related to carbon atoms. A larger value of nhet is associated with the chemicals of explosive group, while it is opposite in case of ncarb. The phi is a kappa shape molecular flexibility descriptor that increases with homologation and decreases with increased branching or cyclicity.69 Larger phi value indicated greater molecular flexibility and hence explosive character. The UI (unsaturation index) is a simple information index for unsaturated bonds (double, triple, and aromatic bonds).70 The optimal qualitative SPR models (DTF, DTB) were applied to the test data, which were kept out from the model building phase, and their performance parameters in the training and test sets are presented in Table 1. As evident, the DTF and DTB models yielded high discriminatory accuracies in training (99.20%, 99.40%) and Table 1. Performance Parameters for the Qualitative SPR Models model

data set

sensitivity (%)

specificity (%)

accuracy (%)

MCC

DTF-SPR

training test training test

98.84 98.77 99.00 100.00

99.37 99.55 99.51 100.00

99.20 99.22 99.40 100.00

0.98 0.97 0.98 1.00

DTB-SPR

Figure 4. Plot showing the contribution of input descriptors in semiquantitative SPR models. E

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

3.3.1. Quantitative QSPR Models for Ideal Explosives. The optimal SPR models (DTF, DTB) for ideal explosives were established with five descriptors (nnitro, MR, noxy, UI, L1m), which encode the structural features of the ideal explosive chemicals. The total number of trees in series, maximum depth of any tree, and the number of average group splits in DTF and DTB were 275, 10, 38.3 and 471, 6, 57.3, respectively. The RMSE in the training and five-fold CV were 0.43, 0.84 (DTF) and 0.27, 0.93 (DTB), respectively. In Y-randomization test, the respective values of mean R2 and cRp2 for these QSPR models derived through five-fold CV were 0.010, 0.935 and 0.016, 0.964, which suggest that the original models are unlikely to arise as a result of chance-correlation. The optimal SPR models (DTF, DTB) in training and test data captured 89.68%, 96.12% and 87.27%, 95.71% variance in respective data. The proportion of the variance explained by the model descriptors is a measure of the closeness of the predicted and actual values of the response. A model predicting exactly matching values of end point property with the measured ones would explain 100% variance in the data. Among the five descriptors, nnitro had highest contribution in both the models (Table 3). Moreover, all the five descriptors considered here have positive correlations with the end point property (DV). Further, to examine the influence of different descriptors on DV, 3D plots were constructed (Figure 5). It was done by

and is calculated from the complete molecular formula. The Gravto descriptor is related to the effective mass distribution and describes molecular size dependent effects of the molecule.72 The nnitro is count of nitrogen atoms in the molecule and is directly related with the explosivity of the compound. Both the DTF and DTB based semiquantitative SPR models yielded high discriminatory accuracies in training (97.62%, 98.81%) and test (100% in each) data (Table 2), whereas the Table 2. Performance Parameters for the Semiquantitative SPR Models model

data set

sensitivity (%)

specificity (%)

accuracy (%)

MCC

DTF-SPR

training test training test

100.00 100.00 96.43 100.00

96.61 100.00 100.00 100.00

97.62 100.00 98.81 100.00

0.95 1.00 0.97 1.00

DTB-SPR

MCC values yielded by both the models in training and test sets were close to unity (>0.95). Further, both the models yielded high sensitivity (>96%) and specificity (>96%) values in the training and test data. In this study, the AUC values of 0.983, 1.0 (DTF) and 0.982, 1.0 (DTB) in training and test sets suggest for high discriminating ability of the semiquantitative models developed here for the industrial and high explosive chemicals (Figure S1, Supporting Information). The values of all the diagnostic parameters obtained here are highly encouraging for both the training and test set compounds. An almost perfect discrimination (∼100%) of chemicals in two categories in test data rendered by two semiquantitative SPR models (Table 2) may be attributed to the relevance of structural features used in model development. The mean values of the selected descriptors for the compounds in two categories (industrial and high explosives) were 2.83, 5.17 (nnitro); 38.22, 46.42 (TIAC); and 46.97, 66.54 (Gravto), respectively. It may be noted that the mean values of all the descriptors were higher for the high explosive compounds. Further, it may be interesting to note that in DTF two compounds (chlorotrinitrobenzene, diazodinitrophenol) and in DTB only one compound (picryl chloride) was misclassified in training data, whereas in test data, both the models rendered a perfect discrimination of the compounds in two categories. From the results, it is evident that both the qualitative (ternary) and semiquantitative SPR models (DTF and DTB) identified most relevant structural features that enabled the developed SPRs to discriminate chemicals among different categories, thus rendering a high statistical confidence in their future use for screening the chemicals. 3.3. Quantitative SPR Modeling (Tier-3). The quantitative SPR models (DTF and DTB) were established for tier-3 screening of chemicals, enabling the user to predict the DV of the ideal explosive chemicals and flammability (LFL) of the nonideal ones.

Figure 5. 3D plots of selected descriptors showing interaction trends between (a) nnitro and noxy (b) MR and UI, and (c) UI and L1m in the DV prediction of ideal explosives.

Table 3. Contribution of Selected Molecular Descriptor in Quantitative SPR Modeling DV-QSPR model

LFL-QSPR model

models

nnitro

L1m

UI

MR

noxy

nta

MR

Aweight

DTF-SPR DTB-SPR

100.00 100.00

80.88 75.47

62.12 29.98

61.61 70.16

54.12 46.93

100.00 100.00

69.22 64.74

98.04 49.95

F

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research Table 4. Performance Parameters for the Quantitative SPR Models for Ideal Explosives models

data set

RMSE

R2

DTF-SPR

training test training test

0.41 0.34 0.25 0.20

0.940 0.930 0.972 0.971

DTB-SPR

QF12

QF22

QF32

CCC

r m2

0.877

0.873

0.929

0.918

0.585

0.958

0.957

0.976

0.976

0.821

varying some selected inputs at a time and considering all the others constant. Figure 5, panel a presents the relationship between nnitro and noxy with DV. It is evident that both the descriptors have a positive influence on DV as its value increases with an increase in both of these descriptors. The nnitro and noxy are the nitrogen and oxygen atom count descriptors. Since the HEMs are basically composed of N and O, the descriptors (nnitro and noxy) have a direct bearing on their detonation property. The nnitro (r = 0.51) and noxy (r = 0.34) have a significant (p < 0.05) positive correlation with the end point property. Figure 5, panel b demonstrates the combined effect of MR (molar refractivity) and UI (unsaturation index) descriptors on DV. These descriptors also exhibited positive influences on the end-property. The MR reflects the effect of size, polarizability, and stearic bulk of the molecules,73 whereas UI is a measure of degree of unsaturation of the molecule. These descriptors are hence directly related with the explosive nature of the molecule. Both of the descriptors (MR and UI) exhibited a positive correlation (r = 0.19 and 0.27) with the end point property. Figure 5, panel c shows combined influence of UI and L1m descriptors on the target property and a positive influence of both these descriptors on detonation behavior is evident. L1m, a weighted holistic invariant molecular descriptor (WHIM), represents first component size directional WHIM index weighted by atomic masses.74 L1m is related with the composition of the molecule and has a positive correlation (r = 0.10) with the end point property. From the above discussion, it can be observed that all the descriptors involved in the quantitative SPR model have physical meaning and that these descriptors can account for structural features affecting the detonation property of the considered molecules. The information obtained on the relationships of the descriptors with the end point property could be useful in designing the molecules of desired property. To evaluate the external predictive power of the constructed quantitative SPR models for predicting the DVs of untested new chemicals, we derived several statistical coefficients, Rext2, CCC, QF12, QF22, QF32, and rm2, on the test data (Table 4) that were kept out during the model building phase. For all these coefficients, a value closer to unity indicates a better quality of the model for use in screening new chemicals. Here, the values of all the internal and external validation metrics for the developed SPR models were within their respective acceptable limits (except rm2 in DTF) and suggest high predictive power of the constructed SPR models for new chemicals. Further, satisfactory values for all the metrics account for small deviations of the predicted DV values from the corresponding experimental values, and identical values for the R2 in training and test sets indicate that the distributions of the responses in two sets are similar. A SPR model is considered acceptable when the value of R2 in external set exceeds 0.6.63 Further, a closely followed pattern of variation of the measured and model predicted responses of the chemicals by the constructed DTF and DTB SPR models (Figure 6) suggests that both the models performed reasonably well. The values of the validation

Figure 6. Plot showing the distribution of measured and model predicted DV values of ideal explosive chemicals (a) DTF SPR, and (b) DTB SPR models.

coefficients further suggest that the performances of the SPR models based on DTF and DTB algorithms are closely comparable. Further, we compared our results with earlier reports.9,10 Wang et al.10 proposed multiple linear regression (MLR) and partial-least-squares regression (PLSR) based QSPR models for predicting the DV explosive chemicals (n = 54) and reported respective R2 values of 0.921 and 0.971 in test data. Although the results are comparable with the present study, the selected data set was limited. Gupta et al.9 recently proposed logistic regression (LR) and PLSR based models for DV of chemicals (n = 92) and reported respective R2 values of 0.922 and 0.914 in test set. Thus, the SPR models proposed in this study performed relatively better than the models based on chemometric approaches. Analysis of the variation between the predicted and experimental values of the target property may further help to understand the limitations of the SPR models. Analysis of the chemicals in test set that were predicted with larger errors revealed that four chemicals (octogen, methylenedinitramine, I,8-dinitronaphthalene, 4,4′-dinitro-3,3′-bifurazan) in DTF were predicted with error of >0.5 unit, whereas none of the compounds was predicted by DTB with such an error. All the four compounds are neutral organics, and three of these (octogen, methylenedinitramine, 4,4′-dinitro-3,3′-bifurazan) G

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research Table 5. Comparison of DV Results from Empirical Codes

a

S. No.

explosive name

DVexp (km/s)

DVDTB (km/s)

DeviationDTB (%)

DVCODE (km/s)

DeviationCODE (%)

1a 2a 3a 4a 5b 6b 7b

cyclotrimethylene trinitramine nitroglycerine nitroguanidine picric acid tetranitromethane triaminotrinitrobenzene trinitrotoluene

8.85 7.60 8.20 7.35 6.36 7.35 6.90

8.71 7.48 7.98 7.13 6.53 7.57 6.89

1.55 1.61 2.66 2.94 −2.63 −3.04 0.10

8.64 7.70 8.07 7.03 6.45 7.58 7.07

2.37 −1.32 1.59 4.35 −1.40 −3.13 −2.46

Keshavarz and Pouretedal.76 bFried and Souers.75

belong to the high explosive category. We also compared the DV values of selected chemicals predicted by DTB SPR and CHEETAH and BKW-IOS method75,76 (Table 5). It is evident that the prediction results yielded by the proposed QSPR model here are relatively better than those predicted by the empirical code. Thus, the proposed SPR model can be used as a tier-3 tool for quantitative prediction of the DV values of chemicals. 3.3.2. Quantitative SPR Model for Nonideal Explosives. The DTF and DTB based SPR models for predicting the flammability of the nonideal explosive chemicals were developed using three descriptors (nta, MR, Aweight), and their optimal architectures have 210, 410 trees in series, 21, 8 maximum depth of tree, and 80, 179.9 number of average group splits, respectively. The two models (DTF, DTB) in training and five-fold CV have RMSE of 0.51, 0.71 and 0.36, 0.76, respectively, and the five-fold Y-scrambling test yielded the R2 and cRp2 values of 0.004, 0.947 and 0.004, 0.968, respectively. The results suggest that the original models have no chancecorrelations. The optimal SPR models (DTF, DTB) in training and test data captured 93.98%, 95.97% and 93.81%, 96.43% variance in respective data. The nta in both the models exhibited the highest contribution (Table 3). Here, the structural features encoded by nta, MR, and Aweight descriptors were found responsible for the flammability of the nonideal explosive chemicals. Further, the combined effects of different descriptors on the flammability of nonideal explosives used here were examined through constructing 3D plots (Figure 7). Figure 7, panel a shows the relationships of nta and MR with LFL, which clearly depicts a negative dependence of flammability on nta. The nta is total atom count descriptor, and the MR reflects the effect of size, polarizability, and stearic bulk of the molecules.73 Both of these descriptors are negatively correlated with the flammability. Thus, higher atom count in a molecule would reflect a larger molecular size, hence lower flammability. A plot showing the combined influence of MR and Aweight on LFL (Figure 7b) suggests that Aweight descriptor has a positive influence on the flammability, whereas MR has an opposite influence. The Aweight represents average molecular weight of the molecule and is calculated as the molecular weight divided by number of atoms. This descriptor is positively correlated with flammability. For both the SPR models, all the internal and external validation metrics (Table 6) were within their respective thresholds, suggesting for their high predictivity for new chemicals. Further, a closely followed pattern of variation of the measured and model predicted responses of the chemicals by the constructed DTF and DTB SPR models (Figure 8) suggests that both the models performed reasonably well. The values of the validation coefficients further suggest that the performances of the SPR models based on DTF and DTB

Figure 7. 3D plots of selected descriptors showing interaction trends between (a) nta and MR, and (b) MR and Aweight in the LFL prediction of nonideal explosives.

algorithms are closely comparable in predicting the flammability of the chemicals. Moreover, two test set compounds (nbutyl chloride, ethyl mercaptan) in DTF and one compound (n-butyl chloride) in DTB were predicted with errors larger than 0.5 unit. Both of these over predicted compounds have low flammability (LFL < 2.8). 3.4. Applicability Domain Analysis of the SPR Models. The ADs of the developed qualitative and semiquantitative SPR models were determined using the descriptors range approach. For the quantitative SPR models, three different approaches (descriptors range, leverage, and ED) were used to analyze the ADs. From the results (Table S3, Supporting Information), it is evident that all the test compounds were within the AD of the qualitative, semiquantitative, and quantitative SPR models. The ADs of the developed quantitative SPR models were also analyzed by the ED approach. Plots of normalized mean distance and end point property of the training and test set compounds are shown in Figure 9. It may be noted that all the test compounds for the ideal and nonideal explosives are inside the domain/area covered by the training set compounds. Further, the structurally influential (h > h*) and response outlier (standardized residual >3) compounds were identified using the Williams plots (Figure 10). Analysis of the Williams plot shows that in case of the ideal explosives, four compounds (cyanuric triazide, dipentaerythritol hexanitrate, tetranitrodiH

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research Table 6. Performance Parameters for the Quantitative SPR Models for Nonideal Explosives models

data set

RMSE

R2

DTF-SPR

training test training test

0.45 0.19 0.37 0.15

0.949 0.945 0.970 0.966

DTB-SPR

QF12

QF22

QF32

CCC

r m2

0.958

0.938

0.989

0.969

0.893

0.976

0.964

0.994

0.982

0.935

Figure 9. Plot showing AD of the SPR models using the ED approach. (a) Ideal explosives and (b) nonideal explosives.

Figure 8. Plot showing the distribution of measured and model predicted LFL values of nonideal explosive chemicals (a) DTF SPR and (b) DTB SPR models.

can be developed by adjusting the values of the molecular descriptors. 3.5. Compliance with OECD Guidelines. In this study, a three-tier strategy (Figure 1) based on predictive qualitative, semiquantitative, and quantitative SPR modeling has been developed for screening the explosive chemicals in accordance with the OECD guidelines for QSAR development.77 Accordingly, the developed QSPR models here were based on the chemical data sets that have well-defined end points, such as the DV and flammability (LFL) values of diverse chemicals established using standard experimental protocols. Second, we employed well established unambiguous algorithms (DTF and DTB) for SPR model development and clearly described the procedural steps for the calculation of molecular descriptors, data splitting, and selection of relevant features for model development. According to the third OECD guideline, the ADs of the developed SPR models were determined using three different approaches including the descriptor range, ED, and leverage methods. The proposed models were rigorously validated using the internal and external validation procedures, and statistical quality of the models was ensured through deriving various stringent parameters (OECD guideline 4). To meet the requirement of the fifth OECD guideline, we provided suitable mechanistic interpretation for the selected features responsible for the explosive behavior of the chemicals.

benzo-1,3a,4,6a-tetrazapentalene, 2,6-bis(picrylamino)-3,5-dinitropyridine) were identified as structurally influential chemicals, whereas only two compounds (hexanitroethane, propane) in DTF and one compound (trinitronaphthalene) in DTB was detected as the response outlier. In case of the nonideal explosives, the quantitative SPR models identified nine high leverage compounds (n-hexadecane, p-terphenyl, n-butyl stearate, methylene chloride, trichloroethylene, ethyl bromide, propargyl bromide, allyl bromide, anthracene). Moreover, the DTF and DTB models rendered six (methylene chloride, vinyl chloride, dichloroethylene, trichloroethylene, isocrotyl bromide, carbon disulfide) and three (methylene chloride, trichloroethylene, carbon disulfide) compounds as the response outliers. The structures of the response outliers and structurally influential chemicals are given in Tables S4 and S5 (Supporting Information). The anomalous behavior of the compounds outside the ADs of the models may be due some relevant structural features present in these molecules and could not be captured by the selected descriptors. The developed SPR models can be used to predict the explosive behavior of new compounds if they locate in the AD of the respective models. Moreover, since this study established quantitative relationships between the explored structural features of the molecules and their end point properties, compounds of desired characteristics I

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research

Figure 10. Williams plot for the (a) ideal explosives and (b) nonideal explosives.

4. CONCLUSIONS In this work, we have developed a three-tier SPR modeling strategy for screening the chemicals for their explosive characteristics. Accordingly, DTF and DTB based, qualitative, semiquantitative, and quantitative SPR models were established in accordance with the OECD guidelines for discriminating the nonexplosive, ideal, and nonideal explosive compounds; industrial and high explosive compounds; and for predicting the characteristic explosive properties (DV for the ideal and flammability for the nonideal explosives) of chemicals. Further, we have explored the significant characteristic structural features of a large number of chemicals that encode the chemical properties and have quantitative relationships with the end point properties. The validation metrics and predictive performance statistics derived for these SPR models entrusted a high confidence for their use in screening new chemicals. We also provided a mechanistic interpretation for the association of the structural features with the end point properties of the chemicals. Comparison with previously published QSAR models on the same end point property also showed the encouraging statistical quality of the proposed SPR models in this study. The proposed three-tier strategy can be used for the screening of chemicals for future HEM designing and development process and safety assessment.





explosives; DTF training set ROC plot in the semiquantitative SPR analysis; DTF test set ROC plot in the semiquantitative SPR analysis; DTB training set ROC plot in the semiquantitative SPR analysis; DTB test set ROC plot in the semiquantitative SPR analysis (PDF)

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]; [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors thank the Director, CSIR-Indian Institute of Toxicology Research, Lucknow (India) for his keen interest in this work and providing all necessary facilities.



REFERENCES

(1) Agrawal, J. P. High Energy Materials: Propellants, Explosives, and Pyrotechnics; WILEY-VCH: Verlag GmbH & Co. KGaA: Weinheim, 2010. (2) Talawar, M. B.; Sivabalan, R.; Mukundan, T.; Muthurajan, H.; Sikder, A. K.; Gandhe, B. R.; Rao, A. S. Environmentally compatible next generation green energetic materials (GEMs). J. Hazard. Mater. 2009, 161, 589. (3) Nair, U. R.; Asthana, S. N.; Rao, A. S.; Gandhe, B. R. Advances in high energy materials. Def. Sci. J. 2010, 60, 137. (4) Agrawal, J. P. Recent trends in high energy materials. Prog. Energy Combust. Sci. 1998, 24, 1. (5) Keshavarz, M. H. Correlations for predicting detonation temperature of pure and mixed CNO and CHNO explosives. Indian J. Eng. Mater. Sci. 2005, 12, 158. (6) Wan, X.; Zhang, Q.; Shen, S. L. Theoretical estimation of the lower flammability limit of fuel-air mixtures at elevated temperatures and pressures. J. Loss Prev. Process Ind. 2015, 36, 13. (7) Coronado, C. J., Jr.; Carvalho, J. A.; Andrade, J. C.; Cortez, E. V.; Carvalho, F. S.; Santos, J. C.; Mendiburu, A. Z. Flammability limits: a review with emphasis on ethanol for aeronautical applications and

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.iecr.5b03575. Data set use in SPR modeling; selected descriptor in discrimination modeling and quantitative SPR modeling; AD of the molecular descriptors in qualitative, semiquantitative, and quantitative SPR models; high leverage chemicals and response outliers in quantitative SPR models for ideal explosives; high leverage chemicals and response outliers in quantitative SPR models for nonideal J

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research description of the experimental procedure. J. Hazard. Mater. 2012, 241-242, 32. (8) Crowl, D. A.; Louvar, J. F. Chemical Process Safety: Fundamentals with Applications; 3rd ed.; Prentice Hall: Boston, MA, 2011; p 245. (9) Gupta, S.; Basant, N.; Singh, K. P. Identifying high energy molecules and predicting their detonation potency using chemometric modeling approaches. Combust. Theory Modell. 2015, 19, 451. (10) Wang, D.; He, G.; Chen, H. Prediction for the detonation velocity of the nitrogen-rich energetic compounds based on quantum chemistry. Russ. J. Phys. Chem. A 2014, 88, 2363. (11) Albahri, T. A. Flammability characteristics of pure hydrocarbons. Prepr. Pap.-Am. Chem. Soc., Div. Fuel Chem. 2003, 48, 683. (12) Gharagheizi, F. Quantitative structure-property relationship for prediction of the lower flammability limit of pure compounds. Energy Fuels 2008, 22, 3037. (13) Gharagheizi, F. A new group contribution-based model for estimation of lower flammability limit of pure compounds. J. Hazard. Mater. 2009, 170, 595. (14) Pan, Y.; Jiang, J.; Wang, R.; Cao, H.; Cui, Y. A novel QSPR model for prediction of lower flammability limits of organic compounds based on support vector machine. J. Hazard. Mater. 2009, 168, 962. (15) Lazzus, J. A. Neural network/ particle swarm method to predict flammability limits in air of organic compounds. Thermochim. Acta 2011, 512, 150. (16) Bagheri, M.; Rajabi, M.; Mirbagheri, M.; Amin, M. Neural network/ particle swarm method to predict flammability limits in air of organic compounds. J. Loss Prev. Process Ind. 2012, 25, 373. (17) Albahri, T. A. Prediction of the lower flammability limit percent in air of pure compounds from their molecular structures. Fire Saf. J. 2013, 59, 188. (18) Shekhar, H. Studies on Empirical Approaches for Estimation of Detonation Velocity of High Explosives. Cent. Eur. J. Energy Mater. 2012, 9, 39. (19) Roy, K.; Kar, S.; Das, R. N. A Primer on QSAR/QSPR Modeling Fundamental Concepts. Springer Briefs in Molecular Science; Springer: New York, 2015; DOI 10.1007/978-3-319-17281-1. (20) Morrill, J. A.; Byrd, E. F. C. Development of quantitative structure−property relationships for predictive modeling and design of energetic materials. J. Mol. Graphics Modell. 2008, 27, 349. (21) Wang, R.; Jiang, J.; Pan, Y.; Cao, H.; Cui, Y. Prediction of impact sensitivity of nitro energetic compounds by neural network based on electrotopological-state indices. J. Hazard. Mater. 2009, 166, 155. (22) Xu, J.; Zhu, L.; Fang, D.; Wang, L.; Xiao, S.; Liu, L.; Xu, W. QSPR studies of impact sensitivity of nitro energetic compounds using three-dimensional descriptors. J. Mol. Graphics Modell. 2012, 36, 10. (23) Singh, K. P.; Gupta, S.; Kumar, A.; Mohan, D. Multi-species QSAR modeling for predicting aquatic toxicity of diverse organic chemicals for regulatory toxicology. Chem. Res. Toxicol. 2014, 27, 741. (24) Basant, N.; Gupta, S.; Singh, K. P. Predicting toxicities of structurally diverse chemical pesticides in multiple aquatic test species using QSTR modeling approaches. Chemosphere 2015, 139, 246. (25) Yang, P.; Yang, Y. H.; Zhou, B. B.; Zomaya, A. Y. A review of ensemble methods in bioinformatics. Curr. Bioinf. 2010, 5, 296. (26) Singh, K. P.; Gupta, S.; Basant, N. In silico prediction of cellular permeability of diverse chemicals using qualitative and quantitative QSAR modeling approaches. Chemom. Intell. Lab. Syst. 2015, 140, 61. (27) Singh, K. P.; Gupta, S. In silico prediction of toxicity of noncongeneric industrial chemicals using ensemble learning based modeling approaches. Toxicol. Appl. Pharmacol. 2014, 275, 198. (28) Singh, K. P.; Gupta, S. Nano-QSAR modeling for predicting biological activity of diverse nanomaterials. RSC Adv. 2014, 4, 13215. (29) OECD. Guidance on the Principle of Measure of Goodness of Fit, Robustness, and Predictivity, Guideline no. ENV/JM/MONO(2007)2; OECD: Paris, France, 2007; Chapter 5, pp 42−65. (30) Kuchta, J. M. Investigation of Fire and Explosion Accidents in the Chemical, Mining, and Fuel-Related Industries: A Manual; UNT Digital Library: Washington DC, 1985. http://digital.library.unt.edu/ark:/ 67531/metadc12822/ (accessed 10 June 2015).

(31) Zeman, V.; Kočí, J.; Zeman, S. Electric spark sensitivity of polynitro compounds Part II: A correlation with detonation velocity of some polynitro arenes. Energy Mater. 1999, 7, 127. (32) Thangadurai, S.; Kartha, K. P. S.; Sharma, D. R.; Shukla, S. K. Review of some newly synthesized high energetic materials. Sci. Tech. Energy Mater. 2004, 65, 215. (33) Meyer, R.; Köhler, J.; Homburg, A. Explosives, 6th ed.; WileyVCH & Co. KGaA: Weinheim, 2007. (34) Trzciński, W. A.; Cudziło, S.; Chyłek, Z.; Szymańczyk, L. Detonation Properties of 1,1-diamino-2,2-dinitroethene (DADNE). J. Hazard. Mater. 2008, 157, 605. (35) Methyl Ethyl Ketone Peroxide. http://www.worldlibrary.in/ articles/methyl_ethyl_ketone_peroxide (accessed 3 June 2015). (36) Türker, L. Velocity of Detonation-A Mathematical Model. Acta. Chim. Slov. 2010, 57, 288. (37) Türker, L. A Trigonometric Approach to a Limiting Law on Detonation Velocity. MATCH Commun. Math. Comput. Chem. 2012, 67, 127. (38) Zhang, Q.; Chang, Y. Prediction of Detonation Pressure and Velocity of Explosives with Micrometer Aluminum Powders. Cent. Eur. J. Energy Mater. 2012, 9, 77. (39) Cao, D. S.; Xu, Q. S.; Hu, Q. N.; Liang, Y. Z. ChemoPy: freely available python package for computational biology and chemoinformatics. Bioinformatics 2013, 29, 1092. (40) Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Model. 1988, 28, 31. (41) ChemSpider. www.chemspider.com (accessed 4 July 2015). (42) Pubchem. http://pubchem.ncbi.nlm.nih.gov/compound/www. chemspider.com (accessed 22 November 2015). (43) ChemMop. http://www.scbdd.com/mopac-optimization/ optimize/www.chemspider.com (accessed 4 July 2015). (44) Reitermanov, Z. Data splitting. WDS’10 Proceedings of Contributed Papers, Part I, 2010, pp 31−36. (45) Basant, N.; Gupta, S.; Singh, K. P. Predicting toxicities of diverse chemical pesticides in multiple avian species using tree-based QSAR approaches for regulatory purposes. J. Chem. Inf. Model. 2015, 55, 1337. (46) Singh, K. P.; Gupta, S.; Rai, P. Predicting acute aquatic toxicity of structurally diverse chemicals in fish using artificial intelligence approaches. Ecotoxicol. Environ. Saf. 2013, 95, 221. (47) Explosives and Detonators. https://miningandblasting.files. wordpress.com/2009/09/explosives-and-detonators.pdf (accessed 23 July 2015). (48) Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123. (49) Friedman, J. H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367. (50) Singh, K. P.; Gupta, S.; Mohan, D. Evaluating influences of seasonal variations and anthropogenic activities on alluvial groundwater hydrochemistry using ensemble learning approaches. J. Hydrol. 2014, 511, 254. (51) DTREG. www.dtreg.com, 2010. (52) Rücker, C.; Rücker, G.; Meringer, M. Y-Randomization and its variants in QSPR/QSAR. J. Chem. Inf. Model. 2007, 47, 2345. (53) Mitra, I.; Saha, A.; Roy, K. Exploring quantitative structure− activity relationship studies of antioxidant phenolic compounds obtained from traditional Chinese medicinal plants. Mol. Simul. 2010, 36, 1067. (54) Cooper, J. A.; Saracci, R.; Cole, P. Describing the validity of carcinogen screening test. Br. J. Cancer 1979, 39, 87. (55) Cheng, F.; Shen, J.; Yu, Y.; Li, W.; Liu, G.; Lee, P. W.; Tang, Y. In silico prediction of tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods. Chemosphere 2011, 82, 1636. (56) Fawcett, T. An introduction to ROC analysis. Pattern Recogn. Lett. 2006, 27, 861. (57) Lin, L. I. Assay validation using the concordance correlation coefficient. Biometrics 1992, 48, 599. K

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Article

Industrial & Engineering Chemistry Research (58) Shi, L. M.; Fang, H.; Tong, W.; Wu, J.; Perkins, R.; Blair, R. M.; Branham, W. S.; Dial, S. L.; Moland, C. L.; Sheehan, D. M. QSAR models using a large diverse set of estrogens. J. Chem. Inf. Model. 2001, 41, 186. (59) Schuurmann, G.; Ebert, R.; Chen, J.; Wang, B.; Kuhne, R. External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs training set activity mean. J. Chem. Inf. Model. 2008, 48, 2140. (60) Consonni, V.; Ballabio, D.; Todeschini, R. Comments on the definition of the Q2 parameter for QSAR validation. J. Chem. Inf. Model. 2009, 49, 1669. (61) Roy, K.; Chakraborty, P.; Mitra, I.; Ojha, P. K.; Kar, S.; Das, R. N. Some case studies on application of “rm2” metrics for judging quality of quantitative structure−activity relationship predictions: Emphasis on scaling of response data. J. Comput. Chem. 2013, 34, 1071. (62) Chirico, N.; Gramatica, P. Real external predictivity of QSAR models: Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. J. Chem. Inf. Model. 2012, 52, 2044. (63) Tropsha, A.; Golbraikh, A.; Cho, W. J. Development of kNN QSAR models for 3-arylisoquinoline antitumor agents. Bull. Korean Chem. Soc. 2011, 32, 2397. (64) Netzeva, T. I.; Worth, A. P.; Aldenberg, A.; Benigni, R.; Cronin, M. T. D.; Gramatica, P.; Jaworska, J. S.; Kahn, S.; Klpoman, G.; Marchant, C. A.; Myatt, G.; Nikolova-Jeliazkova, N.; Patliwicz, G. Y.; Perkins, R.; Roberts, D. W.; Schultz, T. W.; Stanton, D. P.; van de Sandt, J. J. M.; Tong, W.; Veith, G.; Yang, C. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationship. Altern. Lab. Anim. 2005, 33, 155. (65) Gramatica, P. Principles of QSAR models validation: internal and external. QSAR Comb. Sci. 2007, 26, 694. (66) Puzyn, T.; Rasulev, B.; Gajewicz, A.; Hu, X.; Dasari, T. P.; Michalkova, A.; Hwang, H. M.; Toropov, A.; Leszczynska, D.; Leszczynski, J. Using nano-QSAR to predict the cytotoxicity of metal oxide nanoparticles. Nat. Nanotechnol. 2011, 6, 175. (67) Ghorbanzad'e, M.; Fatemi, M. H.; Karimpour, M.; Andersson, P. L. Quantitative and qualitative prediction of corneal permeability for drug-like compounds. Talanta 2011, 85, 2686. (68) Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors; Wiley−VCH: Weinheim, 2000. (69) Contrera, J. F.; MacLaughlin, P.; Hall, L. H.; Kier, L. B. QSAR modeling of carcinogenic risk using discriminant analysis and topological molecular descriptors. Curr. Drug Discovery Technol. 2005, 2, 55. (70) Martínez-Martínez, F. J.; Razo-Hernández, R. S.; PerazaCampos, A. L.; Villanueva-García, M.; Sumaya-Martínez, M. T.; Cano, D. J.; Gómez-Sandoval, Z. Synthesis and in vitro antioxidant activity evaluation of 3-carboxycoumarin derivatives and QSAR study of their DPPH• radical scavenging activity. Molecules 2012, 17, 14882. (71) Fjodorova, N.; Vracko, M.; Novic, M.; Roncaglioni, A.; Benfenati, E. New public QSAR models for carcinogenicity. Chem. Cent. J. 2010, 4, S3. (72) Katritzky, A. R.; Sild, S.; Karelson, M. General Quantitative Structure−Property Relationship Treatment of the Refractive Index of Organic Compounds. J. Chem. Inf. Model. 1998, 38, 840. (73) Nikaljea, A. P. G.; Pathan, M.; Narute, A. S.; Ghodke, M. S.; Rajani, D. Synthesis and QSAR Study of Novel N-(3-chloro-2-oxo-4substituted azetidin-1-yl) isonicotinamide derivatives as Anti mycobacterial Agents. Der. Pharmacia. Sinica 2012, 3, 229. (74) Todeschini, R.; Consonni, V. Descriptors from Molecular Geometry. Handbook of Chemoinformatics: From Data to Knowledge in 4 Vols; Gasteiger, J., Ed.; Wiley-VCH: Verlag GmbH: Weinheim, Germany, 2003. DOI: 10.1002/9783527618279.ch37. (75) Fried, L.; Souers, P. CHEETAH: A Next Generation Thermochemical Code; Lawrence Livermore National Laboratory: CA, USA, UCRL-ID-117240, 1994. (76) Keshavarz, M. H.; Pouretedal, H. R. Predicting the detonation velocity of CHNO explosives by a simple method. Propellants, Explos., Pyrotech. 2005, 30, 105.

(77) Fjodorova, N.; Novich, M.; Vrachko, M.; Smirnov, V.; Kharchevnikova, N.; Zholdakova, Z.; Novikov, S.; Skvortsova, N.; Filimonov, D.; Poroikov, V.; Benfenati, E. Directions in QSAR Modeling for regulatory uses in OECD Member Countries, EU and in Russia. J. Environ. Sci. Health, Part C Environ. Carcinog. Ecotoxicol. Rev. 2008, 26, 201.

L

DOI: 10.1021/acs.iecr.5b03575 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX