Physics-Based Modeling of Chemical Hazards in a Regulatory

Jun 14, 2016 - A semiempirical model based on simple physical assumptions is rigorously compared to a combination of two recent state-of-the-art quant...
2 downloads 15 Views 705KB Size
Correlation pubs.acs.org/IECR

Physics-Based Modeling of Chemical Hazards in a Regulatory Framework: Comparison with Quantitative Structure−Property Relationship (QSPR) Methods for Impact Sensitivities Didier Mathieu* CEA, DAM, Le Ripault, F-37260 Monts, France S Supporting Information *

ABSTRACT: A semiempirical model based on simple physical assumptions is rigorously compared to a combination of two recent state-of-the-art quantitative structure−property relationship (QSPR) methods for impact sensitivities of nitroaliphatic and nitramine compounds. For most datasets considered, it yields slightly better predictions than QSPR schemes, which is noteworthy, considering the fact that it relies only on three adjustable parameters and an equation developed independently of the data. Further advantages of this approach include physical interpretability, enhanced robustness, and wider applicability. Therefore, there is no doubt that such physics-based models provide valuable alternatives to the purely empirical relationships usually employed in regulatory contexts, especially in situations where experimental data are scarce. optimistic estimates of prediction.2 On the other hand, the OECD criteria may be difficult to satisfy. This is especially true for point (3), because of the lack of universally applicable procedures to define the applicability domain (AD) of a model.3 Moreover, users of the model are likely to need predictions for arbitrary compounds, not only for those for which the model is deemed reliable. In fact, better alternatives that would reduce the ambiguity in AD estimations are actively being researched.4 The current focus of regulation agencies on QSPR approaches is understandable, as these techniques are especially well-suited to develop accurate predictive models from large amounts of empirical data. For major aspects of chemical risk, such as toxicity, a huge number of assays have been carried out.5−7 On the other hand, it is difficult to develop theoretically based approaches in view of the inherent complexity of biological processes.8 Nevertheless, whenever an explicit mechanism of action (MOA) has been presented, it is usually worthwhile to take it into consideration, using a physics-based mathematical modeling rather than an equation derived from a statistical evaluation of candidate variables or formalisms.9,10 While the determination of the relevant mechanisms can be extremely difficult for biological activities or toxicological end points, making assumptions regarding the underlying physics, in principle, is easier for the physicochemical properties of chemicals, because they are not dependent on the complex interactions that characterize biological functions. As a matter of

1. INTRODUCTION The early identification of the hazardous properties of chemical substances is a challenge in the frame of industrial safety management. Predictive tools are increasingly used for this purpose. Their development is currently stimulated by the needs arising from newly implemented regulatory frameworks. It is usually based on well-established procedures to derive quantitative structure−property relationships (QSPRs). This approach involves broadly two distinct steps. The first one is the definition of a pool of molecular descriptors and the computation of the corresponding values for every compound in the dataset. The second step is the identification of a relationship linking a property of interest to these descriptors, using statistical or machine-learning techniques. To promote this approach in a regulatory context, the Organisation for Economic Co-operation and Development (OECD) proposed in 2004 that, for every predictive model, the authors should report (1) a defined end point; (2) an unambiguous algorithm; (3) a defined domain of applicability; (4) appropriate measures of goodness-of-fit, robustness, and predictivity; and, (5) whenever possible, a mechanistic interpretation.1 Such a validation of QSPR procedures is clearly desirable in view of their empirical character and the possibility of chance correlations. Accordingly, it is expected that routine applications of procedures designed with the OECD principles in mind will lead to a large number of models compliant with these principles in the forthcoming years. Nevertheless, it must be kept in mind that, although consistency with the OECD principles enforces a minimal quality for QSPR models, it cannot ensure their actual predictive value. It is well-established that standard procedures to evaluate the latter on the basis of a random external test set may produce © XXXX American Chemical Society

Received: April 20, 2016 Revised: May 27, 2016 Accepted: June 14, 2016

A

DOI: 10.1021/acs.iecr.6b01536 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Correlation

Industrial & Engineering Chemistry Research

0.75 (N = 67), depending on the extent of data curation.30 Therefore, the question arises of whether one should favor a specialized QSPR model rather than our more general approach to estimate the sensitivity of compounds belonging to the main families of explosives. To answer this question, the present paper reports a rigorous comparison of the relative performance of the two approaches. More specifically, our model is compared with two recent OECD-compliant predictive schemes that were focused, respectively, on nitroaliphatic compounds32 and nitramine compounds,33 hereafter denoted respectively by C-NO2 and N-NO2. The same training and test sets are used for both approaches, as required for meaningful results. The comparison relies on standard criteria, including determination coefficients R2 (derived from the training set) and Q2 (derived from a leaveone-out cross-validation) to characterize quality of fit and robustness. The predictive value of the models is characterized by the determination coefficient RX2 derived from data predicted for an external test set. However, this criterion does not account for the possibility that log(h50) values for the test set might be systematically underestimated (or overestimated). Therefore, we also consider the root-mean-square error (RMSE) between the calculated and observed data. In particular, the RMSEX value calculated for external test sets is used as the main indicator of the predictive value of the models studied. These criteria are consistent with the most recent recommendations.34

fact, such models based on equations derived from physical considerations have been shown to systematically outperform QSPR methods for many properties, including densities11 and flash point12 of liquids, melting points13 and sublimation enthalpies14 of crystals, Gurney velocities of explosives,15 and heats of decomposition of organic substances.16−18 On the other hand, growing safety concerns and recent regulations, including the European Union (EU) REACH legislation, require the evaluation of more-complex physicochemical properties, such as dust explosivity19 or the impact sensitivity of chemical energetic substances. The latter is most often characterized using the so-called drop weight impact test, which measures the height h50 that a given weight must be dropped onto the sample to trigger an observable decomposition with a 50% probability.20 Because h50 typically ranges over 3 orders of magnitude (from a few centimeters to a few meters), one is mostly interested in log(h50). For several decades, the prediction of this property for new molecules has mainly relied on purely empirical QSPR relationships, as summarized in recent reviews.21−24 Despite much effort, these predictive methods remain limited in scope, except for the QSPR introduced in 2008 by Morrill and Byrd, which may be applied to most energetic compounds of interest, although its predictive value is rather deceptive, in view of the advanced procedure used and given the fact that an extended pool of descriptors was considered.25 Given the relatively small amount of data at hand, recent QSPR approaches focused on the interpretability of the resulting models might seem to be more attractive than such systematic searches for statistically satisfactory relationships, based on many arbitrary descriptors.26 Alternatively, we may follow a physically grounded approach, assuming that the rate-determining step in the decomposition process is associated with the probability that reactive centers will trigger the decomposition of surrounding molecules before the energy released by the reactions dissipate away. For nitro compounds, the additional assumption that sensitivity is determined by the weakest X−NO2 bond led to the introduction of a theoretical sensitivity index that proves to be significantly correlated with observed sensitivities.27,28 In the next step, all explosophore bonds were taken into consideration, rather than focusing on the weakest. This led to the present model, first restricted to nitro compounds29 and more recently generalized to other organic materials.30 Similar to the QSPR of Morrill and Byrd,25 this latter parametrization may be applied to all main classes of nonionic explosive substances. However, a comparison based on a common external test set reveals that it yields significantly improved predictions.29 This semiempirical model (hereafter referred to as the SE model) appears as an interesting alternative to QSPR methods to estimate impact sensitivities for regulatory purposes, at least when a generally applicable method is needed, as is often the case in practice. Because the QSPR approach relies on the fit of the property surface against close data points in the descriptors space, it should perform best for specific classes of similar compounds for which extensive experimental data are available. With regard to impact sensitive energetic compounds, this is the case for nitroaromatics or nitroaliphatic compounds. In fact, recent QSPR approaches that were focused on such restricted chemical families seem to perform remarkably well, as reflected by values of the determination coefficient for an external test set (RX2) close to 0.9 for the best models.31−33 By comparison, external test set predictions using the most general parametrization of our model yields RX2 values ranging from 0.69 (N = 74 entries) to

2. DATASET All h50 values presently used were obtained with the help of the same experimental apparatus, namely, the ERL Type 12 tooling with a 2.5 kg impact hammer.20 For the sake of comparison with QSPR methods, two datasets are first considered, namely, 50 CNO2 compounds studied by Prana et al.32 and 60 N-NO2 compounds studied by Fayet et al.33 The same partitions of these data into training and test sets are considered. In a second step, a more stringent comparison is carried out by considering 19 additional entries from the database reported in our recent paper30 and not considered in QSPR studies.32,33 These new data include 3 C-NO2 and 14 N-NO2 compounds included in a new external test set, as well as two additional nitramines that were deliberately excluded from presently reported statistics, because they carry unusual functional groups (NF2 and −NO) and should therefore be considered as lying outside the scope of the present methods. These two outliers are referenced here as HNFX and MNX and are shown in Figure 1. 3. MODELS 3.1. QSPR Approach. The QSPR methodology yields many different models, depending on many factors, including the composition of the training set, the pool of descriptors considered, and the specific algorithm employed to select a

Figure 1. Structures of 3,3,7,7-tetrakis(difluoramino)octahydro-1,5dinitro-1,5-diazocine (HNFX) and hexahydro-1-nitroso-3,5-dinitro1,3,5-triazine (MNX). B

DOI: 10.1021/acs.iecr.6b01536 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Correlation

Industrial & Engineering Chemistry Research

where kB is the Boltzmann constant, NA the number of atoms on a molecule, and ZN a kinetic prefactor.30 Di and Dj are the dissociation energies of the C−NO2 and N−NO2 bonds, respectively. They approximate the corresponding activation energies for homolytic cleavage of these covalent linkages. Equation 4 yields the rate of a thermally activated process at temperature Te arising from the energy released by decomposed molecules:

subset of relevant descriptors as input variables to the model. For C-NO2 compounds, the three OECD compliant models recently reported yield good fits (0.88 < R2 < 0.90).32 In addition, they prove quite robust (0.85 < Q2 < 0.90). However, they differ significantly regarding their predictive value, with 0.73 < RX2 < 0.88 and 0.19 < RMSEX < 0.29. In fact, two models are very close in performance, one of them being extremely simple, because it is dependent only on constitutive descriptors:32 ⎛n ⎞ log h50 = 1.94 − 2.53⎜ N ⎟ − 0.25n NO2 + 0.07nsingle ⎝ NA ⎠

⎛ Ec ⎞ kBTe = η⎜ ⎟ ⎝ 3NA /2 ⎠

(1)

where Ec is the energy content of a molecule, i.e., the amount of energy that it may release upon decomposition. Its value is estimated using the H2O−CO2 arbitrary,35 neglecting the formation enthalpy ΔfH0 of the unreacted material and resorting to experimental formation enthalpies compiled in the NIST WebBook database36 for the decomposition products.29 Since kpr is defined up to a multiplicative constant, we may arbitrarily set ZC = 1 for prefactors associated with C−NO2 bonds. On the other hand, standard Di data derived from density functional theory are used, as described in ref 29. For completeness, the values involved in the present study are reported in Table S2 in the Supporting Information. In contrast to the couple of QSPR models represented by eqs 1 and 2, which require a total of 9 adjustable parameters to estimate h50 for present compounds, the SE model predicts these data on the basis of a single equation involving only 3 adjustable parameters, namely, kc, η, and ZN (prefactor associated with the homolytic scission of N−NO2 bonds). These parameters have already been optimized, along with others, in a general parametrization of this model against many different families of explosive compounds.30 In what follows, this former parametrization is applied to the present dataset. However, for a rigorous comparison with QSPR schemes, the model is presently reoptimized against the same training sets. For this purpose, we use the Levensberg-Marquardt algorithm implemented in the python-lmfit package.37

where the descriptors are as follows: nN, the number of nitrogen atoms; nNO2, the number of nitro groups; nsingle, the number of single bonds; and NA, the total number of atoms in the molecule. For N-NO2 compounds, a series of QSPR models issued from 17 partitions of the dataset of 60 compounds have been reported.33 Each of them is associated with a simple linear equation, in terms of straightforward two-dimensional (2D) molecular descriptors. In their article, Fayet et al. focused on the four models associated with partitions numbered 2, 6, 9, and 13, since those are the ones that yield the best overall performances, among the 17 equations studied.33 The present work considers these four models, especially the one derived from partition 9, since it yields the best predictions for the corresponding external test set of 20 compounds (highest determination coefficient RX2 = 0.90 and the lowest root-mean-square error RMSEX = 0.14 log units). This especially successful QSPR model is represented by the following equation:33 log h50 = 0.94 − 0.017OB + 86.3

nCO + 0.14nC − O − C − 0.21nCO Mw

(2)

The descriptors involved are as follows: Mw, molecular weight; OB, oxygen balance; nCO, the number of CO bonds; and nC−O−C, the number of C−O−C fragments. A different equation is obtained for each of the other 16 partitions considered in the QSPR approach. 3.2. Semiempirical Model. In contrast to the abovementioned QSPR approaches, our semiempirical model assumes a unique equation for all classes of chemical substances, as long as the sensitivity is dependent on the rate of reaction associated with explosophores moieties. Being obtained by considering limiting cases,30 the analytic form of the dependence of h50 on the corresponding rate constant kpr is not dependent on the training set or the pool of descriptors considered: ⎛ k ⎞4 h50 (cm) = ⎜⎜ c ⎟⎟ ⎝ k pr ⎠

4. RESULTS 4.1. Original Parametrization of the SE Model. As it stands, the general SE model described in ref 30 is first applied to present datasets with no reoptimization of the three parameters involved. The results are summarized and compared to previous ones in Table 1. Because of their specialized character, the QSPR models cannot be applied to the test sets considered in ref 30. However, it is interesting to note in Table 1 that the correlation between

(3)

Table 1. Determination Coefficient (R2) and Root-MeanSquare Error (RMSE) Resulting from the Application of Published Models to the Nitroaliphatic and Nitramine Datasets Introduced Respectively in Refs 32 and 33, and to the Diverse Test Sets Used in Ref 30

The exponent in this power law expression was fixed to 4 empirically.29 A fitting constant kc is clearly required to satisfy dimensional homogeneity. For the presently considered specific case of C−NO2 and N−NO2 compounds, the decomposition process arises from contributions associated with C−NO2 and N−NO2 bonds, respectively. Labeling these two types of contributions with the respective indices i and j, the general expression previously introduced for the rate constant kpr characterizing the propagation of the decomposition becomes k pr =

⎡ ⎛ D ⎞ 1 ⎢ exp⎜ − i ⎟ + ∑ NA ⎢⎣ i ⎝ kBTe ⎠

Dj ⎞⎤ ⎟⎥ ⎝ kBTe ⎠⎥⎦

dataset

nitroaliphatic

nitramine

number of entries 50 60 QSPR Models (from refs 32 and 33) R2 0.86 0.89 RMSE 0.16 0.12 SE Model (from ref 30) R2 0.82 0.88 RMSE 0.19 0.14



∑ ZN exp⎜− j

(5)

(4) C

ref 30 test sets 74−67

0.69−0.75 0.25−0.22

DOI: 10.1021/acs.iecr.6b01536 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Correlation

Industrial & Engineering Chemistry Research predicted and observed sensitivities is much better for the C− NO2 and N−NO2 datasets introduced in previous QSPR studies, compared to that obtained for the heterogeneous datasets considered to explore the limitations of the SE model.30 This difference is understandable, since the most serious errors with the latter stem from aromatic compounds, for which the sensitivity is sometimes underestimated as a probable consequence of overlooked decomposition pathways. For the C−NO2 and N−NO2 datasets previously used to develop QSPR models,32,33 the impact sensitivities thus obtained are only slightly poorer than the QSPR predictions. Indeed, upon going from QSPR equations to the SE model, RMSE values increase from 0.16 to 0.19 for C−NO2 and from 0.12 to 0.14 for N−NO2 compounds. There are several reasons why QSPR models might be expected to perform better: (1) approximately two-thirds of the data were used to fit these models; (2) a distinct model is used for each family of compounds; and (3) a total of nine adjustable parameters are used, versus three for the SE model. The fact that the performances of the QSPR and SE approaches actually prove to be very similar warrants a rigorous comparison between them, based on exactly the same training and test sets, as done in the following examination. 4.2. Parametrization against the Nitroaliphatic Dataset. For the C−NO2 dataset, the SE model involves only two adjustable parameters, namely, kc and η. It is presently reoptimized and validated using the training and test sets employed in the QSPR study.32 The resulting procedure is compared to various QSPR models in Table 2. The quality of fit

Table 3. Comparison of Present Semiempirical (SE) Results with the Most Successful QSPR Modelsa model

N

R2

Q2

RX2

RMSEX

eq 1 ref 32 ref 32 (best)b SE model

4 5 5 2

0.88 0.90 0.93 0.82

0.85 0.87 0.90 0.79

0.81 0.73 0.88 0.84

0.22 0.29 0.19 0.17

R2

Rcv2

RX2

RMSEX

5 3

0.89 0.92

0.83 0.88

0.84 0.92

0.19 0.10

3 3

0.85 0.89

0.81 0.87

0.84 0.88

0.19 0.13

5 3

0.88 0.89

0.83 0.87

0.90 0.91

0.14 0.11

5 3

0.88 0.87

0.84 0.87

0.90 0.92

0.24 0.11

partition 2 ref 33 semiempirical, SE partition 6 ref 33 semiempirical, SE partition 9 ref 33 semiempirical, SE partition 13 ref 33 semiempirical, SE a

The notations used in this table are the same as those used in Table 2.

0.14 to 0.24 for the best QSPR models (Table 3). Interestingly, in addition to providing more-reliable predictions, the SE model also yields better fits (except for partition 13), despite a smaller number of adjustable parameters (3 vs 5). This performance is all the more remarkable that the model equation is postulated on the basis of physical assumptions rather than adjusted, to get optimal statistics for the training set under consideration. 4.4. Parametrization against Both Datasets. Fitting the SE equations simultaneously against both the C−NO2 and N− NO2 training sets yields a simple and fairly general model involving only three adjustable parameters. Detailed results obtained using this model are provided in Table S1 in the Supporting Information. The fit against the 74 entries in the training set (34 C−NO2 and 40 N−NO2 compounds) is satisfactory, with R2 = 0.85 and RMSE = 0.17. As expected from the small number of empirical parameters involved, the model also proves robust (Q2 = 0.84). With regard to its predictive value, Table 4 shows that, for each previously introduced test set, this single model performs as well

Table 2. Comparison of Present Results with QSPR Models for the C−NO2 Dataseta model

N

Table 4. Predictive Value of QSPR Models and SE Model Fitted against Both Training Sets for Various Test Sets of Size N

a

N is the number of adjustable parameters involved, R2 is the determination coefficient derived from the training set, Q2 is the determination coefficient derived from a leave-one-out crossvalidation, RX2 is the determination coefficient derived from data predicted for an external test set, and RMSEX is the root mean square error of data predicted for an external test set. bModel involving quantum chemical descriptors.

N

model

RX2

RMSEX

ref 32

16

QSPR SE

0.81 0.84

0.22 0.17

ref 33

20

QSPR SE

0.90 0.89

0.14 0.13

new test set

17

QSPR SE

0.56 0.71

0.25 0.22

test set

is better for the QSPR methods, presumably because of the fact that they involve more adjustable parameters. The SE model proves satisfactorily robust, since Q2 is only slightly lower than R2. Furthermore, it yields the best predictions, with a value of RMSEX as low as 0.17 for the 16 compounds in the external test set. 4.3. Parametrization against the Nitramine Dataset. The SE model is now reparametrized specifically for N−NO2 compounds, using the partitions reported in ref 33 to define the training and test sets. As stated in section 3.2, this involves the adjustment of a third parameter, ZN, because the molecules in this dataset exhibit C−NO2 as well as N−NO2 bonds. The results are summarized in Table 3 for the four partitions that lead to the most successful QSPR schemes. The best results are systematically obtained using the SE model, as reflected by RMSEX values ranging from 0.10 to 0.13, while corresponding values range from

as the corresponding specialized QSPR method, which is consistent with the very good predictions derived from the independent parametrizations reported in sections 4.2 and 4.3. This good performance of the SE model is illustrated in Figure 2. Present models are also applied to the 19 newly introduced compounds. In contrast to the data from previous QSPR test sets, these newly introduced data were not used in any way in developing and selecting QSPR methods. Therefore, they provide a stringent test set for both approaches. The overall performance of the QSPR and SE models for the 17 compounds D

DOI: 10.1021/acs.iecr.6b01536 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Correlation

Industrial & Engineering Chemistry Research

Figure 2. Theoretical log(h50) values calculated using the present semiempirical (SE) model for the training and test sets previously introduced in QSPR studies.32,33

in the new test set is summarized in Table 4. In addition, the detailed results for the 19 compounds (including HNFX and MNX) is detailed in Table S1 and illustrated in Figure 3. According to RMSEX data for this test set, both approaches yield fair overall results, also not as good as those obtained for the original QSPR datasets. The larger errors obtained for this new dataset can be explained by the fact that, compared to previous compilations, newly introduced sensitivity values include more recent measurements made by a wider variety of research groups. Keeping in mind that measured sensitivities may be significantly dependent on some details of the experimental procedures that have evolved over the years and may differ between different groups,38 the somewhat poorer correlations reported in Figure 3, with respect to those shown in Figure 2, are not surprising. Nevertheless, with RX2 = 0.71 and RMSEX = 0.22, the results obtained using the SE model are consistent with those obtained for the extended dataset previously considered.30 The corresponding value RMSEX = 0.25 derived from QSPR predictions indicate that the latter are slightly less reliable than SE values, despite the fact that the corresponding correlation is much poorer, with RX2 = 0.56, because of the fact that QSPR data for the most insensitive N−NO2 are systematically underestimated. Notwithstanding HNFX and MNX, which are beyond the scope of QSPR approaches, as discussed in the next section, the most significant error observed is the severe overestimation by the QSPR model (as clearly indicated in Figure 3) of log(h50) for the least sensitive C−NO2 compound, namely, 3,3,9,9-tetranitro1,5,7,11-tetraoxaspiro-[5.5]-undecane, shown in Figure 4 as TNTOSU. This dramatic overestimation of log(h50) by the QSPR method (eq 1) may be easily explained in the light of the SE model. Indeed, according to the latter, two factors ignored by eq 1 are likely to contribute to the sensitivity of this molecule, namely, (1) the unstable central moiety, which is expected to readily decompose under strain, and (2) the relatively large number of oxygen atoms, in addition to those in nitro groups, which contributes to make the oxygen balance less negative and therefore enhance the total energy content of the substance. Although the SE model ignores the former factor, it explicitly considers the latter. As a result, it is no surprise to note that log(h50) for this compound is more severely overestimated by the QSPR approach (+0.47), compared to the SE procedure

Figure 3. Calculated versus observed log(h50) values for the newly introduced test set of 3 nitroaliphatic compounds (black) and 14 nitramine (red) compounds. Open symbols rerpesent compounds beyond the AD of the QSPR model, as defined in ref 33. The two stars are for results obtained for HNFX and MNX (not used in presently reported statistics).

Figure 4. Structures of TNTOSU and HK-55.

(+0.17). This example illustrates the fact that eq 1 should not be applied to compounds with many oxygen atoms not involved in nitro groups, because it does not account for their contribution to the oxygen balance. For the SE model, the most significant error (−0.41) concerns HK-55 (Figure 4) for which the observed h50 value of 61 cm is surprisingly large, compared to related cyclic nitramines.39 The most natural explanation for this unexpected insensitivity is the possibility that the labile proton in this molecule could stabilize E

DOI: 10.1021/acs.iecr.6b01536 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Correlation

Industrial & Engineering Chemistry Research

clearly restricted to nitro compounds with NO2 groups bonded to C(sp3) carbons, the QSPR model for nitramine is applied to compounds bearing not only N−NO2 groups, but also, in most cases, C(sp3)−NO2 groups. This suggests that this model is applicable to any compound bearing N−NO2 moieties and consistent with the AD provided by the authors.33 Among all 14 newly considered nitramines, these contraints exclude only three compounds. They are shown in Figure 5, along with

the crystal structure and/or the molecule itself. This assumption is supported by the fact that this log(h50) value for HK-55 is also particularly underestimated using the QSPR model for nitramines, although to a lesser extent (−0.37). Notwithstanding its slightly improved predictive value, a major advantage of the SE model, compared to the QSPR methods, lies in its enhanced generality. In particular, because sensitivity has been experimentally determined for only very few difluoramine and nitrosamine compounds, it is not possible to reliably estimate log(h50) data for such molecules using QSPR methods. In contrast, this can be done using the SE model. The latter may be applied to MNX, leading to an estimated log(h50) value of 1.91, which underestimates the experimental one by −0.20. For HNFX, an additional approximation is required, which consists of attributing an effective bond dissociation energy Di = 0 to the highly unstable NF2 groups whose precise role on the decomposition is not definitely established.30 This yields an estimated value of 0.65 for log(h50), which is −0.25 log unit below the experimental value. Figure 3 clearly shows that the deviations observed for MNX and HNFX using the SE model are consistent with the typical accuracy of the procedure for more mainstream compounds. Nevertheless, in the lack of further examples of compounds with unusual explosophore groups, it might be safer to consider these entries as lying outside the applicability domain of the model, as detailed in the following section. For all 129 compounds considered in the present study, the QSPR approach yields determination coefficient values of R2 = 0.67 and root-mean-square errors of RMSE = 0.23. Corresponding values obtained based on the SE model are R2 = 0.80 and RMSE = 0.18. This clearly illustrates the fact that the latter provides better results for a larger number of compounds. 4.5. Compliance with OECD Guidelines. This section focuses on the three-parameter model described in section 4.4. The above-mentioned results demonstrate that this model is compliant with the five points of the OECD guidelines: (1) it exhibits a well-defined end point, namely, h50 data measured with the ERL Type 12 tooling with a 2.5 kg hammer, as mentioned in section 2; (2) the algorithm to evaluate this end point is extremely simple and unambiguous; (3) its applicability domain (AD) is well-defined, as discussed below; (4) standard measures of goodness-of-fit, robustness, and predictivity are provided; and (5) it provides a mechanistic interpretation, which is an unique feature of this model rooted in the underlying physical assumptions. In the next part of this section, we discuss the AD, which is the only OECD criterion for which the compliance of the SE model is not trivial to demonstrate. As mentioned in the Introduction, because QSPR methods are usually purely empirical and thus cannot be justified on theoretical grounds, a wealth of statistical approaches have been designed to define their AD, based mostly on the distribution of the data points in the descriptor space.10 However, as already noted for flash points,40 this approach can be misleading for a property that is dependent on the presence of specific functional groups. Taking advantage of the newly introduced sensitivity data for C−NO2 and N−NO2 compounds, the relevance of the AD suggested for the QSPR models using such an approach may be tested. Although the QSPR model for C−NO2 compounds is

Figure 5. Nitramine compounds lying outside the AD of the QSPR model and associated h50 data in cm (given in parentheses, in the following format: experimental value/QSPR value/SE value).

corresponding observed and calculated h50 data. Surprisingly enough, while the SE model yields especially overestimated h50 values for these three compounds, QSPR predictions for these outliers are especially good. In fact, ignoring these three compounds yields significantly degraded performance indicators for the QSPR procedure, namely, RX2 = 0.50 and RMSEX = 0.27. Therefore, the definition adopted in ref 33 for the AD, in terms of range values for the descriptors, appears too restrictive. In fact, it is clear that such a formal definition of the AD based exclusively on descriptor values is not fully satisfactory, unless it is associated with additional constraints grounded in chemical considerations. For instance, the AD proposed for the QSPR model devoted to nitramines in terms of range values of the descriptors involved in eq 2 includes HNFX and MNX, because this definition ignores the presence of sensitizing moieties such as − NF2 or −NO groups.33 Nevertheless, this model yields dramatic errors if applied to such compounds, as clearly shown in Figure 3. For HNFX, it overestimates log(h50) by as much as +0.84, which is understandable, given the fact that eq 2 ignores the presence of the highly unstable NF2 groups. As for the nitrosamine MNX, it severely underestimates log(h50) by −0.64. These limitations of the AD originally proposed for the QSPR model are not specific to the simple definition adopted in terms F

DOI: 10.1021/acs.iecr.6b01536 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Correlation

Industrial & Engineering Chemistry Research

T212 refers to the large training set used to parametrize the previous version of the SE model,30 while T34 and T40 refer to the training sets employed to fit the QSPR models for nitroaliphatic32 and nitramine33 compounds, respectively. As is usually the case for models involving several adjustable parameters,42 the present one exhibits some sloppiness: any decrease in kc, or increase in ZN, may be compensated by a decrease of η. The particularly large η values obtained for the general model previously fitted against a large training set including many aromatic compounds might be attributed to our systematic identification of activation energies with X−NO2 bond dissociation energies. This appears to be a reasonable approximation for most aliphatic compounds presently considered, for which X−NO2 bonds are believed to play the role of trigger bonds. In contrast, the decomposition of aromatic compounds probably follows more-complex mechanisms. In other words, these compounds exhibit alternative reaction pathways with lower energetic barriers. Therefore, to compensate for the fact that X−NO2 dissociation energies overestimate the true energetic barriers, larger values of η naturally arise. These results support the view that predicting impact sensitivity is especially difficult for aromatic compounds, because of the various decomposition mechanisms that may be involved, in addition to genuine homolytic X−NO2 bond scission. Finally, the standard deviations reported in Table 5 indicate that the parameters of the SE model are statistically well-defined when the largest training sets are used. In view of predicting sensitivities for C−NO2 and N−NO2 compounds, we recommend the parameters derived from the combined training set T34 + T40, since they are well-defined, compared to those obtained on the basis of the smaller datasets T40 and T34, for which kc is especially ill-defined. On the other hand, in contrast to parameters previously derived from the more comprehensive data set T212,30 they are not spoiled by the presence of compounds in the training set that might exhibit specific decomposition mechanisms not presently considered, such as nitroaromatic compounds.

of range values. Failures to correctly identify compounds for which poor predictions will be obtained also are observed with advanced AD definitions and in the specially favorable cases where extensive data are available, as illustrated by a recent QSPR investigation of melting points.41 As stated in the Introduction, this lack of a fully satisfactory procedure to define the AD of a model is a major weakness of the QSPR approach. In contrast, it is easier to define the AD of a physics-based model, because it is essentially dependent on the underlying assumptions. For the present model to be applicable, the decomposition of the compound studied must be triggered by C−NO2 or N−NO2 bond ruptures. In other words, the molecule must not exhibit explosophore groups beyond C−NO2 and N− NO2 bonds. Among all presently considered molecules, only MNX and HNFX violate these constraints. The fact that reasonable assumptions yield fair estimates of h50 for these two molecules, as discussed in section 4.4, suggests that the SE model could readily accommodate sensitivity data for other explosophore groups, in addition to C−NO2 and N−NO2 compounds. The AD of the present SE model could be further refined according to statistical definitions commonly employed for QSPR methods. This was done for this model in our previous paper, because it was applied to arbitrary compounds.30 Considering only C−NO2 and N−NO2 compounds as done in this work, we may consider a simpler definition of the AD in terms of range values of the two descriptors involved, namely, the quantity Ec/(3NA/2), which ranges between 30 kJ/mol and 58 kJ/mol for the training set, and the number of atoms NA, which ranges from 11 to 71 for the training set. The only test set compound lying outside this AD is 1,3,5-tris(1-oxo-5,5,5-trinitro3-nitrazapentyl)-s-triazacyclohexane, because of its number of atoms (NA = 78). For this compound, the difference of 0.3 between predicted and observed log(h50) values is acceptable, although significantly larger than the RMSE value of 0.22 reported in Table 4. Although removing this compound slightly improves the statistical performance of the model, we do not believe such a restriction on the AD to be useful, for the following reasons: (1) defining the AD in terms of range values is awkward for such small datasets, because the AD thus obtained is critically dependent on the compounds included in the training set; and (2) the ability of the bond dissociation energies to reflect actual activation energies is much more relevant to the significance of present predictions. 4.6. Optimized Parameters of the SE Model. Overall, fairly reliable predictions, similar to those obtained using a combination of two QSPR models involving nine parameters, are presently obtained using a single model based on only three parameters. This good performance suggests that the three parameter values are relatively independent of the training set used to fit the model. Table 5 reports the optimized values obtained for the various training sets considered. In this table,

5. DISCUSSION Generally speaking, a major asset of the QSPR approach lies in the fact it may be used on a routine basis to produce predictive models, independently of the target property. However, present results confirm earlier findings regarding the prediction of physicochemical properties of organic substances in the frequent case where limited experimental data are available. Indeed, in such situations, models based on physical considerations exhibit many advantages over the straightforward QSPR approach for both model designers and end users. First, the design of a semiempirical model from simple physical assumptions is usually rather easy and does not require any specialized software beyond freely available fitting algorithms. This is due to the fact that this approach involves limited trial and error, essentially aimed at testing various assumptions. For the present SE model, trial-and-error testing was only involved regarding the necessity to introduce distinct pre-exponential factors for C−NO2 and N−NO2 bonds. In contrast, even in the frequent case where only linear expressions are considered, the development of a QSPR is dependent on many factors and arbitrary decisions, including the pool of descriptors, the feature selection algorithm, the model selection criteria, the application of dimensionality reduction or regularization techniques,... In addition to the fact that it may be fastidious in the lack of dedicated software, this extensive search for suitable relationships

Table 5. Optimized Values of the Parameters of the SE Model and Associated Standard Deviations training set

source

kc

η

ZN

T212 T34 T40 T34 + T40

ref 30 ref 32 ref 33

0.342 ± 0.03 0.267 ± 0.16 0.289 ± 0.10 0.270 ± 0.08

16.5 ± 1.5 4.53 ± 1.3 7.42 ± 1.5 4.72 ± 0.7

1.42 ± 0.04 N/A 1.47 ± 0.16 1.77 ± 0.12 G

DOI: 10.1021/acs.iecr.6b01536 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Correlation

Industrial & Engineering Chemistry Research

We expect that present results will stimulate further interest into physically grounded approaches to predicting properties that characterize physicochemical hazards for regulatory purposes. We may expect that their association with QSPR techniques could be needed to deliver the full potential of both approaches. For instance, a drastic approximation in the present sempiempirical method consists of identifying activation energies with constant bond dissociation energies precalculated on model compounds. QSPR techniques might be used to go beyond this approximation and estimate activation energies associated with any explosophore group, based on its chemical environment.

has a major drawback, in that it introduces a model selection bias, leading to overoptimistic estimates of the performance of QSPR methods.2 This bias can be minimized using rigorous validation techniques, especially double cross-validation.43 However, such practices are not yet widespread in the QSPR community. An especially interesting aspect of the recent QSPR study of nitramines33 stems from the fact that results associated with many partitions of the data into training and test sets are reported, hence clearly demonstrating the problem associated with model selection. Indeed, although all models yield good R2 and Q2 statistics, corresponding predictive performance vary significantly. Therefore, the predictions obtained for external test sets are critically dependent on the exact criteria considered to select a model on the basis of training set data. In view of practical applications, the present semiempirical approach is more general than the QSPR procedures, as illustrated in the case of compounds with unusual moieties for which no QSPR procedure can be developed, because of a lack of data. This is because within the QSPR approach, extending a model to new compounds requires introducing a new equation and fitting new empirical parameters. In contrast, extending the present SE model essentially requires the determination of additional activation energies. This can be done using quantumchemical techniques, and is not critically dependent on the availability of extensive h50 data. In fact, there is accumulating evidence that models based on intuition and physical insight, even if not fully rigorous, usually provide more-reliable predictions than empirical schemes.11−18 Therefore, they should be considered whenever possible. As emphasized in ref 18, they seem to be a promising alternative to existing QSPR procedures aimed at predicting reactive hazards for regulatory purposes. Indeed, the corresponding properties are not available for many compounds. Furthermore, they are likely to be difficult to capture with standard molecular descriptors calculated on equilibrium geometries, because they are dependent on the activation energies associated with reaction pathways. On the other hand, the QSPR approach is more likely to take advantage of extensive datasets, as available for aqueous solubility or melting point. 41 Finally, many properties, including toxicological end points, are too complex for theory-based approaches. Therefore, QSPR will remain an essential tool, although not the most appropriate in all cases, as demonstrated here.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.iecr.6b01536. Theoretical bond dissociation energies used in the present work (Table S1); experimental and calculated impact sensitivities for all compounds studied in this work (Table S2) (XLS)



AUTHOR INFORMATION

Corresponding Author

*Tel.: +33 (0)247344185. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



REFERENCES

(1) Quintero, F. A.; Patel, S. J.; Muñoz, F.; Mannan, M. S. Review of Existing QSAR/QSPR Models Developed for Properties Used in Hazardous Chemicals Classification System. Ind. Eng. Chem. Res. 2012, 51, 16101−16115. (2) Sheridan, R. P. Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction. J. Chem. Inf. Model. 2013, 53, 783−790. (3) Sahigara, F.; Mansouri, K.; Ballabio, D.; Mauri, A.; Consonni, V.; Todeschini, R. Comparison of different approaches to define the applicability domain of QSAR models. Molecules 2012, 17, 4791−4810. (4) Norinder, U.; Carlsson, L.; Boyer, S.; Eklund, M. Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. J. Chem. Inf. Model. 2014, 54, 1596−1603. (5) Gleeson, M. P.; Modi, S.; Bender, A.; Robinson, R. L. M.; Kirchmair, J.; Promkatkaew, M.; Hannongbua, S.; Glen, R. C. The Challenges Involved in Modeling Toxicity Data In Silico: A Review. Curr. Pharm. Des. 2012, 18, 1266−1291. (6) Huang, S.-H.; Abernethy, D. R.; Wang, Y.; Zhao, P.; Zineh, I. The utility of modeling and simulation in drug development and regulatory review. J. Pharm. Sci. 2013, 102, 2912−2923. (7) Kleandrova, V. V.; Luan, F.; Speck-Planche, A.; Cordeiro, M. N. D. S. In Silico Assessment of the Acute Toxicity of Chemicals: Recent Advances and New Model for Multitasking Prediction of Toxic Effect. Mini-Rev. Med. Chem. 2015, 15, 677−686. (8) Su, L.; Fu, L.; He, J.; Qin, W.; Sheng, L.; Abraham, M. H.; Zhao, Y. H. Comparison of Tetrahymena pyriformis toxicity based on hydrophobicity, polarity, ionization and reactivity of class-based compounds. SAR QSAR Environ. Res. 2012, 23, 537−552. (9) Roberts, D. W.; Aptula, A. O.; Patlewicz, G. Mechanistic Applicability Domains for Non-Animal Based Prediction of Toxicological Endpoints. QSAR Analysis of the Schiff Base Applicability Domain for Skin Sensitization. Chem. Res. Toxicol. 2006, 19, 1228− 1233. (10) Schultz, T. W.; Hewitt, M.; Netzeva, T. I.; Cronin, M. T. D. Assessing Applicability Domains of Toxicological QSARs: Definition,

6. CONCLUSION The present study provides a clear answer to the question raised in the Introduction. It is observed that the good performances recently reported for state-of-the-art quantitative structure− property relationships (QSPR) models stem from the fact they are restricted to relatively simple cases where the impact-induced decomposition is mostly initiated by the rupture of C−NO2 or N−NO2 bonds. Therefore, the present physically grounded model is recommended, even for these simple compounds, using the parametrization newly derived from the merger of the two training sets previously used to fit QSPR models. In addition to providing somewhat more-reliable results, this approach is very easy to apply, requiring only a hand-held calculator. It is also more general and more thoroughly validated than presently available QSPR schemes. Finally, there is much room for improving this method, for instance, by taking advantage of quantum chemical data. H

DOI: 10.1021/acs.iecr.6b01536 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX

Correlation

Industrial & Engineering Chemistry Research Confidence in Predicted Values, and the Role of Mechanisms of Action. QSAR Comb. Sci. 2007, 26, 238−254. (11) Mathieu, D.; Becker, J.-P. Improved evaluation of liquid densities using van der Waals molecular models. J. Phys. Chem. B 2006, 110, 17182−17187. (12) Reinisch, J.; Klamt, A. Predicting Flash Points of Pure Compounds and Mixtures with COSMO-RS. Ind. Eng. Chem. Res. 2015, 54, 12974−12980. (13) Jain, A.; Yalkowsky, S. H. Comparison of two methods for estimation of melting points of organic compounds. Ind. Eng. Chem. Res. 2007, 46, 2589−2592. (14) Mathieu, D. Simple Alternative to Neural Networks for Predicting Sublimation Enthalpies from Fragment Contributions. Ind. Eng. Chem. Res. 2012, 51, 2814−2819. (15) Mathieu, D. Prediction of Gurney Parameters Based on an Analytic Description of the Expanding Products. J. Energ. Mater. 2015, 33, 102−115. (16) Mathieu, D. Energetic Materials: Modelling, Simulation and Characterisation of Pyrotechnics, Propellants and Explosives; Fraunhofer Institut für Chemische Technologie ICT: Karlsruhe, Germany, 2011; p V39. (17) Pasturenzi, C.; Dellavedova, M.; Gigante, L.; Lunghi, A.; Canavese, M.; Cattaneo, C. S.; Copelli, S. Thermochemical stability: A comparison between experimental and predicted data. J. Loss Prev. Process Ind. 2014, 28, 79−91. (18) Mathieu, D. Significance of Theoretical Decomposition Enthalpies for Predicting Thermal Hazards. J. Chem. 2015, 2015, 1−12. (19) Reyes, O. J.; Patel, S. J.; Mannan, M. S. Quantitative Structure Property Relationship Studies for Predicting Dust Explosivity Characteristics (Kst, Pmax) of Organic Chemical Dusts. Ind. Eng. Chem. Res. 2011, 50, 2373−2379. (20) Simpson, L. R.; Foltz, M. F. LLNL Small-Scale Drop-Hammer Impact Sensitivity Test; Report UCRL-ID-119665, Lawrence Livermore National Laboratory, Livermore, CA, 1995. (21) Rice, B. M.; Byrd, E. F. C. Theoretical chemical characterization of energetic materials. J. Mater. Res. 2006, 21, 2444−2452. (22) Zeman, S. High Energy Density Materials; Klapotke, T. M., Ed.; Structure and Bonding, Vol. 125; Springer−Verlag: Heidelberg, Germany, 2007; pp 195−271. (23) Yan, Q.-L.; Zeman, S. Theoretical Evaluation of Sensitivity and Thermal Stability for High Explosives Based on Quantum Chemistry Methods: A Brief Review. Int. J. Quantum Chem. 2013, 113, 1049−1061. (24) Chen, Z.-X.; Xiao, H.-M. Quantum Chemistry Derived Criteria for Impact Sensitivity. Propellants, Explos., Pyrotech. 2014, 39, 487−495. (25) Morrill, J. A.; Byrd, E. F. C. Development of Quantitative Structure−Property Relationships for predictive modeling and design of energetic materials. J. Mol. Graphics Modell. 2008, 27, 349−355. (26) Oksel, C.; Winkler, D. A.; Ma, C. Y.; Wilkins, T.; Wang, X. Z. Accurate and interpretable nanoSAR models from genetic programming-based decision tree construction approaches. Nanotoxicology 2016, 10, 1001−1012. (27) Mathieu, D. Theoretical Shock Sensitivity Index for Explosives. J. Phys. Chem. A 2012, 116, 1794−1800. (28) Mathieu, D. Toward a Physically Based Quantitative Modeling of Impact Sensitivities. J. Phys. Chem. A 2013, 117, 2253−2259. (29) Mathieu, D.; Alaime, T. Predicting Impact Sensitivities of Nitro Compounds on the Basis of a Semi-empirical Rate Constant. J. Phys. Chem. A 2014, 118, 9720−9726. (30) Mathieu, D.; Alaime, T. Impact sensitivities of energetic materials: Exploring the limitations of a model based only on structural formulas. J. Mol. Graphics Modell. 2015, 62, 81−86. (31) Xu, J.; Zhu, L.; Fang, D.; Wang, L.; Xiao, S.; Liu, L.; Xu, W. QSPR Studies of impact sensitivity of nitro energetic compounds using threedimensional descriptors. J. Mol. Graphics Modell. 2012, 36, 10−19. (32) Prana, V.; Fayet, G.; Rotureau, P.; Adamo, C. Development of validated QSPR models for impact sensitivity of nitroaliphatic compounds. J. Hazard. Mater. 2012, 235−236, 169−177.

(33) Fayet, G.; Rotureau, P. Development of simple QSPR models for the impact sensitivity of nitramines. J. Loss Prev. Process Ind. 2014, 30, 1− 8. (34) Alexander, D. L. J.; Tropsha, A.; Winkler, A. Beware of R2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models. J. Chem. Inf. Model. 2015, 55, 1316−1322. (35) Kamlet, M. J.; Jacob, S. J. Chemistry of detonations. I. A simple method for calculating detonation properties of C−H−N−O explosives. J. Chem. Phys. 1968, 48, 23. (36) Linstrom, P. J., Mallard, W. G., Eds. NIST Chemistry WebBook, NIST Standard Reference Database Number 69; National Institute of Standard and Technology: Gaithersburg, MD; available via the Internet at:http://webbook.NIST.gov; accessed 2011. (37) LMFIT, Non-linear Least-Squares Minimization and Curve-Fitting for Python, http://lmfit.github.io/lmfit-py/, accessed Jan. 9, 2015. (38) Sandstrom, M. M.; Brown, G. W.; Preston, D. N.; Pollard, C. J.; Warner, K. F.; Sorensen, D. N.; Remmers, D. L.; Phillips, J. J.; Shelley, T. J.; Reyes, J. A.; Hsu, P. C.; Reynolds, J. G. Variation of Methods in SmallScale Safety and Thermal Testing of Improvised Explosives. Propellants, Explos., Pyrotech. 2015, 40, 109−126. (39) Pagoria, P. F.; Lee, G. S.; Mitchell, A. R.; Schmidt, R. D. A review of energetic materials synthesis. Thermochim. Acta 2002, 384, 187−204. (40) Mathieu, D.; Alaime, T. Insight into the contribution of individual functional groups to the flash point of organic compounds. J. Hazard. Mater. 2014, 267, 169−174. (41) Tetko, I. V.; Sushko, Y.; Novotarskyi, S.; Patiny, L.; Kondratov, I.; Petrenko, A. E.; Charochkina, L.; Asiri, A. M. How Accurately Can We Predict the Melting Points of Drug-like Compounds? J. Chem. Inf. Model. 2014, 54, 3320−3329. (42) Transtrum, M. K.; Machta, B. B.; Brown, K. S.; Daniels, B. C.; Myers, C. R.; Sethna, J. P. Perspective: Sloppiness and emergent theories in physics, biology, and beyond. J. Chem. Phys. 2015, 143, 010901. (43) Baumann, D.; Baumann, K. Reliable estimation of prediction errors for QSAR models under model uncertainty using double crossvalidation. J. Cheminf. 2014, 6, 47.

I

DOI: 10.1021/acs.iecr.6b01536 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX