QSPR Models Developed for Properties

Nov 12, 2012 - The classification criteria include physical, toxic (health), and ... in September 2009 by the Occupational Safety & Health Administrat...
0 downloads 3 Views 275KB Size
Article pubs.acs.org/IECR

Review of Existing QSAR/QSPR Models Developed for Properties Used in Hazardous Chemicals Classification System Flor A. Quintero,†,‡ Suhani J. Patel,† Felipe Muñoz,‡ and M. Sam Mannan*,† †

Mary Kay O’Connor Process Safety Center, Artie McFerrin Department of Chemical Engineering, Texas A&M University System, College Station, Texas 77843-3122, United States ‡ Departamento de Ingeniería Química, Universidad de los Andes, Cr.1 Este #19 A-40, Bogotá D.C., Colombia ABSTRACT: The development of a globally harmonized system (GHS) on an international level requires various countries to classify chemicals according to hazardous properties using similar categories. The classification criteria include physical, toxic (health), and environmental properties. The GHS is also being included in the U.S. regulations through the Notice of Proposed Rulemaking issued in September 2009 by the Occupational Safety & Health Administration (OSHA). It has been suggested in the rulemaking that, in cases where experimental data are not available to predict some types of hazard, quantitative structure− activity relationships/quantitative structure−property relationships (QSAR/QSPR) can be applied as found necessary. Any chemical or physical property of the material can be related to information within an individual molecule and its structure, thereby developing prediction models such as QSAR and QSPR. This review examines the work published for QSARs/QSPRs (in addition to previously published reviews) on the prediction of some of the hazardous properties for selected hazard classes within the GHS regulation. These models are powerful but, at times, are limited in application for some types of compounds and properties. The development of extensive models will greatly enhance the need for hazardous classifications of chemicals.



INTRODUCTION Different systems to regulate classifications, labeling, and safety data sheets (SDSs) of chemicals have been implemented in many countries. These systems present similarities in the content and approach, but they contain enough significant differences to require multiple classifications, and different labels and safety data sheets for the same product.1 In the case of United States (U.S.), four federal agencies regulate chemical hazard classification and communication: The Department of Labor’s Occupational Safety & Health Administration (OSHA), The Department of Transportation (DOT), The Consumer Product Safety Commission (CPSC), and the Environmental Protection Agency (EPA).2 Each of these agencies has its own system, with classification criteria and use of symbols, signal words, and hazard statements differing between the agencies.3 The diversity of application of these systems may lead to inconsistent protection for those potentially exposed to the chemicals, as well as create extensive regulatory burdens on companies,1 causing confusion and resulting in increased risk.3 Because of this unnecessary confusion resulting from the multiplicity of existing hazard classifications, on June 3, 1992, at the Rio Earth Summit, the United Nations Conference on the Environment and Development (UNCED) had its Globally Harmonized System of classification and labeling of chemicals (GHS) declaration endorsed by the United Nations General Assembly.3 The goals of a GHS are as follows: • improve the protection of human health and the environment by providing an internationally comprehensible system for hazard communication;4 • facilitate the international trade of chemicals;3 • reduce the need for duplicate testing and evaluation of chemicals;3,4 and • assist all countries in the management of chemicals.3 © 2012 American Chemical Society

In the United States, the mission of OSHA is to ensure the safety and health of workers by setting and enforcing standards. Following this fact, OSHA has been working as an active participant in the development and implementation of the GHS. On September 12, 2006, OSHA published an Advance Notice of Proposed Rulemaking (ANPR),5 which provides a comparison of the current Hazard Communication Standard (HCS) to the GHS standard.4 The proposed rulemaking6 was published on September 30, 2009, to align OSHA’s HCS with the GHS. OSHA has published the final rule on March 26, 2012, and the effective date of the final rule is 60 days after the date of publication. The GHS provides a consistent approach to the hazard classification of chemicals, labels, and SDSs. The hazard classification criteria for substances and mixtures are based on physical hazards, toxic (health) hazards, and environmental hazards. The GHS rule also states that if experimental data are not available to predict this type of hazard, any chemical or physical property of the material can be related to an individual molecule and can be predicted by implementing models such as quantitative structure−activity relationships/quantitative structure−property relationships (QSAR/QSPR). The purpose of QSAR or QSPR studies is to determine a mathematical relationship between the biological activity or a physicochemical property, respectively (e.g., LD50 (QSAR), boiling point (QSPR)), and the descriptive parameters (descriptors) related to the structure of the molecule,7 following the relationship Property = f(molecular descriptors).8 The QSAR/QSPR methodology requires, as input, a dataset of chemicals, along with Received: Revised: Accepted: Published: 16101

April 25, 2012 November 12, 2012 November 12, 2012 November 12, 2012 dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

ref

15

16

17

18

19

20

21

comments

The introduction of the local lymph node assay (LLNA) has helped to move the science forward and provides a means for easier development of QSAR models. This review contains ∼17 models developed with CASE/MultiCASE and DEREK softwares.

Provides a review and state of the art of 78 noncommercial (Q)SAR models for aromatic amines, nitroarenes, polycyclic aromatic hydrocarbons (PAH), and α,β-unsaturated aliphatic aldehydes. This work provides an evaluation consisting of a preliminary survey (Phase I) and then a more-detailed analysis of listed models (Phase II). These models are classified by chemical class, biological end points, and type.

This work includes ∼82 references related to (Q)SAR studies in Acute Aquatic Toxicity. (Q)SAR studies are classified by chemical class, mode of action, method of statistical derivation, structural alerts, and expert systems that combine structural rules and multiple (Q)SAR models.

State-of-the-art in silico methods are used to assess: (1) skin irritation, (2) skin corrosion, and (3) eye irritation. Approximately 50 references are included in this work. The models were based on homogeneous datasets of acids, bases, electrophiles, or neutral organics.

This study provides a summary of the main in silico approaches that have been developed. This work is divided in the following areas: (1) global mode, which involves a heterogeneous group of compounds; (2) local model, which uses a restricted series of compounds, and (3) commercial, which is comprised of global models. Approximately 30 models have been included.

13 local QSAR models have been assessed. The QSARs for potency generated predictions of 30%−70%, whereas the QSAR for discriminant between active and inactive chemicals were 70%−100% correct.

This paper reviews more than 150 models aiming at predicting rat and mouse LD50 values from molecular descriptors or (and) ecotoxicity data.

Carcinogenicity/Mutagenicity

Acute Aquatic Toxicity

Skin Irritation, Skin Corrosion, and Eye Irritation

Reproductive Toxicity

Carcinogenicity/Mutagenicity

Acute Mammalian Toxicity

health hazard

Respiratory/Skin Sensitization

Table 1. Reviews Published for QSAR Studies To Predict Health Hazards

Industrial & Engineering Chemistry Research

Article

experimental data for a property and the corresponding molecular structure. Thereafter, molecular descriptors can be evaluated using the structure of the molecule (such as topological, electronic, geometrical, constitutional, solvational, quantum-chemical descriptors, and others). The models are developed using several techniques such as support vector machine,9 neural networks,10 and genetic algorithms or statistical analysis, such as multiple linear regression11,12 or partial least-squares.8 Taking into account these types of studies, it is possible to replace expensive biological tests or experiments of a given physicochemical property (especially for experiments involving hazardous, toxic materials or unstable compounds) with calculated descriptors, which can, in turn, be used to predict the responses of interest for new compounds.13 The aim of this review is to provide an overview of the different models developed in the literature using QSAR and QSPR for the prediction of hazardous properties. The use of such models will facilitate the collection and evaluation of hazardous properties that lack testing data in order to comply with GHS regulations. This review is driven by industrial interest related to the early identification and reliable determination of the hazardous properties of chemical substances, which, currently have become a challenge in the frame of industrial safety management. Nevertheless, prior to the use in a regulatory context, of such models, consideration must be given according to the five validation principles proposed by OECD, which enable one to ensure reliability and transparency:14 (1) a defined end point; (2) an unambiguous algorithm; (3) a defined domain of applicability; (4) appropriate measures of goodness-of-fit, robustness, and predictivity; and (5) a mechanistic interpretation, if possible. In particular, the goodness-of-fit is characterized for the molecules in the training set (e.g., the correlation coefficient (R2) and the root mean square error (RMSE)), the robustness is used to evaluate the internal performance of the model (e.g., crossvalidation), and the predictive power is used to evaluate the external performance through validation with components that were not used to develop the model. This paper includes insights about the compliance of OECD principles for the reviewed QSAR/QSPR models and OECD principles 1, 2, 3, 4 (ext/int), and 5. The fourth principle refers to the goodness-of-fit; the term “ext” indicates external validation, and the term “int” indicates internal validation.



HAZARDOUS PROPERTIES The hazard classification criterion in GHS is a semiquantitative or qualitative process. The classification criteria are based on available data, test methods, weight of evidence, a recommended process of classification of mixtures, and bridging principles. In the cases where there are no sufficient data available and there is no clear experimental evidence to classify a substance, data from QSAR and QSPR models can be used to categorize the substance inside the GHS hazard classification. Generally, QSAR models are developed to relate biological activities (health hazards) whereas QSPR models are developed to relate physicochemical properties (physical hazards). The following are the properties associated with different types of health hazards that have been predicted by QSAR models: • Acute toxicity: LD50 (oral, dermal), LC50 (inhalation), EC50, NOEC 16102

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

Industrial & Engineering Chemistry Research

Article

Table 2. Reviews Published for QSPR Studies To Predict Physical Hazards physical hazard

property measured

comments

ref

gases under pressure/flammable liquids/self-heating chemicals

critical temperature

22

flammable liquids

boiling point

This paper summarized 24 works where MLR and NN have been applied to predict critical temperature, flash point, auto ignition temperature, and boiling point for diverse organic compounds. This work reviews commercial software programs used to predict the boiling point. These softwares have been tested for their predictive ability with a test set of 100 organic chemicals. This study reports ∼55 QSPR correlations.

• Skin irritation and corrosion: mean irritation score, primary skin irritation index, logarithm of (octanol/ water), molecular volume, melting point, pKa • Eye irritation and serious eye damage: mean irritation score • Skin or respiratory sensitization • Respiratory sensitization • Skin sensitization: total erythema score (TES), stimulation index (SI), EC3. • Germ cell mutagenicity: mutagenic potency • Carcinogenicity: carcinogenic potential (TD50), carcinogenic activity • Reproductive toxicity: placental clearance index, placental transfer index The following are the properties associated with different types of health hazards that have been predicted by QSPR models: • Explosives: heat of detonation, impact sensitivity • Flammable gases: flammable range • Flammable aerosols: heat of combustion • Gas under pressure: critical temperature • Flammable liquids: flash point and initial boiling point • Self-reactive chemicals: heat of decomposition, thermal decomposition temperature, onset temperature • Self-heating chemicals: spontaneous ignition temperature, auto ignition temperature

23

categorization for health effects. The following classification is based on OSHA GHS. (a). Acute Toxicity. Acute toxicity refers to those adverse effects that occur following oral or dermal administration of a single dose of a substance, or multiple doses given within 24 h, or an inhalation exposure of 4 h. This classification criteria covers a range of toxicity end points based on routes of exposure. Acute toxicity values are expressed as LD50 (oral, dermal), LC50 (inhalation), or acute toxicity estimates (ATE). OSHA does not have the regulatory authority to address environmental concerns, consequently, OSHA does not adopt the aquatic toxicity criteria proposed by GHS (OSHA 2010); however, some QSAR studies have been published in predicting aquatic toxicity, as shown in Table 3. (i). Skin Irritation and Corrosion. Skin corrosion is the production of irreversible damage to the skin, such as visible necrosis through the epidermis and into the dermis, following the application of a test substance for up to 4 h. Corrosive reactions are typified by ulcers, bleeding, bloody scabs, and, by the end of observation at 14 days, by discoloration due to blanching of the skin, complete area of alopecia, and scars. Skin irritation is the production of reversible damage to the skin, following the application of a test substance for up to 4 h; category 1 is for corrosive effects and category 2 for irritation effects. QSAR studies to predict skin corrosion and irritation are provided in Table 4. The QSARs have been derived in some cases relating corrosivity data to Log Pow (logarithm of (octanol/ water) partition coefficient), molecular volume, melting point and pKa. Log Pow, molecular volume, and melting point are used to measure the skin permeability, whereas pKa is used to measure the cytotoxicity. Chemicals with lower Log Pow values, larger molecular volumes or higher melting points are associated to lower skin permeability, thus are less likely to be corrosive, unless they are particularly acidic or basic (pKa).39 Moreover, in other QSAR studies, the property measured is the Primary Irritation Index (PII) or skin irritation score. These properties are functions of Log Pow, the molecular volume, the dipole moment, or the molecular weight. (ii). Serious Eye Damage/Eye Irritation. Eye damage is the production of tissue damage in the eye, or serious physical decay of vision, following application of a test substance to the anterior surface of the eye, which is not fully reversible within 21 days of application. Eye irritation is the production of changes in the eye following the application of test substance to the anterior surface of the eye, which are fully reversible within 21 days of application; category 1 is for eye damage and category 2 for eye irritation. Some QSAR studies developed to predict eye irritation are compiled in a review published by Gallegos Saliner et al.,18 as indicated in Table 1. (iii). Respiratory and Skin Sensitization. Respiratory sensitizer means a chemical that will lead to hypersensitivity of the airways, following inhalation of the chemical. Skin sensitizer means a



HAZARD CLASSIFICATION CRITERIA WITHIN GHS AND CORRESPONDING QSAR/QSPR STUDIES The full document that provides criteria for the classification of health and physical hazards can be found in the Notice of Proposed Rulemaking: Appendices A and B of the Hazard Communication Standard developed by OSHA-GHS (29 CFR 1910.1200). This Notice of Proposed Rulemaking, published on September 30, 2012, represents a significant step in the rulemaking process to align the current OSHA HCS with the GHS. Definitions of each class of the classification system for hazard communication−health hazards and physical hazards can be found under ref 1. In addition, review papers published previously for some of the hazardous properties have been listed in Table 1 for health hazards and in Table 2 for physical hazards. References found beyond the ones listed in these reviews have been described in Sections 1 and 2. QSAR studies to predict acute toxicity, skin corrosion/ irritation, eye irritation, respiratory sensitizer, reproductive toxicity, and carcinogenicity, and QSPRs studies to predict flammability, heat of combustion, critical temperature, flash point, initial boiling point, heat of decomposition, and auto ignition temperature have been summarized for each class in the GHS. 1. Health Hazard. The criteria of health hazard are addressed in the GHS by the Organization for Economic Cooperation and Development (OECD) which develops the classification and 16103

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

16104

Binary mixtures in composition of benzene

431 LC50 values for 66 pesticides

56 phenylsulfonyl carboxylates on Vibrio fischeri

36 substituted aromatic compounds and their mixtures

56 phenylsulfonyl carboxylates for Vibrio fischeri; 65 aromatic compounds for the alga Chlorella vulgaris and 200 phenols for the ciliated protozoan Tetrahymena pyriformis

Acute toxicity EC50

Acute toxicity EC50

Acute toxicity EC50

44 alcohol ethoxylates Acute and EC50 chronic aquatic toxicity NOEC Acute toxicity Tetrahymena pyri- 473 aliphatic chemicals formis toxicity log (IGC−150)

Acute toxicity EC50

Acute toxicity LC50

Acute toxicity pT15 Vibrio fischeri 63 aliphatic compounds toxicity

1, 2, 4 int, ext

The development of a chronic SAR, based on log Kow produced R2 = 0.93. PLS

Coefficients in the 3D PLS model. R2 (X) = 0.657, R2 (Y) = 0.819, R2(Yadj) = 0.816. Toxicity of aliphatic chemicals to the ciliate Tetrahymena pyriformis. Validated by a Y-permutation test and by simulation of external prediction for complementary subsets. 3D models are described from the Comparative Molecular Field Analysis (CoMFA). Cross-validation test: Q2 = 0.790, R2 = 0.920, S = 0.109. Two correlation for single chemical (R2 = 0.8792, R2 = 0.9334). One correlation for mixtures (R2 = 0.9415). Two correlation for single and mixtures (R21 = 0.9319, R21 = 0.9515). Quantitative neighborhoods of atoms description of molecular structures and self-consistent regression was developed.

1, 2, 4

34

33

1, 2, 4 int 1, 2, 4 int

32

31

30

29

28

27

26

25

24

ref

1, 2, 4 int

1, 2, 4

1, 2, 4 ext

1, 2, 4

The descriptors selection was done with a genetic algorithm. The best model in this study had an RMS error of 0.22 log units for the training-set compounds and 0.25 log units for the prediction-set compounds. The Minitab statistical package was used for development of the QSAR.

54 organophosphorus pesticides

Acute mammalian toxicity

R2 = 0.846, s = 0.462. QSARs are based on the logarithm of the octanol−water partition coefficient, log P, and energy of the lowest unoccupied molecular orbital (LUMO). ANNs 400 LC50 data for the training set and 31 LC50 data for the testing set. Toxicity of pesticides to Lepomis macrochirus. This model integrates 2-D molecular descriptors and ecotoxical data; RMSR = 0.345 (training set), 0.359 (test set). Changes of a mixture composition are described by molar ratio R and visualized in the R-plot (quantitative composition−activity relationships (QCAR)). benzene + aniline, R2 = 0.960. benzene + ethanol, R2 = 0.960. ethanol + aniline, R2= 0.722. Development of linear relationships (using SAS) and toxicity test outcomes.

1, 2, 4 ext

447 LC50 values for 70 pesticides

Acute toxicity LC50

LD50

1, 2, 4 ext

R2 = 86.3%. S = 0.30. Type of descriptors: structural, chemical, geometrical, and quantum chemical descriptors. ANNs trained by the back-propagation algorithm. Toxicity of pesticides against Oncorhynchus mykiss. RMSR = 0.355 (training set), 0.350 (test set). CNN

OECD principles 1, 2, 4

model/statistical parameters comments RM

type and number of compounds

69 benzene derivatives

property measured

LC50

health hazard

Acute aquatic toxicity

Table 3. QSAR Studies Used To Predict Acute Toxicity

Industrial & Engineering Chemistry Research Article

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

16105

Skin corrosion/ skin irritation

Primary irritation index (PII)

Log Pow, molecular volume, melting point, pKa Log Pow, molecular volume, melting point, pKa

Skin corrosion

Skin corrosion

property measured

health hazard

model/statistical parameters comments

Neutral and electrophilic organic chemicals

Accuracy ranged from 84% to 87% to discriminate between chemicals with the EC classification R34 (causes burns) and R35 (causes severe burns for 26 out of 27 acids in the dataset). This model shows a poor correlation to discriminate irritant and nonirritant. (Training set, R2 = 0.422; cross-validated, R2 = 0.201), but it shows a good correlation to discriminate corrosion and noncorrosion (92.9% well-classified; 92.9% cross-validated).

Neural Network Analysis

27 corrosive acids

n.a.

1, 2, 4 int

1, 2, 4 ext

1, 2, 4

OECD principles

1, 2, 4 int, ext

OECD principles

Molecules were constructed using Sybyl 6.0 software. The molecular structure was imported into the TSAR spreadsheet (Oxford Molecular Ltd., Sandford-on Thames, U.K.) where QSARs were determined to discriminate between corrosive and noncorrosive chemicals.

model/statistical parameters

56 phenylsulfonyl carboxylates: R2 = 0.908 and Q2 = 0.866. 65 aromatic compounds: R2 = 0.885 and Q2 = 0.849. 200 phenols: R2 = 0.685 and Q2 = 0.651. The proposed approach provides a good correlation and prediction. MRA LD50 i.p(intraperitoneal) R = 0.849, Q2 = 0.613. LD50-oral R = 0.906, Q 2= 0.701. The model is based on topological indices. Cross-validation analysis was performed. This study shows a science-based estimate of the number of EINECS compounds that can be covered by (Q) SAR models for acute toxicity by taking into account the OECD principles. Using the ECOSAR software, 54% of the 100 196 EINECS chemicals were classified into 49 classes that can be potentially covered by (Q)SAR models. The largest proportion of the classified compounds (40% of the EINECS list) falls into the classes of nonpolar and polar narcotics.

20 organic acids, 21 bases, 15 phenols

type and number of compounds

Table 4. QSAR Studies To Predict Skin Corrosion and Irritation

Existing Chemical Substances (EINECS) compounds

Acute toxicity EC50

type and number of compounds

62 organophosphorus pesticides

property measured

Acute toxicity LD50

health hazard

Table 3. continued

39

38

37

ref

36

35

ref

Industrial & Engineering Chemistry Research Article

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

Industrial & Engineering Chemistry Research

Article

Table 5. QSAR Studies To Predict Respiratory Sensitization and Skin Sensitization health hazard

property measured

Skin sensitization

Total erythema score (TES)

Skin sensitization

Skin sensitization Respiratory sensitizer

Stimulation index (SI)

type and number of compounds 12 phenyl benzoates

93 organic compounds

EC3

71 aldehydes

Respiratory sensitization

80 diverse organic compounds

OECD principles

ref

MRA

1, 2, 4

40

Using TES as the dependent variable and C log P, molecular volume (MV), and other parameters as the independent variables. R2 = 0.855 Two models were developed using TOPS-MODE.

1, 2, 4 ext

41

Model 1: F(12,63) = 3.39; D2 = 2.52; p < 0.0007 Model 2: F (9,26) = 4.63; D2 = 8.76; p < 0.001 These models have a high predictivity and have been validated using cross-validation and external validation sets. R2 = 0.873, S = 0.18, F = 44.5

1, 2, 4

42

Two models were developed using an algorithm called cat-SAR

1, 2, 4 int

43

model/statistical parameters

One model had a sensitivity of 0.94 and specificity of 0.87, yielding an overall correct prediction of 91%. The second model had a sensitivity of 0.89, specificity of 0.95, and overall correct prediction of 92%.

chemical that will lead to an allergic response following skin contact. Respiratory and skin sensitization are classified in two categories: A and B. QSAR models to predict respiratory sensitizer are provided in Table 5. Also, reviews to predict respiratory and skin sensitization are provided in Table 1. In order to predict respiratory sensitizer some QSAR models predict total erythema score (TES), which is taken as the index of biological property. In other studies, the predictive property is the EC3, which is a quantitative measure of sensitizing potential. (iv). Germ Cell Mutagenicity. A mutation is defined as a permanent change in the amount or structure of the genetic material in a cell. The term mutation applies both to heritable genetic changes that may be manifested at the phenotypic level and to the underlying DNA modifications when known (including, for example, specific base pair changes and chromosomal translocations). (v). Carcinogenicity. The term Carcinogen means a substance or a mixture of substances which induces cancer or increases its incidence. Substances and mixtures that have induced benign and malignant tumors in well-performed experimental studies on animals are considered also to be presumed or suspected human carcinogens unless there is strong evidence that the mechanism of tumor formation is not relevant for humans. Substances are classified in one or two categories based on strength of evidence and additional weight of evidence considerations. QSAR studies to predict mutagenicity and carcinogenicity are provided in two reviews published by refs 16 and 20). Also additional QSAR studies to predict carcinogenicity are showed in Table 6. (b). Reproductive Toxicity. This class includes adverse effects on sexual function and fertility in adult males and females, as well as adverse effects on development of offspring. The substances shall be classified in one or two categories as known or suspected human reproductive toxicant. QSAR studies to predict reproductive toxicity are provided in Table 7. (c). Single or Repeated Specific Target Organ Toxicity Exposure. Specif ic target organ toxicity−single exposure (STOTSE) means specific, nonlethal target organ toxicity arising from a single exposure to a chemical. Specif ic target organ toxicity− repeated exposure (STOT-RE) means specific target organ toxicity arising from repeated exposure to a substance or mixture. These two classes (single or repeated exposures) use similar

classification criteria. QSARs studies to predict single or repeated specific target organ toxicity exposure have not been found in the literature. (d). Aspiration Hazard. Aspiration means the entry of a liquid or solid chemical directly through the oral or nasal cavity, or indirectly from vomiting, into the trachea and lower respiratory system. QSAR studies to predict aspiration hazards have not been found in the literature. 2. Physical Hazard. Physical hazards are based on the United Nations Recommendations for the Transport of Dangerous Goods and include the following classifications. (a). Explosives. An explosive chemical is a solid or liquid chemical that is, in itself, capable by chemical reaction of producing gas at such a temperature and pressure and at such a speed as to cause damage to the surroundings. Depending on the type of hazard, chemicals shall be assigned to one of the divisions provided in the GHS or be classified as unstable explosives. QSPR studies have been developed to predict flammable range (see Table 9, shown later in this work), which, in some cases, is the same as explosivity range. Additional studies have been developed to predict explosivity properties (see Table 8). Some of the methods reported for calculating the explosive properties use any experimental data of the explosive as well as detonation products. (b). Flammable Gases. Flammable gas means a gas having a flammable range with air at 20 °C and a standard pressure of 101.3 kPa (14.7 psi). Table 9 provides QSPR studies to predict the flammability limits in gases. (i). Flammable Aerosols. Aerosol means any nonrefillable receptacle containing a gas (compressed, liquefied, or dissolved under pressure) and fitted with a release device, allowing the contents to be ejected as particles in suspension in a gas, or as a foam, paste, powder, liquid, or gas. QSPR studies to predict the heat of combustion have been developed (see Table 10). (ii). Oxidizing Gases. Oxidizing gas means any gas which may, generally by providing oxygen, cause or contribute to the combustion of other material more than air does. (iii). Gases under Pressure. Gases under pressure are contained in a receptacle at a pressure of 200 kPa (29 psi) gauge or more, or which are liquefied or liquefied and refrigerated. This type of gases shall be classified in one of the groups provided in the Category 1, such as compressed gas, liquid gas, refrigerated 16106

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

Carcinogenic potential Carcinogenic potential CPDB mouse Carcinogenic activity

Carcinogenicity activity

Carcinogenicity activity

Carcinogenic potency (TD50)

Carcinogenic potency (TD50)

Carcinogenicity

Carcinogenicity

Carcinogenicity

Carcinogenicity

Carcinogenicity

Carcinogenic potency (TD50)

Carcinogenic potency

Carcinogenicity

Carcinogenicity

Carcinogenicity

Carcinogenicity

property measured

health hazard

16107

805 diverse organic chemicals

49 nitrocompounds

62 nitrocompounds

49 nitrocompounds (aromatic and aliphatic)

189 compounds

148 N-nitroso compounds

46 methylated and 32 nonmethylated polycyclic aromatic hydrocarbons (PAH)

30 chemical classes (fibers, polymers, metals/ metalloids, organic chemicals) 24 organic compounds

type and number of compounds

Table 6. QSAR Studies To Predict Carcinogenicity

1, 2, 4 int, ext

1, 2, 4 int

1, 2, 4 int, ext

R = 0.889, R2 = 0.791. MRA using RM R2 = 0.843. MLR based on GETAWAY descriptors. R2 = 85.91, S = 0.361, f = 41.7. CP ANN Three models developed with accuracies of training set and test set of 91−93% and 68−70%, respectively.

1, 2, 4 int

1, 2, 4 int, ext

1, 2, 4 int

The prediction power was evaluated with cross-validation procedure. SVM, accuracy = 95.2%. LDA, accuracy = 89.8%. Model uses a topological sub structural molecule design approach (TOPS-MODE) in the generation of discriminant functions by a LDA. The percentage of correct classification was 76.32%. MRA

1, 2, 4 int

1, 2, 4 int

1, 2, 4

OECD principles

SVM to discriminate between carcinogenic and noncarcinogenic.

The structural cancer expert system “Oncologic” is designed to evaluate carcinogenic potential (six concern levels: low, marginal, low-moderate, moderate, high-moderate, high). Used CASE and MULTICASE for prediction of activity. Cut-off of 0.50.

model/statistical parameters

52

51

50

49

48

47

46

45

44

ref

Industrial & Engineering Chemistry Research Article

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

Industrial & Engineering Chemistry Research

Article

Table 7. QSAR Studies To Predict Reproductive Toxicity health hazard

property measured

Reproductive Toxicity

type and number of compounds

Placental Clearance Index (CI)/ Placental Transfer Index (TI)

184 heterogeneous antivirales and tocolytics compounds

OECD principles

model/statistical parameters Five models performed using MLR.

1, 2, 3, 4 int, 5

ref 53

Models 1, 4, and 5: Placental Clearance Index, R21 = 0.635, R24 = 0.938, R25 = 0.858. Models 2 and 3: Placental Transfer Index, R22 = 0.620, R23 = 0.763.

Table 8. Studies To Predict Explosive Properties physical hazard

property measured

compounds

Explosives

Detonation pressure

CHNO and CHNOAl explosives

Explosives

Detonation temperature

Pure and mixed explosives CaHbNcOd

Explosives

Detonation temperature

Explosives

Heats of detonation

Explosives

Impact sensitivity (Ln(h50%)) Thickness of detonation-capable layer (Δcr)

Aromatic and nonaromatic explosive compounds Aromatic and nonaromatic explosives compounds 30 nitro compounds

Explosives

Impact sensitivity

Database of 234 molecules explosive

Explosives

Impact sensitivity

204 molecules from different chemical families

OECD principles

model/statistical parameters 2

ref

R = 0.98. RMS(CHNO) = 6.0. RMS(CHNOAL) = 4.3. R2 = 0.93. RMS = 4.6. Mean absolute error = 7.0%.

1, 2, 4

54

1, 2, 4

55

1, 2, 4

56

RMS = 0.64 kJ/g. Predicted for 37 explosives.

1, 2, 4

57

Ln (h50%). R = 0.984, SD = 0.175, mean error = 0.128.

1, 2, 4

58

1, 2, 4 int, ext

59

1, 2, 4 int

60

OECD principles

ref

1, 2, 4 int, ext

61

1, 2, 4 int, ext

62

1, 2, 4 int, ext

63

1, 2, 3, 4 int, ext

64

R = 0.997, SD = 7.916, mean error = 4.383. ANNs R2 = 0.190, standard error of prediction (SEP) = 0.818 ANNs R = 0.94

Table 9. QSPR Studies To Predict the Flammability in Gases physical hazard

property measured

type and number of compounds

Flammable gases

Lower flammability limit (LFL)

1056 pure compounds

Flammable gases

Upper flammability limit percent (UFLP)

865 pure compounds

Flammable gases

Lower flammability limit (LFL)

1038 organic compounds

Flammable gases

Lower flammability limit temperature (LFLT)

1171 pure compounds

model/statistical parameters GA-MLR Error = 7.68%. R2 = 0.9698. GA-MLR R2= 0.9202. Error = 9.7%. SVM RMSE = 0.069. MAE = 0.051 vol %. (GA-MLR) SCC = 0.9468. RMSE = 15.61 K.

Table 10. QSPR Studies To Predict the Heat of Combustion physical hazard

property measured

type and number of compounds

model/statistical parameters

Flammable aerosols

Heat of combustion

52 compound carboxylic organic acids (fatty acids/ aromatic acids)

Flammable aerosols

Heat of combustion

1714 pure chemicals

Forward stepwise regression technique. R2= 0.9957. Replacement method procedure. R2 = 0.9936. GA-MLR

1496 organic compounds (C, H, O, N, S, F, Cl, and Br)

R2 = 0.9954. ANN

Flammable aerosols

Heat of combustion

OECD principles

ref

1, 2, 4 int

65

1, 2, 4 int, ext

11

1, 2, 4 int, ext

10

R2 = 0.991 for the training set of 1196 compounds. R2 = 0.992 for the test set of 300 compounds. 16108

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

Industrial & Engineering Chemistry Research

Article

Table 11. QSPR Studies To Predict the Critical Temperature physical hazard

property measured

type and number of compounds

model/statistical parameters

Gases under pressure

Critical temperature

165 diverse organic compounds

Gases under pressure

Critical temperature

856 organic compounds

ANN MAE (whole set) = 4.52 K. MAE (training set) =4.39 K. MAE (test set) = 5.59 K. MLR (RBFNNs) RMS (entire set) = 13.97 K. RMS (training set) = 12.32 K. RMS (prediction set) = 14.23 K.

OECD principles

ref

1, 2, 4 ext

66

1, 2, 4 ext, int

67

Table 12. QSPR Studies To Predict the Flash Point and Boiling Point physical hazard Flammable liquids

property measured

type and number of compounds

model/statistical parameters

OECD compliance

RBF neural network models.

Flammable liquids

Flash point

271 organic compounds

Flammable liquids

Boiling point

30 polycyclic aromatic hydrocarbons

Single output prediction gives for the training set: Tb, r = 0.9891, , AE = 8.5 °C, SE = 11.4 °C Tf, r = 0.9774, AE = 8.0 °C, SE = 10.8 °C Multiparameter regression methodology of the CODESSA computer 1, 2, 4 int program. R2 = 0.9020, R2cv = 0.8985, s = 16.1 PLS 1, 2, 4 int

13

54 alkanols

PCR R2 = 0.995 (PCR and PLS) for boiling point. GA with PLS method.

1, 2, 4 ext, 5

70

106 alcohols compounds

The results for the validation set are, at 95% confidence limits, R = 0.9952. RBF-NNA.

1, 2, 4 ext

71

833 organic compounds

Predictive correlation coefficients: R = 0.998 (training), 0.998 (validation), and 0.991(testing) MLR

1, 2, 4 ext

72

20 structural classes 28 alkyl-alcohols

(R = 0.957; S = 12.98 °C). Two models were developed.

1, 2, 4 ext

73

(1) Includes two total linear indices (R2 = 0.984). (2) Includes four variables [3 global and 1 local (heteroatom) linear indices]; R2 = 0.9934, s = 2.48. The set was partitioned into two sets: one of 150 molecules for 1, 2, 4 ext 74 calibration (R = 0.9419) and the remaining ones for an external validation (R = 0.9422). ANN 1, 2, 4 int, ext, 5 75

Flammable liquids

Flammable liquids

Flammable liquids Flammable liquids

Boiling point

Boiling point

Boiling point

Boiling point

Flammable liquids

Boiling point

200 organic molecules

Flammable liquids

Flash point

Training set: 600 organic compounds Test set: 158 organic compounds

Flammable liquids

Flammable liquids Flammable liquids

Flammable liquids

Flash point

Flash point

Boiling point

Boiling point

1, 2, 4 ext, 5

ref

Flash point/ 400 compounds boiling point

68

69

2

44 alkanes

MLR Average error (MLR) = 13.9 K. R2 (ANN) = 0.978; average error (ANN) = 12.6 K BP-ANN

1, 2, 4 ext

76

1030 organic compounds

Absolute mean error = 6.9 K Absolute mean relative error = 2.26%. GA-MLR

1, 2, 4 int, ext

77

28 alkyl-alcohols

R2 = 0.9669, Q2 2 models: MLR

1, 2, 4 int

78

1, 2, 4 ext, 5

79

155 chemical compounds. (hydrocarbons, aromatics, oxygenated, halogenated) 135 chemicals

LOO

= 0.9663, and s = 12.691

Using nonstochastic and stochastic bilinear indices. R2(model 1) = 0.994 R2(model 2) = 0.990 CODESSA software was used to calculate molecular descriptors. MLR 16109

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

Industrial & Engineering Chemistry Research

Article

Table 12. continued physical hazard

Flammable liquids

property measured

Flash point

Flammable liquids

Flash point

Flammable liquids

Boiling point

Flammable liquids

Flash point

OECD compliance

ref

Heuristic method was chosen to find the multilinear relation. (R2 = 0.9864) RMSE = 9.1 K SVM 1, 2, 4 int, ext

80

type and number of compounds

1282 organic compounds

model/statistical parameters

Average absolute error (MAE) = 6.894 K RMSE = 11.36 organic solvents, alcohols, amines, Four MLR models. 1, 2, 4 ext ethers BP-ANN: Training set, R = 0.940, R2 = 0.883. Test set, R = 0.878, R2 = 0.772. 200 acyclic carbonyl substances This work used Simplified Molecular Input Line Entry System (with 1, 2, 4 international chemical identifier (InChI). Training set; R 2 = 0.9825 95 esters GFA 1, 2, 4 ext

81

82

83

R2 = 0.9642 RMSE = 17.50 ANFIS R2= 0.9732 RMSE = 15.13

Table 13. QSPR Studies To Predict the Auto-Ignition Temperature physical hazard

property measured

type and number of compounds

model/statistical parameters 2

OECD principles

ref

Self-heating

Auto-ignition temperature

200 hydrocarbons

GA-MLR, R = 0.920

1, 2, 4

84

Self-heating

Auto-ignition temperature

118 hydrocarbons

RMSE = 25.876. BP-ANN

1, 2, 4 ext

85

Self-heating

Auto-ignition temperature

Two sets: 50 alkane compounds and 142 organic compounds

Training Set: AAE = 12.8, RMS = 17.49, R = 0.987. SVM

1, 2, 4 int, ext

86

446 organic compounds

First Training Set: R = 0.984, Q2LOO = 0.882, RMS = 16.42 Second Training Set: R = 0.963, Q2LOO = 0.853, RMS = 29.82. GA to select optimal subset of descriptors.

1, 2, 4 ext

87

400 compounds

SVM to model the property and the descriptors. AAE = 28.88 °C, and RMSE = 36.86. R2 = 0.935. Model based on SGC

1, 2, 4 ext

88

Self-heating

Self-heating

Auto-ignition temperature

Auto-ignition temperature

R2= 0.8474. Average error of 32 K and an average error percentage of 4.9%.

liquefied gas, and dissolved gas. In order to predict the critical temperature, QSPR studies have been developed (see Table 11). (iv). Flammable Liquids. Flammable liquid means a liquid with a flash point of no more than 93 °C. There are four categories that classifies liquids based on their flash point and boiling point. QSPR studies to predict the flash point and boiling point have been summarized in Table 12. (v). Flammable Solids. Flammable solid means a solid that is a readily combustible solid, or which may cause or contribute to fire through friction. Classification criteria present two categories based on the burning rate test. QSPR studies have not been developed to predict burning rate; therefore, to classify flammable solids inside the GHS, it is necessary to perform the burning rate test. (vi). Self-Heating Chemicals. A self-heating chemical is a solid or liquid chemical, other than a pyrophoric liquid or solid, which,

by reaction with air and without energy supply, is liable to selfheat; this chemical differs from a pyrophoric liquid or solid in that it will ignite only when present in large amounts (kilograms) and after long periods of time (hours or days). Classification criteria is divided in two categories, based on results obtained in tests. Chemicals with a temperature of spontaneous combustion higher than 50 °C (122 °F) for a volume of 27 m3 shall not be classified as self-heating chemicals. Chemicals with a spontaneous ignition temperature higher than 50 °C (122 °F) for a volume of 450 L shall not be classified in Category 1 of this class. The classification criteria for self-heating chemicals are based on tests that are performed in accordance with the United Nations Recommendations on the Transport of Dangerous Goods, Manual of Tests and Criteria. In addition, as previously mentioned, data for spontaneous ignition temperature (auto-ignition temperature) can 16110

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

Industrial & Engineering Chemistry Research

Article

Table 14. QSPR Studies To Predict the Heat of Decomposition and the Thermal Decomposition Temperature (Td) physical hazard

property measured

Self-reactive Onset temperature Self-reactive Thermal decomposition temperature (Td)

Self-reactive Thermal decomposition temperature (Td)

type and number of compounds

model/statistical parameters

19 nitro compounds

LSRA AAAE = 7% 90 s-order nonlinear optical (NLO) (chromophores amino, MRA azo, cyano, ether, ethynyl) R2 = 0.9642 The mean relative error was 4.46%. 80 monomers of different polymers MLR

Self-reactive Heat of decomposition

22 nitro aromatic compounds

Self-reactive Heat of decomposition

22 nitro aromatic compounds

Self-reactive Heat of decomposition

22 nitrobenzene derivates

Self-reactive Onset temperature (To)

16 organic peroxides

Two models were suggested 4 descriptors: R2 = 0.894, Q2 CV = 0.900 13 descriptors: R2 = 0.956, Q2 CV = 0.956 Multilinear regression R2 = 0.91, R2cv = 0.84 Multilinear correlation R2 = 0.86 MRA R2 = 0.98 Two regression methods applied: MLR

Heat of decomposition (ΔH°)

OECD principles compliance

ref

1, 2, 4

89

1, 2, 4 int, 5

90

1, 2, 4 int, ext, 5

91

1, 2, 4 int

92

1, 2, 4

93

1, 2, 4

94

1, 2, 4 int

95

To, R = 0.957, R2 = 0.916 ΔH°,R = 0.960, R2 = 0.921 PLS To, R = 0.978, R2 = 0.957 ΔH°, R = 0.956, R2 = 0.913

(xi). Organic Peroxides. Organic peroxides are reactive substances or mixtures that are thermally unstable and may undergo an exothermic, self-accelerating decomposition. In addition, they may have one or more of the following properties: • be liable to explosive decomposition, • burn rapidly, • be sensitive to impact or friction, or • react dangerously with other substances. (xii). Corrosive to Metals. A chemical that is corrosive to metals means a chemical that, by chemical action, will materially damage, or even destroy, metals. A chemical is classified in Category 1 when the corrosion rate on either steel or aluminum surfaces exceeds 6.25 mm per year at a test temperature of 55 °C (131 °F), when tested on both materials.

be useful in the classification as well. QSPR studies to predict the auto ignition temperature can be seen in Table 13. (vii). Self-Reactive Chemicals. Self-reactive chemicals are thermally unstable liquid or solid chemicals that are liable to undergo a strongly exothermic decomposition even without the participation of oxygen (air). This definition excludes chemicals classified under this section as explosives, organic peroxides, oxidizing liquids, or oxidizing solids. QSPR studies to predict the heat of decomposition and the thermal decomposition temperature (Td) are provided in Table 14. In order to classify pyrophoric liquids/solids (chemicals that, when in contact with water, emit flammable gases, oxidizing liquids and solids, organic peroxides, and corrosive metals), it is necessary to follow the tests recommended in the United Nations Recommendations on the Transport of Dangerous Goods, Manual of Tests and Criteria. The classifications are not generally dependent on any property measure; therefore, it is difficult to apply QSAR/QSPR to such hazards. The classifications are not provided in this paper; however, they can be found in the proposed rule document.6 (viii). Pyrophoric Liquids and Solids. Pyrophoric liquids or solids mean liquid/solid chemicals that, even in small quantities, are liable to ignite within 5 min after coming into contact with air. (ix). Chemicals That, When in Contact with Water, Emit Flammable Gases. This classification includes solid or liquid chemicals that, by interaction with water, are liable to become spontaneously flammable or to give off flammable gases in dangerous quantities. (x). Oxidizing Liquids and Solids. Oxidizing liquids and solids include liquids or solids that, although not necessarily combustible by themselves, may (generally by yielding oxygen) cause, or contribute to, the combustion of other material.



CONCLUSIONS Implementation of a globally harmonized system (GHS) will prove to be challenging to industry, because companies will have to, on the one hand, deal with modified requirements of substance classification and labeling (8,94) but, on the other hand, address the need to extensively characterize the properties of new substances. By considering the intrinsic drawbacks of the experimental approach (cost, time, availability of the laboratories), QSAR/ QSPR approaches appear to be quite attractive as alternative tools to predict the properties of interest. Their principles of application are indeed virtually unlimited. Through this review, it has been shown that a significant amount of correlations currently exist that can model the dependence on structure of numerous physicochemical and biological properties successfully. However, one should recall that, currently, many of these models are restricted to certain 16111

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

Industrial & Engineering Chemistry Research

Article

types of compounds and properties,18,42 and further studies will have to develop additional models that are focused on extended chemical classes and predict other properties, as soon as further reliable data become available. Indeed, some of the future studies could be related to the following properties (pairs of hazard class and relevant property): • Specific target organ toxicity−single/repeated exposure: dose/concentration that produces the effect • Aspiration hazard: kinematic viscosity • Flammable solids: burning rate/burning time • Self-reactive chemicals: self-accelerating decomposition temperature (SADT), detonation velocity, deflagration velocity • Self-heating: temperature of spontaneous combustion • Chemicals that, when in contact with water, emit flammable gases: rate of evolution of flammable gas/reactivity with water. • Oxidizing solids: burning time • Organic peroxides: SADT • Corrosive metals: corrosion rate in aluminum or steel Finally, other aspects that need to be considered in order to facilitate the development and implementation on new models and the acceptance of QSAR/QSPR as a nontesting alternative source of data involve the transparency and reliability of the model within a regulatory context. The compliance with the Organization for Economic Cooperation and Development (OECD) principles facilitates the consideration of the QSAR/QSPR models for regulatory purposes. In particular, the evaluation of robustness and predictive power as a series of internal and external validation tests demonstrates the reliability of the model. These are fundamental criteria that must be carefully taken into account in order to enhance the development of reliable tools and facilitate their due implementation by regulatory bodies and industrial stakeholders. In the present review, the models analyzed do not fully comply with OECD principles; therefore, properties cannot yet predict with sufficient accuracy to replace experimental data in a regulatory context. However, the models that have been analyzed are useful to provide supporting information to experimental property determination. The intent of considering if a chemical is a hazardous or not, and the intent to classify a chemical within categories of hazard, each depends directly on the strength of the decision-making process. It is difficult to predict exact events, but a good decision can be made based on available information, and uncertainty factors can be reduced, based on analysis of additional information. Uncertainty characterization for decision-making depends on input values (available data), output false negatives and false positives, and uncertainty of predicted values (uncertainty in the models). Characterizing such uncertainty in the QSAR/QSPR models will allow us to make a better decision in terms of hazard classification of chemicals. Nevertheless, it is appropriate to always keep in mind that each decision itself contributes to the overall risk of the chemical process.





ANFIS = adaptive neuro-fuzzy inference system ANN = artificial neural networks ATE = acute toxicity BP-ANN = back-propagation artificial neural network CP ANN = counter-propagation artificial neural network GA = genetic algorithm GA-MLR = genetic algorithm based on multivariate linear regression GETAWAY = geometry, topology, and atom weights assembly GFA = genetic function approximation LC50 = lethal concentration LDA = linear discriminant analysis LSRA = least squares regression analysis MAE = mean absolute error MLR = multiple linear regressions MRA = multiple regression analysis OECD = Organization for Economic Cooperation and Development OSHA = Occupation Safety and Health Administration PCR = principal component regression PLS = partial least squares QSAR = quantitative structure−activity relationships QSPR = quantitative structure−property relationships RBF = radial basis function RBFNNs = radial basis function neural networks RM = replacement method RMS = root mean square RMSE = root-mean-square error RMSR = root-mean-square residual SCC = squared correlation coefficient SGC = structural group contribution SI = stimulation index SVM = support vector machine TES = total erythema score TOPS-MODE = topological substructural molecular descriptors

REFERENCES

(1) Occupational Safety and Health Administration (OSHA). Hazard Communication, 29 CFR Parts 1910, 1915, 1917, 1918, and 1926, No. OSHA-H022K-2006-0062, www.osha.gov/dsg/hazcom/index.html. (2) Winder, C.; Azzi, R.; Wagner, D. The development of the globally harmonized system (GHS) of classification and labelling of hazardous chemicals. J. Hazard. Mater. 2005, 125 (1−3), 29−44. (3) Seguin, L. Optimizing your company’s GHS deployment. J. Chem. Health Saf. 2009, 16, 5−9. (4) Rainer, D. OSHA Standards and the Globally Harmonized System (GHS) of Classification and Labeling of Chemicals. J. Chem. Health Saf. 2010, 17, 35−36. (5) Occupational Safety and Health Administration (OSHA). Hazard communication: Advanced Notice of Proposed Rulemaking; OSHA Federal Register Notice 71:53617−53627; Department of Labor, Federal Register, 2006. (6) Occupational Safety and Health Administration (OSHA). Hazard communication: Proposed rule; OSHA Federal Register Notice 74:50279−50549; Department of Labor, Federal Register, 2009. (7) Katritzky, A. R.; Lobanov, V. S. QSPR: The Correlation and Quantitative Prediction of Chemical and Physical Properties from Structure. Chem. Soc. Rev. 1995, 24, 279−287. (8) Fayet, G.; Joubert, L.; Rotureau, P.; Adamo, C. On the use of descriptors arising from the conceptual density functional theory for the prediction of chemicals explosibility. Chem. Phys. Lett. 2009, 467, 407− 411.

AUTHOR INFORMATION

Corresponding Author

*Tel.: (979) 862-3985. Fax: (979) 845-6446. E-mail: mannan@ tamu.edu. Notes

The authors declare no competing financial interest.



ABBREVIATIONS AAAE = average absolute aggregate error AAE = average absolute error 16112

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

Industrial & Engineering Chemistry Research

Article

(30) Morrall, D. D.; Belanger, S. E.; Dunphy, J. C. Acute and chronic aquatic toxicity structure−activity relationships for alcohol ethoxylates. Ecotoxicol. Environ. Saf. 2003, 56, 381−389. (31) Netzeva, T. I.; Schultz, T. W.; Aptula, A. O.; Cronin, M. T. Partial least squares modelling of the acute toxicity of aliphatic compounds to Tetrahymena pyriformis. SAR QSAR Environ. Res. 2003, 14, 265−283. (32) Liu, X.; Yang, Z.; Wang, L. CoMFA of the acute toxicity of phenylsulfonyl carboxylates to Vibrio fischeri. SAR QSAR Environ. Res. 2003, 14, 183−190. (33) Wei, D. B.; Zhai, L. H.; Hu, H. Y. QSAR-based toxicity classification and prediction for single and mixed aromatic compounds. SAR QSAR Environ. Res. 2004, 15 (3), 207−216. (34) Lagunin, A. A.; Zakharov, A. V.; Filimonov, D. A.; Poroikov, V. V. A new approach to QSAR modelling of acute toxicity. SAR QSAR Environ. Res. 2007, 18, 285−298. (35) Garcia-Domonech, R.; Alarcon-Elbal, P.; Bolas, G.; Bueno-Marí, R.; Chordá-Olmos, F. A.; Delacour, S. A.; Mouriño, M. C.; Vidal, A.; Gálvez, J. Prediction of acute toxicity of organophosphorus pesticides using topological indices. SAR QSAR Environ. Res. 2009, 18, 745−755. (36) Zvinavashe, E.; Murk, A. J.; Rietjens, I. M. On the number of EINEC compounds that can be covered by (Q)SAR models for acute toxicity. Toxicol. Lett. 2009, 67−72. (37) Barratt, M. D. Quantitative structure activity relationships for skin corrosivity of organic acids, bases and phenols. Toxicol. Lett. 1996, 75, 169−176. (38) Barratt, M. D.; Dixit, M. B.; Jones, P. A. The Use of In Vitro Cytotoxicity Measurements in QSAR Methods for the Prediction of the Skin Corrosivity Potential of Acids. Toxicol. In Vitro 1996, 10, 283−290. (39) Barratt, M. D. Quantitative Structure−Activity Relationships for Skin Irritation and Corrosivity of Neutral and Electrophilic Organic Chemicals. Toxicol. In Vitro 1996, 10, 247−256. (40) Barratt, M. D.; Basketter, D. A.; Roberts, D. W. Skin sensitization structure−activity relationships for phenyl benzoates. Toxicol. In Vitro 1994, 8, 823−826. (41) Estrada, E.; Patlewicz, G.; Chamberlain, M.; Basketter, D.; Larbey, S. Computer-Aided Knowledge Generation for Understanding Skin Sensitization Mechanisms: The TOPS-MODE Approach. Chem. Res. Toxicol. 2003, 16, 1226−1235. (42) Patlewicz, G.; Roberts, D. W.; Walker, J. D. QSARs for the skin sensitization potential of aldehydes and related compounds. QSAR Comb. Sci. 2003, 22, 196−203. (43) Cunningham, A. R.; Cunningham, S. L. Development of an information-intensive structure−activity relationship model and its application to human respiratory chemical sensitizers. SAR QSAR Environ. Res. 2005, 16, 273−285. (44) Woo, Y. T.; Lai, D. Y.; Argus, M. F.; Arcos, J. C. Development of structure−activity relationship rules for predicting carcinogenic potential of chemicals. Toxicol. Lett. 1995, 79 (1−3), 219−228. (45) Zhang, Y. P.; Sussman, N.; Macina, O. T.; Rosenkranz, H. S.; Klopman, G. Prediction of the Carcinogenicity of a Second Group of Organic Chemicals Undergoing Carcinogenicity Testing. Environ. Health Perspect. 1996, 104, 1045−1050. (46) Ivanciuc, O. Support Vector Machine Classification of the Carcinogenic Activity of Polycyclic Aromatic Hydrocarbons. Int. Electron. J. Mol. Des. 2002, 1, 203−218. (47) Luan, F.; Zhang, R.; Zhao, C.; Yao, X.; Liu, M.; Hu, Z.; Fan, B. Classification of the Carcinogenicity of N-Nitroso Compounds Based on Support Vector Machines and Linear Discriminant Analysis. Chem. Res. Toxicol. 2005, 18, 198−203. (48) Helguera, A. M.; Cabrera Pérez, M. A.; González, M. P.; Ruiz, R. M.; González Díaz, H. A. Topological substructural approach applied to the computational prediction of rodent carcinogenicity. Bioorg. Med. Chem. 2005, 13, 2477−2488. (49) Morales, A. H.; Pérez, M. A. C.; Pérez, M. A.; Combes, R. D.; González, M. P. Quantitative structure activity relationship for the computational prediction of nitrocompounds carcinogenicity. Toxicology 2006, 220, 51−62.

(9) Pan, Y.; Jiang, J.; et al. Advantages of support vector machine in QSPR studies for predicting auto-ignition temperatures of organic compounds. Chemom. Intell. Lab. Syst. 2008, 92, 169−178. (10) Cao, H. Y.; Jiang, J. C.; Pan, Y.; Wang, R.; Cui, Y. Prediction of the net heat of combustion of organic compounds based on atom-type electrotopological state indices. J. Loss Prevent. Process Ind. 2009, 22, 222−227. (11) Gharagheizi, F. A simple equation for prediction of net heat of combustion of pure chemicals. Chemom. Intell. Lab. Syst. 2008, 91, 177−180. (12) Gharagheizi, F. Prediction of upper flammability limit percent of pure compounds from their molecular structures. J. Hazard. Mater 2009, 167, 507−510. (13) Alves de Lima Ribeiro, F.; Ferreira, M. M. C. QSPR models of boiling point, octanol-water partition coefficient and retention time index of polycyclic aromatic hydrocarbons. J. Mol. Struct. 2003, 663, 109−126. (14) OECD. Guidance document on the validation of (quantitative) structure-activity relationship [(Q)SAR] models, ENV/JM/ MONO(2007)2, OECD Environment Health and Safety Publications, Series on Testing and Assessment, No. 69; Organization for Economic Cooperation and Development (OECD): Paris, 2007. (15) Rodford, R.; Patlewicz, G.; Walker, J. D.; Payne, M. P. Quantitative structure−activity relationships for predicting skin and respiratory sensitization. Environ. Toxicol. Chem. 2003, 22, 1855−1861. (16) Benigni, R.; Bossa, C.; Netzeva, T.; Worth, A. Collection and Evaluation of (Q)SAR Models for Mutagenicity and Carcinogenicity; Joint Research Centre: Ispra, Italy, 2007. (17) Netzeva, T. I.; Pavan, M.; Worth, P. A. Review of (Quantitative) Structure-Activity Relationships for Acute Aquatic Toxicity. QSAR Comb. Sci. 2008, 27, 77−90. (18) Gallegos Saliner, A.; Patlewicz, G.; Worth, A. P. A Review of (Q)SAR Models for Skin and Eye Irritation and Corrosion. QSAR Comb. Sci. 2008, 27, 49−59. (19) Cronin, M. T.; Worth, A. (Q)SARs for Predicting Effects Relating to Reproductive Toxicity. QSAR Comb. Sci. 2008, 27, 91−100. (20) Benigni, R.; Bossa, C. Predictivity of QSAR. J. Chem. Inf. Model. 2008, 48, 971−980. (21) Devillers, J.; Devillers, H. Prediction of acute mammalian toxicity from QSARs and interspecies correlations. SAR QSAR Environ. Res. 2009, 20, 467−500. (22) Katritzky, A. R.; Fara, D. C.; Petrukhin, R. O.; Tatham, D. B.; Maran, U.; Lomaka, A.; Karelson, M. The present utility and future potential for medicinal chemistry of QSAR/QSPR with whole molecule descriptors. Curr. Top. Med. Chem. 2002, 2 (12), 1333−56. (23) Dearden, J. C. Quantitative structure property relationships for prediction of boiling point, vapor pressure, and melting point. Environ. Toxicol. Chem. 2003, 22 (8), 1696−709. (24) Gute, B. D.; Basak, S. C. Predicting acute toxicity (LC 50) of benzene derivates using theoretical molecular descriptors: a hierarchical QSAR approach. SAR QSAR Environ. Res. 1997, 7, 117−131. (25) Devillers, J.; Flatin, J. A general QSAR model for predicting the acute toxicity of pesticides to oncorhynchus mykiss. SAR QSAR Environ. Res. 1999, 25−43. (26) Eldred, D. V.; Jurs, P. C. Prediction of acute mammalian toxicity of organophosphorus pesticide compounds form molecular structure. SAR QSAR Environ. Res. 1999, 75−99. (27) Cronin, M. T.; Bowers, G. S.; Sinks, G. D.; Schultz, T. W. Structure−toxicity relationships for aliphatic compounds encompassing a variety of mechanisms of toxic action to Vibrio fischeri. SAR QSAR Environ. Res. 2000, 11, 301−312. (28) Devillers, J.; Flatin, J. A general QSAR model for predicting the acute toxicity of pesticides to Lepomis macrochirus. SAR QSAR Environ. Res. 2001, 397−417. (29) Tichý, M.; Borek-Dohalský, V.; Matousová, D.; Rucki, M.; Feltl, L.; Roth, Z. Prediction of acute toxicity of chemicals in mixtures: worms tubifex tubifex and gas/liquid distribution. SAR QSAR Environ. Res. 2002, 13, 261−269. 16113

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

Industrial & Engineering Chemistry Research

Article

(50) Morales, A. H.; Duchowicz, P. R.; et al. Application of the replacement method as a novel variable selection strategy in QSAR. 1. Carcinogenic potential. Chemom. Intell. Lab. Syst. 2006, 81, 180−187. (51) Helguera, A. M.; Cordeiro, M. N.; Pérez, M. A.; Combes, R. D.; González, M. P. QSAR modeling of the rodent carcinogenicity of nitrocompounds. Bioorg. Med. Chem. 2008, 16, 3395−3407. (52) Fjodorova, V.; Vracko, M.; Jezierska, A.; Novic, M. Counter propagation artificial neural network categorical models for prediction of carcinogenicity for non-congeneric chemicals. SAR QSAR Environ. Res. 2009, 21, 57−75. (53) Hewitt, M.; Madden, J. C.; Rowe, P. H.; Cronin, M. T. Structurebased modeling in reproductive toxicology: (Q)SARs for the placental barrier. SAR QSAR Environ. Res. 2007, 18, 57−76. (54) Keshavarz, M. H. Prediction of detonation performance of CHNO and CHNOAl explosives through molecular structure. J. Hazard. Mater. 2009, 166, 1296−1301. (55) Keshavarz, M. H. Detonation temperature of high explosives from structural parameters. J. Hazard. Mater. 2006, A137, 1303−1308. (56) Keshavarz, M. H.; Nazari, H. R. A simple method to assess detonation temperature without using any experimental data and computer code. J. Hazard. Mater. 2006, B133, 129−134. (57) Keshavarz, M. Simple procedure for determining heats of detonation. Thermochim. Acta 2005, 428, 95−99. (58) Afanas’ev, G. T.; Pivina, T. S.; Sukhachev, D. V. Comparative Characteristics of Some Experimental and Computational Methods for Estimating Impact Sensitivity Parameters of Explosives. Propellants, Explos., Pyrotech. 1993, 18, 309−316. (59) Cho, S. G.; No, K. T.; Goh, E. M.; Kim, J. K.; Shin, J. H.; Joo, J. D.; Seong, S. Optimization of Neural Networks Architecture for Impact Sensitivity of Energetic Molecules. Bull. Korean Chem. Soc. 2005, 26, 399−408. (60) Nefati, H.; Cense, J.-M.; Legendre, J.-J. Prediction of the Impact Sensitivity by Neural Networks. J. Chem. Inf. Comput. Sci. 1996, 36, 804−810. (61) Gharagheizi, F. Quantitative Structure Property Relationship for Prediction of the Lower Flammability Limit of Pure Compounds. Energy Fuels 2008, 22, 3037−3039. (62) Gharagheizi, F. Prediction of upper flammability limit percent of pure compounds from their molecular structures. J. Hazard. Mater. 2009, 167, 507−510. (63) Pan, Y.; Jiang, J.; Wang, R.; Cao, H.; Cui, Y. A novel QSPR model for prediction of lower flammability limits of organic compounds based on support vector machine. J. Hazard. Mater. 2009, 168, 962−969. (64) Gharagheizi, F. A QSPR model for estimation of lower flammability limit temperature of pure compounds based on molecular structure. J. Hazard. Mater. 2009, 169, 217−220. (65) Duchowicz, P. R.; Garro, J. C. M.; Andrada, M. F.; Castro, E.; Fernández, F. M. QSPR Modeling of Heats of Combustion for Carboxylic Acids. QSAR Comb. Sci. 2007, 26, 647−652. (66) Hall, L. H.; Story, C. T. Boiling Point and Critical Temperature of a Heterogeneous Data Set: QSAR with Atom Type Electrotopological State Indices Using Artificial Neural Networks. J. Chem. Inf. Comput. Sci. 1996, 36, 1004−1014. (67) Yao, X.; Wang, Y.; Zhang, X.; Zhang, R.; Liu, M.; Hu, Z. Radial basis function neural network-based QSPR for the prediction of critical temperature. Chemom. Intell. Lab. Syst. 2002, 62, 217−225. (68) Tetteh, J.; Suzuki, T.; Metcalfe, E.; Howells, S. Quantitative Structure-Property Relationships for the Estimation of Boiling Point and Flash Point Using a Radial Basis Function Neural Network. J. Chem. Inf. Comput. Sci. 1999, 39, 491−507. (69) Katritzky, A. R.; Petrukhin, R.; Jain, R.; Karelson, M. QSPR Analysis of Flash Points. J. Chem. Inf. Comput. Sci. 2001, 41, 1521−1530. (70) Kompany-Zareh, M. A. QSPR study of boiling point of saturated alcohols using genetic algorithm. Acta Chim. Slov. 2003, 50, 259−273. (71) Li, Q.; Chen, X.; Hu, Z. Quantitative structure-property relationship studies for estimating boiling points of alcohols using calculated molecular descriptors with radial basis function neural networks. Chemom. Intell. Lab. Syst. 2004, 72, 93−100.

(72) Ivanova, A. A. Highly diverse, massive organic data as explored by a composite QSPR strategy: An advanced study of boiling point. SAR QSAR Environ. Res. 2004, 16, 231−46. (73) Marrero-Ponce, Y.; Castillo-Garit, J. A.; Olazabal, E.; Serrano, H. S.; Morales, A.; Castañedo, N.; Ibarra-Velarde, F.; Huesca-Guillen, A.; Sánchez, A. M.; Torrens, F.; Castro, E. A. Atom, atom-type and total molecular linear indices as a promising approach for bioorganic and medicinal chemistry: Theoretical and experimental assessment of a novel method for virtual screening and rational design of new lead anthelmintic. Bioorg. Med. Chem. Lett. 2005, 13, 1005−1020. (74) Duchowicz, P. R.; Castro, E. A.; Fernandez, F. M.; Gonzalez, M. P. A new search algorithm for QSPR/QSAR theories: Normal boiling points of some organic molecules. Chem. Phys. Lett. 2005, 412, 376− 380. (75) Katritzky, A. R.; Stoyanova-Slavova, I. B.; Dobchev, D. A.; Karelson, M. QSPR modeling of flash points: An update. J. Mol. Graphics Modell. 2007, 26, 529−536. (76) Pan, Y.; Jiang, J.; Wang, Z. Prediction of the flash points of alkanes by group bond contribution method using artificial neural network. Chin. J. Chem. Eng. 2007, 35, 38−41. (77) Gharagheizi, F.; Alamdari, R. F. Prediction of Flash Point Temperature of Pure Components Using a Quantitative Structure− Property Relationship Model. QSAR Comb. Sci. 2008, 27, 679−683. (78) Castillo-Garit, J. A.; Martinez-Santiago, O.; Marrero-Ponce, Y.; Casanola-Martin, G. M.; Torrens, F. Atom-based non-stochastic and stochastic bilinear indices: Application to QSPR/QSAR studies of organic compounds. Chem. Phys. Lett. 2008, 464, 107−112. (79) Sola, D.; Ferri, A.; Banchero, M.; Manna, L.; Sicardi, S. QSPR prediction of N-boiling point and critical properties of organic compounds and comparison with a group-contribution method. Fluid Phase Equilibr. 2008, 263, 33−42. (80) Pan, Y.; Jiang, J.; Wang, R.; Cao, H.; Zhao, J. Quantitative Structure-Property Relationship Studies for Predicting Flash Points of Organic Compounds using Support Vector Machines. QSAR Comb. Sci. 2008, 27, 1013−1019. (81) Patel, S. J.; Ng, D.; Mannan, S. QSPR Flash Point Prediction of Solvents Using Topological Indices for Application in Computer Aided Molecular Design. Ind. Eng. Chem. Res. 2009, 48, 7378−7387. (82) Toropov, A.; Toropova, A.; Benfenati, E.; Leszczynska, D.; Leszczynski, J. Use of the international chemical identifier for constructing QSPR-model of normal boiling points of acyclic carbonyl substances. J. Math. Chem. 2009, 47, 355−369. (83) Khajeh, A.; Modarress, H. QSPR prediction of flash point of esters by means of GFA and ANFIS. J. Hazard. Mater. 2010, 179, 715−720. (84) Kim, S. Y.; Lee, K. S.; Kim, J. H.; Kim, J. S.; No, K. T. Prediction of auto-ignition temperatures (AITs) for hydrocarbons and compounds containing heteroatoms by the quantitative structure−property relationships. J. Chem. Soc. 2002, 2, 2087−2092. (85) Pan, Y.; Jiang, J.; Wang, R.; Cao, H.; Zhao, J. Prediction of autoignition temperatures of hydrocarbons by neural network based on atom-type electrotopological-state indices. J. Hazard. Mater. 2008, 157, 510−517. (86) Pan, Y.; Jiang, J.; Wang, R.; Cao, H. Advantages of support vector machine in QSPR studies for predicting auto-ignition temperatures of organic compounds. Chemom. Intell. Lab. Syst. 2008, 92, 169−178. (87) Pan, Y.; Jiang, J.; Wang, R.; Cao, H.; Cui, Y. Predicting the autoignition temperatures of organic compounds from molecular structure using support vector machine. J. Hazard. Mater. 2009, 164, 1242−1249. (88) Chen, C. C.; Liaw, H. J. Prediction of autoignition temperatures of organic compounds by the structural group contribution approach. J. Hazard. Mater. 2009, 162, 746−762. (89) Saraf, S. R.; Rogers, W. J.; Mannan, S. Prediction of reactive hazards based on molecular structure. J. Hazard. Mater. 2003, 98, 15− 29. (90) Xu, J.; Guo, B.; Chen, B.; Zhang, Q. A QSPR treatment for the thermal stabilities of second-order NLO chromophore molecules. J. Mol. Model. 2005, 12 (1), 65−75. 16114

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115

Industrial & Engineering Chemistry Research

Article

(91) Ajloo, D.; Sharifian, A.; Behniafar, H. Prediction of Thermal Decomposition Temperature of Polymers Using Methods. Bull. Korean Chem. Soc. 2008, 29, 2009−2016. (92) Fayet, G.; Rotureau, P.; Joubert, L.; Adamo, C. On the prediction of thermal stability of nitroaromatic compounds using quantum chemical calculations. J. Hazard. Mater. 2009, 171, 845−850. (93) Fayet, G.; Joubert, L.; Rotureau, P.; Adamo, C. On the use of descriptors arising from the conceptual density functional theory for the prediction of chemicals explosibility. Chem. Phys. Lett. 2009, 467, 407− 411. (94) Fayet, G.; Rotureau, P.; Joubert, L.; Adamo, C. QSPR modeling of the thermal stability of nitroaromatic compounds: DFT vs. AM1 calculated descriptors. J. Mol. Model. 2010, 16, 805−812. (95) Lu, Y.; Ng, D.; Mannan, S. Prediction of the Reactivity Hazards for Organic Peroxides Using the QSPR Approach. Ind. Eng. Chem. Res. 2011, 50, 1515−1522.

16115

dx.doi.org/10.1021/ie301079r | Ind. Eng. Chem. Res. 2012, 51, 16101−16115