ARTICLE pubs.acs.org/est
Prediction of Aqueous Solubility, Vapor Pressure and Critical Micelle Concentration for Aquatic Partitioning of Perfluorinated Chemicals Barun Bhhatarai and Paola Gramatica* QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology (DBSF), University of Insubria, via J.H. Dunant 3, Varese, 21100, Italy
bS Supporting Information ABSTRACT: The majority of perfluorinated chemicals (PFCs) are of increasing risk to biota and environment due to their physicochemical stability, wide transport in the environment and difficulty in biodegradation. It is necessary to identify and prioritize these harmful PFCs and to characterize their physicochemical properties that govern the solubility, distribution and fate of these chemicals in an aquatic ecosystem. Therefore, available experimental data (10-35 compounds) of three important properties: aqueous solubility (AqS), vapor pressure (VP) and critical micelle concentration (CMC) on per- and polyfluorinated compounds were collected for quantitative structure-property relationship (QSPR) modeling. Simple and robust models based on theoretical molecular descriptors were developed and externally validated for predictivity. Model predictions on selected PFCs were compared with available experimental data and other published in silico predictions. The structural applicability domains (AD) of the models were verified on a bigger data set of 221 compounds. The predicted properties of the chemicals that are within the AD, are reliable, and they help to reduce the wide data gap that exists. Moreover, the predictions of AqS, VP, and CMC of most common PFCs were evaluated to understand the aquatic partitioning and to derive a relation with the available experimental data of bioconcentration factor (BCF).
’ INTRODUCTION Perfluorinated chemicals (PFCs) are a family of straight or branched long carbon chain analogs predominantly substituted by fluorine. Except for a few of the PFCs (like perfluoroalkanes) most have a hydrophilic head mainly consisting of sulfonates or carboxylates and a hydrophobic tail of fluorinated carbon chain. This dual property makes PFCs suitable as surfactants. These compounds are very useful as they are stable during chemical and physical reactions and therefore they are extensively used in chemical, pharmaceutical and other consumer products. The details about the use and property of PFCs are reported elsewhere 1,2. Due to their high production volume, PFCs are largely distributed into the aquatic environment, from fields and foods to wildlife and humans 3-5. These compounds are considered as “emerging pollutants” as they are widely transported in the environment. Also, there is little evidence based on experimental studies that PFCs are biodegradable. Some fluorotelomer acrylates and alcohols are proposed to partly degrade, albeit with limited transformation of the molecule, but the degradation pathways are still widely debated 4. Their possible adverse effects in biota and environments are observed 1,4-6. Thus, there is a need to identify and prioritize the most harmful PFCs and to characterize their physicochemical properties related to the aquatic partitioning. PFCs are under the monitoring of regulatory agencies 7-9 and are studied as one of the four classes of compounds under the European Union FP-7 funded project CADASTER 10. The project seeks to provide the practical guidance to integrate in silico model predictions in risk assessment by showing how to r 2010 American Chemical Society
use the nontesting information for regulatory decision. The significance of the current research is also due to the recent European regulation by REACH 8 aiming to give more important information on the chemicals by reducing the use of experimental tests on animals and by applying more computational tools in early stage of compound identification and prioritization. In current scenario of reducing the cost for synthesis and animal testing, in silico models like quantitative structureproperty relationships (QSPR) are advantageous, as they can be used prior to the chemical synthesis and biological testing. This can aid in predicting and understanding the physicochemical properties just from the chemical structures. Accordingly, the main aim of this research is to develop predictive QSPR models based on the few available experimental data which will help in reducing the data gaps. This approach will also facilitate the estimation of the fundamental physicochemical properties of hazardous chemicals including those found in ECHA preregistered list 11. Not only that, the models presented could be used for planning of safer chemicals according to the “benign by design” concept. It entails producing better alternatives that will assist the regulatory agencies in the process of authorization of chemicals aiding in environmental hazard management and risk assessment. Special Issue: Perfluoroalkyl Acid Received: April 15, 2010 Accepted: September 27, 2010 Revised: September 14, 2010 Published: October 19, 2010 8120
dx.doi.org/10.1021/es101181g | Environ. Sci. Technol. 2011, 45, 8120–8128
Environmental Science & Technology In this article QSPR studies are presented on three important physicochemical properties viz. aqueous solubility (AqS), vapor pressure (VP), and critical micelle concentration (CMC) based on available experimental data of per- and polyfluorinated compounds. These physicochemical properties govern the solubility, distribution, and fate of these chemicals in aqueous environment. Finally, the models’ predictions on selected PFCs were compared with the earlier published predictions and all the three properties were studied in relation to the bioconcentration factor (BCF). Several QSPR studies mainly on AqS and VP are published, but few are focused specifically on per- or even poly fluorinated chemicals 12-14. For CMC data, none of the published QSPR models are specific for PFCs. Other articles dealing with relevant physicochemical properties like logP and logD using different tools exist 15. Furthermore, QSPR study on vapor pressure of PFCs with PLS approach is recently presented by a CADASTER project partner 16. The problem of experimental variability, uncertainty, and inconsistency of some of the reported PFC properties might have hindered efficient analysis by modeling (detailed in ref 17 and on articles cited therein). For current analysis the modeled data points are carefully selected by considering data belonging to similar experimental condition and with less uncertainty and variability (see Supporting Information (SI) SI1). Another parameter of importance is pH which affects the ionization and measured solubility of PFCs and the data thus generated 17. The study of the influence of pH on PFCs will be more relevant when combined with experimental studies and comparative analysis thereafter, which are the future goals of the project. This article proposes the modeling study on nonionic PFCs which are less influenced by pH changes. The current research also provides the predicted data in advance of their measurement for this chemical class of emerging pollutants based on simple statistical methods carefully verified for their predictivity on external chemicals. Moreover, these end points, which are not modeled yet, are indicative properties of studied chemicals for aquatic partitioning and are the basis for calculation of several other properties important for hazard studies.
’ EXPERIMENTAL SECTION Data Set. The experimental data were collected from SRC PhysProp database 18, literature (detailed later) and the data compiled in EU-FP6 PERFORCE report 19. The compilation of data from PERFORCE report needed more scrutiny for selecting “a clearly defined end point” according to the OECD Principles for acceptability of QSAR models 20 as there were several data belonging to a single compound calculated using different methods of estimations and experimental procedures. The data from different lab sources were used in the current study only if the experimental settings and conditions are identical. The data values were converted into log scale for modeling. The details about data preparation and the experimental data used are given in SI SI1 and Table S1, respectively. For AqS (in mg/L), data reported (20 compounds) at temperature range of 293-298K in PERFORCE report were used. Shake flask method was used for most of the data unless indicated in the reference cited. For a strongly hydrophobic compound (CAS 1691-99-2) the data reported using “generator columns” experiments 21 was used. Both methods are incorporated as standard OECD tests on solubility. For FTOH
ARTICLE
(CAS 865-86-1), solubility was determined with the log-linear cosolvency approach 22. In the case of VP (in mmHg), 24 compounds from SRC data reported at 25 °C and 11 additional compounds from PERFORCE reported at 20° or 25 °C for liquid or subcooled liquid were added (total 35 compounds). The data given in 20 °C temperature 20 were extrapolated for 25 °C using the Wagner and Antoine equation as reported in the original articles 23-25. For CMC (in M, i.e., mol/L), 10 compounds were used for QSPR modeling compiled mainly from PERFORCE which has collected data from the literature 26-28. The experimental data generated using the surface tension method collected at 298 K were considered except for PFOSH where the conductivity method was used 29. Descriptor Calculation and Filtering. The collected compounds were drawn by using the HYPERCHEM software 30 and minimized to their lowest energy conformation by using the semiempirical AM1 method. The theoretical molecular descriptors of different types were then calculated using DRAGON software 31. Many different types of 0D-3D molecular descriptors 32 were used as input variables in the modeling to allow the possibility of capturing many of the relevant structural features related to the response. Constant values and descriptors found to be pairwise correlated by greater than 90% were excluded to minimize the redundant information. The reduced sets of input descriptors (as reported in SI Table S1 for each end-point) were subjected to variable selection method using Genetic Algorithm (GA) 33. Variable Selection and QSPR Modeling. The genetic algorithm (GA) method for the variable selection and multiple linear regression (MLR) analysis using the ordinary least squares regression (OLS) method was performed by using the MobyDigs software 34. GA was applied to the set of ∼400 molecular descriptors for each chemical and the related property in order to extract the best set of few molecular descriptors, which were, in combination, the most relevant variables in modeling each studied property 35. The models were initially developed by all subset-procedure, which explored all of the low dimension combinations until two variable models were achieved for a single population. Then, starting from the obtained combinations, 100 different models were developed by exploring the new combinations. GA selects new variables by a mechanism of reproduction/mutation analogous to that of biological population evolution. The final models developed until three variables were ordered according to their decreasing internal predictive performance, verified by the cross-validated correlation coefficient R2CV or Q2LOO (leave-one-out) 36. Several validation methods were further applied on the final models, such as the bootstrap technique (repeated 5000 times for each validation model), Y scrambling (300 iterations to check the fitting of the randomly reordered Y-data) and external validation. This proves the validity of the proposed models and negates the possibility of chance correlation 36. The choice of the proposed models was also made by a strong verification of both the internal predictivity (by Q2BOOT) and especially by the external predictivity (Q2Ext) 36 on chemicals not used in the model development. Two different Q2Ext were used: Q2F1 37, based on the difference between the average value of the dependent variable for the training set and the experimental variable from the external prediction set; and Q2F338, based on the difference between the average value of the dependent variable and the experimental variable both referring to the training set, which is further averaged over the number of training set. 8121
dx.doi.org/10.1021/es101181g |Environ. Sci. Technol. 2011, 45, 8120–8128
Environmental Science & Technology Standard error of estimate(s) and coefficient of determination (R2) are also reported for each model. RMSE (root mean squared of errors) for training (RMSETR) and external prediction sets (RMSEEXT) that summarizes the overall error of the model were calculated as an additional measure of accuracy for the proposed QSPR. Data Splitting and Validation. In order to have the possibility to externally validate the models 35,36 the collected compounds for all end points were divided into training set for model development and prediction set for model validation, based on two different splitting criteria: (i) random selection through property sampling and (ii) “self organizing map” (SOM) by using k-ANN. These two different splittings were used to check that the modeling variable selection will be unbiased of structure and response value and the final model, based on common set of descriptors, is robust and predictive. If the same combination of descriptors shows similar good performances in each splitting, it demonstrates the validity of a particular combination of structural information in prediction of the studied response, regardless of the composition of the training sets. Thus, reliable predictions can be obtained by the full models based on all the available information. Twenty-five to 30% of the total data set was used for prediction set based on the availability of data and the distribution of chemical classes; this set was used after model development for external validation. The limited data availability did not allow the splitting in case of CMC thus only internal predictivities by Q2LOO and Q2BOOT were verified. Applicability Domain. The analysis of the applicability domain (AD) of models allows for the verification of the prediction reliability and the identification of problematic compounds. The Williams plot (not presented here) 36 was used to verify the presence of response outliers (i.e., compounds with crossvalidated standardized residuals greater than 2.5 standard deviation units, ( 2.5σ), and chemicals very influential for their structure in determining model parameters (i.e., compounds with high leverage value (h) (h > h*, the critical value being h* = 3p0 /n, where p’ is the number of model variables plus one, and n is the number of the objects used to calculate the model). The leverage approach was applied also for the definition of the structural chemical domain of each model for chemicals without experimental data by plotting Y-predicted versus hat value 5. The predictions for compounds having high leverage value are extrapolated and should be considered less reliable, but those interpolated within the training domain should be predicted with similar accuracy as for training chemicals. For studying the applicability domain of our models to a wide set of PFCs, all the chemicals belonging to the above-mentioned four physicochemical property data set were combined. In addition, other PFCs retrieved from different data sets and those in ECHA preregistered list, for which no experimental data are available, were also included. Thus, a total of 221 compounds were used for structural applicability domain studies.
’ RESULTS AND DISCUSSIONS This section gives the brief introduction of each the physicochemical property viz. aqueous solubility, vapor pressure and critical micelle concentration, and the benefits of predicting their values for PFCs. The reliable predictions of these properties are useful to understand the aquatic distribution and environmental behavior of the studied chemicals.
ARTICLE
Aqueous Solubility (AqS). The aqueous solubility (AqS) plays an important role as the water ecosystem is more involved in the chemical transport. AqS of PFCs is a useful property on its own, but it can also be used to calculate other physicochemical properties as water/air partition coefficients (Lw) and Henry’s law constant H, which can be estimated together with VP data 39. A small experimental data set for 20 PFCs (SI Table S1) was collected and QSPR modeling was performed. The compounds were divided into 15:5 = training:prediction for each SOM and random by activity splitting methods. Robust and externally validated QSPR models which identify suitable set of descriptors to represent the solubility of PFCs based on the molecular properties, were obtained. The models were developed using the training set and checked by the prediction set (Q2Ext = 79.2192.76). The best model found robust and predictive in both the splitting criteria was selected. The statistical details are given in SI Table S2 AqS, and the experimental vs prediction plot (for SOM splitting) in SI Figure S1 AqS. Equation 1 details the full model developed on all the 20 chemicals using experimental data.
logAqS ¼ - 0:418ð( 1:940Þ - 0:003ð( 0:001ÞTðF::FÞ þ 5:185ð( 3:849ÞSIC1 n ¼ 20; R 2 ¼ 76:31;Q 2 ¼ 69:13; s ¼ 0:91; F ¼ 27:38
ð1Þ
The two descriptors which appear in the model are both bidimensional. T(F..F), a topological descriptor, represents the sum of distances between pair of fluorine atoms (it increases with the number and the distance between two fluorine atoms in a molecule). SIC1 (structural information content, neighbor symmetry of one order) is a descriptor, based on neighbor degrees and edge multiplicity, that gives information mainly on the structural symmetry in the molecule. Even though the regression coefficient of T(F..F) is very low compared to that of SIC1, the standardized coefficient of T(F..F) is higher (0.741) than that of SIC1 (-0.342) indicating that T(F..F) is more important descriptor than SIC1. Thus, the distance of fluorine atoms in the structure is a dominant factor. For SOM split, only 10:2 FTOH (CAS 865-86-1) came out as structural outlier while no response outlier was seen. For the response split and the full model none of the compounds are seen as response or structural outlier. For structural AD, 201 extra compounds were used along with 20 of the experimental data, the average hat value obtained was 0.450 and only 27 compounds were found outside the structural applicability domain (87.8% coverage). Out of these CAS 29809-35-6, CAS 24151-81-3, CAS 16517-11-6, and CAS 355-49-7 have very high hat value corresponding to long chain and bulky structure of PFCs. These compounds are different compared to short chain and less bulky training set chemicals and were excluded from the AD plot (SI Figure S2 AqS) for better visualization. Other compounds which have high hat values as given in the figure are anhydride (CAS 33496-48-9), 18C long acrylates (CAS 59778-97-1, CAS 65150-93-8), and 16C-18C long iodides (CAS 65150-55-6, CAS 355-50-0). The AqS predictions only for some of the well-known PFCs are compared (Table 1A). The experimental data used for modeling and predictions obtained as well as those of EPI Suite predictions were at 25 °C but the experimental data reported by Danish EPA 40 were at 24 °C. Despite the difference in temperature, for acids and alcohols the decreasing trend in AqS value, as the chain length increases can be observed. Further, AqS of PFOSH in pure water is reported 41 to be 519 mg/L (logAqS 2.71) at 20 ( 0.5 °C, 8122
dx.doi.org/10.1021/es101181g |Environ. Sci. Technol. 2011, 45, 8120–8128
Environmental Science & Technology
ARTICLE
Table 1. Predictions of Aqueous Solubility (AqS) and Sub-Cooled Vapor Pressure (pL) for Some Environmentally Relevant PFCsa A logAqS (mg/L)
CAS
Name
375-22-4 2706-90-3 307-24-4 375-85-9 335-67-1 375-95-1 335-76-2 2058-94-8 307-55-1 72629-94-8 2043-47-2 647-42-7 678-39-7 865-86-1 355-46-4 1763-23-1 754-91-6 24448-09-7 1691-99-2 423-82-5
PFBA PFPA PFHxA PFHpA PFOA PFNA PFDA PFUnDA PFDoA PFTriDA 4:2 FTOH 6:2 FTOH 8:2 FTOH 10:2 FTOH PFHxSH PFOSH PFOSA MeFOSE EtFOSE EtFOSEA
at 25 °C Y-Exp
-2.02 -2.29
2.99 1.27 -0.83 -1.96
-0.09 -0.05
Y-Pred 2.65 2.08 1.47 0.82 0.24 -0.74 -1.56 -2.83 -4.12 -5.60 1.94 0.87 -0.62 -3.16 0.88 -0.68 -0.62 -0.14 -0.06 0.14
at 24 °C 40
5.07 3.63 2.41 1.96
2.99 1.08 -0.85 -2.22 2.75b
-0.82 -0.05
EPI Suite Pred.
logKow 40
2.88 1.79 0.67 -0.45 -1.59 -2.73 -3.87 -5.02 -6.17 -7.33 1.68 -0.56 -2.84 -5.13 -0.22 -2.51 -3.62 -3.74 -4.25 -5.76
4.05 4.45 4.90 5.15
B
C
logpL (mmHg)
logpL (Pa)
Y-Expc 0.81
25
-0.80 -1.50 -2.01 -2.76 -3.13 -4.21
25 19 19 19 23 19
0.21 -0.46 -1.30 -2.87
24 23 23 23
-2.42 -4.82d
18 18
Y-Pred
EPI Suite Pred.
0.83 0.41 -0.04 -0.53 -1.04 -1.58 -2.12 -2.71 -3.19 -3.93 0.32 -0.78 -1.91 -3.00 -1.63 -2.62 -3.18 -3.89 -4.17 -5.66
1.17 0.71 0.30 0.05 -0.84 -1.08 -1.57 -2.04 -1.55 -2.40 1.00 -0.02 -1.01 -2.10 -2.19 -2.19 -0.50 -3.44 -4.61 -3.17
Y-Exp
Y-Pred
2.93
2.12 2.53 2.08 1.59 1.08 0.54 0.005 -0.59 -1.07 -1.81 2.44 1.34 0.21 -0.88 0.49 -0.50 -1.06 -1.77 -2.05 -3.54
1.32 0.62 0.11 -0.63 -1.00 -2.09 2.33 1.66 0.82 -0.74
-0.29 -2.69
Exp 12
Pred 12e
2.06 1.66 0.62 0.10 -0.64 -0.98
2.33 1.26 0.60 -0.69
2.11 1.30
3.00 2.85 2.40 2.16 0.54 -0.99
-2.07f
ref 40
-2.70 -0.30 -2.70
a (A) Experimental and predicted data for AqS, EPI suite predictions and experimental AqS and logKOW data from ref 40; (B) Experimental and predicted VP data (unit: mmHg) and EPI suite predictions; C) Experimental and predicted VP data (unit: Pa) converted to compare with Arp et al. 12 and experimental VP data from ref 40. b AqS at 20 ( 5 °C 41. c values are extrapolated for PFBA, PFHpA and PFDA to calculate VP at 25 °C. d out of structural AD domain of VP. e Predicted by COSMOS 12. f At 23 °C 21.
but the here reported model predicted the logAqS (mg/L) to be 0.68 at 25 °C which is 2log units lower for a difference of 5 °C temperature. For EPI Suite 42 the difference is even higher, about 4 log units. Moreover, based on the available experimental (and predicted) AqS and logKOW data the opposite trend of PFC alcohols for these two properties is evident. Vapor Pressure (VP). The vapor pressure (VP) plays an important role in the transport, distribution, and fate of environmental pollutants in the atmosphere and hence, the environmental acceptability of chemical products and processes. VP data are also used in the estimation of liquid viscosity, enthalpy of vaporization, and the air-water partition coefficient 43. The three descriptors based MLR model (eq 2) on 35 compounds (SI Table S1) was obtained based on theoretical molecular descriptors selected by GA. A common, robust, and predictive model was found in both a priori splittings (SOM split and response split: Q2Ext = 80.36-87.78). The statistical parameters as the goodness-of-fit (R2), robustness verified by internal validation (Q2LOO, Q2BOOT, R2Y-scrambling) and external validation parameters (Q2Ext: Q2F1 and Q2F3) are given in SI Table S2 VP and the experimental vs prediction plot (for SOM splitting) in SI Figure S1 VP. logVP ¼ 7:97ð( 1:26Þ - 0:16ð( 0:02ÞF03½C - F - 3:16ð( 0:92ÞAAC - 0:64ð( 0:40ÞnDB n ¼ 35; R ¼ 90:93; Q 2 ¼ 88:21; s ¼ 0:883; F ¼ 103:6 2
ð2Þ The descriptors in eq 2 are arranged according to their decreasing standardized coefficient values that are given below in the parentheses. The model consists of 2D frequency fingerprint descriptor F03[C-F] (-0.73), 2D information indices AAC (-0.43), and 0D constitutional descriptor nDB (-0.20). These descriptors, which are
all inversely related to VP, represent: the frequency of C-F bonds at distant three linkages (these values are higher for branched and cyclic compounds), the average atomic composition index (increases with higher atomic weight atoms or larger molecule) and the number of double bonds, respectively. Compound EtFOSEA (CAS 42382-5) was found to be structural outlier and Sulfuramid (CAS 4151-50-2) came out as a response outlier of the full model, probably because of NHSO2 link uncommon in other compounds of the training set. The experimental and predicted VP data from above proposed model for some environmentally relevant PFCs are given in Table 1B and are compared with the EPI Suite predictions. Also, the VP predictions are converted in Pascal (Pa) unit and compared with the experimental data reported by Danish EPA 40 and predictions as reported by Arp et al. 12 for four long chain PFCs (Table 1C). The difference in experimental data reported by Danish EPA 40 could be due to different methods of estimation and experimental conditions used in data collection. For instance, MeFOSE (CAS 24448-09-7) was listed with 0.002 Pa which is the subcooled liquid VP (estimated from solid VP) value reported by Shoieb et al. 21 at 296 K. It was predicted by our model to be (-3.89 mmHg) 0.017 Pa. For Arp et al., the predictions are comparable (except for PFOSH), even though some of the experimental values used in modeling for fluoro telomer alcohols (FTOH) are different. For example, the experimental value used by Arp et al. for EtFOSE (CAS 1691-99-2) was estimated from solid VP by Shoieb et al. 21. Some more data are found in literature for other PFCs. A VP of PFOSH, its diethanolamine salt (calculated) and PFOSKþ (using spinning rotor method) was reported to be 0.85 Pa, 3.1 10-11 Pa 41 and 3.31 10-4 Pa 44 respectively. For the structural applicability domain (AD) study, the model based on the selected descriptors (nDB, AAC, and F03[C-F]) 8123
dx.doi.org/10.1021/es101181g |Environ. Sci. Technol. 2011, 45, 8120–8128
Environmental Science & Technology
ARTICLE
Table 2. Predictions of Critical Micelle Concentration (CMC) for Most Common and Environmentally Relevant PFCs and Comparison of Them with Descriptor χ3, logKOW (Experimental (Inoue et al., Environ Health Perspect, 2004, 112, 1204) and EPI Suite Predicted) and Carbon (C) Length for Comparative Study with Bio-Concentration (BCF) Data
a
name
CAS
C-length
PFBA PFPA PFHxA PFHxA isomer PFHpA PFOA PFOA isomer PFNA PFNA isomer PFDA PFUnA PFDoA PFTriDA PFTeDA PFHxSH PFOSH
375-22-4 2706-90-3 307-24-4 15899-28-2 375-85-9 335-67-1 15166-06-0 375-95-1 15899-31-7 335-76-2 2058-94-8 307-55-1 72629-94-8 376-06-7 355-46-4 1763-23-1
4 5 6 6 7 8 8 9 9 10 11 12 13 14 6 8
log(BCF)a
0.90
3.04 3.69 4.25 4.47 2.00 3.73
logCMC Exp
logCMC pred
-0.12 -0.70 -1.05 -1.10 -1.54 -2.04 -1.92
-0.20 -0.64 -1.15 -0.86 -1.63 -2.11 -1.86 -2.58 -2.41 -3.07 -3.55 -4.03 -4.52b -5.00b -1.74 -2.62
-2.07 -3.07
-2.97
logKow
2.8
4.0
5.1
EPI pred logKow 2.43 3.40 4.37 3.66 5.33 6.30 5.59 7.26 6.56 8.23 9.20 10.16 11.13 12.10 4.34 6.28
ClogP Arp et al. 12
3.62
4.09 4.33
2.28
χ3 4.93 6.49 8.05 7.30 9.62 11.18 10.43 12.74 11.99 14.30 15.87 17.43 1899 20.55 10.00 13.13
BCF value reported by Martins et al. 47. b Out of structural AD compounds, predictions may not be reliable.
was applied to a total of 221 chemicals compiled; the domain plot is given in SI Figure S2 VP. Only 14 compounds were out of the AD plot with the cutoff being the average hat value (vertical line) of 0.343. Thus, the coverage of our model on the analyzed PFCs is 93.7%. The compounds with very high leverage value which are found outside the AD were perfluoro-perhydro-fluoranthene (CAS 662-28-2), tricyclic perfluorinated compound (CAS 306-91-2) followed by a C-17 perfluoro iodide (CAS 2980935-6), in terms of decreasing distance from the average hat value. Apart from those compounds, other out of AD compounds were mostly saturated bicyclic or tricyclic chemicals, acids (CAS 16517-11-6 and CAS 18122-53-7) or C-13 and C-15 long chain esters and iodides. The inapplicability of our models to these compounds is justified by the chemicals used in the training set that are mainly linear and short chain PFCs. Critical Micelle Concentration (CMC). PFCs have surfactant property and for this reason they are widely industrialized as emulsifiers, detergents, coating, and dispersing agents 5,17. CMC is an important parameter for PFCs as below CMC value, the micelles are not present and it hinders the interaction with hydrophilic and hydrophobic moiety as well as aquatic transport 45. CMC behavior of PFCs could be different as the C—F bond is highly polar in nature. Fluorinated surfactants are known to form smaller micelles than their corresponding hydrocarbon analogs but are much more surface active and stable against acidic, alkaline, oxidative, and reductive reagents as well as at elevated temperatures 26. Fluorinated surfactants are not expected to have higher concentration and unusual micellar behavior at environmentally relevant conditions 27,28. Also for the branched isomers of PFCs, the CMC values do not differ from the corresponding value for the linear isomer as branching is reported to have a minor influence on the value of this end point 27. CMC is useful to optimize detergency and minimize waste. It was reported that skin irritation potential is dependent on CMC 46 as association of surfactant with skin cells, proteins or biomembranes result in change to surface charge, alteration of barrier function or ultimately denaturation. CMC is also studied as a measure of the reactivity of the surfactants with the membranes of the eye 46. A total of 10 CMC experimental values for PFCs were collected and due to its limited size this small data set was not
split into training and prediction set. The existing data, even if limited, could be a source of useful information as it contains diverse structural information. The compounds used are branched and linear derivatives of sulphates and carbonates of various lengths and their structural isomers. Two robust models developed are reported below, the statistical performances highlighting their stability and internal predictivity are given in SI Table S2 CMC, and the plot in SI Figure S1 CMC. logCMC ¼ 1:351ð( 0:183Þ - 0:30ð( 0:018Þχ3 n ¼ 10; R 2 ¼ 97:35; Q 2 ¼ 95:93; s ¼ 0:164; F ¼ 293:5 ð3aÞ The model (eq 3a) was based on a bidimensional descriptor, χ3, belonging to the connectivity indices which represents the molecular complexity and branching 32. As CMC is reported to be independent of branching 27, thus the good inverse correlation with χ3 is able to differentiate the complexity of PFCs in terms of structure. It has higher values for longer PFCs and it could differentiate between different functional groups as sulfonates and carboxylates (see Table 2 and related discussions below). For structural AD, a data set of 211 extra compounds was used, the average hat value obtained was 0.60 and 51 compounds (76.9% coverage) were found outside the structural applicability domain (SI Figure S2 CMC). The compounds with high hat values belong to 18C long acrylates (CAS 59778-97-1, CAS 65150-93-8), acid (CAS 16517-11-6), and iodides (CAS 29809-35-6, CAS 65150-94-9). The broader insight to bigger set based on information from 10 compounds might be an overshoot but the structural diversity of the small data set motivated such AD study. Besides, the lack of public experimental data and a need to estimate the property calls for this analysis. It is evident that the presence of more experimental data for modeling will definitely help to encode key structural features which will govern the CMC behavior typical of PFCs, but for most of them the model based on a simple descriptor as χ3 can be used in order to estimate CMC. An alternative model with slightly lower statistical performances was also selected (eq 3b) as it shows the direct relationship of the frequency of C—F bond with logCMC of PFCs. It could be informative for tentative estimation solely from the bond frequency, but the descriptor is not able to differentiate 8124
dx.doi.org/10.1021/es101181g |Environ. Sci. Technol. 2011, 45, 8120–8128
Environmental Science & Technology
ARTICLE
Figure 1. (A) PCA plot on experimental and predicted data of water solubility, vapor pressure, and CMC of 174 PFCs within the structural applicability domain of all the three models. Dotted line representes the increasing trend of BCFs based on available experimental values. (B) The plot showing the distribution of long chain alkylated carboxylates and sulfonates of PFCs with experimental logBCF, predicted logCMC, EPI Suite estimated logKOW and Carbon chain length (represented by numbers) in relation to the descriptor χ3.
between the carboxylates and sulfonates of the same carbon chain containing PFCs. logCMC ¼ 1:181ð( 0:274Þ - 0:123ð( 0:114ÞF02½C—F n ¼ 10; R 2 ¼ 93:61; Q 2 ¼ 90:62; s ¼ 0:25; F ¼ 117:14 ð3bÞ Comparative RMSE Prediction. The predictions obtained from the models developed were compared with the EPI Suite tool 42 on VP and AqS. For AqS, the RMSE of EPI Suite, calculated using WSKOW tool by CAS input, was significantly higher (RMSE = 1.98) than that of the model presented here (RMSECV = 0.96). Thus, our specific model for PFCs could be better used for these compounds instead of EPI suite. For VP, the RMSE for prediction from EPI suite is higher (RMSE = 1.13) but similar to that of the proposed model (RMSECV = 0.95). The RMSE was calculated using MPBPVP
tool on the log value of the model prediction (max-min range of data almost 10 log units). For CMC, comparison was not possible because of the lack of proper EPI suite tool for CMC prediction. The proposed local model on PFCs could be of vital importance and the prediction can be considered more reliable mainly if the compound falls into the structural applicability domain of the models and are not extrapolated. Additionally these models can be verified for applicability domain to other PFCs. Principal Component Analysis (PCA). Principal component analysis (PCA) was performed to observe the relationship among the studied physicochemical properties by combining the experimental and the predicted values for the 174 PFCs that were within the AD of all three above proposed models. The PC1 and PC2 were calculated which carries 81.3% and 16.2% of the information, respectively. In the PCA plot (Figure 1A) the loading lines for each end point are oriented toward the direction of higher values of responses and the most common PFCs are 8125
dx.doi.org/10.1021/es101181g |Environ. Sci. Technol. 2011, 45, 8120–8128
Environmental Science & Technology tagged to understand the distribution and the effect of each studied end point. The plot shows all three end points oriented toward the same direction along PC1 (right-hand side). Thus, the more soluble compounds are located at the right-hand side of the plot for which the CMC and VP are also higher. Also, the AqS and CMC loading lines are seen toward the same PC2 direction thus the compounds in the upper right of plot (higher values of PC2) have higher solubility and CMC but lower volatility. Compounds less soluble and more volatile are located in the lower right part of the plot. It is confirmed by some experimental data and is in agreement with EPI estimated values available for perfluoroethane (CAS 76-16-4), hexadecafluoroheptane (CAS 335-57-9), and tetradecafluorohexane (CAS 35542-0) (See SI 3). This multivariate analysis and the distribution derived for PFCs is informative for understanding aquatic partitioning behavior. Relation with Bioconcentration Factor (BCF). Bioconcentration factor is often used as one of the measures of bioaccumulation potential of chemicals relative to aquatic organism. It acts as an indicator for the aquatic accumulation and partitioning to body tissues. BCF is considered less important for very hydrophobic chemicals, bioaccumulation factor (BAF) instead is considered as the more important parameter as the dietary uptake predominates for these compounds. However the available experimental data for BAF are very limited, thus we have focused on the experimental BCF value. The relationships between BCFs and the model predictions of three physicochemical properties, as studied above, for 4-14 carbon long PFCs are presented below. The experimental BCF values used here (Table 2) were reported 47 as the ratio of the measured concentrations in fish (rainbow trout) to that of water expressed in units of liter per mass (L/kg) on a wet weight (WW) basis. Depending on the available experimental values (Table 2), BCF is seen to increase with increasing carbon length for carboxylates and sulfonates. Their cumulative study with other properties might help to observe the BCFs trend for other long chain PFCs understanding their aqueous fate and transport. In the PCA plot (Figure 1A) the AqS and CMC lines are seen opposite to the increasing trend of BCFs for carboxylates (dotted line): 9 out of 10 data studied for CMC were for carboxylates of PFCs and its isomers. It shows the compounds with higher BCF values, viz. PFTeDA and PFDoA, have lower solubility, which is in accordance with the fact that a compound should be hydrophobic to partition into the body tissues. The hydrophobicity (measured by KOW) is inversely related to AqS, but it is directly linked to predict bioconcentration potential. The increasing trend for hydrophobicity and BCFs for carboxylates can be seen in Table 2. Although, the logKOW value predicted by EPI suite is double in range compared to those reported by Arp et al. 12, they follow the same increasing trend. However, long chain PFCs behave as surfactants and tend to aggregate at the liquid-liquid interface. Thus they do not partition in the octanol-water system. Tolls and Sijm 48 used CMC instead of KOW for correlating BCFs, but due to the low coefficient of determination in linear relationship of CMC-BCF the use of interfacial tension or reversed-phase chromatographic retention time was suggested. Martins et al. 47 study did not support such recommendation but they reported that the bioaccumulation potential increases with decreasing CMC for carboxylates of PFCs, and the trend was different for sulfonates. Sulfonates (PFOSH, PFHxSH) and PFOA behave differently in the current study and were found slightly away from the loading lines of AqS and
ARTICLE
CMC (Figure 1A) probably because of their different functional groups or atypical properties. A data gap exists in the BCFs of carboxylates, like PFBA to PFHpA (4-7 carbon), PFNA and PFTriDA (9 and 13 carbon) (Table 2) including others. The BCFs trend for these PFCs can be observed in Figure 1A, which follows a typical pattern except for an unusual distribution of PFTriDA. The 13C and 14C (PFTeDA) PFCs were out of structural AD of CMC model thus the reliability in prediction could be doubtful. Unfortunately, the limitation in experimental BCFs data hinders more accurate relationship that could be established between the studied parameters. However, additional experimental BCFs data and physicochemical properties could confirm the preliminary results presented here. It is interesting to know that the QSPR study of CMC shows that the logCMC value is inversely proportional to the value of χ3, and this parameter is also able to differentiate the trend between the logBCF value of sulfonates and carboxylates (Figure 1B). Also, χ3 is able to differentiate the properties of the isomers of PFCs as presented in Table 2. A plot of logCMC, logBCF and logKOW values (EPI predicted) with χ3 is given (Figure 1B) for selected compounds, which have experimental BCFs data, to reveal the different pattern pertaining to structure of PFCs. The plot shows that logBCF is proportional to logKOW and the C-length as expected, but inversely proportional to the CMC value. The difference of logBCF for 8C long carboxylate (PFOA) and sulfonate (PFOSH) is seen prominently along the X-axis even though the EPI suite predicted data for logKOW is similar (Table 2). The current study shows that models for water solubility and vapor pressure of PFCs predicted these end points reasonably well, suggesting that these models can help in assessing and understanding environmental behavior of PFCs in general, particularly for those which are verified in the AD of the models. A model for predicting CMC seems promising even if it is based on a very limited training set of existing experimental data. The predictions are based on simple linear relationships, which are robust and internally and externally validated, developed on 2D descriptors that can be calculated simply from SMILES without the use of complex algorithms and software. Thus, any concerns about the failure to achieve accurate PFC geometries and influence due to overestimation of the bond lengths, based on optimization of chemical structures, are avoided. In addition, the inclusion of linear and branched chain isomers, with and without experimental data, either in modeling or in prediction, helps to understand the isomer specific properties of PFCs. The descriptors observed for VP and CMC are especially able to differentiate linear from branched isomers of PFCs. The individual models presented here have widely verified AD and can further be used to calculate the properties of other PFCs that are available in the market and the new ones even before synthesis. The true range of applicability of CMC model could be better verified pending development of more test data. This study also relates the well-predicted data of PFCs to BCFs revealing information for understanding aquatic partitioning. The comparative analysis of the data with the chemometrical approach confirms the increasing trend of BCFs in opposite direction to that of AqS and CMC for PFCs.
’ ASSOCIATED CONTENT
bS
Supporting Information. The acronyms (SI1) and VP data preparation (SI2), data sets with experimental and predicted data (Table S1), QSAR models statistical parameters (Table S2), experimental vs. prediction plots (Figure S1) and applicability
8126
dx.doi.org/10.1021/es101181g |Environ. Sci. Technol. 2011, 45, 8120–8128
Environmental Science & Technology domain studies (Figure S2). This material is available free of charge via the Internet at http://pubs.acs.org.
’ AUTHOR INFORMATION Corresponding Author
*E-mail:
[email protected].
’ ACKNOWLEDGMENT Financial support by European Union through the project CADASTER FP7-ENV-2007-1-212668 is gratefully acknowledged. Thanks to Tomas Oberg (Linnaeus University, Kalmar, Sweden) for help in extrapolation of VP data. ’ REFERENCES (1) Lau, C.; Anitole, K.; Hodes, C.; Lai, D.; Pfahles-Hutchens, A.; Seed, J. Perfluoroalkyl acids: A review of monitoring and toxicological findings. Toxicol. Sci. 2007, 99, 366–394. (2) Prevedouros, K.; Cousins, I. T.; Buck, R. C.; Korzeniowski, S. H. Sources, fate and transport of perfluorocarboxylates. Environ. Sci. Technol. 2006, 40, 32–44. (3) Haukas, M.; Berger, U.; Hop, H.; Gulliksen, B.; Gabrielsen, G. W. Bioaccumulation of per- and polyfluorinated alkyl substances (PFAS) in selected species from the Barents Sea food web. Environ. Pollut. 2007, 148, 360–371. (4) Lee, H.; D’eon, J.; Mabury, S. A. Biodegradation of polyfluoroalkyl phosphates as a source of perfluorinated acids to the environment. Environ. Sci. Technol. 2010, 44, 3305–3310. (5) Bhhatarai, B.; Gramatica, P. Per- and poly-fluoro toxicity (LC50 inhalation) study in Rat and Mouse using QSAR modeling. Chem. Res. Toxicol. 2010, 23, 528–539. (6) Kleszczynski, K.; Gardzielewski, P.; Mulkiewicz, E.; Stepnowski, P.; Skzadanowski, A. C. Analysis of structure-cytotoxicity in vitro relationship (SAR) for perfluorinated carboxylic acids. Toxicol. In Vitro. 2007, 21, 1206–1211. (7) U.S. EPA. http://www.epa.gov/oppt/pfoa/pubs/related.htm (accessed October 7, 2010). (8) http://ec.europa.eu/environment/chemicals/reach/reach_intro.htm (accessed October 7, 2010). (9) Canadian DSL. http://ec.gc.ca/substances/ese/eng/dsl/dslprog. cfm (accessed October 7, 2010). (10) EU FP7 project CADASTER, www.cadaster.eu (accessed October 7, 2010). (11) European Chemical Agency. http://apps.echa.europa.eu/preregistered/pre-registered-sub.aspx(accessed October 7, 2010). (12) Arp, H. P. H.; Niederer, C.; Goss, K.-U. Predicting the partitioning behavior of various highly fluorinated compounds. Environ. Sci. Technol. 2006, 40, 7298–7304. (13) Goss, K.-U.; Bronner, G.; Harner, T.; Hertel, M.; Schmidt, T. C. The partition behavior of fluorotelomer alcohols and olefins. Environ. Sci. Technol. 2006, 40, 3572–3577. (14) Gabriel, J. L.; Miller, T. F., Jr.; Wolfson, M. R.; Shaffer, T. H. QSAR of perfluorinated hetero-hydrocarbons as potential respiratory media. Application to oxygen solubility, partition coefficient, viscosity, vapor pressure, and density. ASAIO J. 1996, 42, 968–973. (15) Rayne, S.; Forest, K. A comparative assessment of octanolwater partitioning and distribution constant estimation methods for perfluoroalkyl carboxylates and sulfonates. Available from Nature Proceedings http://dx.doi.org/10.1038/npre.2009.3282.2 (2009) (accessed October 7, 2010). (16) Oberg, T.; Liu, T. Updating existing QSAR models: selection and weighting of new data. J.Chemom. 2010, 2 (1), P19. (17) Rayne, S.; Forest, K. Perfluoroalkyl sulfonic and carboxylic acids: A critical review of physicochemical properties, levels and patterns
ARTICLE
in waters and wastewaters, and treatment methods. J. Environ. Sci. Health, Part A: Environ. Sci. Eng. 2009, 44, 1145–1199. (18) SRC PhysProp database. http://www.syrres.com (accessed October 7, 2010). (19) Krop, H.; de Voogt, P. EU-FP6 PERFORCE (PERFluorinated ORganic Chemicals in the European environment) 2, IBED-ESPM, 2008. (20) OECD Guidelines, http://www.oecd.org/dataoecd/33/37/ 37849783.pdf (accessed October 7, 2010). (21) Shoeib, M.; Harner, T.; Ikonomou, M.; Kannan, K. Indoor and outdoor air concentrations and phase partitioning of perfluoroalkyl sulfonamides and polybrominated diphenyl ethers. Environ. Sci. Technol. 2004, 38, 1313–1320. (22) Liu, J.; Lee, L. S. Effect of fluoroteleomer alcohol chain length on aqueous solubility and sorption by soils. Environ. Sci. Technol. 2007, 41, 5357–5362. (23) Kaiser, M. A.; Larsen, B. S.; Dawson, B. J.; Kurtz, K.; Lieckfield, R., Jr.; Miller, J. R.; Flaherty, J. Method for the determination of perfluorooctanoic acid in air samples using liquid chromatography with mass spectrometry. J. Occup. Environ. Hyg. 2005, 2, 307–313. (24) Krusic, P. J.; Marchione, A. A.; Davidson, F.; Kaiser, M. A.; Kao, C.-P. C.; Richardson, R. E.; Botelho, M.; Waterland, R. L.; Buck, R. C. Vapor pressure and intramolecular hydrogen bonding in fluorotelomer alcohols. J. Phys. Chem. A 2005, 109, 6232–6241. (25) Steele, W. V.; Chirico, R. D.; Knipmeyer, S. E.; Nguyen, A. Vapor pressure, heat capacity, and density along the saturation line: Measurements for benzenamine, butylbenzene, sec-butylbenzene, tertbutylbenzene, 2,2-dimethylbutanoic acid, tridecafluoroheptanoic acid, 2-butyl-2-ethyl-1,3-propanediol, 2,2,4-trimethyl-1,3-pentanediol, and 1-chloro-2-propanol. J. Chem. Eng. Data 2002, 47 (648-666), 715–724. (26) Shinoda, K.; Hato, M.; Hayashi, T. The physico-chemical properties of aqueous solutions of fluorinated surfactants. J. Phys. Chem. 1972, 76, 909–914. (27) Kunieda, H.; Shinoda, K. Krafft points, critical micelle concentrations, surface tension, and solubilizing power of aqueous solutions of fluorinated surfactants. J. Phys. Chem. 1976, 80, 2468–2470. (28) Guo, W. T.; Fung, B. M. Micelles and aggregates of fluorinated surfactants. J. Phys. Chem. 1991, 95, 1829–1836. (29) Hoffman, H.; Ulbricht, W.; Tagesson, B. Investigations on micellar systems of perfluordetergents. Evidence for emulsions-dropletlike giant micelles. Z. Phys. Chem 1978, 113, 17–36(Wiesbaden, 1978). (30) HyperChem 7.03; Hypercube, Inc.: Gainesville, FL, 2002; www. hyper.com. (31) Todeschini, R.; Consonni, V.; Mauri, A.; Pavan, M. DRAGON v.5; Talete srl: Milan, Italy, 2007; www.talete.it. (32) Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors; Wiley-VCH: Weinheim, Germany, 2009. (33) Leardi, R.; Boggia, R.; Terrile, M. Genetic algorithms as a strategy for feature selection. J. Chemom. 1992, 6, 267–281. (34) Todeschini, R.; Consonni, V.; Pavan, M. MOBY DIGS—Software for multilinear regression analysis and variable subset selection by genetic algorithm. Ver. 1.2 for Windows, Talete srl: Milan, Italy, 2002. (35) Gramatica, P.; Giani, E.; Papa, E. Statistical external validation and consensus modeling: A QSPR case study for Koc prediction. J. Mol. Graphics Model. 2007, 25, 755–766. (36) Gramatica, P. Principles of QSAR models validation: internal and external. QSAR Comb. Sci. 2007, 26, 694–701. (37) Shi, L. M.; Fang, H.; Tong, W.; Wu, J.; Perkins, R.; Blair, R. M.; Branham, W. S.; Dial, S. L.; Moland, C. L.; Sheehan, D. M. QSAR models using a large diverse set of estrogens. J. Chem. Inf. Comput. Sci. 2001, 41, 186–195. (38) Consonni, V.; Ballabio, D.; Todeschini, R. Comments on the definition of the Q2 parameter for QSAR validation. J. Chem. Inf. Model 2009, 49, 1669–1678. (39) Staudinger, J.; Roberts, P. V. A critical review of Henry’s law constants for environmental applications. Crit. Rev. Environ. Sci. Technol. 1996, 26, 205–297. (40) Jensen, A. A.; Poulsen, P. B.; Bossi, R. Survey and Environmental/Health Assessment of Fluorinated Substances in Impregnated 8127
dx.doi.org/10.1021/es101181g |Environ. Sci. Technol. 2011, 45, 8120–8128
Environmental Science & Technology
ARTICLE
Consumer Products and Impregnating Agents, No. 99; Danish Environmental Protection Agency: Copenhagen, 2008. (41) Environmental and Health Assessment of Perfluorooctane Sulfonic Acid and its Salts, US EPA Docket AR-226-602; 3M Company: St Paul MN, 2003. (42) EPI suite. http://www.epa.gov/oppt/exposure/pubs/episuite. htm (accessed October 7, 2010). (43) Katritzky, A. R.; Slavov, S. H.; Dobchev, D. A.; Karelson, M. Rapid QSPR model development technique for prediction of vapor pressure of organic compounds. Comput. Chem. Eng. 2007, 31, 1123–1130. (44) Environmental risk evaluation report: Perfluorooctane-sulphonate (PFOS), UK Environment Agency, 2004. pfos.uk.risk.eval.report (accessed October 7, 2010). (45) Robinson, B. H.; Bucak, S.; Fontana, A. On the concept of driving force applied to micelle and vesicle self assembly. Langmuir 2000, 16, 8231–8237. (46) Patlewicz, G. Y.; Rodford, R. A.; Ellis, G.; Barratt, M. D. A QSAR model for the eye irritation of cationic surfactants. Toxicol. in Vitro 2000, 14, 79–84. (47) Martin, J. W.; Mabury, S. A.; Solomon, K. R.; Muir, D. C. G. Bioconcentration and tissue distribution of perfluorinated acids in Rainbow Trout (Oncorhynchus Mykiss). Environ. Toxicol. Chem. 2003, 22, 196–204. (48) Tolls, J.; Sijm, D. T. H. M. A preliminary evaluation of the relationship between bioconcentration and hydrophobicity for surfactants. Environ. Toxicol. Chem. 1995, 14, 1675–1685.
8128
dx.doi.org/10.1021/es101181g |Environ. Sci. Technol. 2011, 45, 8120–8128