Predicting Storage–Lipid Water Partitioning of Organic Solutes from

Apr 2, 2015 - In this work, we evaluated the performance of four predictive models, ABSOLV, COSMOtherm, KOWWIN, and SPARC to calculate storage lipidâ€...
0 downloads 4 Views 817KB Size
Article pubs.acs.org/est

Predicting Storage−Lipid Water Partitioning of Organic Solutes from Molecular Structure Anett Geisler,†,§ Luise Oemisch,† Satoshi Endo,†,∥ and Kai-Uwe Goss*,†,‡ †

UFZ - Helmholtz Centre for Environmental Research, Permoserstraße 15, D-04318 Leipzig, Germany University of Halle-Wittenberg, Institute of Chemistry, Kurt Mothes Straße 2, D-06120 Halle, Germany



S Supporting Information *

ABSTRACT: Partitioning to storage fat is the major process for bioaccumulation of many neutral organic chemicals. In this work, we evaluated the performance of four predictive models, ABSOLV, COSMOtherm, KOWWIN, and SPARC to calculate storage lipid−water partition coefficients. In a first step of the validation, we used over 300 literature data for chemicals with relatively simple molecular structures. For these compounds the overall performance was similar for all models with a root-mean-square error (rmse) between 0.45 and 0.61 log units. Clear differences became visible in the second validation step where a subset with only H-bond-donor compounds was used. Here, COSMOtherm and SPARC performed clearly better with an rmse of 0.35 and 0.42 log units, respectively, compared to ABSOLV and KOWWIN with an rmse of 0.91 and 0.85 log units, respectively. The last step in our validation was a comparison with experimental values for 22 complex, multifunctional chemicals (including pesticides, hormones, mycotoxins) that we measured specifically for this validation purpose. For these chemicals, predictions by all models were less accurate than those for simpler chemicals. COSMOtherm performed the best (rmse 0.71 log units) while the other methods showed considerably poorer results (rmse 1.29 (ABSOLV), 1.25 (SPARC), and 1.62 (KOWWIN) log units).



partition coefficients (Kstorage lipid/water) at 37 °C.2 This ppLFER model appears,

INTRODUCTION Lipids are among the most relevant phases for bioaccumulation of neutral organic chemicals. There are two main types of lipids, storage lipids and membrane lipids, which differ in their sorptive properties.1,2 Storage lipids refer to triglycerides, whose fatty acids vary in their chain length (typically between C6 and C22) and the degree of unsaturation. They form an unstructured phase similar to organic solvent. In contrast, membrane lipids form structured bilayers that surround, among others, biological cells in organisms. The two types of lipids often have high contributions to the overall sorptive capacity of the organism, and the actual contributions depend on the types of chemical of concern and the content of the two lipids in the organism.3 Partitioning of organic chemicals to membrane lipids and its prediction has been discussed in detail elsewhere1,4 and the literature cited therein. Partitioning into storage lipids is thus the focus of the present work. In previous work we have shown that the origin of storage lipids (i.e., various animals and plants) and the fatty acid composition has no significant influence on the partitioning of a wide variety of nonpolar and polar organic chemicals to storage lipids.2,5 Thus, all storage lipids can be treated as a single phase when it comes to the equilibrium partitioning of neutral organic compounds. Moreover, using a large and consistent data set, we calibrated a polyparameter linear free energy relationship (ppLFER) model for the prediction of storage lipid−water © 2015 American Chemical Society

Log K storage lipid/water = 0.58L i − 1.62Si − 1.93A i − 4.15Bi + 1.99Vi + 0.55 (1)

where Li is the log of the hexadecane/air partition coefficient, Si is the solute dipolarity/polarizability, Ai is the solute H-bond donor property, Bi is the solute H-bond acceptor property, and Vi is McGowan’s volume. These five descriptors quantify the solute propensities for the molecular interactions that govern the partition process: van der Waals interactions and cavity formation (Vi, Li), polar interactions (Si), and H-bond interactions (Ai, Bi). The respective interaction properties of the storage lipid−water partition system are encoded in the calibrated coefficients that scale the influence of the solute descriptors on log Kstorage lipid/water. This model allows the accurate prediction of Kstorage lipid/water (root-mean-square error (rmse) = 0.20), provided that high quality solute descriptors are available.2 Solute descriptors have been reported in the Received: Revised: Accepted: Published: 5538

December 31, 2014 March 20, 2015 April 2, 2015 April 2, 2015 DOI: 10.1021/es506336m Environ. Sci. Technol. 2015, 49, 5538−5545

Article

Environmental Science & Technology

models for Kow at 37 °C or temperature-dependence of Kow are available. Therefore, following the current practice, we also used Kow values for 25 °C predicted by KOWWIN without any temperature correction. ABSOLV. ABSOLV, a module in ADME Boxes v 5.0 (ACD/ Laboratories, Toronto, Canada; http://www.acdlabs.com/ company/media/pr/2009_02_pa.php), is a commercial QSAR model that predicts the pp-LFER solute descriptors for chemicals. Chemicals are entered in SMILES notation. ABSOLV is based on a fragment-contribution method that is calibrated with a large number of experimentally determined solute descriptors.18 ABSOLV seems to include some additional (undisclosed) adjustment for improved predictions.19 The predicted solute descriptors can be combined with existing pp-LFER equations for predicting equilibrium partition coefficients. In this study, we used the ABSOLV-predicted descriptors together with the calibrated pp-LFER equation for storage lipid−water partitioning at 37 °C (eq 1). For a validation of the ability to predict temperature-dependence, we used ABSOLV-predicted descriptors in combination with the pp-LFER equation for the enthalpy of the storage lipid−water partitioning process, ΔHi (kJ/mol) = 10.51 Li − 49.29 Si − 16.36 Ai + 70.39 Bi − 66.19 Vi + 38.95.2 COSMOtherm. COSMOtherm is a commercial software provided by COSMOlogic GmbH & Co. KG, Leverkusen, Germany and predicts various chemical properties based on the COSMO-RS theory.20 COSMOtherm requires three-dimensional cosmo files generated by a quantum-chemical dielectric continuum solvation calculation. A cosmo file encompasses information for interaction properties of the molecule. Using cosmo files, COSMOtherm performs a statistical thermodynamics treatment of surface interactions and can predict equilibrium partition coefficients. The prediction can be performed for any solute in any partition system at desired temperature, provided that the solute and solvents have defined molecular structures. The version of COSMOtherm we used in this study is C30_1401 with parametrization BP_TZVPDFINE_C30_1401. In a recent article from our group,21 we found this parametrization to perform best in a large validation effort. COSMOconf (v 3.0) was used to obtain cosmo files for storage lipids, water, and all solute molecules considered in this work. Triolein (i.e., trioleoylglycerol) and nonanoic acid triglyceride (NTG; i.e., trinonanoylglycerol) were used as model structures for storage lipids in the COSMOtherm calculations (for structures see Supporting Information Table SI-1A). The choice of storage lipid types for the calculations is arbitrary but has little influence on the prediction (see Results and Discussion). SPARC. This software was available online free of charge (http://archemcalc.com/sparc) at the time when the calculations for this study were done, but has become commercial software since mid 2013. SPARC estimates the molecular interactions (e.g., van der Waals, H-bonds) that are responsible for a partition process based on a molecular fragment approach. Fragment values are from an extensive calibration with existing partition data.22 It calculates equilibrium partition constants for organic compounds at any temperature and in any partition system whose molecular structure is known and can be provided in SMILES notation. As for COSMOtherm, triolein and NTG were used as model structures for storage lipids. We accessed the SPARC software between 2010 and 2013 and mainly used version 4.5, but some calculations were performed with version 4.6 because of the version change on the course of

literature for at least 2000 compounds.6 For chemicals whose descriptors are unavailable, however, the descriptors have to be determined by time-consuming experiments.7−10 For a fast screening of the bioaccumulation potential for tens of thousands of organic compounds, we thus need other predictive methods that only require molecular structure as input information. In this work, we evaluate 4 such models, KOWWIN, ABSOLV, COSMOtherm, and SPARC (see Materials and Methods section for model descriptions). The three latter methods allow a direct prediction of the partition coefficient from water to triglycerides. In contrast, the first approach, KOWWIN only predicts the partition coefficient from water to octanol (Kow). Nevertheless, we included KOWWIN in the evaluation because octanol is often used as surrogate for lipids and KOWWIN is among the most widely used tools for the prediction of bioaccumulation.11 For the model evaluation, 305 experimental storage lipid− water partition coefficients at 37 °C from the literature were used. This data set covers a broad range of molecular interaction properties (i.e., van der Waals, H-bond interactions). The data are, however, mostly for monofunctional compounds. To test the robustness of the models, we measured and evaluated additional compounds with higher complexity in the molecular structure such as hormones, mycotoxins and pesticides. Due to their multifunctionality, these compounds pose a special challenge for predictive models. Finally, a published set of partition data at 7 °C was used to evaluate how well the models would predict the temperature dependence of Kstorage lipid/water.



MATERIALS AND METHODS Models. KOWWIN. The standard method for predicting the storage lipid−water partition coefficient is to predict Kow with an existing quantitative structure activity relationship (QSAR) and assume that there is little difference between octanol and storage lipid (i.e., Kstorage lipid/water = Kow).12−15 KOWWIN is publicly available as part of EPI Suite provided by the U.S. Environmental Protection Agency under http://www.epa.gov/ oppt/exposure/pubs/episuite.htm. The version we used is 1.68. KOWWIN uses the molecular structure of the partitioning chemical in the form of a SMILES string as input information. The model is based on an atom and fragment contribution method and has been calibrated with Kow data for 2447 training compounds. There is little information on the applicability domain of KOWWIN. It has been shown that models based on similar concepts can fail dramatically when applied to new structures that had not been part of the calibration data set.16,17 In addition, octanol may not always be a good surrogate for storage lipids, as suggested by differences in fitting coefficients of the pp-LFER equations that describe the octanol−water and storage lipid−water partition systems.2 The pp-LFERs indicate that H-bond donor solutes will sorb stronger to octanol than to storage lipids. Our intention of using KOWWIN in this study is to evaluate how useful KOWWIN may be despite these shortcomings. Note that KOWWIN can only provide Kow values for 25 °C and not for 37 °C as would be required for a comparison with experimental data at 37 °C. A temperature difference by 12 °C causes a shift of up to 0.2 log units in the log Kow value if the corresponding enthalpy is within ±30 kJ/ mol. Typically, this minor discrepancy is neglected when KOWWIN or other QSARs are used for estimating lipid−water partition coefficients, presumably because no convenient QSAR 5539

DOI: 10.1021/es506336m Environ. Sci. Technol. 2015, 49, 5538−5545

Article

Environmental Science & Technology

Table 1. Statistic Values for All Models and Validation Steps. Root-Mean-Square Errors (rmse) Are Given for the Logarithmic Partition Coefficients Step 1 (Simple chemicals) Step 2 (H-bond donor chemicals) Step 3 (Complex chemicals) a

number of chemicals rmse number of chemicals rmse number of chemicals rmse

ABSOLV

COSMOtherm

KOWWIN (Kow at 25 °C)

SPARC (v 4.5)

304a 0.61 51 0.91 24 1.29

304a 0.45 51 0.35 24 0.71

305 0.60 51 0.84 24 1.62

302a 0.54 51 0.42 24 1.25

ABSOLV, COSMOtherm, and SPARC v 4.5 do not calculate the value for SF6. SPARC v 4.5 does not calculate CS2 and CH3CH2CClF2 either.

Figure 1. Experimental storage lipid−water partition coefficients for all literature compounds versus predictions by four different models (Step 1 evaluation).

deviation of measurements even from different laboratories.2 These data are shown in Table SI-1 in the Supporting Information. A subselection of 51 H-bond donor substances taken from this first validation data set were used for Step 2 of our evaluation, because a comparison of the pp-LFER equations for octanol and storage lipids in Supporting Information SI-2A within the former paper2 reveals substantial differences in the “aA” term. This implies that H-bond donor compounds should partition differently to octanol and storage lipid phases and could exhibit systematic errors when Kow is assumed to equal Kstorage lipid/water. For Step 3 of our evaluation, we used experimental data for 31 complex compounds with more than one polar functional group per molecule. Twentytwo of these values are new data measured in this work (for the method, see below) and the other 9 values have already been reported elsewhere.5,23 The experimental Kstorage lipid/water data for these 31 compounds are provided in Table 2 below and the

this study. The version used for each calculation is clearly indicated in the following discussions. Experimental Data for Model Evaluation. In a first validation step (Step 1), we considered all log Kstorage lipid/water data that were used to calibrate the pp-LFER model in the previous study2 plus some additional data mostly for hydrophobic compounds such as polychlorinated biphenyls (PCBs) and polycyclic aromatic hydrocarbons.23 All data are for 37 °C and for the neutral species in case chemicals are ionizable. The log Kstorage lipid/water data set covers a range from −2.66 to 9.88 and represents a large number of chemical groups (e.g., alkanes, esters, ketones, ethers, aldehydes, alcohols, acids, halogenated compounds, aromatic compounds with different substitutions), although the molecular structure of the chemicals is relatively simple, typically with no or only one polar functional group. The data are consistent as indicated by the good fit of the ppLFER2 and are generally precise as shown by the small standard 5540

DOI: 10.1021/es506336m Environ. Sci. Technol. 2015, 49, 5538−5545

Article

Environmental Science & Technology

Figure 2. Experimental storage lipid−water partition coefficients for H-bond donor substances versus model-predicted values (Step 2 evaluation).



RESULTS AND DISCUSSION Natural storage lipids contain a mixture of triglycerides having fatty acids with carbon numbers ranging from 6 to 22. For the calculations with SPARC v 4.5 and COSMOtherm, we have tried out triolein (18 carbon atoms in each of the three fatty acids) and NTG (9 carbon atoms each) as model lipid structures. Kstorage lipid/water-predictions for nearly 300 compounds were made using the two model lipid structures and compared in Figure SI-1 in the Supporting Information. In both SPARC and COSMOtherm, triolein and NTG gave similar values of Kstorage lipid/water. The mean absolute difference in log Kstorage lipid/water between triolein and NTG was 0.17 and 0.25 for SPARC v 4.5 and COSMOtherm, respectively. Maximal difference was 0.35 and 0.59, respectively. Therefore, for all calculations and results discussed in the following, we focus on NTG. NTG has the advantage that it is relatively small and thus demands much less computation time in the quantum-chemical dielectric continuum solvation calculation, which has to be performed at least once before all COSMOtherm calculations. Step 1, Simply Structured Substances. Model evaluation with the 305 literature data reveals good overall agreement between experimental data and predictions from all models (rmse from 0.45 to 0.61 log units; see Table 1 and Figure 1a− d). The data were predicted within an accuracy of 1 log unit for most of the compounds. Two distinct outliers were found with the SPARC v 4.5 method: triethylphosphate, the only compound with the phosphate group in the Step 1 data set, and PCB 209, the largest of PCBs. Predictions for phosphates appear to be generally inaccurate with SPARC v 4.5 and v 4.6.21,24 We also tested v 4.6 for predicting Kstorage lipid/water of

respective model predictions are provided in Supporting Information SI-2. The model evaluation with the complex chemical structures possessing more than one functional group is instructive because many of the thousands of chemicals that have to be assessed by regulatory authorities concerning their environmental behavior also possess such complex structures. For prediction methods, such complex structures typically pose a challenge because neighboring functional groups often do not contribute additively to the free energy of partitioning due to intramolecular interactions. Experimental Method. The experimental storage lipid− water partition coefficients for Step 3 have been obtained using the silicon membrane equilibrator method (SMEM).23 The method details were described in the cited reference. In short, a PDMS tube was placed within a vial that contained soybean oil spiked with the substance under investigation. Water was filled into the tube whereby an equilibrium partitioning from the oil phase to the water phase was established via the PDMS. If the target substance can significantly deprotonate around neutral pH, water was acidified before plugged into the PDMS tube. After equilibration, the water phase was sampled and analyzed for the concentration of the substance. Initially, we intended to use solid phase microextraction (SPME) fibers as passive samplers for our measurements. However, it turned out that using SMEM represents a more reliable method for measuring Kstorage lipid/water for polar compounds. This issue has been discussed previously,23 and further discussions are presented in the Supporting Information SI-3 with additional microstructural investigations using scanning electron microscopy (SEM). 5541

DOI: 10.1021/es506336m Environ. Sci. Technol. 2015, 49, 5538−5545

Article

Environmental Science & Technology

Table 2. 31 Kstorage lipid/water Values for Complex Compounds with Standard Deviation (SD) (Step 3)

triethylphosphate but it was still a strong outlier. Prediction errors for PCBs by SPARC v 4.5 appear to increase with increasing size of PCBs and similar trends also exist for ABSOLV and KOWWIN though to a lesser extent. While we are aware that accurate measurements of partition coefficients for large PCBs are generally challenging, the data used were measured with a passive sampling method, which is the state-ofart approach for hydrophobic chemicals, and the difference of 3 log units between experimental and predicted values is too large to ascribe only to experimental uncertainty. The relatively large errors for PCBs may rather be related to the bulkiness of PCB molecules which may not accurately be modeled by incremental methods.25 The evaluation further indicates systematic errors for long alkanes in the KOWWIN prediction. Such systematic deviations as found for SPARC v 4.5, ABSOLV, and KOWWIN were not observed for COSMOtherm. Step 2, H-bond donor Substances. As outlined above, one expects a systematically poorer performance of the Kow-based approach to predict Kstorage lipid/water for H-bond donor chemicals, and thus an evaluation of the four prediction models with respect to H-bond donor chemicals was of particular interest. Therefore, in the second step, we limited the evaluation to a subset of 51 H-bond donor compounds (all compounds with an experimental H-bond donor descriptor A > 0.3, see ref 2 for the values). The results for KOWWIN do indeed show a strong systematic overestimation of Kstorage lipid/water (Figure 2c) and an rmse that is higher than for the complete data set (Table 1). Figure SI-4 in the Supporting Information compares the A descriptor and the difference between KOWWIN-predicted Kow and experimental Kstorage lipid/water, showing that Kow > Kstorage lipid/water for H-bond donor compounds. We note that this systematic error is not attributed to the uncertainty of KOWWIN to predict Kow but to the use of octanol as a surrogate for storage lipids. Thus, the same systematic error should occur no matter which Kow estimation model is used, even in the case of experimental Kow. Interestingly, the results for ABSOLV (Figure 2a) also show a poorer prediction for H-bond-donor compounds (rmse = 0.91) as compared to the complete data set (rmse = 0.61). This systematic error cannot be due to an insufficient calibration of the pp-LFER equation used in combination with the ABSOLV descriptors, because all 51 experimental values had been part of the calibration. While most of the values are overestimated by the ABSOLV approach, the values for all halogenated phenols are underestimated. It is currently unknown why ABSOLV exhibits relatively large errors for H-bond donor compounds. The other two models, COSMOtherm and SPARC v4.5, show a substantially better prediction for H-bond-donor compounds (rmse of 0.35 and 0.42, respectively) than KOWWIN and ABSOLV, suggesting the robustness of COSMOtherm and SPARC v4.5 for such polar compounds. Step 3, Complex substances. In this section, we look at chemicals that contain more than one functional group and have more complex molecular structure than those considered above. The selected compounds cover hormones and hormone active compounds (e.g., estrone, bisphenol A, phthalate esters), fungicides, herbicides, and mycotoxins. Functional groups that are represented include alcohol, amide, carbonyl, nitrile, ester, epoxide, and phenyl groups. A list of all 31 experimental data is provided in Table 2. For 7 out of the 22 newly measured chemicals (all seven chemicals are mycotoxins), we can only report an upper limit for Kstorage lipid/water because the value appears to be too small to

compound Bromoxynil Ioxynil Testosterone Estrone Penconazole Malathion Flusilazole Methidathion Methiocarb Lenacil Progesterone Monuron Carbamazepine Estradiol Bisphenol A Carbazole a

Log Kstorage lipid/water ± SD

compound

Data Measured in This Study 2.19 ± 0.01 Deoxycorticosterone 2.84 ± 0.05 Metconazole 1.96 ± 0.01 3-Acetyl-DON 2.65 ± 0.04 Aflatoxin B1 2.30 ± 0.04 Aflatoxin B2 2.88 ± 0.02 Aflatoxin G1 2.88 ± 0.04 Aflatoxin M1 2.43 ± 0.02 Alternariol 2.54 ± 0.03 Altenuene 1.26 ± 0.01 Fusarenone-X 3.18 ± 0.07 Verrucarin Data from the Literature 0.78 ± 0.01a Atrazine 0.50 ± 0.03a Diethyl phthalate 2.42 ± 0.02a dipropyl phthalate 1.75 ± 0.03a Dipentyl phthalate 3.49 ± 0.02a

Log Kstorage lipid/water ± SD 2.10 ± 3.43 ±