Complexation of Mn2+, Fe2+, Y3+, La3+, Pb2+ ... - ACS Publications

Sep 6, 2012 - Vitaly Solov'ev , Natalia Kireeva , Svetlana Ovchinnikova , Aslan Tsivadze. Journal of Inclusion Phenomena and Macrocyclic Chemistry 201...
0 downloads 0 Views 461KB Size
Subscriber access provided by UNIVERSITY OF SASKATCHEWAN LIBRARY

Article

Complexation of Mn2+, Fe2+, Y3+, La3+, Pb2+, and UO22+ with Organic Ligands: QSPR Ensemble Modeling of Stability Constants Vitaly Solovev, Gilles Marcou, Aslan Yu. Tsivadze, and Alexandre Varnek Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/ie301271s • Publication Date (Web): 06 Sep 2012 Downloaded from http://pubs.acs.org on September 13, 2012

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Complexation of Mn2+, Fe2+, Y3+, La3+, Pb2+, and UO22+ with Organic Ligands: QSPR Ensemble Modeling of Stability Constants Vitaly Solov’ev * 1, Gilles Marcou 2, Aslan Tsivadze 1, Alexandre Varnek 2 Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Leninskiy prospect, 31a, 119991, Moscow, Russian Federation 1

Laboratoire d’Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, rue B.Pascal, Strasbourg, 67000, France

2

ABSTRACT: Quantitative structure - property relationship (QSPR) modeling of the stability constant logK of the 1:1 (M:L) complexes of 6 transition metal cations (M) with 261 (Mn2+), 87 (Fe2+), 105 (Y3+), 186 (La3+), 226 (Pb2+) and 66 (UO22+) organic ligands (L) in aqueous solutions at 298 K and an ionic strength 0.1 M was performed using ensemble multiple linear regression analysis and substructural molecular fragments as descriptors. The models have been validated in external 5fold cross-validations procedure and on new ligands recently reported in the literature. Predicted logK values were calculated by consensus models as arithmetic means of 315 (Mn2+), 119 (Fe2+), 260 (Y3+), 290 (La3+), 304 (Pb2+) and 249 (UO22+) individual models. Absolute prediction error of logK is below 1.0 for 75% (UO22+), 70% (Mn2+, Fe2+, La3+), 65% (Pb2+) and 60% (Y3+) of the ligands and comparable with the systematic errors in experimental data. The developed QSPR models were used to screen selective ligands for the studied cations. The obtained models are incorporated in the COMET predictor available at http://infochim.u-strasbg.fr/cgi-bin/predictor.cgi. Keywords: Multiple linear regression analysis, substructural molecular fragments, QSPR modeling of stability constants; design of selective metal binders; complexes of Mn2+, Fe2+, Y3+, La3+, Pb2+, and UO22+ with organic ligands in water.

1 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 25

1. INTRODUCTION

Binding of metal cations by organic ligands in solutions plays an essential role in metalloelement-dependent biological processes

1-4

and in various workflows in industry

4-7

. Many

efforts in research laboratories are made to design the ligands that selectively bind a given metal cation

8-11

. The careful selection of compounds, which potentially are selective metal binders, is

highly important due to the cost, efforts, and time associated with the synthetic and measurement processes 12-15. A large amount of known experimental data concerning stability constants of metalligand complexes has been collected

16-22

. This opens an opportunity to develop quantitative

structure - property relationships (QSPR) linking the stability constant with the structure of ligands which, in turn, can be used for computer-aided design of new metal binders 23,24. Up to date, QSPR modeling of stability constants of the complexation of metal ions was performed for alkali 25-33, alkaline-earth 33-38, rare-earth

39-42

and transition metal24,34,36,37,40,43,44 ions.

In many cases the practical application of the reported QSPR is complicated due to the lack of complete information about descriptors’ calculations and details of machine-learning method implementation. In order to overcome this problem, we have developed the COMET (COmplexation of METals) software 39 which implements previously elaborated QSPR models of stability constants of the 1:1 (M:L) complexes of diverse organic ligands with alkaline-earth (Sr2+ 35, Ca2+, Ba2+, Mg2+), lanthanide

39

(Ce3+, Pr3+, Nd3+, Sm3+, Eu3+, Gd3+, Tb3+, Dy3+, Ho3+, Er3+, Tm3+, Yb3+, Lu3+) and

transition metal cations (Ag+ 40, Zn2+, Cd2+ and Hg2+ 24) in water at 298 K and an ionic strength 0.1 M. The reliability of predictions has been improved using ensemble modeling approach and taking into account models applicability domain 24,39,40,45,46. We report here QSPR modeling of the stability constant logK of the 1:1 (M:L) complexes of transition metal cations Mn2+, Fe2+, Y3+, La3+, Pb2+, and UO22+ with diverse sets of organic molecules in aqueous solution at 298 K and an ionic strength 0.1 M. Totally, 642 different organic ligands were involved in the modeling. The models have been validated in external 5-fold crossvalidations procedure. Hundreds of individual models for every metal form a consensus model which is applied for the logK predictions and to screen selective ligands taking into account applicability domain of models. The obtained models are incorporated in the COMET predictor available at http://infochim.u-strasbg.fr/cgi-bin/predictor.cgi.

2 ACS Paragon Plus Environment

Page 3 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

2. METHODS

2.1. Data Sets. The experimental stability constant (logK) values for the 1:1 (M:L) complexes of Mn2+, Fe2+ , Y3+, La3+, Pb2+, and UO22+ with diverse organic ligands in water were critically selected from the IUPAC Stability Constants Database (SC DB) 16 at standard temperature 298 K and an ionic strength I = 0.1 M. Some logK values were corrected to specified temperature and an ionic strength using the procedures included in SC DB. Thus, Debye-Huckel theory and the Davies equation for the mean ionic activity coefficient were applied to adjust the stability constant from I΄ = 0 – 0.3 M to an ionic strength I = 0.1 M 16. If an experimental enthalpy (∆H) value of an equilibrium is available, and it is assumed that ∆H is independent of temperature, the van't Hoff equation in the integration form was used to calculate logK at 298 K from a knowledge of its value at some other temperature T΄ 16. As a rule, the T΄ values varied from 288 to 303 K. Text files resulted from searching in SC DB were converted into Structure - Data Files (SDF). SDF includes the 2D structures of the ligands, names of metal ions as well as the corresponding logK values for the complexation reactions. If several values of logK were available for a particular ligand, the most recent data or the data consistent with respect to different experimental methods were chosen. The data manager EdiSDF 35,47,48 was used to prepare data sets containing 261 (Mn2+), 87 (Fe2+), 105 (Y3+), 186 (La3+), 226 (Pb2+) and 66 (UO22+) organic ligands. The logK values vary in the ranges of 0.2 – 19.9 (Mn2+), -0.4 – 20.2 (Fe2+), 1.7 – 17.0 (Y3+), 1.1 – 16.8 (La3+), 1.0 – 22.7 (Pb2+) and 0.7 – 15.5 (UO22+); although the most values lie between 2 and 6 (Mn2+), 5 and 7 (Fe2+), 1 and 6 (Y3+), 0 and 5 (La3+), 1 and 10 (most widely, Pb2+), 1 and 8 (UO22+) (Figure 1). The datasets for six metal ions contain totally 642 organic ligands. The names of the ligands and experimental stability constant values are given as supporting information in Tables SM1 – SM6. As a rule, an organic ligand has several functional groups. Studied molecules include derivatives of carboxylic and polycarboxylic acids; various aminocarboxylates; derivatives of phosphoric and phosphinic acids; cyclic and acyclic polydentate ligands with the terminal carboxy and phosphoryl groups separated by various cyclic or acyclic spacers; various sulfonic acids; ternary amines with phosphono and carboxy groups; mono- and dipodands of ternary amines; phenol, adenosine and guanosine derivatives; crown-ethers, thia-, and aza-crowns with neutral and acidic 3 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 25

lariat groups, cryptands; pyridines; phenanthroline derivatives, etc. (see supporting information: Tables SM1 – SM6). 2.2. Descriptors. Substructural molecular fragments (SMF)

24,31,46,47

as subgraphs of

molecular graphs of the ligands were used as descriptors in QSPR models. A fragment occurrence is a descriptor value. The descriptors were derived solely from 2D chemical structures. Molecules were represented with implicit hydrogen atoms. Two classes of the SMF descriptors were generated: shortest topological paths with explicit representation of atoms and bonds, and terminal groups as shortest paths but defined by length and explicit identification of terminal atoms and bonds. The Floyd algorithm 49 was used to find the shortest paths in the molecular graphs. Single, double, triple and aromatic bonds are recognized. Single, double and triple bonds were considered different in acyclic and cyclic non-aromatic motifs. The minimal (nmin ≥ 2) and maximal (nmax ≤ 15) numbers of constituent atoms are defined for both classes of the sequences. The values of nmin and nmax varied from 2 to 15. The notations IAB(nmin - nmax) and IAB(nmin - nmax)t represent SMF types of two classes including all intermediate shortest paths with n atoms: nmin ≤ n ≤ nmax. 210 types of SMF of two classes have been generated by varying the values of nmin and nmax. SMF descriptors of each particular type were used as an initial descriptors’ pool in QSPR modeling to build several QSPR models applying different variable selection technique. Concatenated fragments always occurring in the same combination in each compound of the training set were considered as one extended fragment. Rare fragments (i.e., found in less than m molecules, here m < 2) were excluded. 2.3. Machine learning method. QSPR modeling was performed using Multiple Linear Regression Analysis (MLR) of the ISIDA/QSPR program backward stepwise variable selection techniques

24,31,48,50,51

50

with original combined forward and

. At present, the program can generate

more than 25,000 MLR models; each of them corresponds to particular type of the SMF descriptors and applied variable selection technique. The leave-one-out (LOO) cross–validation correlation coefficient Q corresponds to the robustness of models and it serves as a criterion of model selection: the acceptable models are characterized by Q2 > Q2lim, where Q2lim is a user defined threshold. In this work, Q2lim = 0.5 was applied for all studied metal ions. The logK values were predicted by consensus models (CMs). One consensus model combines predictions issued from a multitude of individual models originated from different types of the SMF descriptors and variable selection algorithms

24,29,35,45,51

. Thus for each compound from

the test set, the target property is computed as an arithmetic mean of values obtained by individual models excluding those leading to outlying values according to Tompson’s rule and a method of ranked series 52. If a test compound is identified as being outside an applicability domain (AD) of an 4 ACS Paragon Plus Environment

Page 5 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

individual model, the prediction by given model for a given compound is not included in CM. Two AD approaches - bounding box and fragment control - have conjointly been applied. The bounding box method considers as AD a multi-dimension descriptor space confined by minimal and maximal values of counts of SMF descriptors involved in an individual model. Fragment control rejects a prediction for a test compound containing SMF fragments which don’t occur in the initial SMF pool generated for the training set. In order to validate CM, the external 5-fold cross validation (5-CV) was applied 40,45. In this procedure, an entire dataset is divided in 5 non-overlapping pairs of training and test sets. Each training set covers 4/5th of the dataset while the related test set covers the remaining 1/5th. Predictions are prepared for all molecules of the initial dataset, since each of them belongs to one of the test sets. The descriptor selection and model acceptance procedures were performed only on the training folds. No information about test set compounds has been used at the training stage. Predictive performance of CM has been estimated using coefficient of determination (R02), rootmean squared error (RMSE) and mean absolute error (MAE) for a combination of all five test sets R02 = 1 – Σ (logKexp – logKpred)2/ Σ (logKexp – exp)2, RMSE = (Σ (logKexp – logKpred)2/n)1/2 and MAE = ΣlogKexp – logKpred/n, were logKexp and logKpred are, respectively, experimental and predicted values of the stability constant.

3. RESULTS AND DISCUSSION

2100 individual structure - property models were built for every metal ion, only the most robust models (Q2 > 0.5) were used for the preparation of CMs. The number of individual models in CM varies from one fold in 5-CV to another one: 277 - 323 (Mn2+), 120 (Fe2+), 152 - 261 (Y3+), 276 - 318 (La3+), 290 - 313 (Pb2+) and 220 - 238 (UO22+). Obtained CMs demonstrate a reasonable predictive ability in 5-CV (Figure 2): RMSE is 1.1 (Mn2+), 1.2 (Fe2+), 1.3 (Y3+), 0.96 (La3+), 1.1 (Pb2+) and 1.1 (UO22+), MAE is 0.83 (Mn2+), 0.86 (Fe2+), 1.0 (Y3+), 0.75 (La3+), 0.86 (Pb2+) and 0.80 (UO22+) whereas squared determination coefficient R02 varies from 0.823 (Y3+) to 0.926 (Pb2+) (Figure 2). The Regression Error Curves evidence (Figure 3) that absolute prediction error is below 1.0 for 75% (UO22+), 70% (Mn2+, Fe2+, La3+), 65% (Pb2+) and 60% (Y3+) of the ligands. 5 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 25

It should be noted that discrepancies in experimental logK values reported for the same equilibrium by different authors may attain rather high values from 0.7 to 1.8

53-64

(Table 1). These

values are close to the prediction errors of the models (Figures 2 and 3) showing that predictive performance of QSPR models is appropriate. The elaborated QSPR models perform better than previously reported ones in respect diversity and the size of the ligand sets, ensemble modeling methodology and severe crossvalidation procedure. Earlier, QSPR modeling studies of logK for the Mn2+, Fe2+ 36,37,65, La3+ 41 and Pb2+

34

complexes were performed on relatively small data sets. These models were either not

validated

41

or their validation have been performed on a single test set

34,36

. The AD approaches

were never applied to these models 34,36,37,41,65. Particular types of SMF descriptors enable to build the individual models with a high squared LOO cross-validation correlation coefficient (Q2 = 0.85 – 0.99) and a low standard deviation (s = 0.29 - 0.96) for all training sets of 5-CV. In most of cases, these are terminal groups containing from 2 up to 10 atoms (Tables 2). It is interesting to note that the same molecular fragments often participate in many different individual models. Moreover, their contributions into logK for one particular metal slightly vary from one model to another one. For example, shortest topological paths O=P-C-P=O (Mn2+), CarNar-Car-Car-Nar-Car (Fe2+), N-C-C=O (Y3+), N-Car-Car-O (La3+), Car-Car-Car-C-C=O (Pb2+) and N-CC=O (UO22+) contribute 3.7, 3.1, 2.0, 3.4, 5.4 and 4.6 logK units into the stability constant according to the 54, 37, 36, 20, 21 and 35 models, correspondingly. These specified fragments characterize derivatives of diphosphonic acids (Mn2+), 1,10-phenanthrolines (Fe2+), ligands with aminoacetic acid groups (Y3+, UO22+), 1-nitroso-2-naphthols (La3+) and derivatives of hydroxamic acid (Pb2+) (Table 3). Thus, the ISIDA/QSPR program and SMF descriptors make possible to detect ligand moieties with high positive or negative contributions into logK which are useful for design of new metal binders. The SMF types providing the best models in 5-CV have been used to build and validate the models on the entire modeling data set. The final CMs include the 315 (Mn2+), 119 (Fe2+), 260 (Y3+), 290 (La3+), 304 (Pb2+) and 249 (UO22+) individual models. Predictive ability of CMs was validated on the additional test set of 16 complexes of Mn2+ and Pb2+ with 15 new organic ligands 811,66-68

which were not tested in the initial data sets. The ligands are mainly the derivatives of 9-, 12-

and 15-membered macrocycles with/without the lateral functional groups (Table 4). The statistical parameters (R2 = 0.913 and RMSE = 1.13, Table 4) demonstrate a reasonable agreement between the

6 ACS Paragon Plus Environment

Page 7 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

experimental and predicted logK values which is similar to that obtained in 5-CV on the modeling sets. Developed CMs were applied for screening of selective metal binders among the 2962 organic ligands 16 which can form the 1:1 complexes with metal ions in water. The logK values were predicted using the developed CMs and the FMF (Forecast by Molecular Fragments) program

24

.

FMF predicts the target property as an arithmetic mean of values obtained by a collection of individual ISIDA/QSPR models excluding those which lead to outlying values according to a method of ranked series

52

. FMF displays positive and negative SMF contributions of individual

structure - property models supporting the design of new compounds. The combination of bounding box and fragment control applicability domains is applied as in the ISIDA/QSPR program. The maximal predicted logKpred values as ligand selectivity to studied metals are given in grey cells of Table 5. Several ligands were selected for which the logKpred values are characterized by low standard deviations and large numbers of applied individual models. The obtained models for logK are embedded into COMET (COmplexation of METals) predictor

39

which is freely available at http://infochim.u-strasbg.fr/cgi-bin/predictor.cgi. A ligand

can be submitted as an SDF (MOL) file or prepared online. COMET applies an ensemble of MLR models to each metal. A combination of bounding box and fragment control applicability domains is applied.

4. CONCLUSIONS

QSPR modeling of the stability constant logK of the 1:1 (M:L) complexes of 6 transition metal cations with 261 (Mn2+), 87 (Fe2+), 105 (Y3+), 186 (La3+), 226 (Pb2+) and 66 (UO22+) organic molecules in aqueous solutions at 298 K and an ionic strength 0.1 M was performed using ensemble multiple linear regression analysis and substructural molecular fragments as descriptors. For every ligand, predicted logK value was calculated as an arithmetic mean over values calculated by ensemble of individual models. The models have been validated in external 5-CV procedure and on new ligands recently reported in the literature. The diversity and the size of the ligand sets significantly exceed those reported in previous studies, where the QSPR ensemble modeling and severe 5-CV were not used. Root mean squared errors for 5-CV are 1.1 (Mn2+), 1.2 (Fe2+), 1.3 (Y3+), 0.96 (La3+), 1.1 (Pb2+) and 1.1 (UO22+), which are comparable with the systematic errors in 7 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 25

experimental data. Developed consensus models were applied for screening of selective ligands to every metal cation among Mn2+, Fe2+ , Y3+, La3+, Pb2+, and UO22+ using the 2962 organic ligands from the IUPAC Stability Constants Database. The models are embedded into the COMET predictor available via the Internet.

ACKNOWLEDGMENT

We thank the Russian Foundation for Basic Research (project no. 09-03-93106) for the support. VS thanks Profs. G. Pettit and L. Pettit from Academic Software for providing with the SCDB-to-SDF program.

SUPPORTING INFORMATION

Tables SM1 – SM6 contain the names of the organic ligands (L) and the experimental stability constant values (logK) for the equilibria M + L = (M)L (M = Mn2+, Fe2+ , Y3+, La3+, Pb2+ and UO22+) in water at 298 K and an ionic strength 0.1 M.

REFERENCES

(1) Comprehensive Coordination Chemistry II. Bio-coordination Chemistry; Que, L. J.; Tolman, W. B., Eds.; Elsevier: San Diego, 2003; Vol. 8. (2) Bhattacharya, P. K. Metal Ions in Biochemistry; Alpha Scince International: Harrow, U. K., 2005. (3) Metal Ions in Biological Systems. Vol 37. Manganese and Its Role in Biological Processes; Sigel, A.; Sigel, H., Eds.; CRC Press: New York, 2000. (4) Tretyakov, Y. D.; Martynenko, L. I.; Grigoryev, A. N.; Tsivadze, A. Y. Inorganic Chemistry. Chemistry of Elements. Book 1 (Rus.); Himia: Moscow, 2001. (5) Comprehensive Coordination Chemistry II. Applications of Coordination Chemistry; Ward, M. D., Ed.; Elsevier: San Diego, 2003; Vol. 9. (6) Duca, G. Homogeneous Catalysis with Metal Complexes. Fundamentals and Applications. Springer Series in Chemical Physics; Springer: Berlin, Heidelberg, 2012; Vol. 102. 8 ACS Paragon Plus Environment

Page 9 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(7) Kumar, S.; Dhar, D. N.; Saxena, P. N. Applications of metal complexes of Schiff bases - A review. J. Sci. Ind. Res. 2009, 68 (March), 181-187. (8) DrahosQ, B.; Kotek, J.; CıSsarQová, I.; Hermann, P.; Helm, L.; LukesQ, I.; Tóth, E. Mn2+ Complexes with 12-Membered Pyridine Based Macrocycles Bearing Carboxylate or Phosphonate Pendant Arm: Crystallographic, Thermodynamic, Kinetic, Redox, and 1H/17O Relaxation Studies. Inorg. Chem. 2011, 50 (24), 12785−12801. (9) Svobodová, I.; Lubal, P.; Plutnar, J.; Havličková, J.; Kotek, J.; Hermann, P.; Lukeš, I. Thermodynamic, kinetic and solid-state study of divalent metal complexes of 1,4,8,11tetraazacyclotetradecane (cyclam) bearing two trans (1,8-)methylphosphonic acid pendant arms. Dalton Trans. 2006, (43), 5184-5197.

(10) Aragoni, M. C.; Arca, M.; Bencini, A.; Blake, A. J.; Caltagirone, C.; Decortes, A.; Demartin, F.; Devillanova, F. A.; Faggi, E.; Dolci, L. S.; Garau, A.; Isaia, F.; Lippolis, V.; Prodi, L.; Wilson, C.; Valtancoli, B.; Zaccheroni, N. Coordination chemistry of N-aminopropyl pendant arm derivatives of mixed N/S-, and N/S/O-donor macrocycles, and construction of selective fluorimetric chemosensors for heavy metal ions. Dalton Trans. 2005, (18), 2994-3004. (11) Bazzicalupi, C.; Bencini, A.; Bianchi, A.; Borsari, L.; Danesi, A.; Giorgi, C.; Lodeiro, C.; Mariani, P.; Pina, F.; Santarelli, S.; Tamayo, A.; Valtancoli, B. Basicity and coordination properties of a new phenanthroline-based bis-macrocyclic receptor. Dalton Trans. 2006, (33), 4000-4010. (12) Hancock, R. D. Approaches to Predicting Stability Constants. A Critical Review. Analyst 1997, 122 (4), 51R–58R. (13) Hancock, R. D.; Martell, A. E. Ligand design for selective complexation of metal ions in aqueous solution. Chem. Rev. 1989, 89 (8), 1875-914. (14) Martell A. E.; Hancock R. D.; Motekaitis R. J. Factors Affecting Stabilities of Chelate, Macrocyclic and Macrobicyclic Complexes in Solution. Coord. Chem. Rev. 1994, 133 (JUL), 39-65. (15) Dimmock, P. W.; Warwick, P.; Robbins, R. A. Approaches to predicting stability constants. Analyst 1995, 120 (8), 2159-2170. (16) IUPAC Stability Constants Database; version 5.33; http://www.acadsoft.co.uk/ 2012. (17) NIST Critically Selected Stability Constants of Metal Complexes Database: version 8.0; http://www.nist.gov/srd/nist46.cfm; US Dept. Commerce: Gaitherburg, 2004. (18) May, P. M.; Rowland, D.; Murray, K. Joint Expert Speciation System. A powerful research tool for modeling chemical speciation in complex environments: http://jess.murdoch.edu.au/jess_home.htm; Murdoch University: Perth, Western Australia, 19852012. (19) Martell, A. E.; Smith, R. M. Critical Stability Constants Plenum Press: New York, 1974, 1975, 1977, 1976, 1982, 1989; Vol. 1-6. (20) Christensen, J. J.; Izatt, R. M. Handbook of Metal Ligand Heats and Related Thermodynamic Quantities; Marcel Dekker Inc.: New York, 1983. (21) Izatt R. M.; Pawlak K.; Bradshaw J. S.; Bruening R. L. Thermodynamic and Kinetic Data for Macrocycle Interaction with Cations and Anions. Chem. Rev. 1991, 91 (8), 1721-2085. (22) Solov'ev V. P.; Vnuk E. A.; Strakhova N. N.; Raevsky O. A. Thermodynamics of Complexation of the Macrocyclic Polyethers with Salts of Alkali and Alkaline-Earth Metals (Rus.); VINITI: Moscow, 1991. (23) Varnek, A.; Solov’ev, V., Quantitative Structure-Property Relationships in solvent extraction and complexation of metals. In Ion Exchange and Solvent Extraction, A Series of Advances, Sengupta, A. K., Moyer, B. A., Eds. CRC Press, Taylor and Francis Group: Boca Raton, 2009; Vol. 19, pp 319-358. (24) Solov’ev, V.; Sukhno, I.; Buzko, V.; Polushin, A.; Marcou, G.; Tsivadze, A.; Varnek, A. Stability Constants of Complexes of Zn2+, Cd2+, and Hg2+ with Organic Ligands: QSPR 9 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 25

Consensus Modeling and Design of New Metal Binders. J. Incl. Phenom. Macrocycl. Chem. 2012, 72 (3-4), 309-321. (25) Daraei, H.; Irandoust, M.; Ghasemi, J. B.; Kurdian, A. R. QSPR probing of Na+ complexation with 15-crown-5 ethers derivatives using artificial neural network and multiple linear regression. J. Incl. Phenom. Macrocycl. Chem. 2012, 72 (3-4), 423-435. (26) Li, Y.; Su, L.; Zhang, X.; Huang, X.; Zhai, H. Prediction of association constants of cesium chelates based on Uniform Design Optimized Support Vector Machine. Chemom. Intell. Lab. Syst. 2011, 105 (1), 106-113. (27) Ghasemi, J. B.; Ahmadi, S.; Ayati, M. QSPR Modeling of Stability Constants of the Li-Hemispherands Complexes Using MLR: A Theoretical Host-Guest Study. Macroheterocycles 2010, 3 (4), 234-242. (28) Ghasemi, J.; Saaidpour, S. QSPR modeling of stability constants of diverse 15crown-5 ethers complexes using best multiple linear regression. J. Incl. Phenom. Macrocycl. Chem. 2008, 60 (3-4), 339-351. (29) Solov'ev V. P.; Varnek A. A. Structure-Property Modeling of Metal Binders Using Molecular Fragments. Rus. Chem. Bull. 2004, 53 (7), 1434-1445. (30) Varnek A. A.; Wipff G.; Solov’ev V. P.; Solotnov A. F. Assessment of the Macrocyclic Effect for the Complexation of Crown-Ethers with Alkali Cations Using the Substructural Molecular Fragments Method. J. Chem. Inf. Comput. Sci. 2002, 42 (4), 812-829. (31) Solov'ev V. P.; Varnek A. A.; Wipff G. Modeling of Ion Complexation and Extraction Using Substructural Molecular Fragments. J. Chem. Inf. Comput. Sci. 2000, 40 (3), 847858. (32) Gakh A. A.; Sumpter B. G.; Noid D. W.; Sachleben R. A.; Moyer B. A. Prediction of Complexation Properties of Crown Ethers Using Computational Neural Networks. J. Incl. Phenom. Mol. Recognit. Chem. 1997, 27 (3), 201-213. (33) Shi Z. G.; McCullough E. A. A Computer-Simulation - Statistical Procedure for Predicting Complexation Equilibrium-Constants. J. Incl. Phenom. Mol. Recognit. Chem. 1994, 18 (1), 9-26. (34) Cabaniss, S. E. Quantitative Structure-Property Relationships for Predicting Metal Binding by Organic Ligands. Environ. Sci. Technol. 2008, 42 (14), 5210–5216. (35) Solov'ev V. P.; Kireeva N. V.; Tsivadze A. Y.; Varnek A. A. Structure-Property Modelling of Complex Formation of Strontium with Organic Ligands in Water. J. Struct. Chem. 2006, 47 (2), 298-311. (36) Toropov, A. A.; Toropova, A. P.; Nesterova, A. I.; Nabiev, O. M. QSPR Modeling of Complex Stability by Correlation Weighing of the Topological and Chemical Invariants of Molecular Graphs. Russ. J. Coord. Chem. 2004, 30 (9), 611–617. (37) Toropov, A. A.; Toropova, A. P. QSPR Modeling of Complex Stability by Optimization of Correlation Weights of the Hydrogen Bond Index and the Local Graph Invariants. Russ. J. Coord. Chem. 2002, 28 (12), 877-880. (38) Raevskii, O. A.; Sapegin, A. M.; Chistyakov, V. V.; Solov'ev, V. P.; Zefirov, N. S. Development of a model for the relation between structure and complex forming ability. Koord. Khim. (Rus.) 1990, 16 (9), 1175-84. (39) Varnek, A.; Fourches, D.; Kireeva, N.; Klimchuk, O.; Marcou, G.; Tsivadze, A.; Solov’ev, V. Computer-Aided Design of New Metal Binders. Radiochim. Acta 2008, 96 (8), 505511. (40) Tetko, I. V.; Solov'ev, V. P.; Antonov, A. V.; Yao, X. J.; Fan, B. T.; Hoonakker, F.; Fourches, D.; Lachiche, N.; Varnek, A. Benchmarking of Linear and Non-Linear Approaches for Quantitative Structure-Property Relationship Studies of Metal Complexation with Organic Ligands. J. Chem. Inf. Model. 2006, 46 (2), 808-819. 10 ACS Paragon Plus Environment

Page 11 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(41) Svetlitski, R.; Lomaka, A.; Karelson, M. QSPR Modelling of Lanthanide-Organic Complex Stability Constants. Separat. Sci. Technol. 2006, 41 (1), 197-216. (42) Qi, Y.-H.; Zhang, Q.-Y.; Xu, L. Correlation Analysis of the Structures and Stability Constants of Gadolinium(III) Complexes. J. Chem. Inf. Comput. Sci. 2002, 42 (6), 1471-1475. (43) Raos, N.; Miličević, A. Estimation of Stability Constants of Coordination Compounds using Models Based on Topological Indices. Arch. Ind. Hyg. Toxicol. 2009, 60 (1), 123128. (44) Grgas, B.; Nikolić, S.; Paulić, N.; Raos, N. Estimation of Stability Constants of Copper(II) Chelates with N-alkylated Amino Acids using Topological Indices. Croat. Chem. Acta 1999, 72 (4), 885-895. (45) Varnek A.; Kireeva N.; Tetko I. V.; Baskin I. I.; Solov’ev V. P. Exhaustive QSPR Studies of a Large Diverse Set of Ionic Liquids: How Accurately Can We Predict Melting Points? J. Chem. Inf. Model. 2007, 47 (3), 1111-1122. (46) Solov’ev, V.; Oprisiu, I.; Marcou, G.; Varnek, A. Quantitative Structure_Property Relationship (QSPR) Modeling of Normal Boiling Point Temperature and Composition of Binary Azeotropes. Ind. Eng. Chem. Res. 2011, 50 (24), 14162-14167. (47) Varnek A.; Fourches D.; Hoonakker F.; Solov’ev V. P. Substructural Fragments: An Universal Language to Encode Reactions, Molecular and Supramolecular Structures. J. Comput. Aid. Mol. Des. 2005, 19 (9-10), 693-703. (48) Varnek A.; Fourches D.; Horvath D.; Klimchuk O.; Gaudin С.; Vayer P.; Solov’ev V.; Hoonakker F.; Tetko I. V.; Marcou G. ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors. Curr. Comput.-Aided Drug Des. 2008, 4 (3), 191-198. (49) Swamy, M. N. S.; Thulasiraman, K. Graphs, Networks, and Algorithms; John Wiley & Sons: New York, 1981. (50) Solov'ev, V. P.; Varnek, A. A. ISIDA (In Silico Design and Data Analysis) QSPR program; version 5.76; http://infochim.u-strasbg.fr/spip.php?rubrique53; Strasbourg; Moscow, 2012. (51) Varnek A.; Solov'ev V. P. "In Silico" Design of Potential Anti-HIV Actives Using Fragment Descriptors. Comb. Chem. High Throughput Screening 2005, 8 (5), 403-416. (52) Muller, P. H.; Neumann, P.; Storm, R. Tafeln der mathematischen Statistik; VEB Fachbuchverlag: Leipzip, 1979. (53) Amico, P.; Daniele, P. G.; Ostacoli, G.; Arena, G.; Rizzarelli, E.; Sammartano, S. Mixed metal complexes in solution. Part 4. Formation and stability of heterobinuclear complexes of cadmium(II)-citrate with some bivalent metal ions in aqueous solution Transition Met. Chem. 1985, 10 (1), 11-14. (54) Grzybowski, A. K.; Tate, S. S.; Datta, S. P. Magnesium and manganese complexes of citric and isocitric acids. J. Chem. Soc. A 1970, 241-245. (55) Clark, N. H.; Martell, A. E. Ferrous chelates of EDTA, hydroxyethyenediaminetriacetic acid (HEDTA) and N,N'-bis(2-hydroxy-5sulfobenzyl)ethylenediamine-N,N'-diacetic acid (SHBED). Inorg. Chem. 1988, 27 (7), 1297-1298. (56) Brunetti, A. P.; Nancollas, G. H.; Smith, P. N. Thermodynamics of ion association. XIX. Complexes of divalent metal ions with monoprotonated ethylenediaminetetraacetate. J. Am. Chem. Soc. 1969, 91 (17), 4680-4683. (57) Gualtieri, R. J.; McBryde, W. A. E.; Powell, H. K. J. Ethylenediamine-N,N′-diacetic acid complexes with divalent manganese, zinc, cadmium, and lead: a thermodynamic study. Can. J. Chem. 1979, 57 (1), 113-118. (58) Schrøder, K. H.; Johnsen, B. G. Copper(II), lead and zinc complexes of ethylenediamine-N,N'-diacetic acid. Talanta 1974, 21 (6), 671-673. 11 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 25

(59) Dyatlova, N. M.; Temkina, V. Y.; Belugin, Y. F.; Lavrova, O. Y.; Bertina, L. E.; Iozefovich, F. D.; Kalmykova, N. N.; Zhirov, E. P. Complexing of 2-hydroxyethyliminodiacetic acid with rare earth elements. Zh. Neorg. Khim. 1965, 10, 1131-1137. (60) Thompson, L. C.; Loraas, J. A. Complexes of the Rare Earths. VI. NHydroxyethyliminodiacetic Acid. Inorg. Chem. 1963, 2 (3), 594-597. (61) Kolhe, V.; Dwivedi, K. Thermodynamic equilibrium constants of ternary chelates of trivalent La-, Ce-, Pr-, Nd- and Y-egta-amino acid systems. J. Indian Chem. Soc. 1996, 73 (12), 678681. (62) Sekhon, B. S.; Chopra, S. L. A thermodynamic study of the complexation reaction for some amino acids with cerium(III) and yttrium(III). Thermochim. Acta 1973, 7 (2), 151-157. (63) Nourmand, M.; Meissami, N. Complex Formation Between Uranium(VI) Ion and some alpha-Aminoacids. Polyhedron 1982, 1 (6), 537-539. (64) Jianmin, Z.; Aiyou, Z.; Baojiao, C.; Wenming, W. pH Study of the stability of complex compounds formed by glycylglycine, leucine and 2,3,5,6- tetrahydro-6-phenylimidazo(2,1b)thiazole with uranyl ion(II). Chem. J. Chinese U. 1982, 3 (3), 281-286. (65) Toropov, A. A.; Toropova, A. P. QSPR modeling of stability of complexes of adenosine phosphate derivatives with metals absent from the complexes of the teaching access. Russ. J. Coord. Chem. 2001, 27 (8), 574-578. (66) DrahosQ, B.; Kotek, J.; Hermann, P.; LukesQ, I.; Tóth, E. Mn2+ Complexes with PyridineContaining 15-Membered Macrocycles: Thermodynamic, Kinetic, Crystallographic, and1H/17O Relaxation Studies. Inorg. Chem. 2010, 49 (7), 3224-3238.

(67) Caltagirone, C.; Bencini, A.; Demartin, F.; Devillanova, F. A.; Garau, A.; Isaia, F.; Lippolis, V.; Mariani, P.; Papke, U.; Tei, L.; Verani, G. Redox chemosensors: coordination chemistry towards CuII, ZnII, CdII, HgII, and PbII of 1-aza-4,10-dithia-7-oxacyclododecane ([12]aneNS2O) and its N-ferrocenylmethyl derivative. Dalton Trans. 2003, (5), 901-909. (68) Sigel, H.; DaCosta, C. P.; Song, B.; Carloni, P.; Gregan, F. Stability and Structure of Metal Ion Complexes Formed in Solution with Acetyl Phosphate and Acetonylphosphonate: Quantification of Isomeric Equilibria. J. Am. Chem. Soc. 1999, 121 (26), 6248-6257.

12 ACS Paragon Plus Environment

Page 13 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

TABLES CAPTIONS

Table 1. The logK values for the 1:1 (M:L) complexes of several studied ligands with Mn2+, Fe2+ , Y3+, La3+, Pb2+, and UO22+ in water at temperature 298 K and an ionic strength 0.1 M. The demonstration of a discrepancy in the experimental logK values. Table 2. The statistical parameters of the best MLR models and optimal descriptor types according to the five training sets of the 5-CV procedure. Table 3. Selected SMF and their mean contributions into logK according to the sets of individual QSPR models. Table 4. Experimental and predicted stability constant values logK of the 1:1 (M:L) complexation for the additional test set. Table 5. Predicted selective ligands to metal ions among Mn2+, Fe2+ , Y3+, La3+, Pb2+, and UO22+: maximal logKpred values as ligand selectivity are given in grey cells.

13 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 25

FIGURES CAPTIONS

Figure 1. Distribution of experimental values of the stability constant (logK) for the 1:1 (M:L) complexes of organic ligands with Mn2+, Fe2+ , Y3+, La3+, Pb2+, and UO22+ in water at temperature 298 K and an ionic strength 0.1 M. Figure 2. Predicted versus experimental values of the stability constant (logK) for the 1:1 (M:L) complexation of organic ligands with Mn2+, Fe2+ , Y3+, La3+, Pb2+, and UO22+ in water at temperature 298 K and an ionic strength 0.1 M. The predicted data represent a combination of the five external test sets of the 5CV procedure.

Figure 3. Percentage of compounds vs absolute prediction error logKexp - logKpred for the 1:1 (M:L) complexation of Mn2+, Fe2+ and Y3+, (left graphs), La3+, Pb2+, and UO22+ (right graphs) with organic ligands in water at 298 K and ionic strength 0.1 M.; the series of the lines (a) corresponds to CMs; the series of the lines (b) corresponds to “no model”: arithmetic mean of experimental constant values of all ligands for given cation is as the predicted value for any ligand.

14 ACS Paragon Plus Environment

Page 15 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

TABLES

Table 1. The logK values for the 1:1 (M:L) complexes of several studied ligands with Mn2+, Fe2+ , Y3+, La3+, Pb2+, and UO22+ in water at temperature 298 K and an ionic strength 0.1 M. The demonstration of a discrepancy in the experimental logK values. no.

ligand

1

O

OH

O

metal ion

logK

reference

Mn2+

3.81

53

2.16

54

12.58

55

11.63

56

10.66

57

11.71

58

7.33

59

8.00

60

6.09

61

4.26

62

7.13

63

5.60

64

OH O

HO HO

2

Fe2+

O OH HO

N

N HO

O

OH

O

3 O

Pb2+

N H

N H

O HO

OH

4

OH HO

La3+

O

N

O OH

5

Y3+

O OH NH2

UO22+

15 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 25

Table 2. The statistical parameters of the best MLR models and optimal descriptor types according to the five training sets of the 5-CV procedure a. no. SMF type b

n

m

s

Q2

Mn2+ 1

IAB(2 - 11)t

208 - 209

48 - 76

0.57 – 0.66

0.925 – 0.957

2

IAB(2 - 10)t

208 - 209

59 - 67

0.53 – 0.62

0.930 – 0.939

Fe2+ 3

IAB(3 – 10)t

69 - 70

20 - 32

0.32 – 0.63

0.944 – 0.986

4

IAB(2 – 8)t

69 - 70

19 - 32

0.29 – 0.69

0.921 – 0.988

Y3+ 5

IAB(2 – 10)t

84

25 – 34

0.41 – 0.83

0.853 – 0.950

6

IAB(2 – 11)t

84

23 – 30

0.52 – 0.83

0.853 – 0.943

La3+ 7

IAB(2 – 9)t

148 - 149

36 - 52

0.44 - 0.57

0.937 -0.970

8

IAB(2 – 6)t

148 - 149

30 - 42

0.53 - 0.67

0.932 - 0.957

Pb2+ 9

IAB(2 – 10)

180 - 181

49 - 65

0.49 - 0.59

0.956 - 0.974

10

IAB(2 – 14)t

180 - 181

36 -69

0.45 - 0.96

0.924 - 0.973

0.39 - 0.60

0.930 - 0.977

UO22+ 11 a

IAB(2 – 10)t

56

17 - 20

Statistical parameters of the MLR models: the number of data point in training set (n), the number

of SMF variables (m), standard deviation (s), squared LOO cross-validation correlation coefficient (Q2); b SMF type: see the notation in Methods’ section: descriptors.

16 ACS Paragon Plus Environment

Page 17 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 3. Selected SMF and their mean contributions into logK according to the sets of individual QSPR models a. no.

SMF b

2+ Mn + L, 261 Ligands

Nmodel

Nmol

ligand substructure c N(O) OH OH

1

O=P-C-P=O

3.74 (0.54)

54

5

P O

O P

OH

OH

2

N.C.C.N.C.C.N-C-C=O

4.56 (0.07)

9

7

N

N

N

N

O HO

Fe2+ + L, 87 Ligands N

N

4

Car-Nar-Car-Car-Nar-Car

3.13 (0.28)

37

6 N

5

N.[4]-O

2.91 (0.48)

34

O

9 OH

N

6

N-[4]=O

2.70 (0.47)

58

O

47 OH

Y3+ + L, 105 Ligands N

7

N-C-C=O

2.04 (0.16)

36

O

31 OH HO

OH

8

C-N-C-C-N-C

1.48 (0.10)

25

5

O

O N

N

La3+ + L, 186 Ligands

17 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 25

O

9

O.C:C.C=O

3.5 (1.2)

84

5

O

O

O N O

10

N-Car-Car-O

3.39 (0.77)

20

12

21

3

OH

Pb2+ + L, 226 Ligands 11

Car-Car-Car-C-C=O

5.4 (2.3)

O O (N)

S

12

O=[5]=S

3.93 (0.38)

22

3

N O

N

UO22+ + L, 66 Ligands N

13

N-C-C=O

4.63 (0.13)

35

O

14 OH O

14

Car-Car-O

3.05 (0.50)

51

18

OH OH

a

is fragment contribution (arithmetic mean) and its standard deviation (in parentheses)

according to the Nmodel models and the Nmol ligands. b

Substructural molecular fragments (SMF): N.[4]-O, N-[4]=O and O=[5]=S are terminal groups as

shortest path sequences defined by length (the number of atoms in square brackets) and explicit indication of beginning atom and bond and ending bond and atom, and the rest of SMF are shortest topological paths with explicit presentation of atoms and bonds; bonds: ‘-’ and ‘=’ are single and double in chain, ‘.’ and and ‘:’ are single and double in non-aromatic cycle; Car-Nar is aromatic bond. c

Significant substructure of a ligand including corresponding SMF which bonds are accented by

thickness.

18 ACS Paragon Plus Environment

Page 19 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 4. Experimental and predicted stability constant values logK of the 1:1 (M:L) complexation for the additional test set a. no.

ligand

cation

exp.

logK pred. b

Nm c

OH

1

N H

N

H N

2

N

O

Mn2+

11.54 d

12.01 (0.32)

142

N

OH OH P O

Mn2+

14.06 d

11.76 (0.27)

36

Mn2+

7.43 d

9.72 (0.30)

137

Mn2+

10.61 d

9.1 (2.0)

71

Mn2+

7.18 e

6.02 (0.34)

44

Mn2+

10.89 e

10.97 (0.17)

144

Mn2+

1.95 f

3.34 (0.69)

87

Mn2+

2.36 f

3.44 (0.38)

98

Mn2+

12.35 g

11.6 (2.8)

40

Pb2+

4.27 h

3.91 (0.98)

206

Pb2+

6.93 h

6.01 (0.28)

158

Pb2+

9.2 h

8.44 (0.79)

42

Pb2+

6.0 h

5.40 (0.59)

48

N H

N

H N HO

3

OH

N

N O

O O

HO

4

P

HO

N

N

OH

P

O

OH

O O

N H

5

6 7 8 9

O

N H N

O

N H

HN

H N

HN

N

O

O

H3C

O O

P O P

H 3C

HO HO P

OH OH OH OH

N H N

O

O

N

P

OH OH

H N

S

10

S

11

O

12

H2N

S H N N H

S

H N

O N

N

NH2

S

S

O

13

N

O S

NH2

19 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

N H

14

H3C

N

N H N

15

Pb2+

8.65 i

7.9 (1.2)

136

Pb2+

4.20 j

3.97 (0.91)

231

Pb2+

14.96 g

16.03 (0.94)

37

N

S NH

Page 20 of 25

O S

16

HO HO P O

N H N

H N

N

O P

OH OH

R0 2 RMSE a

0.913 1.13

Experimental data are given at 298 K and ionic strength 0.1 M; b Predicted stability constant values

logKpred are computed using the MLR CMs, standard deviations are given in parentheses; c For the given ligand, it is the number of individual models in CM using AD; d Reference 8; e Reference 66; f Reference 68; g Reference 9; h Reference 10; i Reference 11; j Reference 67.

20 ACS Paragon Plus Environment

Page 21 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 5. Predicted selective ligands to metal ions among Mn2+, Fe2+ , Y3+, La3+, Pb2+, and UO22+: maximal logKpred values as ligand selectivity are given in grey cells a. no.

ligand

logKpred Mn2+

Fe2+

Y3+

4.8 (0.4; 85)

0.5 (2.1; 21) 2.5 (0.6; 92)

La3+

Pb2+

UO22+

O

1

O2N

1.8 (0.2; 151) 2.3 (0.6; 87)

2.6 (1.0; 139)

OH

OH

2

2.8 (0.5; 153) 6.1 (0.3; 108) 2.0 (1.0; 156) 1.6 (0.6; 159) 2.6 (0.4; 66)

2.5 (0.4; 72)

2.5 (0.8; 175) 1.1 (0.6; 27) 6.1 (0.8; 149) 1.6 (0.7; 151) 3.8 (0.2; 86)

2.6 (1.6; 84)

H 2N

HO

OH

N

3

CH3 OH

4

OH

HN

NH O

OH

5

8.7 (1.0; 74)

4.0 (1.6; 103) 5.6 (7.1; 135)

O OH

O H2N

2.5 (0.8; 107) 0.7 (0.2; 24) 5.7 (0.7; 52)

O

6.7 (0.2; 137) 6.8 (0.3; 78) 7.6 (0.8; 193) 6.9 (0.5; 151) 10.5 (0.7; 149) 8.0 (0.8; 38)

N

HO

6

O

a

3.8 (0.5; 155) 3.3 (1.0; 86) 4.3 (0.1; 85)

3.4 (0.3; 149) 2.0 (0.6; 138) 8.9 (1.1; 163)

Predicted stability constant values logK pred are computed using the consensus models; standard

deviation and the number of applied individual models (italic) are given in parentheses.

21 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 25

FIGURES

22 ACS Paragon Plus Environment

Page 23 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Number of ligands

Number of ligands 80

15

Mn2+ 60

261 ligands

Fe2+ 87 ligands

10

40 5

20

0

0 0

4

8

12

16

0

20

4

8

12

16

logKexp

20

logKexp Number of ligands

Number of ligands

40

25 3+

La3+

Y 20

30

105 ligands

186 ligands

15 20

10 10

5

0

0 0

4

8

12

0

16

4

8

12

logKexp

logKexp Number of ligands

16

Number of ligands Pb2+

20

UO22+

9

226 ligands

66 ligands

15

6 10

3

5 0

0 0

4

8

12

16

20

0

logKexp

4

8

12

logKexp

Figure 1. Distribution of experimental values of the stability constant (logK) for the 1:1 (M:L) complexes of organic ligands with Mn2+, Fe2+ , Y3+, La3+, Pb2+, and UO22+ in water at temperature 298 K and an ionic strength 0.1 M.

23 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

logKpred

logKpred 20

logKpred = 0.30 + 0.92logKexp

20

2

n = 261, R = 0.894, s = 1.09

15

Page 24 of 25

15

Mn2+

10

R02 = 0.892 RMSE = 1.12 MAE = 0.83

5 0 0

5

10

15

logKpred = -0.25 + 1.03logKexp n = 81, R2 = 0.928, s = 1.20 Fe2+

10

R02 = 0.917 RMSE = 1.19 MAE = 0.86

5 0

logKexp

0

20

logKpred

5

10

15

20

logKexp logKpred

15

logKpred = 1.05 + 0.80logKexp n = 104, R2 = 0.824, 12 s = 1.20

15 12

Y3+

9 6

3

6

9

12

La3+

9 6

R02 = 0.823 RMSE = 1.34 MAE = 1.00

3

logKpred = 0.25 + 0.92logKexp n = 184, R2 = 0.907, s = 0.92

R02 = 0.904 RMSE = 0.96 MAE = 0.75

3 3

15

6

9

12

logKexp

15

logKexp

logKpred

logKpred 15

logKpred = 0.51 + 0.91logKexp

logKpred = 0.17 + 0.97logKexp

20

2

n = 223, R = 0.928, s = 1.08

15

12

n = 65, R2 = 0.917, s = 1.02

9

Pb

10

UO22+

2+

6

R02 = 0.926 RMSE = 1.09 MAE = 0.86

5 0 0

5

10

15

R02 = 0.916 RMSE = 1.06 MAE = 0.80

3

20

logKexp

3

6

9

12

15

logKexp

Figure 2. Predicted versus experimental values of the stability constant (logK) for the 1:1 (M:L) complexation of organic ligands with Mn2+, Fe2+ , Y3+, La3+, Pb2+, and UO22+ in water at temperature 298 K and an ionic strength 0.1 M. The predicted data represent a combination of the five external test sets of the 5-CV procedure.

24 ACS Paragon Plus Environment

Page 25 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Percentage of compounds

Percentage of compounds

100

100 2+

Mn Fe2+ Y3+

80

80

a

60

60

40

40

La3+ Pb2+ UO22+

a

b

20

20

0 0.0

0.5

1.0

1.5

2.0

0 0.0

Prediction error

b 0.5

1.0

1.5

2.0

Prediction error

Figure 3. Percentage of compounds vs absolute prediction error logKexp - logKpred for the 1:1 (M:L) complexation of Mn2+, Fe2+ and Y3+, (left graphs), La3+, Pb2+, and UO22+ (right graphs) with organic ligands in water at 298 K and ionic strength 0.1 M.; the series of the lines (a) corresponds to CMs; the series of the lines (b) corresponds to “no model”: arithmetic mean of experimental constant values of all ligands for given cation is as the predicted value for any ligand.

25 ACS Paragon Plus Environment