Expansion of the Ligand Knowledge Base for Monodentate P-Donor

Nov 12, 2010 - Tiffany Piou , Fedor Romanov-Michailidis , Maria Romanova-Michaelides , Kelvin E. Jackson , Natthawat Semakul , Trevor D. Taggart , Bri...
0 downloads 10 Views 1MB Size
Organometallics 2010, 29, 6245–6258 DOI: 10.1021/om100648v

6245

Expansion of the Ligand Knowledge Base for Monodentate P-Donor Ligands (LKB-P)† Jes us Jover, Natalie Fey,* Jeremy N. Harvey, Guy C. Lloyd-Jones, A. Guy Orpen, and Gareth J. J. Owen-Smith School of Chemistry, University of Bristol, Cantock’s Close, Bristol BS8 1TS, U.K.

Paul Murray, David R. J. Hose, and Robert Osborne AstraZeneca, Pharmaceutical Development, Avlon Works, Severn Road, Hallen, Bristol BS10 7ZE, U.K.

Mark Purdie AstraZeneca, Pharmaceutical Development, Charnwood, Bakewell Road, Loughborough, Leicestershire LE11 5RH, U.K. Received July 5, 2010

We have expanded the ligand knowledge base for monodentate P-donor ligands (LKB-P, Chem. Eur. J. 2006, 12, 291-302) by 287 ligands and added descriptors derived from computational results on a gold complex [AuClL]. This expansion to 348 ligands captures known ligand space for this class of monodentate two-electron donor ligands well, and we have used principal component analysis (PCA) of the descriptors to derive an improved map of ligand space. Potential applications of this map, including the visualization of ligand similarities/differences and trends in experimental data, as well as the design of ligand test sets for high-throughput screening and the identification of ligands for reaction optimization, are discussed. Descriptors of ligand properties can also be used in regression models for the interpretation and prediction of available response data, and here we explore such models for both experimental and calculated data, highlighting the advantages of large training sets that sample ligand space well.

Introduction The ligands coordinated to a transition metal center can be used to modulate the structural and electronic features of organometallic and coordination complexes, and so provide a convenient way of fine-tuning properties ranging from spectroscopic features (e.g., luminescence, infrared stretching frequencies) to stability and reactivity. The latter is of particular relevance to the development of homogeneous organometallic catalysts, where a successful proof-of-transformation is often followed by extensive optimization of both reaction conditions and catalyst properties to achieve a suitable compromise between catalyst activity and cost.6

Several recent studies have reported the use of robotic high-throughput screening (HTS) in this area,7 where an initial, broad ligand screen to identify active complexes is often followed by further ligand screening or manual experimentation to improve and refine reactivity, selectivity, and stability. Such ligand-driven optimizations can be guided by an appreciation of ligand properties, either simply in qualitative terms (small/large, electron donating/withdrawing) or on a quantitative scale if suitable data for a range of ligands are available.8-10 By relating experimentally observed

† Development of a Ligand Knowledge Base, Part 6. See refs 1-5 for Parts 1-5. *Corresponding author. E-mail: [email protected]. Tel: (þ44) 0117 331 8260. Fax: (þ44) 0117 925 1295. (1) Fey, N.; Harris, S. E.; Harvey, J. N.; Orpen, A. G. J. Chem. Inf. Model. 2006, 46, 912–929. (2) Fey, N.; Tsipis, A.; Harris, S. E.; Harvey, J. N.; Orpen, A. G.; Mansson, R. A. Chem.;Eur. J. 2006, 12, 291–302. (3) Mansson, R. A.; Welsh, A. H.; Fey, N.; Orpen, A. G. J. Chem. Inf. Model. 2006, 46, 2591–2600. (4) Fey, N.; Harvey, J. N.; Lloyd-Jones, G. C.; Murray, P.; Orpen, A. G.; Osborne, R.; Purdie, M. Organometallics 2008, 27, 1372–1383. (5) Fey, N.; Haddow, M. F.; Harvey, J. N.; McMullin, C. L.; Orpen, A. G. Dalton Trans. 2009, 8183–8196. (6) Federsel, H.-J. Acc. Chem. Res. 2009, 42, 671–680.

(7) Meeuwissen, J.; Kuil, M.; van der Burg, A. M.; Sandee, A. J.; Reek, J. N. H. Chem.;Eur. J. 2009, 15, 10272–10279. Martha, C. T.; Heemskerk, A.; Hoogendoorn, J. C.; Elders, N.; Niessen, W. M. A.; Orru, R. V. A.; Irth, H. Chem.;Eur. J. 2009, 15, 7368–7375. Goudriaan, P. E.; van Leeuwen, P.; Birkholz, M. N.; Reek, J. N. H. Eur. J. Inorg. Chem. 2008, 2939–2958. Stambuli, J. P.; Hartwig, J. F. Curr. Opin. Chem. Biol. 2003, 7, 420–426. de Vries, J. G.; Lefort, L. Chem.;Eur. J. 2006, 12, 4722–4734. Jaekel, C.; Paciello, R. Chem. Rev. 2006, 106, 2912–2942. Boussie, T. R.; Diamond, G. M.; Goh, C.; Hall, K. A.; LaPointe, A. M.; Leclerc, M. K.; Murphy, V.; Shoemaker, J. A. W.; Turner, H.; Rosen, R. K.; Stevens, J. C.; Alfano, F.; Busico, V.; Cipullo, R.; Talarico, G. Angew. Chem., Int. Ed. 2006, 45, 3278–3283. (8) Tolman, C. A. Chem. Rev. 1977, 77, 313–348. Diez-Gonzalez, S.; Nolan, S. P. Coord. Chem. Rev. 2007, 251, 874–883. Wolf, S.; Plenio, H. J. Organomet. Chem. 2009, 694, 1487–1492. (9) Cooney, K. D.; Cundari, T. R.; Hoffman, N. W.; Pittard, K. A.; Temple, M. D.; Zhao, Y. J. Am. Chem. Soc. 2003, 125, 4318.

r 2010 American Chemical Society

Published on Web 11/12/2010

pubs.acs.org/Organometallics

6246

Organometallics, Vol. 29, No. 23, 2010

activity/reactivity/selectivity to measures of ligand properties, further optimization can target ligands expected to have similar or improved properties.11-13 Such a quantitative understanding of ligand effects on the properties of transition metal complexes can be based on experimentally determined parameters, e.g., from spectroscopy or structural analysis8,14 or from calculations of structures, properties, or indeed reactivity (see refs 15 and 16 for recent reviews); computational approaches are particularly attractive where novel and toxic systems might be of interest.17 Computational ligand characterizations also promise potential efficiency gains if systems could be reliably evaluated in silico,18 or at least synthetic efforts could be focused on particularly promising target complexes.9,12,13 The resultant parameters capturing ligand properties, also called ligand descriptors, can aid the interpretation of experimental results. Thus the analysis of correlations or the fitting of simple regression models can highlight the relationship between a given observable and a small number of descriptors.19 Ligand descriptors can also be used to select ligands for experimentation, either by choosing a selection of ligands to achieve a chemically varied test set9,20 or by highlighting subsets of ligands whose complexes have desirable properties, such as high catalytic activity, in common and thus focusing further study on “hot spots” within the ligand space.9,13,20-22 Where suitable experimental training data are available, i.e., those sampling a varied ligand set under the same reaction conditions and capturing a good range of response data, such descriptors can be used to derive regression models suitable for making predictions about other ligands;13 these models often require more extensive descriptor databases, and detailed chemical interpretation becomes more difficult. In general, the choice of descriptors is driven by the intended application, where fewer descriptors are useful for the interpretation of ligand effects and the (10) Cavallo, L.; Correa, A.; Costabile, C.; Jacobsen, H. J. Organomet. Chem. 2005, 690, 5407–5413. Burello, E.; Rothenberg, G. Int. J. Mol. Sci. 2006, 7, 375–404. Gusev, D. G. Organometallics 2009, 28, 763–770. Gusev, D. G. Organometallics 2009, 28, 6458–6461. Tonner, R.; Frenking, G. Organometallics 2009, 28, 3901–3905. (11) an der Heiden, M. R.; Plenio, H.; Immel, S.; Burello, E.; Rothenberg, G.; Hoefsloot, H. C. J. Chem.;Eur. J. 2008, 14, 2857– 2866. (12) Sparta, M.; Borve, K. J.; Jensen, V. R. J. Am. Chem. Soc. 2007, 129, 8487–8499. (13) Occhipinti, G.; Bjørsvik, H. R.; Jensen, V. R. J. Am. Chem. Soc. 2006, 128, 6952–6964. (14) Freixa, Z.; van Leeuwen, P. W. N. M. Dalton Trans. 2003, 1890– 1901. Birkholz, M. N.; Freixa, Z.; van Leeuwen, P. Chem. Soc. Rev. 2009, 38, 1099–1118. (15) Fey, N.; Orpen, A. G.; Harvey, J. N. Coord. Chem. Rev. 2009, 253, 704–722. (16) Fey, N. Dalton Trans. 2010, 39, 296–310. (17) Perrin, L.; Clot, E.; Eisenstein, O.; Loch, J.; Crabtree, R. H. Inorg. Chem. 2001, 40, 5806. (18) Houk, K. N.; Cheong, P. H.-Y. Nature 2008, 455, 309–313. (19) Fernandez, A. L.; Wilson, M. R.; Prock, A.; Giering, W. P. Organometallics 2001, 20, 3429–3435. Wilson, M. R.; Prock, A.; Giering, W. P.; Fernandez, A. L.; Haar, C. M.; Nolan, S. P.; Foxman, B. M. Organometallics 2002, 21, 2758–2763. Poe, A. J. Dalton Trans. 2009, 1999–2003. Babij, C.; Po€e, A. J. J. Phys. Org. Chem. 2004, 17, 162. Bunten, K. A.; Po€e, A. J. New J. Chem. 2006, 30, 1638–1649. (20) Fagan, P. J.; Hauptman, E.; Shapiro, R.; Casalnuovo, A. J. Am. Chem. Soc. 2000, 122, 5043–5051. (21) Burello, E.; Rothenberg, G. Adv. Synth. Catal. 2003, 345, 1334– 1340. Hageman, J. A.; Westerhuis, J. A.; Fruehauf, H.-W.; Rothenberg, G. Adv. Synth. Catal. 2006, 348, 361–369. Drummond, M. L.; Sumpter, B. G. Inorg. Chem. 2007, 46, 8613–8624. (22) Mathew, J.; Tinto, T.; Suresh, C. H. Inorg. Chem. 2007, 46, 10800–10809.

Jover et al. Scheme 1. Complexes Used in LKB-P

generation of simple maps of ligand space, and more extensive descriptor databases are perhaps more likely to provide a good and transferable fit to a broad range of response variables. A balance often needs to be struck between retaining the ability to interpret analyses in a chemical context, usually easier with fewer descriptors, and capturing different chemical environments to achieve transferability, requiring more extensive databases. We have recently reported the development of databases of calculated ligand descriptors that capture ligand properties in a variety of coordination chemistry environments, useful for transferable maps and models, but retain a strong link with “classical” structural chemistry, thus facilitating the interpretation of ligand effects and models in this context. Such ligand knowledge bases (LKBs) have been developed for monodentate P-2,3 and C-donor5 ligands, as well as bidentate ligands with P,P- and P,N-donor atoms.4 We have also demonstrated the application of such knowledge bases for the generation of maps of ligand space, as well as in statistical models useful for both the interpretation and prediction of ligand effects on a broad range of observed property data. In addition, we have started to explore the combined analysis of different ligand classes in an LKB context, highlighting the challenge of treating different ligand types in a single knowledge base, where differences in bonding can dominate the statistical analysis and thus give rise to nonrobust behavior.5 Here we report the expansion of our prototype LKB for monodentate phosphorus(III) donor ligands (LKB-P)2 to include a further 287 ligands, giving a total of 348 ligands and considerably improving the coverage and sampling of ligand space. Many of these ligands have not previously been described by steric and electronic parameters, and where subsets of these ligands are considered, parameters have often been restricted to a single coordination environment. In addition, we have included a new complex for descriptor calculations, [AuClL], which we have described previously for C-donor ligands,5 with a view to facilitating a future combination of these two knowledge bases to glean a more complete view of ligand space relevant to homogeneous catalysis. This expanded LKB-P database has been summarized using principal component analysis to derive a much more comprehensive and robust map of ligand space than previously reported,2 showing multiple ligands for the first time and allowing us to attempt interpretation of individual principal components in terms of familiar steric and electronic effects. We also demonstrate here how the projection of

Article

Organometallics, Vol. 29, No. 23, 2010

6247

Table 1. Descriptors Used in LKB-P descriptor

derivation (unit)

Free Ligand energy of highest occupied molecular orbital (hartree) energy of lowest unoccupied molecular orbital (hartree) interaction energy between singlet L in ground-state conformation and ring of 8 helium atoms; Ester = Etot(system) - [Etot(He8) þ Etot(L)] (kcal mol-1)

EHOMO ELUMO He8_steric Protonated Ligand ([HL]þ)

proton affinity (kcal mol-1)

PA Borane Adduct (H3B 3 L)

NBO charge on BH3 fragment bond energy for dissociation of P-ligand from BH3 fragment (kcal mol-1)a P-B distance (A˚) change in average P-A bond length compared to free ligand (A˚) change in average A-P-A angle compared to free ligand (deg)

Q(B fragm.) BE(B) P-B ΔP-A(B) ΔA-P-A(B) Gold Complexes ([AuClL]) Q(Au fragm) BE(Au) Au-Cl P-Au ΔP-A(Au) ΔA-P-A(Au)

NBO charge on AuCl fragment bond energy for dissociation of L from [AuCl] fragment (kcal mol-1)a r(Au-Cl) (A˚) r(Au-P) (A˚) change in average P-A bond length in complex compared to free ligand (A˚) change in average A-P-A angle compared to free ligand (deg)

Palladium Complexes ([PdCl3L]-) NBO charge on [PdCl3]- fragment bond energy for dissociation of L from [PdCl3]- fragment (kcal mol-1)a r(Pd-Cl), trans to ligand (A˚) r(Pd-C) (A˚) change in average P-A bond length compared to free ligand (A˚) change in average A-P-A angle compared to free ligand (deg)

Q(Pd fragm) BE (Pd) Pd-Cl trans P-Pd ΔP-A(Pd) ΔA-P-A(Pd) Platinum Complexes ([Pt(PH3)3L])

NBO charge on [(PH3)3Pt] fragment bond energy for dissociation of P-ligand from [Pt (PH3)3] fragment (kcal mol-1)a P-Pt distance (A˚) change in average P-A bond length compared to free ligand (A˚) change in average A-P-A angle compared to free ligand (deg) average (H3P)Pt(PH3) angle (deg)

Q(Pt fragm) BE(Pt) P-Pt ΔP-A(Pt) ΔA-P-A(Pt) — (H3P)Pt(PH3) Cumulative S40 calc a

(

P

— ZPA -

P

— APA), where Z = BH3, [PdCl3]-, [Pt(PH3)3], [AuCl] (deg)

BE = [Etot(fragment) þ Etot(L)] - Etot(complex)

experimental screening data onto such a map of ligand space can be used to identify trends in ligand properties and reactivity, as well as to target specific areas of ligand space for further experimentation. In addition, the statistically significant size of this database has allowed us to explore a range of statistical regression approaches to derive models for the interpretation and prediction of ligand effects, and we discuss potential applications of such models for the discovery and design of efficient ligands in organometallic catalysis. This brings us closer to being able to rationally design and evaluate novel catalysts in silico, which should ultimately allow for targeted synthesis and efficient reaction optimization.

Design Descriptors. The computational approach used to optimize all ligands and their complexes is summarized in the Computational Details below, but it is perhaps worth commenting on our choice of density functional (BP86) in this context. As outlined in our initial work,2 the analysis of large data sets makes systematic errors as experienced by most functionals for the reproduction of experimental binding and

activation energies less of a concern, as long as the correct trends for different ligands are captured. In this sense, any functional should be suitable for the calculation of knowledge base descriptors, provided no significant errors/failures occur for individual structures. Recently, mechanistic studies in organometallic chemistry have achieved improved agreement with experimental data when (solvated) free energies calculated with dispersion-corrected density functionals were considered.23 This suggests that the BP86 functional used here, which does not account fully for dispersion, is unlikely to compute the various descriptors with quantitative accuracy. However, it should still capture trends and thereby provide reliable data for LKB purposes. Most of the descriptors used in this expanded knowledge base have been described and discussed previously.2 However, we have reviewed the LP s-character descriptor and found it to be reasonably highly correlated with other descriptors of (23) Minenkov, Y.; Occhipinti, G.; Jensen, V. R. J. Phys. Chem. A 2009, 113, 11833–11844. Sieffert, N.; B€uhl, M. Inorg. Chem. 2009, 48, 4622–4624. McMullin, C. L.; Jover, J.; Harvey, J. N.; Fey, N. Dalton Trans. 2010, Advance Article, DOI: 10.1039/C0DT00778A.

6248

Organometallics, Vol. 29, No. 23, 2010 Scheme 2

σ-donor character for the expanded ligand set considered here (highest R = -0.82 with P-B). Data extraction is more complicated than for other descriptors, usually requiring manual inspection of the NBO section of calculations, and we have therefore decided to leave this parameter out of this and future versions of LKB-P. In the development of a ligand knowledge base for Cdonor ligands L (LKB-C) we explored the use of a linear gold(I) complex, [AuClL], to capture metal-ligand bonding free of steric hindrance. 5 With a view to combining LKBs in the future, we have thus extended the range of complexes calculated for monodentate P-donor ligands2 to include parameters derived from this fragment; the S40 descriptor24 has also been modified to include this data. Scheme 1 shows the fragments used for descriptor calculations. This expanded version of LKB-P uses 28 descriptors derived from seven calculations per ligand and includes: • frontier molecular orbital energies, • ligand proton affinities, • adduct binding energies for representative complexes with different coordination environments,43 • structural parameters describing geometry changes of ligands upon complexation, the metal-ligand bond lengths, as well as the geometry of the metal fragments, • metal fragment charges, • measures of steric bulk, the He8_steric descriptor and the S40 descriptor derived from all complexes considered here. (24) Dunne, B. J.; Morris, R. B.; Orpen, A. G. J. Chem. Soc., Dalton Trans. 1991, 653–661.

Jover et al. Scheme 3. P-Donor in Alkyl Ring

The full range of descriptors is summarized in Table 1; the chemical context of both new gold descriptors and other descriptors described previously will be discussed below for this expanded ligand set. Ligands. Our main aim in this work was to substantially extend and improve the sampling of phosphorus(III) donor ligand space.2 To this end, we have included further symmetrically substituted ligands of the general form PA3, including alkyl- and arylphosphines (ligands 1-22, Table S1, and 63-86, Table S4, respectively), phosphine halides (189-192, Table S8), alkyl- and arylphosphites (218-228, 229-239, Tables S10a, b), and aminophosphines (291-299, Table S13). In addition, we have investigated the effect of systematic variation of substituents on ligand properties by including a number of ligands of the general types PAB2 and PABC, where A, B, C = R, Ar, Hal, OR, OAr, NR2, NAr2 (ligands 23-55, 87-96, 97-123, 193-217, 240-262, 300-326, 339-348). Such mixed substitution patterns can also arise if the P-donor atom is part of a ring system (124-158, Table S6, 263-290, Table S12, 327-338, Table S14, Schemes 3-5). Synthetically useful ligand subsets, such as Buchwald’s family of biaryl ligands (159-188, Table S7, Scheme 6), and a range of more unusual ligands (e.g., 56-62, 128-134, 146158, Schemes 2 and 3) have also been included. Tables 2 and 3 summarize alkyl and aryl substituents considered, and Schemes 2-6 capture additional ligand types included in this work. Tables S1-S15 in the Supporting Information give detailed ligand lists and show the ligand numbering used throughout this work. Ligands described previously2 have been included in this set for completeness, and the numbering used in this earlier paper is also indicated in the ligand tables in the Supporting Information.

Article

Organometallics, Vol. 29, No. 23, 2010

6249

Scheme 4. P-Donor in Ring (OR)

Scheme 5. P-Donor in Ring (NR2)

While sampling of ligand space is by no means even and indeed remains sparse in some areas (see map below, Figure 3), the addition of 287 ligands has substantially improved our coverage of common P-donor ligand motifs, giving rise to improved statistical robustness of maps and models as discussed below.

Results and Analysis Descriptors. Exploring the relationship between descriptors can be useful to establish their chemical meaning, and here bivariate correlations and scatter plots have been used to probe the new descriptors derived from the gold complex, [AuClL]. As shown in Table 4, the Au-P bond dissociation energy, BE(Au), is quite highly correlated with proton affinity (PA), energy of the HOMO (EHOMO), and the B-P

bond dissociation energy (BE(B)), suggesting that this parameter captures σ-bonding properties. High correlation can also be observed with the trans Pd-Cl bond length (Pd-Cl trans), and a high inverse correlation occurs with the NBO charge of the Pt(PH3)3 fragment (Q(Pt fragm)), in line with capturing ligand net donation, likely dominated by σ-effects in these descriptors. The Au-P bond length correlates highly with all P-“M” bond lengths (“M” = B, Pd, Pt), as well as with the proton affinity. Correlations of AuClL parameters with ELUMO, a measure related to the potentially π-accepting character of the ligand, remain only moderate (highest R = 0.58 with BE(Au)). However, reasonably high correlations of AuClL parameters can be observed with the steric descriptors S40 and He8_steric (see Supporting Information, Table S16), suggesting that even this complex is perhaps not completely free of a steric component, although there is likely also a relationship between ligand size and electron donor properties, which may have been captured here. As might be expected, the Au-Cl distance correlates highly with descriptors linked to net (σ) donation, i.e., EHOMO, PA, P-Pd, Pd-Cl trans, and P-Pt. The structural change on coordination of the ligand to the gold center (ΔA-P-A(Au)) is most highly correlated with similar data for the borane fragment (ΔA-P-A(B)), but correlations with the other two metal fragments are also reasonably high. Similar relationships can be observed for fragment charges and bond length changes, although in the latter case the simple halide-substituted ligands PHalmXn, such as 190-194, 200205, 212, 213, show a pronounced structural response on coordination to the AuCl fragment, reducing the correlation coefficient with similar data derived from other fragments (R = 0.16 for correlation with ΔP-A (Pt), Figure 1). Since

6250

Organometallics, Vol. 29, No. 23, 2010 Scheme 6. Biaryl-Substituted Ligands

most correlation coefficients are smaller than 0.7, with exceptions discussed above, this analysis also suggests that the new gold descriptors contribute additional information to the database rather than simply capturing the same effects as other descriptors, making this fragment a useful source of new descriptors. Although the other descriptors shown in Table 1 have been discussed in a previous publication,2 the He8_steric descriptor merits further consideration. Designed to mimic the steric interactions in an octahedral complex between a given ligand and other, cis-coordinated groups, this descriptor is sensitive to conformational change. For most ligands, optimization from the free ligand conformation, freezing only the P and He positions, and thus allowing relaxation in response to the bulk of the He8 ring did not result in significant conformational change. As discussed previously,2,15 this approach can overestimate the size of ligands able to respond to their coordination environment by adopting a different conformation from that favored by the uncoordinated ligand, which is used to generate the calculation input. In addition, for a subset of bulky ligands, including some with a biaryl substituent, optimizations from the free ligand geometry failed and a different starting geometry, closer to that adopted on transition metal coordination, had to be used to obtain the He8_steric descriptor. Comparison for related systems where He8_steric can be calculated for several conformers suggests that such conformational problems might increase the adduct energies by around 2-3 kcal mol-1, because conformers minimizing interactions with the He8 ring often increase interactions between substituents on the

Jover et al.

P-donor, so the net energy change remains relatively small. This would also complicate changing the He8_steric parameter to use a conformer observed on transition metal complexation of a given ligand, as conformational preferences can be sensitive to both electronic and steric properties of different coordination environments, but large-scale conformational sampling is difficult to achieve reliably. Fast molecular mechanics approaches do not necessarily capture electronic effects well, and experimental (usually crystallographic) data are simply not available for all ligands considered. In addition, even quite subtle changes in the coordination environment can affect conformational preferences,25 and the large set of ligands considered here prevents a detailed evaluation of all possible conformers at the DFT level. (Preoptimization with semiempirical approaches might be considered a suitable compromise, but this is considerably more expensive than molecular mechanics and, depending on the parametrization chosen, the conformational energy surface can still be significantly different from that calculated with DFT.) We are currently exploring the introduction of a “dynamic” steric measure, designed to capture the range of conformers accessible to LKBs and will report on our results in due course. Conformer preferences and He8_steric energies might be affected by the incomplete capture of dispersion effects for the functional used. However, conformer energy differences are unlikely to be large in such cases, and dispersive contributions will correlate with increases in ligand size, so again we expect the correct trends to have been captured by the present version of LKB-P, even where individual energies would be expected to change when a dispersion-corrected functional is used. It is also instructive to explore the relationship between He8_steric and the S40 parameter extracted from the complexes considered. S40 is widely used as a proxy for steric effects,9,22 although electronic effects also contribute, and this descriptor is potentially conformationally averaged, because it is P calculated from P the angular changes in different complexes ( — ZPA - — APA, see Table 1). However, we have not considered hindered octahedral complexes for descriptor calculations here, so conformational variability of S40 is more limited than it would be with a more extensive set of fragments. Figure 2 shows a plot of the two steric parameters, and it is indeed interesting to note that there is considerable scatter, perhaps more so for larger ligands, i.e., those with He8_steric >20 kcal mol-1. Nevertheless, both descriptors capture the same trends, and we consider this level of conformational variability acceptable for the present version of LKB-P; indeed it is in line with our observations for other ligand types.4 Maps of Ligand Space. Although the simple, two-dimensional relationships between descriptors as described above can be useful to determine the main properties contributing to observed data, most descriptors capture net effects and the interpretation of correlations and scatter plots often remains vague because resolution into, for example, σ- and π-electronic effects is difficult.15 Ligand knowledge bases can also be used to identify and visualize ligand similarities in multivariate space, based on the assumption that points close in this ligand space arise from different ligands that have similar properties. Visualization of such multivariate data (25) Barder, T. E.; Biscoe, M. R.; Buchwald, S. L. Organometallics 2007, 26, 2183–2192. Barder, T. E.; Buchwald, S. L. J. Am. Chem. Soc. 2007, 129, 12003–12010.

Article

Organometallics, Vol. 29, No. 23, 2010

6251

Table 2. Alkyl Substituents, where A, B, C Denote Other Groups class

R=

PRAr2, PR2Ar P-donor in ring, addt R

H, Me, CF3, CCl3, Et, C2F5, CH2CF3, C(O)Me, nPr, iPr, (CH2)2CN, (CH2)3OH, nBu, iBu, sBu, tBu, neopentyl (Np), cyclopentyl (Cyp), cyclohexyl (Cy), benzyl (Bn), I,a IIa H, Me, CF3, COOH, Et, C(O)Me, nPr, iPr, sBu, tBu, Np, Cy, 1-adamantyl (1-Ad) Me, CF3, Et, C(O)Me, iPr, sBu, tBu, Cy, Bn Me, iPr, sBu, tBu, Cy

biarylPR2 PHalmRn

Me, iPr, sBu, tBu, Cyp, Cy H, Me, Et, tBu, Cy

P(OR)3 P(OR)mAn P(OA)mRn P(NR2)3

Me, CF3, Et, CH2CF3, iPr, nBu, iBu, tBu, tC4F9, Np, Cy H, Me, iPr Me, CF3, tBu, Cy H, Me, iPr, pyrroline (pyr, IIIa), pyrrolidine (pyrd, IVa), morpholine (morph,Va), piperidyl (pip,VIa) H, Me, iPr, pyr, pip Me, C(O)Me, tBu, Cy Me, tBu, Cy

PR3 PR2R0

P(NR2)mAn P(NA2)mRn PRBC a

no. 1-22 23-55 97-123 124, 130, 131, 134, 135, 137, 142, 144, 146-149, 152-157, 272, 278 159-164, 166-169, 171-175, 177-185, 187, 188 193, 194, 196, 198, 200, 201, 204, 205, 206, 208, 210, 212, 213, 216-217 218-228 240-244, 250-255 240-244, 247, 250-255, 258, 259, 262 291-297 300-309, 313-323 300, 302-304, 310, 314, 316-318, 324, 325 339-345

See Chart 1 for relevant structures.

Chart 1. Line Structures for Tables 2 and 3

is difficult for 28 descriptors (Table 1), and so we have used these descriptors in principal component analysis (PCA) instead. PCA is a statistical approach designed to minimize the dimensionality of data sets while maximizing the information content of fewer, orthogonal, i.e., uncorrelated, variables (principal components, PCs),26 and here we find that the first two principal components capture 65.1% of the variation in the data, with PC3 and PC4 increasing this to 74.8% and 80.5%, respectively. Figure 3 shows a score plot for PCs 1 and 2, while Figure 4 and Table 5 summarize the corresponding data. Score plots identifying all ligands by their number can be found in the Supporting Information, Figures S1-S5. A grid plot of PCs 1-4 is also included there, Figure S6. Principal component analysis is known to be nonrobust to changes in the ligand set, especially for outliers and novel ligand classes,3 but we have observed a gradual “convergence” of the resulting map with increasing numbers of ligands such that the descriptors loading highly onto the first few PCs no longer change substantially in response to the addition of ligands and the overall shape of ligand space appears constant; the magnitude and sign of descriptor coefficients change more noticeably. On the basis of this (26) Livingstone, D. Data Analysis for Chemists; Oxford University Press: Oxford, 1995.

observation, the current version of LKB-P appears to capture the known P-donor ligand space reasonably well, and the predictions below should be reliable for ligands similar to the main classes considered. However, as mentioned in the discussion of the ligand set, sampling is by no means complete or even. To some extent this is a reflection of the synthetic utility of different ligand classes, where LKB-P is biased toward stable, synthetically useful, and versatile ligands, with more limited exploration of other ligands. Descriptor loadings on PC1 and PC2 are high (Figure 4 and Table 5), suggesting that all descriptors considered here contribute to the map. The first two principal components capture a reasonably high proportion of the information content and show chemically intuitive clustering according to the ligand substitution pattern (see color coding in Figure 3). Alkyl-substituted ligands of the general type PA3 and PA2B are predominantly found in a band around PC1 = 0 to þ7 and PC2 = -7 to þ5, while aryl-substituted ligand scores adopt a more spherical shape around PC1 = -3 to þ3, PC2 = -6 to þ2. Phosphites and phosphinites form another band for PC1 = -10 to þ1 and PC2 = 0 to þ4, and halide-substituted ligands stretch across PC1 = -13 to -3 and PC2 = -10 to þ4. Aminophosphines appear to be scattered through ligand space, but consideration of PC3 vs PC2 (Figure 5) highlights that they actually occur outside the main ligand grouping, at PC1 = -5 to þ5, PC2 = -5 to þ5, and PC3 = -6 to 0. In this case, inspection of PC3, which increases the information content by a further 9.8 percentage points, is useful to distinguish this class of ligands from other sets. Consideration of the clustering of ligand classes (Figures 3 and 5) and descriptor loadings (Figure 4, Tables 5 and S16) on the first few PCs allows us to attempt interpretation of the map in terms of familiar steric and electronic effects: PC1 is dominated by steric (He8_steric, S40 ) and σ-electronic descriptors (EHOMO, PA), as well as M-L bonding terms such as M-P distances and bond dissociation energies, again terms with a large σ-electronic component. The qualitative spread of ligands highlighted in Figure 3 supports this interpretation, with electropositive alkylphosphines at large, positive values of PC1 (PMe3, 2, PtBu3, 16), phosphites (P(OMe)3, 218, P(OPh)3, 229) at intermediate values of PC1, and electron-withdrawing, halide-substituted ligands (PHal3, 189-192) at large, negative

6252

Organometallics, Vol. 29, No. 23, 2010

Jover et al.

Table 3. Aryl Substituents, Where A, B, C Denote Other Groups class

Ar =

PArmRn P-donor in ring, addt Ar

Ph, R-substituted C6Hn (various ring positions, R = F, Cl, Me, SO3H, OMe, CF3, NMe2, tBu), C6F5, ferrocene (FeCp2), naphthyl Ph, C6F5, R-substituted C6Hn (R = Me, iPr, Ph) Ph

PAr3

Ph, o-tolyl Ph Ph, C6F5, R-substituted C6Hn (R = Cl, Me, OMe, CN, iPr, tBu, Ph) Ph, 2,4-tBu2-C6H3 Ph, C6F5 Ph, borazine (VIIa) Ph, 2-pyridine, VIIIa Ph Ph, C6F5 Ph, R-substituted C6Hn

biarylPAr2 PHalmArn P(OAr)3 P(OAr)mAn P(OA)mArn P(NAr2)3 P(NAr2)mAn P(NA2)mArn PArBC P(OAr)BC a

no. 63-86 97-123 126, 128, 129, 132, 133, 136, 138-141, 143, 145, 158, 265, 269, 275, 280, 287 165, 170, 176, 186 199, 211 229-239 262 245, 246, 248, 249, 256, 257, 260, 261 298, 299 310, 311, 312, 324, 325, 326 308, 311, 312, 313, 320, 322, 326 342-348 339, 344-348

See Chart 1 for relevant structures.

Table 4. High Linear Bivariate Correlation Coefficients (R) between Au Fragment and Other Descriptors Au descriptor

highest correlations with (R =)

BE(Au) Au-P Au-Cl ΔP-A (Au) ΔA-P-A (Au) Q(Au fragm)

EHOMO (0.74), PA (0.83), BE(B) (0.84), Pd-Cl trans (0.89), Q(Pt fragm.) (-0.87) PA (0.83), P-B (0.89), P-Pd (0.88), P-Pt (0.91), S40 (-0.77), He8_steric (0.74) EHOMO (0.94), PA (0.84), P-Pd (0.80), Pd-Cl trans (0.86), P-Pt (0.87) ΔP-A (B) (0.76), ΔP-A (Pd) (0.60) ΔA-P-A (B) (0.84), ΔA-P-A (Pd) (0.74), ΔA-P-A (Pt) (0.66) BE(B) (-0.71), Q(B fragm) (0.71), Pd-Cl trans (-0.69), Q(Pd fragm) (0.69), Q(Pt fragm) (0.74),

Figure 1. Scatter plot of ΔP-A (Pd) vs ΔP-A (Au), highlighting halide-substituted outlier ligands.

values. PC2 also captures steric effects, as well as ELUMO and the bond length changes in the ligand on coordination. Reasonably high loadings for most of the BH3 descriptors cloud the picture somewhat, but both steric and π-electronic effects appear to contribute to PC2. Larger ligands (PtBu3, PBr3, PI3) tend to appear at negative values for PC2, whereas smaller ligands (PMe3, PF3) are located toward more positive values. PC3 captures mostly the angular response of ligands to coordination, and this further distinguishes different ligand classes by their flexibility, as discussed for aminophosphines. The map can also be used to identify changes in ligand properties due to changing substitution patterns. Figure 3 shows PMe3 (2) and P(CF3)3 (3), as well as the intermediate, mixed ligands PMe2(CF3) (25) and PMe(CF3)2 (32). In line with our tentative interpretations of descriptor loadings, the gradual perfluorination of this system moves ligands from the top right quadrant toward the lower left, corre-

Figure 2. Scatter plot of steric parameters.

sponding to increases in both size and electron-withdrawing character. Similar changes can be observed for PPh3 (63) and P(C6F5)3 (70). In addition, unfamiliar ligands, such as 60-62 (Scheme 2), Landis’ phospholanes (e.g., 131 and 132),27 Pringle’s phosphacyclohexanones (e.g., 142),28 and cage phosphines (e.g., 158)29 (all Scheme 3) can also be located on the map, and their likely synthetic utility could be predicted by comparison with other ligands nearby in ligand space.

(27) Landis, C. R.; Nelson, R. C.; Jin, W. C.; Bowman, A. C. Organometallics 2006, 25, 1377–1391. (28) Doherty, R.; Haddow, M. F.; Harrison, Z. A.; Orpen, A. G.; Pringle, P. G.; Turner, A.; Wingad, R. L. Dalton Trans. 2006, 4310–4320. (29) Baber, R. A.; Clarke, M. L.; Heslop, K. M.; Marr, A. C.; Orpen, A. G.; Pringle, P. G.; Ward, A.; Zambrano-Williams, D. E. Dalton Trans. 2005, 1079–1085.

Article

Organometallics, Vol. 29, No. 23, 2010

6253

Figure 3. Principal component score plot (PC1 and PC2) for ligands in LKB-P, capturing 65% of variation in the data. See text for detailed discussion of principal component loadings. Table 5. Descriptor Loadings on PC1-4a

Figure 4. Principal component loadings (PCs 1 and 2).

Further to using this map derived from LKB-P to observe ligand similarities and trends across ligand space, it can also be enhanced with projections of suitable experimental data. High-throughput screening (HTS) can be used to generate such data sets, and Figure 6 shows a projection of experimental yields obtained from fluorescence resonance energy transfer (FRET) assays for the palladium-catalyzed aminations of aryl bromides (Scheme 7), reported by Stauffer and Hartwig in 2003.30 The authors report yields for 119 ligands, including 70 monodentate P-donor ligands, of which 30 are also in the present version of LKB-P (see Table S18 for relevant data). (Since additional monodentate ligands in the experimental data set were similar to those covered by LKB-P already and thus not accessing novel areas of ligand space, we did not pursue calculations to increase the number of ligands in common.) While the overlap with LKB-P is rather limited, the data achieve a reasonable spread with respect to ligand structures and experimental yields, and there is a clear trend of higher yield observations toward the lower right corner of the present ligand map. This agrees well with the (30) Stauffer, S. R.; Hartwig, J. F. J. Am. Chem. Soc. 2003, 125, 6977– 6985.

descriptor

PC1

% contribution EHOMO ELUMO He8_steric PA Q(B fragm) BE(B) P-B ΔP-A(B) ΔA-P-A(B) Q(Au fragm) BE(Au) Au-Cl P-Au ΔP-A(Au) ΔA-P-A(Au) Q(Pd fragm) BE(Pd) Pd-Cl trans P-Pd ΔP-A(Pd) ΔA-P-A(Pd) Q(Pt fragm) BE(Pt) P-Pt ΔP-A(Pt) ΔA-P-A(Pt) — (H3P)Pt(PH3) S40

40.7 0.240 0.151 0.191 0.272 -0.139 0.167 0.217

a

-0.200 0.259 0.281 0.263 -0.162 0.269 0.238 -0.251 -0.201 0.260 -0.193 -0.221

PC2 24.4

PC3

PC4

9.8

5.6

0.190 -0.220 -0.235 0.250 -0.196 -0.235 0.195 -0.179

-0.195 0.186 -0.272 0.267 -0.182 -0.326 0.173 -0.157 0.196

-0.262 0.262 0.175 0.150 0.216 0.434

0.183 0.403

0.405

0.372 -0.226

0.455 -0.252 -0.172 0.292 0.155 -0.213 0.227

-0.306

-0.217 0.433 0.292

Values smaller than |0.15| are not shown; see Table S17 for full data.

design paradigm for ligands in palladium-catalyzed C-N coupling reactions,31 where electropositive ligands support the often rate-limiting oxidative addition of aryl halide to the palladium source32 (note that, depending on the aryl halide used, ligand dissociation or an associative displacement pathway has been suggested as the rate-limiting step as well33), and sterically hindered ligands prevent an unproduc(31) Corbet, J.-P.; Mignani, G. Chem. Rev. 2006, 106, 2651–2710. (32) Shekhar, S.; Hartwig, J. F. Organometallics 2007, 26, 340–351. (33) Li, Z.; Fu, Y.; Guo, Q.-X.; Liu, L. Organometallics 2008, 27, 4043–4049. Senn, H. M.; Ziegler, T. Organometallics 2004, 23, 2980–2988. Barrios-Landeros, F.; Carrow, B. P.; Hartwig, J. F. J. Am. Chem. Soc. 2009, 131, 8141–8154.

6254

Organometallics, Vol. 29, No. 23, 2010

Jover et al.

Figure 5. Principal component score plot (PC2 and PC3) for ligands in LKB-P.

Figure 6. Projection of FRET yields for palladium-catalyzed amination of aryl bromide30 onto LKB-P map. Spot size and coloring relate to yield, with dark red, large spots corresponding to the highest yields (55, 69%; 162, 80%) and small yellow spots corresponding to lowest yields (77, 8%; 310, 9%).

tive β-hydride elimination pathway in favor of the reductive elimination step, which gives the desired arylamine product (Scheme 8).34 Further screening could target neighboring ligands in this region, and this is indeed where many of the biaryl ligands described by Buchwald and identified as coligand for palladium-catalyzed cross-coupling reactions can be found35 (Figure S5, e.g., ligands 166-168, 172-174, 177, 179-181, 184, 187). Ligand maps can also be used to identify representative ligands from different subgroups, useful for reaction screening where a broad and varied ligand set is more likely perhaps to show reactivity for a given substrate and hence to access (34) Hartwig, J. F. Pure Appl. Chem. 1999, 71, 1417–1423. Hartwig, J. F. Nature 2008, 455, 314–322. Hartwig, J. F. Acc. Chem. Res. 2008, 41, 1534–1544. (35) Surry, D. S.; Buchwald, S. L. Angew. Chem., Int. Ed. 2008, 47, 6338–6361.

regions of “reaction space” not previously considered. In addition, screening results can be useful for further analysis to determine whether steric and/or electronic ligand effects contribute to observed reactivity4 and indeed to achieve reliable prediction for other ligands. Such data analysis will benefit from improved sampling of ligand types, generating a larger data set more suitable for multivariate analysis and, if related ligands have been considered, often reducing the likelihood of extrapolation when predictions are made. Even a low or zero response can be a useful result for such models, allowing further experiments to focus on more promising areas of ligand space as well as validating predictions. Such test sets can be designed simply by visual inspection, choosing an appropriate number of ligands from different regions of the map, but design of experiments (DoE) approaches36 can also be used in this context to achieve good coverage of ligand space. Initial broad ligand screens are often restricted to commercially available ligands, and Figure 7 illustrates the regions of ligand space accessed by this criterion, corresponding to 117 out of the 348 ligands included in LKB-P.44 Price, actual availability to purchase a ligand/complex on a limited project time scale, and safety/stability under reaction conditions may further limit the spread of ligands available for consideration, while an in-house library could of course expand the ligand set and perhaps shift its focus to a particular subset, which might hamper comparison with other screening results. Models of Response Data. Simple bivariate linear relationships between LKB descriptors have been discussed already, but where suitable experimental data are available, relationships between such response data and LKB-P descriptors can also be explored, with a view to interpreting ligand effects and making predictions for other ligands. As observed ligand effects usually arise from a combination of steric and electronic contributions, bivariate linear correlation coefficients tend to be low and thus not meaningful, and multivariate (36) Stazi, F.; Palmisano, G.; Turconi, M.; Santagostino, M. Tet. Lett. 2005, 46, 1815–1818. McNamara, C. A.; King, F.; Bradley, M. Tetrahedron Lett. 2004, 45, 8239–8243.

Article

Organometallics, Vol. 29, No. 23, 2010

6255

Scheme 7. Palladium-Catalyzed Amination

Scheme 8. General Catalytic Cycle for Palladium-Catalyzed Amination

Figure 7. Projection of commercially available ligands (according to listings in catalogues of chemical suppliers and on databases), indicated as blue boxes, on LKB-P map.

models need to be considered instead. While PCA benefits from and relies on reasonably high correlations between different descriptors, these complicate the derivation and analysis of multivariate linear regression models, as they can give rise to a range of models that all reproduce the initial training set well, but may differ considerably in their predictive performance.3 In addition, a range of linear and nonlinear regression approaches are available for analysis and predictions in chemistry,26 and the predictive performance of the resulting models can again vary substantially. However, the biggest challenge is often to find an experimental data set of sufficient size and variability to use as a training set, and indeed the 30 ligands screened by FRET/HTS as described above are perhaps insufficient in terms of both sampling and variability to attempt any meaningful predictions. Nevertheless, we have experimented with different approaches to variable selection and regression for this data set.45 Figure 8 illustrates the predicted yields for all ligands for a partial least-squares regression (PLSR) using a single latent variable derived from all LKB-P descriptors (regression coefficient, R2= 0.171) and a multiple linear regression (MLR) model using 10 descriptors (EHOMO, ELUMO, He8_steric, BE(B), P-B, ΔA-P-A(B), P-Au, ΔP-A(Au), P-Pt, ΔA-P-A(Pt),

R2 = 0.908).46 Model details are summarized in the Supporting Information (Table S19). As indicated by the regression coefficients, the MLR model gives a much better fit to the experimental data, but a relatively large number of descriptors is needed to achieve such agreement, making the model rather overfitted.37 This is likely to have a detrimental impact on predictions, and indeed high yield predictions occur around the fringes of ligand space, but no clear pattern emerges. On the basis of current understanding of the catalytic cycle for palladiumcatalyzed amination mentioned above, ligand effects would be expected to be consistent within subsets of similar ligands, which is not clearly the case here, likely because the model is extrapolating to ligands not included in the training set. Nevertheless, the descriptors used in this model point to the importance of steric and σ-electronic effects, supporting the existing design paradigm of bulky, electropositive ligands for cross-coupling catalysts discussed above. Proper statistical model validation is also difficult for such a small data set, as division into test and training sets would further reduce the number of ligands for fitting; however, resampling approaches could be used. (Here, model evaluation is further (37) Hawkins, D. M. J. Chem. Inf. Comput. Sci. 2004, 44, 1–12.

6256

Organometallics, Vol. 29, No. 23, 2010

Jover et al.

Figure 8. LKB-P projection of predicted yields for multiple linear regression (left, 10 descriptors) and partial least-squares regression (right) models fitted to yields of 30 ligands analyzed by FRET/HTS.30 For the MLR model, predicted yields >100% have been deleted. (See Supporting Information for further details of data processing and models.)

complicated by transformation of the data.) As would be expected, models with fewer descriptors give worse regression coefficients, but the pattern of predictions appears quite similar to that shown in Figure 7 for MLR (see Supporting Information for a 6 descriptor model, Table S19). The PLSR model is based on a single latent variable as chosen by the automatic model fitting routine (see Computational Details). In this case, trends across ligand space are pronounced, but do not seem to account well for the actual training data, as shown by a low regression coefficient (R2 = 0.171); increasing the model size to consider two latent variables does improve agreement between experiment and model, giving trends quite similar to the MLR model (see Supporting Information, Table S19), but inclusion of further latent variables does not improve the model. Indeed, the cross-validated regression coefficient, Q2, drops off rapidly, making even the two-variable model unfavorable (Q2 = 0.0803, suggesting that the model has almost no predictive power). In fact, all of these models are of poor quality, especially for making predictions, and detailed interpretation beyond the identification of general trends would be difficult and indeed almost inappropriate, as the models are unlikely to be robust to changes in the ligand set. While this may be attributed in part to the limited sampling of ligand space, it is also worth considering the experimental data in this context. Using yields after a set time interval (16 h in this case) as the response data does not provide information on the rates of reaction or the robustness of the catalyst, so medium to high yields can arise for different reasons (including a fast initial reaction and catalyst deactivation, as well as a slower reaction of a robust catalyst) and the relationship with ligand properties may not be straightforward, further complicating analysis. These models thus mainly serve to illustrate the importance of using large and varied ligand sets for the prediction of ligand effects, facilitating model validation as well as avoiding overfitting and extrapolation. Another possibility would be to fit local models, considering only the ligands of highest yield clustered together in ligand space. Such a model would not seek to attempt prediction for the full ligand set, focusing instead on an area of activity. This might be useful for reaction optimization, but perhaps less so for an initial screen of ligands.

While the paucity of suitable experimental data sets hampers the use of LKB-P descriptors in predictive models, we can use calculated response data to further illustrate such applications. Calculated observations, e.g., barriers to reaction or bond dissociation energies, can access a broad and varied ligand set, including systems that are novel and/or synthetically inaccessible, so a global model for prediction is feasible; in addition, ligand effects will be considered in isolation. However, this may not correspond well with related experimental observations due to other influences, e.g., the effects of solvent, counterions, catalyst source etc., and predictions need to be validated carefully. We originally considered a chromium complex, [Cr(CO)5L] (Scheme 1), as a source of descriptors for LKB-P, but found this to be sensitive to steric interactions with larger ligands, masking electronic contributions to the bonding in some cases.2 Such mixing of properties is less of a problem when attempting to fit regression models to this data, and we were able to calculate the bond dissociation energy for the trans carbonyl group for all ligands in LKB-P. In this case we have not attempted validation against suitable experimental data and will simply use this to illustrate potential applications of regression models. The size of this data set allows us to evaluate model quality by splitting the data into separate training and prediction sets. Table 6 summarizes two representative MLR and PLSR models, with further details included in the Supporting Information (Table S20), and Figure 9 shows the corresponding diagnostic plots. In this case both approaches to regression give reasonably good fits to the training data, as well as good predictions, with the PLSR model giving slightly better rootmean-squared errors (rmse’s) for both training and test sets. Inspection of outliers can sometimes hint at problems with a subset of ligands, but in this case no clear pattern emerges and predictions are generally within 100% have been deleted to improve display. (46) Like principal component analysis (PCA), partial least-squares regression (PLSR) is a projection method. However, in this case the projected (latent) variables are formulated to capture the variation of both the descriptors and the response variable. These latent variables are then used to fit least-squares regression models to the response data.