… and organic reaction performance - C&EN Global Enterprise (ACS

When chemists develop new types of reactions, they generate a lot of data on what works and how well, along with what doesn't work at all. Much of the...
0 downloads 8 Views 42KB Size
Science Concentrates COMPUTATIONAL CHEMISTRY

Machine learning predicts inorganic properties … Artificial neural network overcomes difficulties with inorganic data Machine learning, in which computers train on large data sets to make predictions, can be a fast way to find promising molecules for various applications, but it’s only as good as the data it trains on. A new strategy could make the method more useful for identifying leads among inorganic complexes, for which reliable data can be harder to come by (J. Phys. Chem. Lett. 2018, DOI: 10.1021/ acs.jpclett.8b00170). Heather J. Kulik and colleagues at Massachusetts Institute of Technology wanted to use machine learning to find new inorganic compounds with a small energy gap between their high- and low-electron-spin states. Because light or heat can boost these molecules, called spin-crossover complexes (SCCs), into a highspin state, they could be useful as switches and sensors. Finding new SCCs computationally presents a particular challenge for machine-learning models. Spin states and other properties of inorganic molecules are complicated, and less data is available to N teach the models. C To overcome this limitation, H3CH2N NH2CH3 the researchers combined a stanFe H3CH2N NH2CH3 dard search algorithm with a type C of machine learning called an artificial neural network to explore N octahedral SCCs. The network was trained to recognize complexes with a spin-state energy gap of 5 kcal/mol or less and provided a Fe(II)(CNC6H5)2(NH2CH3)4 check that limited the algorithm’s One of the spin-crossover exploration to complexes more familiar to the neural network. complexes identified by Their method turned up 372 Kulik’s machine-learning leads in minutes, which would strategy. have taken about four days with a rigorous computational method, density-functional theory (DFT). Kulik concluded that about 70% were viable targets. “By being a little bit conservative about walking away from spaces where the model was completely untrustworthy, we were able to be right—where right was reproducing the DFT result—a good amount of the time,” Kulik says. “This is a paradigm that can be extrapolated to very, very large explorations of chemical space very, very rapidly.” She’s hoping to evaluate millions of molecules in the next iteration. Kendall N. Houk, a computational chemist at the University of California, Los Angeles, praises the research for its speed and accuracy. Houk says the paper “fits into contemporary excitement about the use of machine learning.”—SAM LEMONICK

8

C&EN | CEN.ACS.ORG | FEBRUARY 19, 2018

INFORMATICS

… and organic reaction performance Chemists feed data from thousands of reactions into an algorithm to predict the best reagents to use When chemists develop new types of reactions, they generate a lot of data on what works and how well, along with what doesn’t work at all. Much of the data are never used, says Abigail G. Doyle, a chemistry professor at Princeton University. “We publish only a small fraction and usually only the best results,” she says. Doyle thinks that by using machine learning—in which computer algorithms find patterns in data—it might be possible to use all the data chemists generate to predict the best conditions for a reaction even when the substrate has never been used in that transformation before. Pd catalyst, isoxazole additive, base

Cl

NH2 +

H N

CF3

Doyle and Princeton’s Derek T. Ahneman and Jesús G. Estrada, along with Merck & Co.’s Spencer D. Dreher and Shishi Lin, take a step in this direction by using machine learning to predict the yield of a Buchwald-Hartwig amination (example shown). Their algorithm allowed for variation in the aryl halide substrate, palladium catalyst ligand, base, and an isoxazole additive (Science 2018, DOI: 10.1126/science. aar5169). The chemists added isoxazole to the mix because this motif is popular in druglike molecules but sometimes poisons these reactions. The team hoped to get a better idea of what conditions and specific isoxazole structures were problematic. Using Merck’s ultra-high-throughput reaction technology, the chemists performed 4,608 reactions and used the data from a portion of those to build an algorithm that would predict the outcome of the remaining reactions. After trying several algorithms, the chemists found that the so-called random forest model performed the best. This algorithm accurately predicted which isoxazole additives would poison the reaction, even those that weren’t included in the data used to build the model. The results could help chemists pick which ligand and base combination to use to maximize yields for the C–N coupling when a given isoxazole motif is part of their substrate. “We’re most excited by the idea that you can apply this method to any sort of new problem that you identify in reactivity,” Dreher says, although both he and Doyle say that kind of predictive power is still a long way off. The team’s use of machine learning “is marvelous and long overdue for the field of homogeneous catalysis and chemical synthesis in general,” says Richmond Sarpong, an expert in organic synthesis at the University of California, Berkeley.—BETHANY HALFORD

CF3