Autonomous Molecular Design: Then and Now - ACS Applied

Mar 25, 2019 - The success of deep machine learning in processing of large amounts of data, for example, in image or voice recognition and generation,...
0 downloads 0 Views 6MB Size
Forum Article www.acsami.org

Cite This: ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Autonomous Molecular Design: Then and Now Tanja Dimitrov,† Christoph Kreisbeck,†,‡ Jill S. Becker,† Alán Aspuru-Guzik,¶,† and Semion K. Saikin*,†,‡ †

Kebotix, Inc., 501 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States ¶ Department of Chemistry and Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3H6, Canada ACS Appl. Mater. Interfaces Downloaded from pubs.acs.org by UNIV OF LOUISIANA AT LAFAYETTE on 03/25/19. For personal use only.



ABSTRACT: The success of deep machine learning in processing of large amounts of data, for example, in image or voice recognition and generation, raises the possibilities that these tools can also be applied for solving complex problems in materials science. In this forum article, we focus on molecular design that aims to answer the question on how we can predict and synthesize molecules with tailored physical, chemical, or biological properties. A potential answer to this question could be found by using intelligent systems that integrate physical models and computational machine learning techniques with automated synthesis and characterization tools. Such systems learn through every single experiment in an analogy to a human scientific expert. While the general idea of an autonomous system for molecular synthesis and characterization has been around for a while, its implementations for the materials sciences are sparse. Here we provide an overview of the developments in chemistry automation and the applications of machine learning techniques in the chemical and pharmaceutical industries with a focus on the novel capabilities that deep learning brings in. KEYWORDS: machine learning, inverse design, deep learning, artificial intelligence, autonomous synthesis, neural networks

1. INTRODUCTION Discovery of new organic materials with tailored properties is a complex process which combines systematic and tedious tasks with a number of “lucky” coincidences. There are many questions as to what materials with specific properties we should make. Example questions include which molecules could form an ideal organic superconducting material or which molecules could make the most energy-efficient light sensors and emitters in wearable electronics? How would we make these materials nontoxic? These are just a few of the questions that molecular design aims to answer. For a given set of macroscopic properties, we aim to find the corresponding microscopic molecular structures and molecular packings. Two complementary possibilities to approach this problem with are the inverse and the direct design. In inverse design, microscopic structures are derived from the macroscopic properties. However, the structure−property relations of the inverse design are very complex and in most of the practical applications cannot be derived using analytical or computational models. In contrast, the direct approach tests microscopic structures for the desired macroscopic properties. For example, in a naive “trial and error” method, we randomly select molecules, synthesize, and test them for the property. This approach is also very inefficient because the number of potentially synthesizable molecules is huge.1−3 The conventional approach to address these problems is based on humans’ abilities to correlate and generalize experience. Making educated guesses, scientists generate hypotheses, synthesize, and test the molecules and then adjust these hypotheses according to the obtained experimental feedback. © XXXX American Chemical Society

Nowadays, many routine synthesis and characterization operations as well as computational modeling can be automated. However, a higher level analysis of the results and decision making is still attributed to humans heavily involving themselves in the molecular discovery loop. Such an automated open-loop system possesses several advantages because, on average, machines can operate at a higher speed maintaining a higher precision. Moreover, this approach releases researchers from monotonous, tedious procedures leaving more time for creative work. Recent advances in deep machine learning (ML)4,5 brought us a set of analytical tools that enter in many aspects of our life. For example, it is natural now to use ML-based speech and face recognition, text translation, maps, and navigation on our cell phones. These tools are also exploited for the analysis of scientific data and the scientific decision making process. The problems that ML has been successfully applied include searches for molecules with specific properties,6 discoveries of chemical reaction pathways,7 modeling of excitation dynamics,8 analysis of wave functions of complex systems,9 and identification of phase transitions.10 One of the reasons for the success of ML in materials science is the intrinsic hierarchy of physics phenomena.11 However, a systematic understanding of which ML methods are optimal for molecular characterization and what their limitations are is yet to be developed. Special Issue: Materials Discovery and Design Received: January 21, 2019 Accepted: March 15, 2019

A

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Forum Article

ACS Applied Materials & Interfaces

Figure 1. Schematic illustration of an autonomous molecular discovery system with multiple feedback loops.

2. AUTOMATION The first successful attempts in a complete automation of a system that synthesize molecules date back to the 1960s− 1970s.13−15 In these pioneering studies, automation of the reaction was used for the optimization of reaction conditions. In one of the earliest research studies involving a computercontrolled synthesis,14 the authors developed a system, where the dispensing of chemicals into the reactor was controlled through a set of pumps and syringes. The products of the reaction were also characterized automatically. The operation of the system was demonstrated using a hydrolysis of pnitrophenyl phosphate to p-nitrophenol by the enzyme alkaline phosphatase. The product of this reaction is yellow colored. Therefore, the amount of the product was easily monitored using spectrophotometry tools during the experiment. Three main operational phases had been shown by the authors of the study implementing (1) a routine operation of the system, where conditions for single chemical reactions were preprogrammed on a computer and then the synthesis was executed autonomously, (2) a design of experimental procedures, where a set of experiments were executed automatically while scanning selected condition parameters, and finally (3) a simple decision making procedure. This procedure included a feedback with adapting a grid step in the experiment which allowed the authors to optimize the experimental conditions. Importantly, the authors of the study highlighted that the computer, besides controlling routine procedures, can be used for such tasks as data interpretation and the design of the experiment with a feedback loop. In yet another early work,15 the researchers from Smith Klein and French Laboratories had developed a closed-loop automated chemical synthesis system to optimize chemical reaction conditions using a simplex algorithm. The system was composed of a single chemical reactor, where the reactor conditions including heating/cooling, stirring, and adding chemical reagents were controlled by a protocol that ran on a computer. In addition, a liquid chromatographic column had been used for the characterization of the products. Interestingly, this work showed a prototype of a distributed lab, where a computer was communicating through a modem with the synthetic system. We also would like to highlight here the system developed by Legard and Foucard.16 The main features of this system were its modularity and ability to use standard chemistry lab equipment. It was introduced as a versatile chemistry automation kit, Logilap, that would allow for

Combining automatic characterization, synthesis, and computational modeling with deep ML-based analysis and decision-making modules naturally closes the molecular discovery loop. The main advantage of such a closed-loop autonomous system, as we see it now, is in a thorough unbiased analysis of the data and generation of hypotheses with a larger fraction of hits, rather than just a higher throughput. Figure 1 shows a schematic diagram of such a system, where human intuition and existing knowledge provides an input to the control module. The system generates hypotheses using a variety of theoretical models and ML tools, automatically plans, executes, and analyzes the experiment providing feedback at each stage.12 The general idea of having a feedback loop in automated synthesis and characterization of molecules has been around at least since the late 1970s. However, the elements composing the system and our understanding of how it should operate have evolved sufficiently. Feedback loops can be implemented at multiple levels including the optimization of the synthesis procedure, finding appropriate reaction pathways, and suggesting new molecular structures for testing. Modern computational capabilities allow us to process data at a much higher rate; therefore, more complex data analysis models can be used. Moreover, the automated system can learn directly from online databases and even generate and update existing libraries. These are just some highlights of modern capabilities that still have to be implemented and tested in the autonomous discovery workflow. In this forum article, we provide a concise overview of a closed-loop discovery approach bringing together modern advances with the research from the late previous century, when many components of automated chemistry and machine learning were developed. The rest of the forum article is structured as follows. In section 2, we discuss the key steps in chemistry automation focusing on the developments from 1970s until 2000s. Section 3 shortly outlines the early developments in ML and highthroughput screening before the period of deep learning. Section 4 is focused on applications of deep machine learning methods in prediction of molecular properties. Section 5 provides examples and discuss general trends in using autonomous systems in characterization and synthesis of molecules. Finally, section 6 provides a discussion and some practical ideas about the current issues in the design of autonomous platforms. B

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Forum Article

ACS Applied Materials & Interfaces

Figure 2. Examples of automated chemical synthesis systems from 1970s (a) and 1990s (b) illustrate the improved level of integration. Left panel is reproduced with permission from ref 16. Copyright 1978 American Chemical Society. Right panel is reproduced from ref 34 by permission of John Wiley & Sons, Inc.

of the 1990s, sufficient progress had been achieved in the synthesis of polymers, where combinatorial libraries of polymers were automatically synthesized and characterized using fluorescence or Fourier transform-infrared (FT-IR) spectroscopy.26,27 Development of HTS also resulted in numerous patents on high-throughput experimentation (HTE) in the research and development of new materials.28 At the same time, commercial synthesizers appeared on the market.29,30 Moreover, sufficient progress in materials characterization specific to combinatorial chemistry had been achieved. For example, for many applications, fast characterization of mixtures before purification has been implemented as a more time and cost efficient approach. The software used for controlling automated experiments has also changed to account for parallel processing. While the original protocols used in 1980s were focused on serial implementation of reaction steps and optimization algorithms, the planners designed in the 1990s accounted for parallel processes and implementing a factorial design, a design of experiment, a modified simplex, and tree search algorithms.31−33 By the beginning of the 20th century, the main components for chemical reaction automation were (1) the delivery of the different chemical components to the reactor, (2) controlling the reactor conditions, (3) product purification, and (4) product characterization. All components were conceptually solved and implemented in commercially available synthesizers. These synthesizers were combined with characterization tools in systems ready for generations of combinatorial libraries. Figure 2 illustrates how the integration of automated chemical systems evolved from the 1970s16 until the end of the 1990s.34 While the original motivation for automated chemistry was in replacing repetitive work,15,16 HTE brought in the capabilities for exploring more chemical spaces. Multiple chemical reactions have been demonstrated using these systems.35 However, the synthetic capabilities of each particular system were limited to specific types of reactions, and a true universal chemical synthesizer had yet to be developed. It had been postulated that the design of the synthesizers should be adjusted for the particular tasks of either process screening, process optimization, or library generation. Within the last 20 years, automation in chemistry kept evolving on a bit slower pace with the major improvements in the integration of the components. Many automated characterization devices became conventional tools in chemistry and biological laboratories. However, the problem of the synthesis versatility had not been solved. In his review of automated systems for chemical synthesis developed at the beginning of

automatic control of the reaction parameters. Later in 1980s, this Logilap kit had been used by several research teams.17 Within the following two decades, the automation of chemical laboratories became more systematic with a focus on the versatility of the systems. It has been recognized since the early stages that automation of chemistry requires a diverse set of methods. Therefore, the automated systems have to be easily adapted for chemical reactions, using solid and liquid reagents, multiple solvents, and various chemical conditions.18−20 Moreover, the computer-controlled architecture requires that the software should be easily modifiable by experimentalists, who in turn are not specialized in software design.18 Architectures with multiple reactors and robotic arms21 have been introduced to parallelize the process. In the particular study,21 the designed system had a robot with three remote hands, a reagent station, or a stockroom that could contain up to 15 30−100 mL bottles, a reaction stage with up to nine magnetically stirred reactors, and a storage area that could accommodate 100 test tubes. The authors demonstrated the operation of the platform using a preparation of trifunctional vinyl sulfone in a one-pot sequence from ketosulfone and methyl coumalate. This multistep reaction is sensitive to the catalyst and the solvent used. Therefore, these parameters were used in the automated optimization. Particular attention should be paid to the studies from Takeda Chemistry Industries.22−24 The developed automated platform was used for the synthesis of substituted N-(carboxyalkyl)amino acids where some intermediate components are unstable. To optimize the reaction conditions, the research team developed a kinetic model for the reaction. The parameters of this model were obtained from the experiment creating a feedback loop.23 The developed platform was used to generate 90 compounds, working 24 h a day with an average reported productivity of three compounds per day. This workstation demonstrated that even if the chemical yields are low under the optimum conditions, it is still possible to obtain a sufficient amount of the desired product by repeating the reaction. The interest in combinatorial methods for automated chemistry grew tremendously since the end of the 1980s. In part, this was triggered by the success of high-throughput screening (HTS) in the pharmaceutical industry.25 As compared to process chemistry, where the main focus is on the control and optimization of the process conditions, combinatorial chemistry is focused on the versatility of products. This approach allows generating large libraries of related chemicals using similar reaction pathways. By the end C

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Forum Article

ACS Applied Materials & Interfaces the 1990s, Lindsey outlined17 the following architectures: (1) flow reactors, (2) single-batch reactors, (3) single-robotic synthesizers, (4) dual-robotic synthesizers, and (5) workstations. Presently, these architectures have merged into two main classes: (1) systems for flow and (2) systems for batch chemistry. Both classes have their own advantages. For example, flow chemistry setups allow for a continuous screening of experimental parameters, such as concentrations of active chemical components and catalysts, temperature, and time.36,37 Moreover, the up-scaling of the process is much easier in the flow setup which could be transferred from research to the developmental stage. In contrast, the batch setup brings in the “digitization” of the synthesis process, which is natural for the generation of libraries of chemical components and compounds. Additionally, the batch setup allows the operation with very small amounts of chemicals, which is important when the cost of chemicals is a critical factor.38

illustration of the ML methods before the era of deep learning, we review the effort of a research group for DuPont published in ref 49. This study illustrates the state-of-the-art of ML from HTS by the middle of 2000s. The research team compared four different ML models, two decision-tree models, an inhouse built InfoEvolve model,50 and a neural network model with eight different descriptors to the classification of large, ∼106 molecules, agrochemical data sets. The main outcome of this multiyear project was that the best prediction performance could be obtained using a combined model trained on different types of descriptors. The authors argued that the descriptor sets contain complementary information and combining them averages the performance of the models. Sparse applications of ANNs in chemistry can be dated back at least to the beginning of 1990s.51,52 In these studies, ANNs were applied to molecular characterization, including the prediction of chemical shifts in NMR spectra of organic compounds,53 classification of IR spectra,54 and FT-Raman spectra.55 The training sets used in these studies were sufficiently small, ∼100−1000 data points. The performance of ANNs had been compared to other ML methods.49,56 However, the results did not seem too promising. This also has been associated with a higher computation cost of training of NNs as well as the complexities in their design.

3. VIRTUAL SCREENING AND MACHINE LEARNING While being introduced much earlier, high-throughput virtual screening and machine learning methods were extensively developed for medical chemical applications in the 1990s− 2000s.39,40 The limitations of the synthetic capabilities were quickly recognized as the main issues of the HTS approach, especially for applications in the pharmaceutical domain. The sizes of the largest molecular libraries, on the order of 104−105 molecules, were negligible as compared to the number of potentially synthesizable molecules.41 Therefore, virtual screening methods naturally appeared as a tool for helping with the exponential growth of HTS cost. These methods allowed for a less expensive presynthesis analysis of molecules in order to limit the search space. Closed-loop platforms involving virtual screening, scoring the molecules, synthesizing the promising leads, followed by their characterization have been proposed.39 It should be noted that despite many similarities, the virtual screening for drug design and for materials design are different.42 While the virtual screening of potential drug candidates is frequently based on phenomenological equations or empirical data and cannot be done using microscopic models, the virtual screening of materials can be done using phenomenological microscopical computation models or ab initio methods such as molecular dynamics or density functional theory (DFT). Further, the virtual screening of materials comes with the following drawbacks/issues: (1) large libraries are still expensive, (2) the precision of computed results is often not sufficient, and (3) some properties are difficult to compute. In parallel, ML methods evolved as a tool for the optimization, classification, and prediction of molecular properties in the pharmaceutical industry.43−48 The methods included random forest, decision trees, support vector machines, and artificial neural networks (ANNs). At the beginning of the 20th century, these methods became conventional tools for classifying results from HTS. On average, the ML methods showed a several-fold improvement in hit-rates as compared to the random or an expert based HTS. It had been found that many existing molecular libraries used in HTS are not diverse enough and contain multiple clusters of molecules. Proper learning from these unbalanced data sets and evaluating the ML methods required additional balancing and reweighting of the components. As an

4. DEEP LEARNING IN CHEMISTRY In recent years, we experienced major breakthroughs in various applications for deep learning. One of the most famous examples is AlphaGo,57 which outperformed the best human player in a complex and abstract board game. The main driver for the recent developments in deep learning is based on the vast increase in computational resources following Moore’s Law. Today, even standard desktop computers can reach tera floating point operations per second (TFLOP) as tailored hardware technologies for deep learning, such as Google’s tensor processing units (TPUs) or the Tesla V100 graphics processing unit (GPU) (up to 125 TFLOPS Tensor Performance) are available as consumer products. The fast paced advances in deep learning have been quickly adopted by the field of chemistry for small molecule design.58−60 Successes range from (1) the prediction of binding activities of small molecules,61−63 (2) sophisticated AI software for reaction prediction64,65 and reaction route planning,7 and (3) the inverse design of small molecules.60,66 New opportunities have emerged in the last years by the advent of deep generative models67−69 originally developed for text and images. Deep generative models offer the prospect of a paradigm shift from traditional forward design and virtual screening of combinatorial libraries toward more diverse, yet focused, exploration of chemical space for various applications ranging from de novo drug discovery to small molecule based materials for organic photovoltaics or energy storage applications, among others. Such an inverse-design pipeline needs to (1) learn the rules of chemistry to generate valid chemical structures, (2) efficiently evaluate the molecular properties of newly generated structures, and (3) quickly identify the relevant chemical space resulting in either focused libraries or a small set of lead candidates for experimental testing. Since molecules can be represented as Simplified Molecular Input Line Entry Specification (SMILES) character strings, recurrent neural networks for sequence and text generation70−73 serve as natural frameworks for algorithms where the D

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Forum Article

ACS Applied Materials & Interfaces computer “dreams” up new molecular libraries. Segler et al.74 have trained a long short-term memory (LSTM) recurrent neural network (RNN) on a large molecular library of 1.4 million molecules extracted from the ChEMBL database. About 900 000 new molecules were generated by sampling 50 000 000 SMILES symbols. The properties of the generated molecules, such as number of proton H-donors/acceptors, solubility (log P), total polar surface area, etc., resemble what the model has seen in the training set. In the first step, the task of the LSTM-RNN model is to reproduce valid SMILES strings, in the second step transfer learning is applied to shift the distribution of the molecular properties toward the generation of active compounds against certain targets. Since the model is already pretrained, less examples of known active compounds (∼1 000) are sufficient to retrain the model to generate more focused libraries of novel potential drugcandidates. A similar approach has been used by Merk et al.75 for the de novo molecular design for agonists of therapeutically relevant retinoid X and/or peroxisome proliferator-activated receptors. A total of 5 out of 49 high-scoring lead candidates extracted by the deep-learning approach were experimentally tested, and 4 of them did indeed show considerable potency. While focused library generation has been shown to yield a diverse set of molecules resulting in experimentally verified novel bioactive compounds,75,76 the covered chemical space is inherently restricted by the used training set. One way to push the distribution of generated molecules outside the chemical space of the training set is to couple the sequence based generative models with policy based reinforcement learning.77,78 In ref 78, a prior RNN was trained on 1.5 million compounds in the ChEMBL database, while another RNN is used as the agent network. Based on Markov decision processes, the agent chooses the next character in the sequence when generating the SMILES string. During the learning epochs, the agent policy will be subsequently updated to maximize its expected return when evaluating an application specific scoring function for the generated character sequence. The model has been successfully demonstrated for various tasks ranging from avoiding certain elements in the generated molecules to producing new compounds which are predicted to be biologically active. Reinforcement learning for inverse molecular design has been also applied in combination with Generative Adversarial Networks (GANs) for molecular structure generation.79,80 Both stochastic and deterministic policy gradients have been used. When compared to RNN for character sequence generation, GANs offer more flexibility in the representation of molecular structures. MolGAN,80 for example, represents molecules as graphs. Another approach to the inverse design of small molecules, pioneered by Gómez-Bombarelli et al.66 uses variational autoencoders (VAEs). An encoder NN is used to compress molecular space into a continuous vector space representation. Vector points in this so-called latent space can then be decoded back into molecules. By training the VAE jointly with a model for property prediction, the latent space ensures a sufficiently smooth and continuum representation of both, structures and properties. This facilitates Bayesian optimization in the latent space for the de novo design of molecules with the desired properties. In a simple example, Gómez-Bombarelli et al. demonstrate their concept for the design of druglike compounds that are easy to synthesize. Hereby, the authors trained the VAE on 250 000 molecules extracted from the ZINC database and estimate that their VAE architecture can

potentially generate approximately 7.5 million distinct structures. Since generative models such as autoencoders do not depend on hand-coded rules, the model needs to learn chemistry and the syntax of molecular representations, for example, SMILES strings. This can be a challenge, and usually VAEs produce to a large degree invalid molecules. This issue has been addressed by various improvements on the autoencoder architectures for small molecules. For example, Kusner et al.81 introduced a grammar variational autencoder (GVAE) to generate molecules with valid SMILES syntax. However, the SMILES language is not entirely context free which becomes a problem when ringbonds are involved. This has led to the development of a syntax-directed VAE,82 which applies an on-the-fly semantic validation by implementing stochastically lazy attributes. Besides SMILES strings, graphs also have been used to generate a diverse library of molecules.83,84 To date, the best VAE when it comes to generating chemically valid molecules is the Junction Tree Variational Autoencoder85 which represents molecules as graphs, which are dissected into subgraphs and smaller building blocks. Each molecule is then described as a treestructured scaffold over chemical substructures. Further, a variety of additional methods for inverse-design have recently emerged. For example, ChemTS86 uses a Monte Carlo tree search approach for which a RNN is trained as rollout policy. Molecules are represented as SMILES. The approach was successfully demonstrated for the design of molecules that maximize the octanol−water partition coefficient (log P) and simultaneously optimize for synthetic accessibility with an additional penalty score to avoid the generation of molecules with large rings. Another recent approach to inverse design is based on Bayesian molecular design.87 Hereby, ML models are trained to predict structure−function relationships for molecular properties. Then Bayes’s law is used to derive a posterior distribution for backward prediction. Molecular generation is based on the SMILES representation, and a chemical language model is trained separately. A summary of the discussed deep learning methods that are used in chemistry can be found in Table 1. Table 1. Deep Learning in Chemistry acronym

description

RNN LSTM

Long short-term memory recurrent neural network for generating focused molecular libraries.74 Generative Adversarial Networks for molecular structure generation.79,80 Generative Adversarial Networks that represents molecules as graphs.80 Variational autoencoders for the inverse design of small molecules.66 Grammar variational autoencoder to generate molecules with valid SMILES syntax.81 Syntax-directed variational autoencoder which applies an onthe-fly semantic validation of the generated SMILES.82 Junction Tree Variational Autoencoder that represents molecules as graphs.85 Combines recurrent neural networks and Monte Carlo tree search for de novo molecular design.86 Inverse molecule design using the Bayesian statistics combined with a chemical language model.87

GAN MolGAN VAE GVAE SD-VAE JTVAE ChemTS Bayesian molecular design E

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Forum Article

ACS Applied Materials & Interfaces

5. AUTONOMOUS DISCOVERY SYSTEMS The progress in automation and deep learning algorithms, as described in the previous sections, has enabled the design of autonomous discovery systems. In particular, for chemical systems, autonomous molecular discovery systems require to go beyond the automation of the synthesis and/or the characterization of potential compounds. Such a system needs to (1) generate the hypotheses, (2) test them, and (3) adapt these hypotheses by performing automated experiments. Adam and Eve, designed to generate and test hypotheses in a closed-loop cycle using laboratory automation in the field of biomedical research,89,90 are considered among the first autonomous systems. Adam produced new scientific knowledge by analyzing genes and enzyme functions of yeast.89,91−93 To propose hypotheses, Adam used logic programming for representing background knowledge, where the metabolism of yeast, including most of the genes, proteins, enzymic functions, and metabolites, were modeled as a directed, labeled hypergraph.89 In a closed-loop, Adam then applied abductive reasoning to form hypotheses, used active learning to select experiments, generated experimental data by measuring the optical density of the yeast cultures, and then tested the hypotheses using decision tree and random forest algorithms.89 The combination of this software with the automation of Adams hardware, which includes robotic arms to control the experimental setup such as a liquid handler and a plate reader, allowed Adam to autonomously perform microbial experiments.89 While Adam had been designed to investigate genes and enzyme functions, Eve specializes on early stage screening and design of drugs that target neglected Third World diseases.88−94 Eve, shown in Figure 3, comprises three types

Several academic research teams are driving the progress in the design of autonomous chemical synthesis systems that can be used for specific applications.97−110 There is also a number of automated platforms in pharmaceutical companies. However, since these platforms are often commercialized and are not open source, the current state-of-the-art of these systems is not as transparent and it is not so easy to evaluate how autonomous they are. In the following, we highlight a few selected examples from five different research groups (1−5) explicitly, some of which are also shown in Figure 4. A summary of the discussed systems can be found in Table 2. (1) Remarkable designs of autonomous systems have been developed in the laboratory of Lee Cronin. These include an organic synthesis system,111 a dropfactory system,112 and the Chemputer.113 The latter is shown in Figure 4a. The autonomous organic synthesis system111 with its liquid handling robot and inline spectroscopy tools, i.e., a NMR system, a mass spectrometer, and an infrared spectrometer, performs chemical reactions, analyzes the products, and then uses real-time data processing and feedback mechanisms to predict the reactivity of chemical compounds. Combining machine learning, robotics, real-time feedback with the information provided by human experts, who initially label the reactivity of 72 mixtures to train the support vector machine algorithm and manually performed the reactions, the system was able to autonomously explore the chemical reactivity landscape of about 1000 reaction combinations.111 As the result of this study, the researchers were able to reveal four new chemical reactions. Discovery of molecules with targeted properties requires algorithms that allow for searching outside the local chemical space. Using a curiosity driven algorithm and by investigating the behavior of oil-in-water proto-cell droplets in a closed-loop cycle, the autonomous drop factory system112 was able to identify and classify modes of proto-cell droplet motion among other behaviors. In the same research group, the Chemputer,113 a modular system driven by a chemical programming language, synthesized the three pharmaceutical compounds Nytol, rufinamide, and sildenafil. (2) Researchers from the group of Steven Ley developed a flow system controlled by the LeyLab software that allows monitoring and controlling of chemical reactions, automation of synthetic procedures, and autonomously self-optimizing reaction parameters. The system was able to optimize a threedimensional heterogeneous catalytic reaction and a fivedimensional Appel reaction.107 Further, the group demonstrated a cloud-based solution that allows scientific collaborations across the globe. For the synthesis of the active pharmaceutical ingredients Tramadol, Lidocaine, and Bupropion, researchers in the USA remotely initiated, monitored, and controlled the experimental setup stationed in a laboratory in the U.K., including the chemicals and a self-optimized continuous IR flow system as shown in Figure 4b. (3) The autonomous self-optimizing reactor designed in the group of Klavs F. Jensen with its design-of-experiment (DOE)based adaptive response surface algorithm allows for the optimization of discrete variables, such as types of catalysts or solvents and continuous variables like temperature, reaction time, and concentration simultaneously. The reactor was used for precatalyst selection in Suzuki-Miyaura cross-couplings and for the optimization of an alkylation reaction. Among ten different solvents and three continuous variables,98,99 the

Figure 3. Eve autonomous system. The figure is adapted from ref 88 and used under CC BY 4.0.

of liquid handlers, two microplate readers, and an automated cellular imager that are operated by an active learning algorithm.88,95 This algorithm allows for cellular growth assays, cell based chemical compound screening assays, and cellular morphology assays.90 Eve can perform scientific investigations that go beyond standard library screening and is capable to perform hit-confirmation and lead generation.96 While Eve demonstrates the possibilities of active learning in compound screening, same as Adam, it was not designed for synthesizing chemicals.88 F

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Forum Article

ACS Applied Materials & Interfaces

Figure 4. Examples of modern chemical synthesis systems with autonomous control. (a) Chemputer designed in the group of Lee Cronin, from ref 113. Reprinted with permission from AAAS. (b) Remotely controlled FlowIR developed in the group of Steven Ley. Figure adapted from ref 115 and used under CC BY 4.0. (c) The plug-and-play system designed in the group of Klavs F. Jensen. Figure adapted from ref 100. Reprinted with permission from AAAS.

C−C and C−N cross-couplings, olefinations, reductive aminations, nucleophilic aromatic substitutions (SNAr), photoredox catalysis, and multistep sequences. (4) Researchers in the group of François-Xavier Felpin110 developed an autonomous self-optimizing flow reactor that is controlled by a custom-made optimization algorithm derived from the Nelder−Mead and golden section searches with a flexible monitoring system. The system was able to perform a multistep synthesis for the total synthesis of carpanone.110 (5) Autonomous closed-loop platforms have also been designed for the synthesis of single-walled carbon nanotubes by the team of Benji Murayama at the Air Force Research Laboratory (AFRL). The Autonomous Research System (ARES) autonomously learned to target growth rates.117 These are a just a few selected examples of specialized systems with autonomous control. Additional discussions of autonomous systems and their specific applications for energy and drug discovery processes can be found in refs 12, 110, and 118. One needs to note that the boundary between the automated systems with optimization loops described in the previous section and the autonomous systems discussed here is very thin. Most importantly, an optimization or a search algorithm lies at the heart of each autonomous system. This algorithm allows for the learning of an objective function, i.e., a function that describes the experimental outcome, e.g., the yield of a chemical reaction, as a function of a set of external parameters.119 By feeding the experimental data into supervised or unsupervised machine learning algorithms, the dependency of desired properties as a function of the set of experimental conditions can be learned in real-time, thus allowing the autonomous system to explore the parameter space of the experiment it is designed for. The ML algorithms can be also used for understanding the general energy landscape of the studied systems.120−122 While supervised algorithms use labeled data to train the model, unsupervised

Table 2. Autonomous Discovery Systems system Adam89,91−93 Eve88 Organic synthesis system111 Dropfactory system112 Chemputer113 Flowsystem LeyLab107 FlowIR114,115 Reactor98,99 Plug-and-play system100 Flow reactor110 ChemOS116 ARES117

description Autonomous system for biomedical research to analyze gene and enzyme functions of yeast. Autonomous system for early stage screening and design of drugs targeting Third World diseases. Autonomous organic synthesis system to predict the reactivity of chemical compounds. Autonomous system to explore the behavior of oil-in-water proto-cell droplets. Modular system that synthesized Nytol, Rufinamide, and Sildenafil. Autonomous flow system that optimized catalytic reactions. Cloud based flow system that synthesized Tramadol, Lidocaine, and Bupropion Self-optimizing reactor that allows to optimize discrete and continuous parameters. Plug-and-play system with interchangeable modules for a variety of different chemical reactions. Autonomous self-optimizing flow reactor that performed a multistep synthesis of carpone. Software package to handle workflow of autonomous platforms that explored color and cocktail spaces. Autonomous Research System for the synthesis of singlewalled carbon nanotubes.

reactor has identified solvents and reaction conditions that maximized the yields of the monoalkylated product. In the same research group, a major drawback of continuous-flow chemical synthesis systems has been addressed, i.e., such systems are often built for very specific chemical reactions and/or targets.100 The researchers addressed this problem by designing a plug-and-play system with interchangeable modules as shown in Figure 4c. This reconfigurable system allows optimizing of a variety of different chemical reactions including high-yielding implementations of G

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Forum Article

ACS Applied Materials & Interfaces

single experiments. The balance between the integration vs modularity should be also addressed in future autonomous systems. Highly integrated characterization tools such as NMR spectrometers and HPLC/MS are commonly used in automated experimentation. However, the entire system should be modular, where each unit can be easily added or removed without reprogramming the entire software. Theoretical/computational tests of deep ML methods, especially for generative models, show multiple advantages of these as compared to the ML tools used previously. However, the deep learning techniques still have to be implemented and tested in complete discovery pipelines. One of the issues here can be with the general hype about deep learning. Most likely, deep learning is not a magical tool that will eventually solve all problems of materials discovery. However, it is applicable for some intermediate steps, providing solutions, e.g., on how to perform a proper and complete enough sampling of the chemical space. The overall goal here is to increase the hit rate, rather than just to increase the size of the libraries. One related question to be addressed here is how to evaluate and compare the performance of the deep learning methods. In addition to the questions raised above, the question remains on how fast our deep learning models for autonomous systems/platforms can learn from the experiments. The conventional paradigm of deep learning is that the models are trained on a tremendous amount of data. This is not an issue for the data from social networks and for image processing. However, in the case of automated chemical synthesis, each data point can be expensive and its generation is limited by the throughput of the synthesis and characterization systems. How much experimental data is enough to learn from experimental measurements with active feedback? Also, this data can be obtained with different precision. The active learning algorithms should automatically decide what precision to use in order to balance the cost of the experiment and the value it brings to the training of the algorithm. One of the main advantages of deep learning is in training of a model on raw unprocessed data. However, this cannot be directly implemented for learning of molecular properties. In the latter case we need to digitize molecular information and write it in a form of molecular descriptors. Therefore, the deep learning model can learn only as much information as it is encoded in the descriptors. We are in an exciting era, where human-robot co-operations redefine the scientific discovery process and accelerate the discovery and synthesis of molecules with targeted properties. The generalization of autonomous platforms for molecular discovery will require from us to rethink existing designs of platforms encompassing several challenges on the hardware and software side as well as on the interface in between them.

models have successfully proven to be able to unravel nontrivial patterns in complex data. Promising unsupervised algorithms to operate autonomous platforms comprise Bayesian deep learning, Bayesian conditional generative adversarial networks,123 or deep Bayesian optimization119 among others. For instance, ChemOS,116 a software package able to handle the workflow of autonomous platforms, combines Bayesian optimization119 with laboratory instruments such as high-performance liquid chromatography (HPLC) to learn the color and cocktail chemical spaces.116 Each synthesis and characterization step performed by such an autonomous discovery system adds new information that is then used within the feedback loop as a decision-making process for the next synthesis step, guiding the experimental search for new molecules. The vision is to move away from autonomous systems toward the more general concept of autonomous molecular discovery platforms to target a broad range of different chemical reactions. Therefore, to synthesize specific molecules with desired properties, an autonomous platform should be as flexible as possible in design on both ends, i.e., on hardware and on software. This demands modular interchangeable monitoring and synthesis systems. In addition, the algorithm that controls the system should be robust enough to optimize an objective function within a chemical search space spanned by discrete and continuous variables as well as flexible enough to explore the molecular design space. All of the autonomous systems outlined above demonstrate how combining robotics and machine learning algorithms can lead to new scientific understanding and discovery. Each of these systems tackle different aspects that bring us closer to a generalized autonomous molecular discovery platform and ultimately bring us closer to building platforms that will be able to autonomously dream up new materials at the push of a button.

6. LOOKING AHEAD The general trend of chemistry automation has not been a continuous progress but rather involved periods of stagnation and exploration of various designs. Automation in chemistry still lacks the versatility desired. Currently, most of the automated systems are limited to specific sets of chemical reactions, and adjusting them between the reactions may require a sufficient effort. Synthetic platforms should be reaction agnostic and easily transformable. Systems focused on products versatility are useful for generating large molecular libraries. However, to push the envelope of smart synthesis systems, process versatility should be also explored in depth. The ability of flexible access of software to the devices is the heart of any autonomous system. Many characterization and synthesis devices exist in the market as modular tools. Unfortunately, so far, not all of them have easy access for third-party software control. Ideally, chemical synthesis and characterization systems should adopt a general robotic standard for their interface with the external devices. The field of robotics enters into a phase where robotic systems can cooperate with humans in a safe way or even learn from them. This standard has yet to be used by chemistry automation and comes with a lot of opportunities and challenges. On the one hand, graphical user interfaces (GUI) are practical for human researchers operating or trying hypotheses but become impractical for automated control systems. On the other hand, application program interfaces (API) are practical for the software control of devices but not flexible enough to use for



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Tanja Dimitrov: 0000-0002-5675-7825 Alán Aspuru-Guzik: 0000-0002-8277-4434 Semion K. Saikin: 0000-0003-1924-3961 Notes

The authors declare no competing financial interest. H

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Forum Article

ACS Applied Materials & Interfaces

■ ■

Automated System with Operator-specified Reaction Sequences. J. Am. Chem. Soc. 1984, 106, 7143−7145. (22) Hayashi, N.; Sugawara, T. Computer-assisted Automated Synthesis I: Computer-controlled Reaction of Substituted N(carboxyalkyl)amino Acids. Tetrahedron Comput. Methodol. 1988, 1, 237−246. (23) Hayashi, N.; Sugawara, T.; Shintani, M.; Kato, S. Computerassisted Automatic Synthesis Ii. Development of a Fully Automated Apparatus for Preparing Substituted N-(carboxyalkyl)amino Acids. J. Autom. Chem. 1989, 11, 212−220. (24) Hayashi, N.; Sugawara, T.; Kato, S. Computer-assisted Automated Synthesis. Iii. Synthesis of Substituted N-(carboxyalkyl) Amino-acid Tert-butyl Ester Derivatives. J. Autom. Chem. 1991, 13, 187−197. (25) Pereira, D. A.; Williams, J. A. Origin and Evolution of High Throughput Screening. Br. J. Pharmacol. 2007, 152, 53−61. (26) Hoogenboom, R.; Meier, M. A. R.; Schubert, U. S. Combinatorial Methods, Automated Synthesis and High-Throughput Screening in Polymer Research: Past and Present. Macromol. Rapid Commun. 2003, 24, 15−32. (27) Meier, M. A. R.; Hoogenboom, R.; Schubert, U. S. Combinatorial Methods, Automated Synthesis and High-Throughput Screening in Polymer Research: The Evolution Continues. Macromol. Rapid Commun. 2004, 25, 21−33. (28) Dar, Y. L. High-Throughput Experimentation: A Powerful Enabling Technology for the Chemicals and Materials Industry. Macromol. Rapid Commun. 2004, 25, 34−47. (29) Armitage, M. A.; Smith, G. E.; Veal, K. T. A Versatile and CostEffective Approach to Automated Laboratory Organic Synthesis. Org. Process Res. Dev. 1999, 3, 189−195. (30) Hird, N. Automated Synthesis: New Tools for the Organic Chemist. Drug Discovery Today 1999, 4, 265−274. (31) Andrew Corkan, L.; Lindsey, J. S. Experiment Manager Software for an Automated Chemistry Workstation, Including a Scheduler for Parallel Experimentation. Chemom. Intell. Lab. Syst. 1992, 17, 47−74. (32) Plouvier, J. C.; Andrew Corkan, L.; Lindsey, J. S. Experiment Planner for Strategic Experimentation with an Automated Chemistry Workstation. Chemom. Intell. Lab. Syst. 1992, 17, 75−94. (33) Dixon, J. M.; Lindsey, J. S. Performance of Search Algorithms in the Examination of Chemical Reaction Spaces with an Automated Chemistry Workstation. JALA 2004, 9, 364−374. (34) Okamoto, H.; Deuchi, K. Design of a Robotic Workstation for Automated Organic Synthesis. Lab. Rob. Autom. 2000, 12, 2−11. (35) Harre, M.; Tilstam, U.; Weinmann, H. Breaking the New Bottleneck: Automated Synthesis in Chemical Process Research and Development. Org. Process Res. Dev. 1999, 3, 304−318. (36) Malet-Sanz, L.; Susanne, F. Continuous Flow Synthesis. a Pharma Perspective. J. Med. Chem. 2012, 55, 4062−4098. (37) Wegner, J.; Ceylan, S.; Kirschning, A. Flow Chemistry−a Key Enabling Technology for (multistep) Organic Synthesis. Adv. Synth. Catal. 2012, 354, 17−57. (38) Buitrago Santanilla, A.; Regalado, E. L.; Pereira, T.; Shevlin, M.; Bateman, K.; Campeau, L.-C.; Schneeweis, J.; Berritt, S.; Shi, Z.-C.; Nantermet, P.; Liu, Y.; Helmy, R.; Welch, C. J.; Vachal, P.; Davies, I. W.; Cernak, T.; Dreher, S. D. Nanomole-scale High-throughput Chemistry for the Synthesis of Complex Molecules. Science 2015, 347, 49−53. (39) Walters, W.; Stahl, M. T.; Murcko, M. A. Virtual Screening - an Overview. Drug Discovery Today 1998, 3, 160−178. (40) Shoichet, B. K. Virtual Screening of Chemical Libraries. Nature 2004, 432, 862−865. (41) Dobson, C. M. Chemical Space and Biology. Nature 2004, 432, 824−828. (42) Pyzer-Knapp, E. O.; Suh, C.; Gomez-Bombarelli, R.; AguileraIparraguirre, J.; Aspuru-Guzik, A. What Is High-throughput Virtual Screening? a Perspective from Organic Materials Discovery. Annu. Rev. Mater. Res. 2015, 45, 195−216.

ACKNOWLEDGMENTS A.A.-G. thanks Dr. Anders G. Frøseth for his generous support. REFERENCES

(1) Bohacek, R. S.; McMartin, C.; Guida, W. C. The Art and Practice of Structure-based Drug Design: A Molecular Modeling Perspective. Med. Res. Rev. 1996, 16, 3−50. (2) Ertl, P. Cheminformatics Analysis of Organic Substituents: Identification of the Most Common Substituents, Calculation of Substituent Properties, and Automatic Identification of Drug-like Bioisosteric Groups. J. Chem. Inf. Comput. Sci. 2003, 43, 374−380. (3) Polishchuk, P.; Madzhidov, T.; Varnek, A. Estimation of the Size of Drug-like Chemical Space Based on Gdb-17 Data. J. Comput.-Aided Mol. Des. 2013, 27, 675−679. (4) LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436. (5) Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R. P.; de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 2016, 104, 148. (6) Gómez-Bombarelli, R.; Aguilera-Iparraguirre, J.; Hirzel, T.; Duvenaud, D.; Maclaurin, D.; Blood-Forsythe, M. A.; Chae, H.; Einzinger, M.; Ha, D. G.; Wu, T.; Markopoulos, G.; Jeon, S.; Kang, H.; Miyazaki, H.; Numata, M.; Kim, S.; Huang, W.; Hong, S.; Baldo, M.; Adams, R.; Aspuru-Guzik, A. Design of Efficient Molecular Organic Light-emitting Diodes by a High-throughput Virtual Screening and Experimental Approach. Nat. Mater. 2016, 15, 1120. (7) Segler, M.; Preuss, M.; Waller, M. Planning Chemical Syntheses with Deep Neural Networks and Symbolic Ai. Nature 2018, 555, 604. (8) Häse, F.; Kreisbeck, C.; Aspuru-Guzik, A. Machine Learning for Quantum Dynamics: Deep Learning of Excitation Energy Transfer Properties. Chem. Sci. 2017, 8, 8419−8426. (9) Carleo, G.; Troyer, M. Solving the Quantum Many-Body Problem with Artificial Networks. Science 2017, 355, 602. (10) van Nieuwenburg, E. P. L.; Liu, Y.-H.; Huber, S. D. Learning Phase Transitions by Confusion. Nat. Phys. 2017, 13, 435. (11) Lin, H. W.; Tegmark, M.; Rolnick, D. Why Does Deep and Cheap Learning Work so Well? J. Stat. Phys. 2017, 168, 1223. (12) Tabor, D. P.; Roch, L. M.; Saikin, S. K.; Kreisbeck, C.; Sheberla, D.; Montoya, J. H.; Dwaraknath, S.; Aykol, M.; Ortiz, C.; Tribukait, H.; Amador-Bedolla, C.; Brabec, C. J.; Maruyama, B.; Persson, K. A.; Aspuru-Guzik, A. Accelerating the Discovery of Materials for Clean Energy in the Era of Smart Automation. Nat. Rev. Mater. 2018, 3, 5− 20. (13) Merrifield, R. B.; Stewart, J. M.; Jernberg, N. Instrument for Automated Synthesis of Peptides. Anal. Chem. 1966, 38, 1905−1914. (14) Deming, S. N.; Pardue, H. L. Automated Instrumental System for Fundamental Characterization of Chemical Reactions. Anal. Chem. 1971, 43, 192−200. (15) Winicov, H.; Schainbaum, J.; Buckley, J.; Longino, G.; Hill, J.; Berkoff, C. Chemical Process Optimization by Computer - a Selfdirected Chemical Synthesis System. Anal. Chim. Acta 1978, 103, 469−476. (16) Legrand, M.; Foucard, A. Automation on the Laboratory Bench. J. Chem. Educ. 1978, 55, 767. (17) Lindsey, J. S. A Retrospective on the Automation of Laboratory Synthetic Chemistry. Chemom. Intell. Lab. Syst. 1992, 17, 15−45. (18) Legrand, M.; Bolla, P. A Fully Automatic Apparatus for Chemical Reactions on the Laboratory Scale. J. Autom. Chem. 1985, 7, 31−37. (19) Porte, C.; Roussin, D.; Bondiou, J.-C.; Hodac, F.; Delacroix, A. The ’Automated Versatile Modular Reactor’: Construction and Use. J. Autom. Chem. 1987, 9, 166−173. (20) Guette, J.-P.; Crenne, N.; Bulliot, H.; Desmurs, J.-R.; Igersheim, F. Automation in the organic chemistry laboratory: Why? How? Pure Appl. Chem. 1988, 60, 1669−1678. (21) Frisbee, A. R.; Nantz, M. H.; Kramer, G. W.; Fuchs, P. L. Laboratory Automation. 1: Syntheses Via Vinyl Sulfones. 14. Robotic Orchestration of Organic Reactions: Yield Optimization Via an I

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Forum Article

ACS Applied Materials & Interfaces (43) Aoyama, T.; Suzuki, Y.; Ichikawa, H. Neural Networks Applied to Structure-Activity Relationships. J. Med. Chem. 1990, 33, 905−908. (44) King, R. D.; Muggleton, S.; Lewis, R. A.; Sternberg, M. J. Drug Design by Machine Learning: The Use of Inductive Logic Programming to Model the Structure-activity Relationships of Trimethoprim Analogues Binding to Dihydrofolate Reductase. Proc. Natl. Acad. Sci. U. S. A. 1992, 89, 11322−11326. (45) Tetko, I. V.; Tanchuk, V. Y.; Chentsova, N. P.; Antonenko, S. V.; Poda, G. I.; Kukhar, V. P.; Luik, A. I. HIV-1 Reverse Transcriptase Inhibitor Design Using Artificial Neural Networks. J. Med. Chem. 1994, 37, 2520−2526. (46) Schneider, G.; Schrodl, W.; Wallukat, G.; Muller, J.; Nissen, E.; Ronspeck, W.; Wrede, P.; Kunze, R. Peptide Design by Artificial Neural Networks and Computer-based Evolutionary Search. Proc. Natl. Acad. Sci. U. S. A. 1998, 95, 12179−12184. (47) Schneider, G.; Wrede, P. Artificial Neural Networks for Computer-based Molecular Design. Prog. Biophys. Mol. Biol. 1998, 70, 175−222. (48) Burbidge, R.; Trotter, M.; Buxton, B.; Holden, S. Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis. Comput. Chem. 2001, 26, 5−14. (49) Simmons, K.; Kinney, J.; Owens, A.; Kleier, D. A.; Bloch, K.; Argentar, D.; Walsh, A.; Vaidyanathan, G. Practical Outcomes of Applying Ensemble Machine Learning Classifiers to High-throughput Screening (HTS) Data Analysis and Screening. J. Chem. Inf. Model. 2008, 48, 2196−2206. (50) Vaidyanathan, G. Infoevolve: Moving from Data to Knowledge Using Information Theory and Genetic Algorithms. Ann. N. Y. Acad. Sci. 2004, 1020, 227−238. (51) Simon, V.; Gasteiger, J.; Zupan, J. A Combined Application of Two Different Neural Network Types for the Prediction of Chemical Reactivity. J. Am. Chem. Soc. 1993, 115, 9148−9159. (52) Gasteiger, J.; Zupan, J. Neural Networks in Chemistry. Angew. Chem., Int. Ed. Engl. 1993, 32, 503−527. (53) Aires-de Sousa, J.; Hemmer, M. C.; Gasteiger, J. Prediction of 1 H NMR Chemical Shifts Using Neural Networks. Anal. Chem. 2002, 74, 80−90. (54) Munk, M. E.; Madison, M. S.; Robb, E. W. The Neural Network As a Tool for Multispectral Interpretation. J. Chem. Inf. Comput. Sci. 1996, 36, 231−238. (55) Lewis, I. R.; Daniel, N. W.; Chaffin, N. C.; Griffiths, P. R. Raman Spectrometry and Neural Networks for the Classification of Wood Types-1. Spectrochimica Acta Part A: Molecular Spectroscopy 1994, 50, 1943−1958. (56) Byvatov, E.; Fechner, U.; Sadowski, J.; Schneider, G. Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification. J. Chem. Inf. Comput. Sci. 2003, 43, 1882−1889. (57) Silver, D.; Huang, A.; Maddison, C.; Guez, L.; Sifre, A.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.; Leach, M.; Kavukcuoglu, K.; Graepel, T.; Hassabis, D. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 2016, 529, 484−489. (58) Chen, H.; Engkvist, O.; Wang, Y.; Olivecrona, M.; Blaschke, T. The Rise of Deep Learning in Drug Discovery. Drug Discovery Today 2018, 23, 1241−1250. (59) Gawehn, E.; Hiss, J. A.; Schneider, G. Deep Learning in Drug Discovery. Mol. Inf. 2016, 35, 3−14. (60) Sanchez-Lengeling, B.; Aspuru-Guzik, A. Inverse Molecular Design Using Machine Learning: Generative Models for Matter Engineering. Science 2018, 361, 360−365. (61) Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D. R. Protein-Ligand Scoring with Convolutional Neural Networks. J. Chem. Inf. Model. 2017, 57, 942−957. (62) Jiménez, J.; Skalic, M.; Martínez-Rosell, G.; De Fabritiis, G. Kdeep: Protein-Ligand Absolute Binding Affinity Prediction Via 3dconvolutional Neural Networks. J. Chem. Inf. Model. 2018, 58, 287− 296.

(63) Feinberg, E. N.; Sur, D.; Wu, Z.; Husic, B. E.; Mai, H.; Li, Y.; Sun, S.; Yang, J.; Ramsundar, B.; Pande, V. S. PotentialNet for Molecular Property Prediction. ACS Cent. Sci. 2018, 4, 1520−1530. (64) Wei, J. N.; Duvenaud, D.; Aspuru-Guzik, A. Neural Networks for the Prediction of Organic Chemistry Reactions. ACS Cent. Sci. 2016, 2, 725−732. (65) Coley, C. W.; Barzilay, R.; Jaakkola, T. S.; Green, W. H.; Jensen, K. F. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci. 2017, 3, 434−443. (66) Gómez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; HernándezLobato, J. M.; Sánchez-Lengeling, B.; Sheberla, D.; AguileraIparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-driven Continuous Representation of Molecules. ACS Cent. Sci. 2018, 4, 268−276. (67) Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; WardeFarley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., Weinberger, K. Q., Eds.; Neural Information Processing Systems Foundation, Inc., 2014; pp 2672−2680. (68) Yu, L.; Zhang, W.; Wang, J.; Yu, Y. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. arXiv 2017, 1609.05473. (69) Bowman, S. R.; Vilnis, L.; Vinyals, O.; Dai, A.; Jozefowicz, R.; Bengio, S. Generating Sentences from a Continuous Space. Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, August 11−12, 2016; pp 10−21. (70) Sutskever, I.; Martens, J.; Hinton, G. Generating Text with Recurrent Neural Networks. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, June 28−July 2, 2011. (71) Sutskever, I.; Vinyals, O.; Le, Q. V. Advances in Neural Information Processing Systems 27; Neural Information Processing Systems Foundation, Inc., 2014; pp 3104−3112. (72) Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, October 25−29, 2014; pp 1724−1734. (73) Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, 1409.0473. (74) Segler, M. H. S.; Kogej, T.; Tyrchan, C.; Waller, M. P. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks. ACS Cent. Sci. 2018, 4, 120−131. (75) Merk, D.; Friedrich, L.; Grisoni, F.; Schneider, G. De Novo Design of Bioactive Small Molecules by Artificial Intelligence. Mol. Inf. 2018, 37, 1700153. (76) Yuan, W.; Jiang, D.; Nambiar, D. K.; Liew, L. P.; Hay, M. P.; Bloomstein, J.; Lu, P.; Turner, B.; Le, Q.-T.; Tibshirani, R.; Khatri, P.; Moloney, M. G.; Koong, A. C. Chemical Space Mimicry for Drug Discovery. J. Chem. Inf. Model. 2017, 57, 875−882. (77) Popova, M.; Isayev, O.; Tropsha, A. Deep Reinforcement Learning for De Novo Drug Design. Sci. Adv. 2018, 4, No. eaap7885. (78) Olivecrona, M.; Blaschke, T.; Engkvist, O.; Chen, H. Molecular De-novo Design through Deep Reinforcement Learning. J. Cheminf. 2017, 9, 48. (79) Guimaraes, G. L.; Sanchez-Lengeling, B.; Outeiral, C.; Farias, P. L. C.; Aspuru-Guzik, A. Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models. arXiv 2017, 1705.10843. (80) De Cao, N.; Kipf, T. MolGAN: An Implicit Generative Model for Small Molecular Graphs. arXiv 2018, 1805.11973. (81) Kusner, M. J.; Paige, B.; Hernández-Lobato, J. M. Grammar Variational Autoencoder. arXiv 2017, 1703.01925. (82) Dai, H.; Tian, Y.; Dai, B.; Skiena, S.; Song, L. Syntax-Directed Variational Autoencoder for Structured Data. arXiv 2018, 1802.08786. J

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Forum Article

ACS Applied Materials & Interfaces (83) Liu, Q.; Allamanis, M.; Brockschmidt, M.; Gaunt, A. L. Constrained Graph Variational Autoencoders for Molecule Design. arXiv 2018, 1805.09076. (84) Kipf, T. N.; Welling, M. Variational Graph Auto-Encoders. arXiv 2016, 1611.07308. (85) Jin, W.; Barzilay, R.; Jaakkola, T. Junction Tree Variational Autoencoder for Molecular Graph Generation. arXiv 2018, 1802.04364. (86) Yang, X.; Zhang, J.; Yoshizoe, K.; Terayama, K.; Tsuda, K. ChemTS: An Efficient Python Library for de novo Molecular Generation. Sci. Technol. Adv. Mater. 2017, 18, 972−976. (87) Ikebata, H.; Hongo, K.; Isomura, T.; Maezono, R.; Yoshida, R. Bayesian Molecular Design with a Chemical Language Model. J. Comput.-Aided Mol. Des. 2017, 31, 379−391. (88) Williams, K.; Bilsland, E.; Sparkes, A.; Aubrey, W.; Young, M.; Soldatova, L. N.; Grave, K. D.; Ramon, J.; de Clare, M.; Sirawaraporn, W.; Oliver, S. G.; King, R. D. Cheaper Faster Drug Development Validated by the Repositioning of Drugs against Neglected Tropical Diseases. J. R. Soc., Interface 2015, 12, 20141289. (89) King, R. D.; Rowland, J.; Aubrey, W.; Liakata, M.; Markham, M.; Soldatova, L. N.; Whelan, K. E.; Clare, A.; Young, M.; Sparkes, A.; Oliver, S. G.; Pir, P. The Robot Scientist Adam. Computer 2009, 42, 46. (90) Sparkes, A.; Aubrey, W.; Byrne, E.; Clare, A.; Khan, M. N.; Liakata, M.; Markham, M.; Rowland, J.; Soldatova, L. N.; Whelan, K. E.; Young, M.; King, R. D. Towards Robot Scientists for Autonomous Scientific Discovery. Automated Experimentation 2010, 2, 1. (91) King, R. D.; Rowland, J.; Oliver, S. G.; Young, M.; Aubrey, W.; Byrne, E.; Liakata, M.; Markham, M.; Pir, P.; Soldatova, L. N.; Sparkes, A.; Whelan, K. E.; Clare, A. The Automation of Science. Science 2009, 324, 85−89. (92) King, R. D.; Liakata, M.; Lu, C.; Oliver, S. G.; Soldatova, L. N. On the Formalization and Reuse of Scientific Research. J. R. Soc., Interface 2011, 8, 1440−1448. (93) King, R. D. Rise of the Robo Scientists. Sci. Am. 2011, 304, 72− 77. (94) Milo, A. The Art of Organic Synthesis in the Age of Automation. Isr. J. Chem. 2018, 58, 131−135. (95) Cohn, D. A.; Ghahramani, Z.; Jordan, M. I. Active Learning with Statistical Models. J. Artif. Intell. Res. 1996, 4, 129−145. (96) King, R. D.; Costa, V. S.; Mellingwood, C.; Soldatova, L. N. Automating Sciences: Philosophical and Social Dimensions. IEEE Technol. Soc. Mag. 2018, 37, 40−46. (97) Krishnadasan, S.; Brown, R. J. C.; deMello, A. J.; deMello, J. C. Intelligent Routes to the Controlled Synthesis of Nanoparticles. Lab Chip 2007, 7, 1434−1441. (98) Reizman, B. J.; Jensen, K. F. Feedback in Flow for Accelerated Reaction Development. Acc. Chem. Res. 2016, 49, 1786−1796. (99) Reizman, B. J.; Jensen, K. F. Simultaneous Solvent Screening and Reaction Optimization in Microliter Slugs. Chem. Commun. 2015, 51, 13290−13293. (100) Bédard, A.-C.; Adamo, A.; Aroh, K. C.; Russell, M. G.; Bedermann, A. A.; Torosian, J.; Yue, B.; Jensen, K. F.; Jamison, T. F. Reconfigurable System for Automated Optimization of Diverse Chemical Reactions. Science 2018, 361, 1220−1225. (101) Hsieh, H.-W.; Coley, C. W.; Baumgartner, L. M.; Jensen, K. F.; Robinson, R. I. Photoredox Iridium−Nickel Dual-Catalyzed Decarboxylative Arylation Cross-Coupling: From Batch to Continuous Flow via Self-Optimizing Segmented Flow Reactor. Org. Process Res. Dev. 2018, 22, 542−550. (102) Bourne, R. A.; Skilton, R. A.; Parrott, A. J.; Irvine, D. J.; Poliakoff, M. Adaptive Process Optimization for Continuous Methylation of Alcohols in Supercritical Carbon Dioxide. Org. Process Res. Dev. 2011, 15, 932−938. (103) Houben, C.; Peremezhney, N.; Zubov, A.; Kosek, J.; Lapkin, A. A. Closed-Loop Multitarget Optimization for Discovery of New Emulsion Polymerization Recipes. Org. Process Res. Dev. 2015, 19, 1049−1053.

(104) Echtermeyer, A.; Amar, Y.; Zakrzewski, J.; Lapkin, A. Selfoptimization and Model-based Design of Experiments for Developing a C−H Activation Flow Process. Beilstein J. Org. Chem. 2017, 13, 150. (105) Jeraal, M. I.; Holmes, N.; Akien, G. R.; Bourne, R. A. Enhanced Process Development Using Automated Continuous Reactors by Self-optimization Algorithms and Statistical Empirical Modelling. Tetrahedron 2018, 74, 3158−3164. (106) Sans, V.; Porwol, L.; Dragone, V.; Cronin, L. A Self Optimizing Synthetic Organic Reactor System Using Real-time Inline Nmr Spectroscopy. Chem. Sci. 2015, 6, 1258−1264. (107) Fitzpatrick, D. E.; Battilocchio, C.; Ley, S. V. A Novel Internet-based Reaction Monitoring, Control and Autonomous Selfoptimization Platform for Chemical Synthesis. Org. Process Res. Dev. 2016, 20, 386−394. (108) Poscharny, K.; Fabry, D.; Heddrich, S.; Sugiono, E.; Liauw, M.; Rueping, M. Machine Assisted Reaction Optimization: A Selfoptimizing Reactor System for Continuous-flow Photochemical Reactions. Tetrahedron 2018, 74, 3171−3175. (109) Holmes, N.; Akien, G. R.; Savage, R. J. D.; Stanetty, C.; Baxendale, I. R.; Blacker, A. J.; Taylor, B. A.; Woodward, R. L.; Meadows, R. E.; Bourne, R. A. Online Quantitative Mass Spectrometry for the Rapid Adaptive Optimisation of Automated Flow Reactors. React. Chem. Eng. 2016, 1, 96−100. (110) Cortés-Borda, D.; Wimmer, E.; Gouilleux, B.; Barré, E.; Oger, N.; Goulamaly, L.; Peault, L.; Charrier, B.; Truchet, C.; Giraudeau, P.; Rodriguez-Zubiri, M.; Le Grognec, E.; Felpin, F.-X. An Autonomous Self-Optimizing Flow Reactor for the Synthesis of Natural Product Carpanone. J. Org. Chem. 2018, 83, 14286−14299. (111) Granda, J. M.; Donina, L.; Dragone, V.; Long, D.-L.; Cronin, L. Controlling an Organic Synthesis Robot with Machine Learning to Search for New Reactivity. Nature 2018, 559, 377. (112) Grizou, J.; Points, L.; Sharma, A.; Cronin, L. A Closed Loop Discovery Robot Driven by a Curiosity Algorithm Discovers ProtoCells That Show Complex and Emergent Behaviours. ChemRxiv 2018, DOI: 10.26434/chemrxiv.6958334. (113) Steiner, S.; Wolf, J.; Glatzel, S.; Andreou, A.; Granda, J. M.; Keenan, G.; Hinkley, T.; Aragon-Camarasa, G.; Kitson, P. J.; Angelone, D.; Cronin, L. Organic Synthesis in a Modular Robotic System Driven by a Chemical Programming Language. Science 2019, 363, eaav2211. (114) Fitzpatrick, D.; Ley, S. V. Engineering Chemistry for the Future of Chemical Synthesis. Tetrahedron 2018, 74, 3087−3100. (115) Fitzpatrick, D. E.; Maujean, T.; Evans, A. C.; Ley, S. V. Acrossthe-World Automated Optimization and Continuous-Flow Synthesis of Pharmaceutical Agents Operating Through a Cloud-Based Server. Angew. Chem., Int. Ed. 2018, 57, 15128−15132. (116) Roch, L. M.; Häse, F.; Kreisbeck, C.; Tamayo-Mendoza, T.; Yunker, L. P.; Hein, J. E.; Aspuru-Guzik, A. Chemos: Orchestrating Autonomous Experimentation. Science Robotics 2018, 3, No. eaat5559. (117) Nikolaev, P.; Hooper, D.; Webber, F.; Rao, R.; Decker, K.; Krein, M.; Poleski, J.; Barto, R.; Maruyama, B. Autonomy in Materials Research: A Case Study in Carbon Nanotube Growth. npj Comput. Mater. 2016, 2, 16031. (118) Schneider, G. Automating Drug Discovery. Nat. Rev. Drug Discovery 2017, 17, 97−113. (119) Häse, F.; Roch, L. M.; Kreisbeck, C.; Aspuru-Guzik, A. Phoenics: A Bayesian Optimizer for Chemistry. ACS Cent. Sci. 2018, 4, 1134−1145. (120) Ballard, A. J.; Das, R.; Martiniani, S.; Mehta, D.; Sagun, L.; Stevenson, J. D.; Wales, D. J. Energy Landscapes for Machine Learning. Phys. Chem. Chem. Phys. 2017, 19, 12585−12603. (121) Artrith, N.; Urban, A.; Ceder, G. Constructing First-Principles Phase Diagrams of Amorphous LixSi Using Machine-LearningAssisted Sampling with an Evolutionary Algorithm. J. Chem. Phys. 2018, 148, 241711. (122) Choudhary, K.; DeCost, B.; Tavazza, F. Machine Learning with Force-Field-Inspired Descriptors for Materials: Fast Screening and Mapping Energy Landscape. Phys. Rev. Materials 2018, 2, 083801. K

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Forum Article

ACS Applied Materials & Interfaces (123) Abbasnejad, M. E.; Shi, Q.; Abbasnejad, I.; Hengel, A. v. d.; Dick, A. Bayesian Conditional Generative Adverserial Networks. arXiv 2017, 1706.05477.

L

DOI: 10.1021/acsami.9b01226 ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX