Using Artificial Intelligence To Forecast Water Oxidation Catalysts

Aug 12, 2019 - JJournal of Agricultural and Food Chemistry ... Artificial intelligence and various types of machine learning are of increasing ... Sou...
2 downloads 0 Views 3MB Size
Research Article Cite This: ACS Catal. 2019, 9, 8383−8387

pubs.acs.org/acscatalysis

Using Artificial Intelligence To Forecast Water Oxidation Catalysts Regina Palkovits and Stefan Palkovits* Institute for Technical and Macromolecular Chemistry, RWTH Aachen University, Worringerweg 2, 52074 Aachen, Germany

Downloaded via NOTTINGHAM TRENT UNIV on August 12, 2019 at 20:41:45 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: Artificial intelligence and various types of machine learning are of increasing interest not only in the natural sciences but also in a wide range of applied and engineering sciences. In this study, we rethink the view on combinatorial heterogeneous catalysis and combine machine learning methods with combinatorial approaches in electrocatalysis. Several machine learning methods were used to forecast water oxidation catalysts on the basis of literature published data sets and data from our own work. The machine learning models exhibit a decent prediction precision based on the data sets available and confirm that even simple models are suitable for good forecasts.

KEYWORDS: machine learning, oxygen evolution, artificial intelligence, neural network, combinatorial chemistry

I

made for water oxidation catalysts based on machine learning approaches like artificial neural networks (Figure 1). Further

n most cases, the prediction of catalysts that enable a specific reaction remains a challenge that has been unsolved until today. Very often, more traditional approaches are used in chemistry where a small amount of a catalyst is first synthesized and then tested and modified according to the observations made during the testing procedure and catalyst characterization. Especially in the 1990s, there were some efforts to speed up catalyst syntheses and testing in heterogeneous catalysis by applying suitable preparation methods and using high-throughput reactors.1,2 This ultimately led to the idea to predict solid catalysts on the basis of appropriate descriptors.3−5 In the following years, these kinds of predictions were not the focus of the majority of the catalysis community, though some efforts have continued.6,7 Combinatorial approaches also attracted attention in electrochemistry8,9 but are not yet a standard tool in electrocatalyst development. Moreover, methods such as the high-throughput testing of large catalyst libraries are now available; however, the evaluation of performance criteria is most of the time in the hands of skilled experts. In parallel, the development of new search algorithms utilized, for example, for picture search by classification with the help of machine learning (ML) or other artificial intelligence (AI) methods evolved with the accelerated growth of the Internet and the availability of huge data sets. Companies such as Google and Facebook drive these developments actively, as they are at the heart of their business model. But can artificial intelligence also be used in chemistry? There are well-performing examples where machine learning is used in computational chemistry:10,11 e.g., to speed up some parts of the time-consuming calculations.11 On the other hand, especially in the field of organic synthesis, critical voices exist which prefer chemical expert knowledge over machine learning approaches.12 In this work, we present how forecasts can be © XXXX American Chemical Society

Figure 1. Artificial neural network with four input nodes (features), two hidden layers with six nodes each, and one output node (target).

ML methods will be included in the study to evaluate the performance of different algorithms against each other in order to find the best-suited one for the task of forecasting water oxidation catalysts. Artificial neural networks are algorithms that are inspired by nature. They consist of some input nodes/ neurons and several hidden layers with additional nodes. The Received: May 14, 2019 Revised: July 19, 2019

8383

DOI: 10.1021/acscatal.9b01985 ACS Catal. 2019, 9, 8383−8387

Research Article

ACS Catalysis

Figure 2. Overview over the whole data set from refs 9 and 10. The overpotential (OP) is plotted versus the atomic composition of (Ni-Fe-CoCe)Ox.

last layer gives the final output of the network. The nodes are connected via weights. Each node after the input layer has an activation function which activates the respective node based on the input weights. A simple neural network can be described completely via matrix algebra. In this work, published data sets13,14 with about 6000 samples of (Ni-FeCo-Ce)Ox catalysts in total with varying compositions were used to train different machine learning algorithms (e.g., artificial neural networks). Depending on the respective method, a decent prediction precision can be achieved not only for the train and test fractions of the data set but also for materials completely different from those in our own work on the electrochemical oxygen evolution reaction. The data sets from refs 13 and 14 consist of 66513 and 545614 data points, respectively. The materials of the original publication are reported with respect to the elemental composition in percent representing different (Ni-Fe-CoCe)Ox materials. Additionally, the overpotential (OP) is reported for the catalyst library with 1 and 10 mA/cm2 being typical performance criteria in literature. The substrates were synthesized in a combinatorial approach on glass substrates. For detailed information, we refer the reader to the original publications. The overall data set therefore consists of more than 6000 individual catalysts with four features (input variables) represented by composition with respect to the elements and one target (output variable) with respect to the OP at 10 mA/cm2. The whole data visualization and evaluation was carried out with the Python programming language15 and additional data science and ML libraries, which will be mentioned where appropriate. An important aspect of basic data pretreatment in machine learning relates to a general visualization and data inspection. Accordingly, pair plots of the whole data set were created with the help of the Pandas package, each with the OP on the ordinate and the respective atomic composition on the abscissa (Figure 2). The graphs show that, apart from some exceptions, the smallest OP values represented in this data set are around 350 mV. The majority of the OP values range from 400 to 450 mV and rise up to about 500 mV. Along this line, it is challenging to predict values that are much better or worse with ML. The representation in Figure 2 does not suggest that the catalysts can be grouped according to their features. To investigate if different catalyst classes are hidden in the data set, unsupervised learning methods were used for grouping and visualizing. In unsupervised learning, the algorithms used try to learn relations only on the basis of the features (here the elemental composition), as targets (here the OP) are not always available. To learn only from the features, the k-means clustering algorithm16 was used together with a subsequent

visualization of the found clusters with the t-distributed stochastic neighbor embedding algorithm (t-SNE).17 Both algorithms were implemented in the Scikit-Learn18 package, and Figure 3 summarizes the results.

Figure 3. Unsupervised visualization of the data sets features with kmeans clustering (right) and t-distributed stochastic neighbor embedding (left).

With the k-means clustering, different cluster numbers between 1 and 50 were tried. For each of the 50 steps, a score was calculated with Scikit-Learn, which represents how well the algorithm can make predictions on the data set. For an easier visualization of the cluster amount, the gradient of the score was calculated and plotted versus the cluster number (Figure 3, right side). As the gradient of the score is calculated, the quality of the prediction gets better the lower the value is. It becomes evident that above 10 clusters only minor improvements can be realized. Therefore, 10 clusters were used as inputs for the t-SNE algorithm. t-SNE is used to reduce the dimensionality of the data set from 4 target dimensions down to 2 (Figure 3, left side), allowing a 2-dimensional representation of the whole data set. One can imagine the dimension of the axes as linear combination of the features. Coloring the result of t-SNE with the found clusters reveals that indeed 10 clusters can be identified which do not overlap too much. The data set of 6000 catalysts can therefore be divided into just 10 classes. Unfortunately, it is not easy to translate conclusions from the t-SNE plot back into the original 4-dimensional space. As the unsupervised algorithms reveal some structure in the data set, different machine learning algorithms were used to analyze the structure of the data and to provide forecasts. The data set was further pretreated for the supervised learning algorithms. As the data set is somewhat asymmetrical with respect to low OPs, the data set was first sorted with respect to the OP. Then, materials 8384

DOI: 10.1021/acscatal.9b01985 ACS Catal. 2019, 9, 8383−8387

Research Article

ACS Catalysis with an OP lower than 380 mV were cut from the data set, leading to 146 samples. Next, the train-test-split method from Scikit-Learn was used to split 70% of the small OP samples and the remaining data set for training and the rest for testing purposes. As a random seed 42 was chosen for more reproducibility. Finally, the training and the testing samples were added together again to ensure enough samples with low OP in the training and the testing data sets. After additional shuffling with a fixed random seed, these final data sets were then used for the supervised machine learning algorithms. Three different kinds of machine learning models were used for training and prediction on the electrochemical data set. First, artificial neural networks (ANN) were tested, followed by support vector regression (SVR), and last k-nearest neighbor regression (KNN) was used to model the data. For the ANN, Tensorflow19 was used for programming the neural network. Tensorflow is the artificial intelligence framework by Google and is also used, for example, for the picture search by Google or the AlphaGo project.20 For modeling the data, a simple multilayer perceptron (the colloquial version of an ANN) was used, as more complicated network types such as convolutional neural networks (CNN) should not provide any benefits for this relatively small data set made of tabular data. To identify the best-performing network, different amounts of hidden neurons were applied behind the four input neurons and in front of the single output neuron. Neuron amounts between 2 and 30 in simple linear layers with an increment of 2 were tested, resulting in small networks with 2 hidden layers of 6 or 8 neurons performing the best. Smaller neuron amounts showed a significantly higher error, and higher neuron amounts bore the risk of overfitting (learning by heart) of the network. Including additional dropout layers to prevent overfitting did not show any advantage. Therefore, finally the network illustrated in Figure 1 was chosen with a 4−6−6−1 structure of neurons. ReLu served as an activation function. For the final training of the network, the RMSprop optimizer with a learning rate of 0.01 and mean absolute error as metrics were employed. The network was trained over 500 epochs with the possibility for an early stop if the metrics did not change any longer to prevent overfitting. For validation purposes a split of 20% of the training set was used. Figure 4 shows the outcome of the training. Training and validation converge quickly, and by far not all 500 epochs have to be used. The whole calculation was carried out on a MacBook Pro with an Intel Core i7 CPU at 3.3 GHz and 16GB RAM. Additional accelerator units such as GPUs were not used; the Tensorflow library used all 4 cores of the

machine by default. The training of the network typically takes about 30 s wall time for about 100 epochs. For SVR, the package was used as it is implemented in Scikit-Learn21 to show that simpler ML models can also perform well in the task to predict electrocatalysts. The algorithm needs some hyperparameters to work as expected, which were calculated with a gridsearch approach. The best parameters calculated and used for the final model were a RBF kernel, a C value of 10, a γ value of 10, and an ε value of 0.001. Using a 5-fold cross-validation on the training set lead to a score value for SVR of about 0.75. The hyperparameter search is the most demanding step of all calculations in this report. As gridsearch is basically a brute force approach, testing all combinations used in this report takes between 30 and 40 min on all 4 cores of the described computer. Also, for KNN the implementation from ScikitLearn was used. Here the most important hyperparameter is the amount of deployed clusters. It was estimated with an “elbow curve” as for the t-SNE algorithm. For this supervised algorithm, 11 clusters seem to be a good choice. More clusters do not result in a better performance, but especially values lower than 5 perform much worse. The results are well in line with the results of the visualization step. The necessary CPU time is more or less negligible in comparison to the other methods. The finally trained models were now evaluated with respect to the prediction precision. To evaluate the performance of the machine learning models used, not only the test part of the data set was used but also catalysts from our previous work.22 These materials were mostly prepared via hydrothermal synthesis (Table 1, samples Table 1. Catalyst Compositions in Atomic Ratios from Ref 18 sample no.

Ni (%)

Fe (%)

Co (%)

Ce (%)

OP at 10 mA/cm2 (mV)

1 2

0.50 0.33

0.00 0.00

0.50 0.67

0.00 0.00

370 360

3

0.58

0.00

0.42

0.00

410

4

0.32

0.00

0.68

0.00

360

5

0.45

0.00

0.55

0.00

420

6

0.36

0.00

0.64

0.00

350

origin commercial hydrothermal synthesis hydrothermal synthesis hydrothermal synthesis hydrothermal synthesis KIT-6 hard template

2−5). Sample 1 is a commercial benchmark NiCoOx, and sample 6 was prepared via hard-templating with KIT-6, a mesoporous silica, as a template. In this data set, only the Ni and Co amounts were varied. The exact synthesis can be found in the original paper. Already at this point it becomes clear that predicting the performance of the catalysts in Table 1 will be a challenge, as their low OPs are not well represented within the original data set. Figure 5 illustrates a comparison of the prediction quality. The true values are always plotted on the x axis and the predicted values on the y axis. In the top row, predictions on the test data set are presented for all three methods. All of them possess a decent prediction quality. Especially the SVR algorithm is able to also predict the less represented low-OP samples with good precision. The ANN and the KNN exhibit slight deviations with respect to low OPs. Here, the prediction is not as precise as with the SVR algorithm. The prediction on

Figure 4. Loss and mean absolute error metrics of the used ANN plotted over the calculated epochs for the training set (black) and the validation set (magenta). 8385

DOI: 10.1021/acscatal.9b01985 ACS Catal. 2019, 9, 8383−8387

Research Article

ACS Catalysis

Figure 5. Predictions for the three machine learning models used on the test set (top) and on our own catalysts (bottom).

the own catalysts in the bottom row follows a similar behavior. The prediction is most precise for samples 3 and 5, while the other samples are not predicted with the same accuracy. Hardtemplated materials such as sample 6 are not represented in the training data set at all. The material structure can indeed significantly alter performance but is not covered in the data set and its features, respectively. Nevertheless, the remaining samples also exhibit deviations. This is most probably due to less frequent samples with low OP in the original data set. For sample 1 deviations can be attributed to the composition of the sample. As the composition, the supplier’s specification was used, which might deviate from the real composition. To further evaluate the prediction quality of the used methods, some metrics such as the mean squared error (Figure 6, left) and the R2 score (Figure 6, right) were calculated. The mean squared error is in all cases slightly lower for the training data set than for the test set ,and the R2 score emphasizes the same trend but inverted for the train and test sets. Trends like the once shown in Figure 6 are very often a hint that the methods do not exhibit overfitting (learning by heart). If a too low fraction of samples is used for training (less than 30%), both trends revert and the mean squared error is higher for the training data. This stresses that the right amount of samples for the training data is important to prevent overfitting. The examples illustrate that for a decent quality of machine learning study not only the inspection of the data set is important but also the preprocessing and the choice of the models hyperparameters. It is notable that the simple SVR model outperforms especially the state of the art neural network, leading to the

Figure 6. Metrics overview for all methods. The left-hand side shows the mean squared error and the right-hand side the R2 score.

conclusion that more complicated models might not always be the right choice. Only with respect to the tuning of the hyperparameters does SVR have less performance, but this can be regarded as a neglectable issue. As always, also in machine learning and data science the choice of the model is mainly governed by the data themselves and limitations on the prediction quality are due to limitations in the data. A wider 8386

DOI: 10.1021/acscatal.9b01985 ACS Catal. 2019, 9, 8383−8387

Research Article

ACS Catalysis

(8) Schwanke, C.; Stein, H. S.; Xi, L.; Sliozberg, K.; Schuhmann, W.; Ludwig, A.; Lange, K. M. Correlating Oxygen Evolution Catalysts Activity and Electronic Structure by a High-Throughput Investigation of Ni1-y-zFeyCrzOx. Sci. Rep. 2017, 7, 44192. (9) Neyerlin, K. C.; Bugosh, G.; Forgie, R.; Liu, Z.; Strasser, P. Combinatorial Study of High-Surface-Area Binary and Ternary Electrocatalysts for the Oxygen Evolution Reaction. J. Electrochem. Soc. 2009, 156, B363−B369. (10) Schlexer Lamoureux, P.; Winter, K. T.; Garrido Torres, J. A.; Streibel, V.; Zhao, M.; Bajdich, M.; Abild-Pedersen, F.; Bligaard, T. Machine Learning for Computational Heterogeneous Catalysis. ChemCatChem 2019, 11, 1−22. (11) Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; Müller, K. R. Bypassing the Kohn-Sham Equations with Machine Learning. Nat. Commun. 2017, 8, 872. (12) Maryasin, B.; Marquetand, P.; Maulide, N. Machine Learning for Organic Synthesis: Are Robots Replacing Chemists? Angew. Chem., Int. Ed. 2018, 57, 6978−6980. (13) Haber, J. A.; Xiang, C.; Guevarra, D.; Jung, S.; Jin, J.; Gregoire, J. M. High-Throughput Mapping of the Electrochemical Properties of (Ni-Fe-Co-Ce)Ox Oxygen-Evolution Catalysts. ChemElectroChem 2014, 1, 524−528. (14) Haber, J. A.; Cai, Y.; Jung, S.; Xiang, C.; Mitrovic, S.; Jin, J.; Bell, A. T.; Gregoire, J. M. Discovering Ce-rich Oxygen Evolution Catalysts, from High Throughput Screening to Water Electrolysis. Energy Environ. Sci. 2014, 7, 682−688. (15) van Rossum, G. Python Tutorial; Centrum Voor Wiskunde En Informatica (CWI): Amsterdam, 1995. (16) Arthur, D.; Vassilvitskii, S. K-means++: The Advantages of Careful Seeding, Proceedings of the Eighteenth Annual Acm-Siam Symposium on Discrete Algorithms, 2007. (17) van der Maaten, L.; Hinton, G. E. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579−2605. (18) Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825−2830. (19) Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G. S.; Davis, A.; Dean, J.; Devin, M., et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems, 2015. (20) Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L. R.; Lai, M.; Bolton, A.; et al. Mastering the Game of Go without Human Knowledge. Nature 2017, 550, 354−359. (21) Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.; Wang, X.-R.; Lin, C.-J. LIBLINEAR: A Library for Large Linear Classification. J. Mach. Learn. Res. 2008, 9, 1871−1874. (22) Broicher, C.; Zeng, F.; Artz, J.; Hartmann, H.; Besmehn, A.; Palkovits, S.; Palkovits, R. Facile Synthesis of Mesoporous Nickel Cobalt Oxide for OER − Insight into Intrinsic Electrocatalytic Activity. ChemCatChem 2019, 11, 412−416.

data range with more features could, for example, increase the quality here. In this study we used published data on water oxidation catalysts as the basis to train different machine learning models such as artificial neural networks, support vector regression, and k-nearest neighbor regression and predicted the OPs of a test set being a subset of the original data set and also for our own data. We could show that already simple models are able to show a decent prediction quality and are able to outperform more complicated models. Deviations on the prediction of our own data set can be attributed mainly to a less frequent occurrence of low-OP samples in the original data set. The calculated metrics confirm SVR as the best-performing algorithm. Overfitting of all models can be excluded but is dependent on the amount of data used for training. This leads to the conclusion that, as always, the choice of the model is governed by the data and not vice versa.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscatal.9b01985.



Source code of this study (PDF)

AUTHOR INFORMATION

Corresponding Author

*E-mail for S.P.: [email protected]. ORCID

Regina Palkovits: 0000-0002-4970-2957 Stefan Palkovits: 0000-0003-4809-2939 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS Stefan Palkovits acknowledges Dr. Jens Artz for useful hints concerning this manuscript. We acknowledge the Cluster of Excellence the Fuel Science Center (EXC 2186) funded by the Excellence Initiative by the German federal and state governments.



REFERENCES

(1) Senkan, S. M. High-Throughput Screening of Solid-State Catalyst Libraries. Nature 1998, 394, 350−353. (2) Olejnik, S.; Baltes, C.; Muhler, M.; Schüth, F. Parallelized N2O Frontal Chromatography for the fast Determination of Copper Surface Areas. J. Comb. Chem. 2008, 10, 387−390. (3) Klanner, C.; Farrusseng, D.; Baumes, L.; Lengliz, M.; Mirodatos, C.; Schüth, F. The Development of Descriptors for Solids: Teaching ″Catalytic Intuition″ to a Computer. Angew. Chem., Int. Ed. 2004, 43, 5347−5349. (4) Norskov, J. K.; Bligaard, T. The Catalyst Genome. Angew. Chem., Int. Ed. 2013, 52, 776−777. (5) Hattori, T.; Kito, S. Neural Network as a Tool for Catalyst Development. Catal. Today 1995, 23, 347−355. (6) Burello, E.; Rothenberg, G. In Silico Design in Homogeneous Catalysis Using Descriptor Modelling. Int. J. Mol. Sci. 2006, 7, 375− 404. (7) Takahashi, K.; Takahashi, L.; Miyazato, I.; Fujima, J.; Tanaka, Y.; Uno, T.; Satoh, H.; Ohno, K.; Nishida, M.; Hirai, K.; Ohyama, J.; Nguyen, T. N.; Nishimura, S.; Taniike, T. The Rise of Catalyst Informatics: Towards Catalyst Genomics. ChemCatChem 2019, 11, 1146−1152. 8387

DOI: 10.1021/acscatal.9b01985 ACS Catal. 2019, 9, 8383−8387