Article pubs.acs.org/JPCC
Cite This: J. Phys. Chem. C 2018, 122, 8315−8326
Toward Effective Utilization of Methane: Machine Learning Prediction of Adsorption Energies on Metal Alloys Takashi Toyao,*,†,‡ Keisuke Suzuki,§ Shoma Kikuchi,§ Satoru Takakusagi,† Ken-ichi Shimizu,†,‡ and Ichigaku Takigawa*,§,∥ †
Institute for Catalysis, Hokkaido University, N-21, W-10, Sapporo 001-0021, Japan Elements Strategy Initiative for Catalysts and Batteries, Kyoto University, Katsura, Kyoto 615-8520, Japan § Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Sapporo 060-0814, Japan ∥ PRESTO, Japan Science and Technology Agency (JST), 4-1-8, Honcho, Kawaguchi, Saitama 332-0012, Japan J. Phys. Chem. C 2018.122:8315-8326. Downloaded from pubs.acs.org by UNIV OF WINNIPEG on 08/17/18. For personal use only.
‡
S Supporting Information *
ABSTRACT: The process employed to discover new materials for specific applications typically utilizes screening of large compound libraries. In this approach, the performance of a compound is correlated to the properties of elements referred to as descriptors. In the effort described below, we developed a simple and efficient machine learning (ML) model for predicting adsorption energies of CH4 related species, namely, CH3, CH2, CH, C, and H on the Cubased alloys. The developed ML model predicted the DFT-calculated adsorption energies with 12 descriptors, which are readily available values for the selected elements. The predictive accuracy of four regression methods (ordinary linear regression by least-squares (OLR), random forest regression (RFR), gradient boosting regression (GBR), and extra tree regression (ETR)) with different numbers of descriptors and different test-set/training-set ratios was quantitatively evaluated using statistical cross validations. Among four types of regression methods, we have found that ETR gave the best performance in predicting the adsorption energies with the average root mean squared errors (RMSEs) below 0.3 eV. Strikingly, despite its simplicity and low computational cost, this model can predict the adsorption energies on a range of Cu-based alloy models (46 in total number) as calculated by using DFT. In addition, we show the ML prediction for the differences in the adsorption energies of CH3 and CH2 on the same surface. This would be of great importance especially when designing the selective catalytic reaction processes to suppress the undesired overreactions. The accuracy and simplicity of the developed system suggest that adsorption energies can be readily predicted without time-consuming DFT calculations, and eventually, this would allow us to predict the catalytic performances of the solid catalysts.
■
INTRODUCTION
Much effort has concentrated on the development of new catalysts for more efficient utilization of CH4 owing to its economic importance and significant interests from both industrial and academic communities.19−23 Nevertheless, the development of such catalysts remains challenging, in part because heterogeneous catalysis remains an empirical science. Together, the complexity of the surface reactions and the large number of independent parameters involved render the prediction of catalyst performance a formidable task. Although there have been important contributions employing surface properties such as d-band center,24,25 work function,26,27 coordination numbers,28 strain,29 and surface energies30 as a descriptor of catalytic properties, studies applying descriptors to solid catalysts have not been extensively explored, indicative of the difficulty associated with determining the relationship between bulk/surface structure and performance.31 Yet, the
Owing to the large resource of methane (CH4) in the form of natural gas (shale gas) and to its renewability in the form of biogas, there exists strong economic incentive to develop processes for the conversion of CH4 into value-added products.1−4 Industrial CH4 conversion involves the use of catalytic steam or autothermal re-forming to generate synthesis gas comprising a mixture of CO and H2. This mixture finds its application in the large-scale production of methanol and higher hydrocarbons.1 The methanol is in turn transformed into ethylene and propylene, which are the two most important building blocks of chemical industry. However, each gas-based step has its own drawbacks negatively influencing the overall process economy.5 Therefore, a new and more efficient process, one involving the direct (one-step) conversion of methane to methanol, formaldehyde, or C2 products such as C2H4 and C2H6 through oxidative coupling of methane (OCM), is needed. Over the past 30 years researchers have focused on the development of new methodology to meet this objective.6−18 © 2018 American Chemical Society
Received: December 24, 2017 Revised: March 12, 2018 Published: March 26, 2018 8315
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326
Article
The Journal of Physical Chemistry C
other related reactions, without the need to perform laborious transition-state searches to determine activation barriers for each reaction. Furthermore, a Sabatier-type trade-off exists between the adsorption of reactants and desorption of products such that catalysts having optimal adsorption energies will confer maximal catalytic activity.68 For the materials explored in this study, we have examined metal alloys that have attracted much attention for CH4 utilization processes as well as applications to other technologically relevant fields.69,70 Among the various types of alloys available, Cu-based alloys were chosen because of their current use as industrial catalysts, their economy, and their importance to various CH4 utilization processes. In the study described below, we developed a protocol of ML prediction for adsorption energies of the CH4 related species, namely, CH3, CH2, CH, C, and H on the Cu-based alloys. Using our ML model, along with 12 descriptors that are readily available from the periodic table, handbooks, and public databases,71−74 we predicted the DFT-calculated adsorption energies. In addition to predicting the adsorption energies of a single adsorbate, we showed that the difference in the adsorption energies of CH3 and CH2 on the Cu-alloy surface could also be predicted. Our findings are ultimately applicable to the design of selective catalytic reaction processes that suppress undesired over-reactions and they provide insight into the chemical and physical principles governing the catalytic processes.
pace at which new materials for heterogeneous catalysts are discovered could, in principle, be accelerated by using computational-based screening methods.32 Recent developments in methodology and computer technology, as well as the establishment of a descriptor-based approach for the analysis of reaction mechanisms and trends across the periodic table, have produced fast screening platforms for the discovery of novel catalytic materials.33−35 However, these computational-based approaches typically based on first-principles calculations usually require high computational cost.36 In recent years, machine learning (ML) methods have gained increasing popularity among the molecular and materials science communities for use in the high-throughput screening or the prediction of various kinds of physical properties that have complex principles to mathematically model.37−52 ML methods could serve as a fast and high-precision alternative to the firstprinciples modeling. According to the ML framework, predictive computations are modeled as a function from some inputs to the output of desired values. Supervised ML methods statistically infer the function, given instances of input−output pairs called the training set: they inductively learn, from the data, the underlying principle for input−output dependencies. Since the input data usually come from experiments and/or first-principles calculations, applications of the ML methods are still limited. However, once the input data are collected and ML framework is properly devised, ML methods would enable significantly fast prediction of various kinds of physical properties. For application of ML in the automation of materials synthesis, the process of intuition that is involved (i.e., taking it from an idea to the desired material) must be developed. Indeed, owing to the challenge imposed by the difficulty of identifying the relationship between bulk/surface structure and catalytic performances, the application of ML methods to the discovery of new chemical catalysts is a work-in-progress.53−63 Whereas important contributions aimed at tackling this challenge have been made in recent years, truly novel catalysts have yet to be discovered and many factors remain to be explored. To this end we have recently reported ML-based predictions for d-band centers.64 Despite the absence of a sufficient theoretical model for heterogeneous catalysis, d-band centers are widely used as a general and versatile descriptor for various catalytic reactions.24,25 However, the accuracy of d-band center models predicting each catalytic reaction is limited. In order to develop a more effective catalyst design guide and eventually novel catalysts for specific and challenging reactions such as CH4 utilization reactions, the relationships (descriptors) governing reactions must be defined. To meet this goal, we have focused our efforts on the development of a machine learning approach for predicting adsorption energies. Adsorption is a fundamental step for surface-catalyzed reactions that follow the Brønsted−Evans− Polanyi (BEP) relation because their transition states often have adsorption properties that are comparable to those of the dissociated or nondissociated reactants.65−67 δEact = α δEr
■
METHODS AND DATA Adsorption Energy Calculations. All calculations were performed with the Vienna ab initio simulation package (VASP)75,76 using projector-augmented wave potentials77 and the Perdew−Burke−Ernzerhof (PBE) functional.78 The lattice constant of the face-centered-cubic (fcc) Cu bulk was calculated with an energy cutoff of 400 eV and 15 × 15 × 15 k-points mesh and then used in all subsequent calculations. For the Cu(111) surface a p(3 × 3) supercell, containing four layers of Cu atoms (36 atoms in total) and 15 Å of vacuum, was used with an energy cutoff of 400 eV and 5 × 5 × 1 k-points mesh. The (111) surface was chosen because it is considered to be the most closely packed and stable surface for the face-centeredcubic (fcc) metal. The first two layers of Cu atoms from the surface as well as the adsorbed molecules/atoms were allowed to relax during geometrical optimization; the geometry of the two bottom layers was fixed. The energies of isolated molecules and atoms were obtained using the same parameters as those in the free-surface slab calculations. The geometry convergence criterion was set at Fmax < 0.05 eV/Å, where Fmax is the maximum force acting on the mobile atoms. The adsorption energy Eads was computed as the difference between the total energy of an adsorbed molecule/atom on the Cu surface and the sum of the energies of the relaxed Cu(111) slab and a geometrically optimized molecule/atom in the gas phase. Negative values of Eads indicate that adsorption is likely, with the most negative value being the most stable. For construction of a model of the alloys, Cu-based alloys, in which one Cu atom located at the center of the surface layer in the slab model is exchanged by another element, were employed. Element selection was based on test results obtained for elements having the atomic numbers 3 (Li) through 83 (Bi) (noble gases not included). For machine learning predictions, the models used were those that correctly converged in the DFT calculations and that corresponded to optimized
(1)
The BEP equation (eq 1) directly relates the change in activation energy of the reaction, δEact, to the corresponding change of the reaction energy, δEr, for different surfaces via a constant factor α, which is based on the particular reaction type. A successful BEP relationship enables prediction of the rate constants for reactions comprising the same family, as well as 8316
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326
Article
The Journal of Physical Chemistry C
Figure 1. Periodic table of elements in which the elements used for the machine learning predictions are framed. An employed structure model for Cu-based alloys is also shown. Color code is as follows: brown, Cu; blue, doped metal atoms.
structures free of excessive distortion. More specifically, if the doped element is located above C atom of the adsorbate (or H atom for H as an adsorbate) to the z direction in the slab model and/or is located more than 3.2 Å away from those atoms in the adsorbates, such systems with excessive distortion were excluded. Elements that the 12 descriptors are not available such as As and Hg were also excluded. The 46 elements used for the machine learning predictions are shown in Figure 1. Note that we did not consider whether the alloy structures can exist in reality or not since our main objective here is to explore many elements and understand the element properties through ML predictions. However, recent experimental study on Pt/Cu single-atom alloys suggests that this type of catalyst could be synthesized experimentally and applied to catalytic reactions.79 Identification of the locations of adsorption sites of the adsorbates was guided by a previous study that employed a similar model.80 Namely, CH3, CH2, CH, C, and H species were placed at hexagonal-close-packed (hcp)-t, face-centeredcubic (fcc)-t, fcc, fcc, and fcc sites, respectively. The initial geometries for each model are shown in Figure 2. For the choice of descriptors for metals, we pretested several candidates and chose 12 physical properties for the elements (Table S1), which are readily available from the periodic table, handbooks, and the equally available databases.71,72 It should be noted here that surface energies as a descriptor are for the most stable surface for each metal. In order to bypass timeconsuming DFT calculations while retaining prediction accuracy readily accessible, characteristic values as descriptors were used. Each metal was represented as a 12-dimensional vector of the descriptor values. The interdependencies of the descriptors are reflected by the correlation map shown in Figure 3. We can observe correlated variables of descriptors. This fact motivates us to investigate variable selection to identify a smaller nonredundant subset of 12 descriptors. Monte Carlo Cross Validation for Assessing Predictability. In pursuit of the data-driven prediction of the adsorption energies, 46 targets were separated into two disjoint sets of “test set” of size n and “training set” of size 46 − n. The first objective was to evaluate how accurately the adsorption energies of the test set can be predicted by using those of the
Figure 2. Adsorption models for (A) CH3, (B) CH2, (C) CH, (D) C, and (E) H on Cu-based alloys. Color code is as follows: gray, C; white, H; brown, Cu; blue, doped metal atoms.
training set. First, an ML model was built using the training set. Then, by use that model, the adsorption energies of the test set were predicted, and their root-mean-square error (RMSE) between the predicted values and true values (ground truth) was calculated for predictability evaluation. A single-shot trial of this procedure provided an estimate of RMSE. By alteration of the split of training and test sets, the estimate was shown to vary within a certain range. For quantitative evaluation, the estimation variance was reduced by repeating the single-shot trial over 100 random test/training splits, i.e., 100 random leave-n-out trials. The mean of 100 RMSE estimates was used as the prediction accuracy of the ML model. Because the test set in each trial was not used to build the corresponding ML model in that trial, it simulates yet-unseen targets to be predicted. An added benefit is that we can also control the size 8317
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326
Article
The Journal of Physical Chemistry C
information on the tree ensemble methods (RFR, GBR, ETR) is given in the Supporting Information. To evaluate the predictive capability of the ML models, we use Monte Carlo cross validation with 100 times of random leave-25%-out trials for evaluating the prediction performance for the adsorption energies of CH3. All ML methods take the input of 12 descriptors listed in Table S1 for the prediction. The RMSE of the difference between the predicted values and the ground truths is calculated for each trial and averaged to obtain the mean RMSE values and their standard deviations. For most ML methods, users need to appropriately set up the values of hyperparameters that have a critical impact on the prediction performance. We tested a reasonable range of candidate values in an exhaustive way (grid search) shown in Table 1, chose the best hyperparameter by 3-fold cross validation on the training set, and used it for calculating the predicted values for the test set. For ML implementations, we used a widely used package, scikit-learn (http://scikit-learn. org).84
■
RESULTS AND DISCUSSION Adsorption Energies of CH3, CH2, CH, C, and H on the Cu-Based Alloys Obtained by DFT Calculations. The calculated adsorption energies of the adsorbates on the pure Cu surface were compared to previously reported absorption energies for verification. The adsorption energies of CH3, CH2, CH, C, and H on the Cu monometallic surface were calculated to be −1.47, −3.90, −5.14, −5.00, and −2.52 eV, respectively, values that are in close agreement with those determined by Gajewski et al.80 Having verified our method, we then calculated the adsorption energies of the adsorbates on Cubased alloys (see Figure 4 and Table S2 in Supporting
Figure 3. Correlation maps correspond to the 12 descriptors used in this study.
n of test set and analyze how large the training set must be for accurate prediction. This approach is based on the methodologies associated with Monte Carlo cross validation,81 leave-nout,82,83 random permutation cross validation (shuffle and split),84 and random subsampling cross validation85 as have been applied in related work.82 The alternative computational tools such as k-fold cross validation and bootstrapping were less applicable to our setting. Machine Learning (ML) Methods. Pre-evaluations for selecting machine learning methods were performed with a set of nine widely used ML methods85,86 from three major categories: linear methods for linear regression and kernel methods and tree ensemble methods for nonlinear regression. Linear methods need to assume linearity between the prediction target and descriptors but would give the most stable baseline for prediction performance. We tested ordinary linear regression by least-squares (OLR), partial least-squares (PLS) regression with automatic dimensionality reduction, least absolute shrinkage and selection operator (LASSO) regression with automatic descriptor selection. For kernel methods, we tested kernel ridge regression (KRR),87,88 support vector regression (SVR), 89 and Gaussian process regression (GPR).90 For tree ensemble methods, we tested random forest regression (RFR),91 gradient boosting regression (GBR),87 and extra trees regression (ETR).88 Note that the more detailed
Figure 4. DFT-calculated adsorption energies of CH3, CH2, CH, C, and H on the Cu-based alloys.
Table 1. List of the Nine ML Methods and Hyperparametersa category linear
kernel (nonlinear)
tree ensemble (nonlinear)
a
method
hyperparameters [tested range]
OLR PLS LASSO KRR SVR GPR RFR GBR ETR
(no tuning parameters) n_components ∈ [1,2, ..., no. of vars] n_alphas = 10 by LassoCV (Range is automatically determined.) kernel = ‘rbf, alpha, gamma ∈ [1.0, 10−1, 10−2, ..., 10−9, 10−10] kernel = ‘rbf’, C ∈ [1.0, 10, 102, ..., 107, 108], gamma ∈ [1.0, 10−1, 10−2, ..., 10−9, 10−10] kernel = Const(1.0, (1e−5, 1e5)) * RBF(1.0, (1e−5, 1e5)) + WhiteKernel(1.0, (1e−5, 1e5)), alpha = 0.0 n_estimators = 200, max_depth ∈ [1, 2, 3, 4, 8, 10] n_estimators = 200, max_depth ∈ [1, 2, 3, 4, 8, 10], learning_rate ∈ [1.0, 10−1, 10−2, ..., 10−9, 10−10] n_estimators = 200, max_depth ∈ [1, 2, 3, 4, 8, 10]
The default values of scikit-learn were used for the hyperparameters not indicated here. 8318
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326
Article
The Journal of Physical Chemistry C Information). No consideration was made whether the simulated alloys can form or not. However, Cu-based alloy systems whose structures collapsed after optimization and those in which the adsorbates could not be placed at the appropriate adsorption positions were excluded. Our findings show that the adsorbates tend to be strongly adsorbed by Cu-based alloys containing group 5−7 metals but not by the Cu-based alloys containing group 13−15 metals. These results suggest that a correlation exists between the adsorption energy and the periodic table group to which the corresponding metal of the Cu-based alloy belongs. Pre-Evaluations for Selecting Machine Learning Methods Using Adsorption Energies of CH3. Figure 5
Figure 6. DFT-calculated adsorption energies of CH3 on Cu-based alloys and the values predicted by using the OLR, RFR, GBR, and ETR methods. Color code is as follows: blue, training set = 75%; red, test set = 25%.
Figure 5. Average RMSEs for predicting the adsorption energies of CH3 by 100 times of random leave-25%-out trials with various ML methods.
were 0.27 ± 0.07 for OLR, 0.24 ± 0.06 for RFR, 0.24 ± 0.05 for GBR, and 0.24 ± 0.06 eV for ETR (Table 2). The number after Table 2. Mean RMSEs of Each Method for Prediction of DFT-Calculated Adsorption Energies of CH3
demonstrates that the nine ML methods tested in this study could well predict the adsorption energies of CH3 within RMSE of 0.24−0.27 eV on average only from the 12 readily available descriptors in Table S1. This suggests that the adsorption energies of CH3 on 46 Cu-based alloys share some trends that each can be captured from other systems in a data-driven manner. The performance of each method may seem comparable, but the differences were statistically significant for ELR vs GPR (p = 0.04), ETR vs SVR (p = 0.02), ETR vs KRR (p = 0.01), ETR vs OLR (p = 0.002) according to the pvalues by Welch two-sample t test on 100 trials. This suggests that tree ensemble methods were significantly better than kernel methods, and we therefore use these methods (RFR, GBR, ETR) for the subsequent analysis in addition to the linear baseline (OLR). Prediction of Adsorption Energies of CH3 with the Four ML Methods. For prediction of adsorption energies of CH3 on Cu-based alloys, we use four ML methods of OLR, RFR, GBR, ETR according to our pre-evaluations. In our preevaluations, the tree ensemble methods are less sensitive to the hyperparameters. Thus, for all subsequent analysis, we use 200 trees in the final ensemble models for RFR, GBR, ETR throughout the paper. Figure 6 illustrates the predictive performance of the four ML methods in a single-shot random trial with 75% training set (blue colored) and 25% test set (red colored). This case uses all 12 descriptors listed in Table S1. The X-axis represents the DFT-calculated adsorption energies (ground truth), and the Y-axis gives the value predicted by the ML methods. The deviation from the X = Y line indicates the error in prediction. For a more quantitative evaluation, we performed 100 random trials of this single-shot leave-25%-out. The mean RMSE values for the 25% test set over 100 trials
training error, eV mean
test error, eV sd
(min, max) OLR RFR GBR ETR
0.15 (0.10, 0.09 (0.07, 0.00 (0.00, 0.00 (0.00,
0.01 0.18) 0.01 0.11) 0.00 0.00) 0.00 0.00)
mean
sd
(min, max) 0.27 (0.15, 0.24 (0.14, 0.24 (0.13, 0.24 (0.13,
0.07 0.47) 0.06 0.40) 0.05 0.38) 0.06 0.38)
“±” indicates the standard deviation of the 100 RMSE values. The three nonlinear methods (RFR, GBR, and ETR) showed better prediction performance than the OLR method (p < 0.01 by Welch two-sample t test). The three methods gave almost the same predicting accuracy. As previously mentioned, the numbers of regression trees used in RFR, GBR, and ETR are all fixed at 200, and we set other hyperparameters to the default of scikit-learn. ETR and RFR are known to be less hyperparameter sensitive than GBR, and thus we can avoid demanding hyperparameter tuning. Moreover, the training time for ETR is smaller than RFR and GBR because ETR is based on random-splitting trees. This fact suggests that the adsorption energies can readily be predicted by the ETR method in the absence of DFT calculations or even demanding hyperparameter tuning for ML schemes. We therefore concluded that the ETR model is the best choice for prediction of the 8319
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326
Article
The Journal of Physical Chemistry C adsorption energies explored in this study. In order to compare the required time for the DFT calculations and ML analysis, their computational times are given. For DFT calculation of adsorption energy of CH3 on the Cu monometallic surface, it took about 10 h with our 32 cores workstation. Note that for the system containing another metal such as Pb, it took even longer time (about 34 h) to calculate. On the contrary, for each of the corresponding ML predictions, it always only took less than 1 s with our 1 core laptop PC. The computational time for ML prediction does not depend on individual systems but on which ML model is used. For many ML methods including tree ensembles, the predicted values can be calculated instantly because the function form for prediction is explicitly obtained by fitting an ML model function to the given training data. The calculation for prediction can be made simply by substituting the corresponding descriptor values into the function already obtained. This fact indicates potential of ML as a fast alternative to the DFT calculations. For better understanding of the ML analysis, the results of the prediction are briefly discussed. As seen in our preevaluations (Figure 5), ETR and RFR tend to have slightly better performance than GBR, which suggests that our problem consists of only 46 system examples, and some of them are not similar at all to the others; thus the difficulty of prediction is highly dependent on how training and test splits are made. If similar systems are assigned into both training and test, then they are predictable. But if these systems are all assigned only to the test, they are difficult to predict by ML methods. In this situation, it would be more important to reduce the prediction variance than the prediction bias. Both ETR and RFR fit this purpose, but since ETR has more strong effect for reducing prediction variance, ETR would achieve the best performance. Our current ML model gives some errors even with ETR. This is most likely due to the small data set size used in this study. For more accurate predictions, in general, it would be desirable to collect the larger data sets. In order to provide a quantitative validation of prediction with the ETR model, we repeated the tests 300 and 500 times (instead of 100 times) with random splits and calculated the mean RMSEs for the predictions. The resultant values were 0.24 ± 0.05 eV for 300 times and 0.24 ± 0.05 eV for 500 times, which suggests that the 100 time average was a stable estimate for the prediction error. To also confirm that the number of trees (fixed at 200) does not much affect the prediction performance, we also tested ETR with 400 and 800 trees and obtained 0.24 ± 0.06 eV for 400 trees and 0.24 ± 0.06 eV for 800 trees. Note that the mean RMSE for 200 trees was 0.24 ± 0.06 eV. These results further support the stable accuracy of prediction performance for the ETR method for this system. Evaluation of Descriptor Importance for ETR-Based Prediction Using Adsorption Energies of CH3. Next, we investigated the relevance or redundancy of each of the 12 descriptors used in the ETR model. ETR is based on multiple regression trees, and hence it provides the “feature importance” score92 for each descriptor. This score can be used to assess the relative importance of that descriptor with respect to the predictability of the adsorption energies. Figure 7 shows the feature-importance scores of all 12 descriptors for predicting the adsorption energies of CH3. The most important descriptor for this system is the periodic table group to which the doped metal belongs. To evaluate the effect of the number of descriptors on the predictive performance of the ETR method, the predictions made with all 12 descriptors vs with the top 10,
Figure 7. Feature-importance scores of the descriptors for the ETR prediction of the adsorption energies of CH3 on Cu-based alloys.
6, 5, or 3 descriptors are compared in Figure 8. For quantitative evaluation, we also repeated the tests in Figure 8 100 times with random training/test splits and calculated the mean RMSEs for the predictions. The resultant values (the RMSEs for the test set) were 0.24 ± 0.06 eV with 12 descriptors, 0.24 ± 0.06 eV with the top 10 descriptors, 0.24 ± 0.06 eV with the top 6 descriptors, 0.24 ± 0.06 eV with the top 5 descriptors, and 0.25 ± 0.06 eV with the top 3 descriptors. As a result, the robust ML prediction performance remained constant even when using even only 3 descriptors. The top 3 descriptors of the doped metal include its group in the periodic table, surface energy, and melting point. Model Estimation Using a Different Ratio of Test/ Training Splits Using Adsorption Energies of CH3. We also investigated how large the training set must be in order for the ML method to provide sufficient prediction performance. Figure 9 shows the predictive performance using ETR with the top five descriptors and different ratios of the test/training sets, 10%/90%, 25%/75%, and 50%/50%. In other words, we reserved 10%, 25% or 50% of Table S2 as a test set and then made predictions from the remaining values (i.e., training set). For the results depicted in Figure 9, there are 46 values in total, and 10%/90% corresponds to sets of size 5/41, 25%/75% to 12/35, and 50%/50% to 23/23. For quantitative evaluation, we also calculated the mean RMSEs for the 100 random splits for each setting: the resultant values (test errors) were 0.21 ± 0.08 eV for the 10%/90% test, 0.24 ± 0.06 eV for the 25%/75% test, and 0.25 ± 0.04 eV for the 50%/50% test. These results quantitatively showed a general trend of ML where the more data are inputted, the better is the prediction. In addition, it was also observed that the adsorption energies of CH3 on the Cubased alloys can be predicted with a moderate level of accuracy (RMSE = 0.25 ± 0.04 eV) even when only 50% of the data are included. This result provides a useful guideline for the trade-off between the predictive performance and data availability. Prediction of Adsorption Energies of CH2, CH, C, and H on the Cu-Based Alloys. Because it is applicable to CH3, the adsorption energies of CH2, CH, C, and H were also predicted using the ETR method (Figure 10). For these predictions, 100 random trials of the single-shot leave-25%-out were carried out for each adsorbate to validate quantitative evaluation. The average RMSE values were within 0.2−0.3 eV for all the adsorbates tested (see Tables S3−S6 of the Supporting Information), suggesting that ML via the ETR method is applicable not only to CH3 but also to other adsorbates, without the need for hyperparameter tuning. Figure 11 shows the feature-importance scores of all 12 descriptors for predicting the adsorption energies of CH2, CH, C, and H. 8320
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326
Article
The Journal of Physical Chemistry C
Figure 8. DFT-calculated adsorption energies of CH3 on Cu-based alloys correlated with the values predicted by ETR with the 12 (all), top 6, top 5, and top 3 descriptors in Figure 6. Color code is as follows: blue, training set = 75%; red, test set = 25%.
Figure 9. DFT-calculated adsorption energies of CH3 on Cu-based alloys correlated with the values predicted by ETR with the top five descriptors by using different prediction/training ratio. Color code is as follows: blue, training set; red, test set.
and thus its variance is larger, while we performed leave-25%out 100 times in our analysis. Prediction of Values of ECH3 − ECH2 for CH4 Utilization. The initial step of the OCM reaction is the formation of CH3 species, which is known to exist.10,12 Once the formation of the CH3 species is initiated on a catalytic surface, gas-phase reactions ensue. The radical species thus formed are expected to recombine to form ethane. In addition, the CH3 species is a key intermediate for a partial oxidation reaction of methane to methanol. Consequently, it is important for both OCM and methanol synthesis (via partial oxidation of methane) to stabilize the CH3 species and to avoid further dehydrogenation to CH2, CH, and C and hence to undesired coke or COx formation. For the catalysts design, therefore, creating surfaces on which the CH3 species adsorbs more strongly than does the CH2 species is crucial. For this reason we compared the adsorption energies of CH3 and CH2 on the Cu-based alloy surfaces. The difference in the adsorption energies of CH3 and CH2, obtained by subtraction of the adsorption energies of CH3 from those of CH2 (ECH3 − ECH2), are reported in Figure 12. If our hypothesis is correct, elements that show small ECH3 − ECH2 values (e.g., Te, Sn, and Mg) are the best-suited doping metals. On the contrary, elements that show large ECH3 − ECH2
Interestingly, the group in the periodic table, surface energy, and melting points are top three descriptors for all the adsorbates as well as for CH3. Moreover, the scores show that prediction efficacy is largely dependent on these top three descriptors. In order to discuss the predictability depending on the doped element, leave-one-out (LOO) analyses ware performed, as shown in Figure S5 in the Supporting Information. For each of 46 systems, an ML prediction trained on 45 other cases was made, and the DFT-calculated and predicted values were plotted at the x-axis and y-axis, respectively. This LOO analysis would provide some insights on how each system is predictable in a data-driven manner based on the other-metal-doped systems. It was observed that the systems containing Nd as a doped metal are particularly difficult to predict. Moreover, the systems containing a doped metal such as V, Nb, Ta, and Sb were found to be relatively difficult to predict compared to the other metals tested. In the LOO analysis, we deal with more training data (45 for training, 1 for test) than leave-25%-out (34 for training, 12 for test), and therefore, the prediction errors were lower: Test RMSE was 0.18 ± 0.15 eV for CH3, 0.18 ± 0.12 eV for CH2, 0.22 ± 0.17 eV for CH, RMSE 0.25 ± 0.19 eV for C, and 0.17 ± 0.17 eV for H. But LOO is based on 46 trials, 8321
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326
Article
The Journal of Physical Chemistry C
Figure 12. Difference of DFT-calculated adsorption energies of CH3 and CH2 on the Cu-based alloys.
predictions are 0.37 ± 0.09, 0.28 ± 0.05, 0.30 ± 0.06, and 0.26 ± 0.05 eV, respectively (Table S6). As is the case with predictions of adsorption energies of CH3, RFR, GBR, and ETR methods showed better performance than the OLR method. In addition, we found that once again, the ETR can be used without hyperparameter tuning and significant training time for predictions. Feature-importance scores of all 12 descriptors for predicting ECH3 − ECH2 values were investigated (Figure S5 in the Supporting Information). The important descriptors for predicting ECH3 − ECH2 values are the same as those important to predicting the individual adsorption energies. The most important descriptor for predicting ECH3 − ECH2 values is the group in the periodic table to which the doped metal belongs, followed by surface energies, melting points, and atomic radius. These results explored in this study suggest that time-efficient ML methods can be used to predict the relative catalytic activities of materials as well as their specificity in product formation. The ultimate goal of this effort is to utilize this approach to gain fundamental knowledge about the factors that determine catalytic activity so that ideal catalysts can be designed in an atom-by-atom manner. Because catalytic properties of materials in principle should be determined by their electronic structures, the strategy used in this approach is to design target electronic structures by changing composition and physical nature of selected materials.29,93,94 The concept of controlling the properties of matter at the molecular scale by engineering electronic structure should not only be relevant to catalytic materials but also be more generally applicable to other challenges in chemistry, physics, and materials science related to the design of materials for batteries, gas storage, sensing, and molecules for homogeneous catalysis.95
Figure 10. DFT-calculated adsorption energies of CH2, CH, C, and H on Cu-based alloys correlated with the values predicted by ETR with the 12 descriptors. Color code is as follows: blue, training set = 75%; red, test set = 25%.
■
Figure 11. Feature-importance scores of the descriptors for the ETR prediction of the adsorption energies of CH2, CH, C, and H on Cubased alloys.
CONCLUSIONS We have provided ample evidence that machine learning methods enable the investigator to predict the DFT-calculated adsorption energies of CH3, CH2, CH, C, and H on Cu-based alloys. By using 12 descriptors that are readily available from the databases in combination with four different regression methods (ORL, RFR, GBR, and ETR), we showed that the best ML prediction method, ETR, gives sufficient accuracy with the root mean squared errors (RMSEs). It was also demonstrated that the top three descriptors such as the group in the periodic table, surface energies, and melting points for doped metals can be used to predict the adsorption energies with comparable accuracy. In addition, the difference between the adsorption
values (e.g., Cr, V, and Mo) could generate Cu surfaces that induce the undesired reactions of methane. In order to explore this hypothesis, the ECH3 − ECH2 values were predicted by using each of the four ML methods. Our findings represented in Figure 13 show the predictive performance in 100 random trials of this single-shot leave25%-out computation and which provides the mean RMSE values for each method. The mean RMSE values for the test errors corresponding to the OLR, RFR, GBR, and ETR-based 8322
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326
Article
The Journal of Physical Chemistry C
Figure 13. Difference of DFT-calculated adsorption energies of CH3 and CH2 and the values predicted by the OLR, RFR, GBR, and ETR methods. Color code is as follows: blue, training set = 75%; red, test set = 25%.
energies of CH3 and CH2 (ECH3 − ECH2) was predicted in order to optimize the efficient utilization of methane. ETR method showed the best performance for this propose, indicating that ETR method is very useful to predict adsorption energies and related values without demanding hyperparameter tuning. The explored machine learning model can be applied in the screening of large libraries of alloys and potentially other solid materials (e.g., oxides) widely used for CH4 utilization with a negligible CPU time compared to first-principles methods.
■
Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This work was supported by the KAKENHI Grants JP26620110, JP16H06595, JP17H01341, JP17H01783, and JP17K19953 and Grant-in-Aid for Scientific Research on Innovative Areas “Nano Informatics” (Grant 25106010) from the Japan Society for the Promotion of Science (JSPS), by the Japanese Ministry of Education, Culture, Sports, Science, and Technology (MEXT) within the projects “Integrated Research Consortium on Chemical Sciences (IRCCS)” and “Elements Strategy Initiative to Form Core Research Center”, and by the JST-CREST Projects JPMJCR15P4 and JPMJCR17J3, and JSTPRESTO Project JPMJPR15N9.
ASSOCIATED CONTENT
S Supporting Information *
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jpcc.7b12670. Input features (descriptors) used for prediction of adsorption energies, summary of DFT-calculated adsorption energies, supplementary results of ML predictions, and explanation of tree ensemble methods (RFR, GBR, ETR) (PDF)
■
■
REFERENCES
(1) Horn, R.; Schlögl, R. Methane Activation by Heterogeneous Catalysis. Catal. Catal. Lett. 2015, 145, 23−39. (2) Olivos-Suarez, A. I.; Szécsényi, À .; Hensen, E. J. M.; RuizMartinez, J.; Pidko, E. A.; Gascon, J. Strategies for the Direct Catalytic Valorization of Methane Using Heterogeneous Catalysis: Challenges and Opportunities. ACS Catal. 2016, 6, 2965−2981. (3) Zhou, S.; Li, J.; Schlangen, M.; Schwarz, H. Bond Activation by Metal-Carbene Complexes in the Gas Phase. Acc. Chem. Res. 2016, 49, 494−502. (4) Kumar, G.; Lau, S. L. J.; Krcha, M. D.; Janik, M. J. Correlation of Methane Activation and Oxide Catalyst Reducibility and Its Implications for Oxidative Coupling. ACS Catal. 2016, 6, 1812−1821.
AUTHOR INFORMATION
Corresponding Authors
*T.T.: e-mail,
[email protected]. *I.T.: e-mail,
[email protected]. ORCID
Takashi Toyao: 0000-0002-6062-5622 Ken-ichi Shimizu: 0000-0003-0501-0294 8323
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326
Article
The Journal of Physical Chemistry C (5) Shi, L. E. I.; Yang, G.; Tao, K. A. I.; Yoneyama, Y.; Tan, Y.; Tsubaki, N. Reforming with Methane and New Route of LowTemperature Methanol Synthesis. Acc. Chem. Res. 2013, 46, 1838− 1847. (6) Liu, S.; Wang, L.; Ohnishi, R.; Ichikawa, M. Bifunctional Catalysis of Mo/HZSM-5 in the Dehydroaromatization of Methane to Benzene and Naphthalene XAFS/TG/DTA/MASS/FTIR Characterization and Supporting Effects. J. Catal. 1999, 181, 175−188. (7) Oshima, K.; Shinagawa, T.; Sekine, Y. Methane Conversion Assisted by Plasma or Electric Field. J. Jpn. Pet. Inst. 2013, 56, 11−21. (8) Mahyuddin, M. H.; Staykov, A.; Shiota, Y.; Yoshizawa, K. Direct Conversion of Methane to Methanol by Metal-Exchanged ZSM-5 Zeolite (Metal = Fe, Co, Ni, Cu). ACS Catal. 2016, 6, 8321−8331. (9) Campbell, K. D.; Lunsford, J. H. Contribution of Gas-Phase Radical Coupling in the Catalytic Oxidation of Methane. J. Phys. Chem. 1988, 92, 5792−5796. (10) Zavyalova, U.; Holena, M.; Schlögl, R.; Baerns, M. Statistical Analysis of Past Catalytic Data on Oxidative Methane Coupling for New Insights into the Composition of High-Performance Catalysts. ChemCatChem 2011, 3, 1935−1947. (11) Kondratenko, E. V.; Schlüter, M.; Baerns, M.; Linke, D.; Holena, M. Developing Catalytic Materials for the Oxidative Coupling of Methane through Statistical Analysis of Literature Data. Catal. Sci. Technol. 2015, 5, 1668−1677. (12) Campbell, K. D.; Morales, E.; Lunsford, J. H. Gas-Phase Coupling of Methyl Radicals during the Catalytic Partial Oxidation of Methane. J. Am. Chem. Soc. 1987, 109, 7900−7901. (13) Hutchings, G. J.; Woodhouse, J. R.; Scurrell, M. S. Partial Oxidation of Methane over Oxide Catalysts. Comments on the Reaction Mechanism. J. Chem. Soc., Faraday Trans. 1 1989, 85, 2507− 2523. (14) Sorokin, A. B.; Kudrik, E. V.; Alvarez, L. X.; Afanasiev, P.; Millet, J. M. M.; Bouchu, D. Oxidation of Methane and Ethylene in Water at Ambient Conditions. Catal. Catal. Today 2010, 157, 149−154. (15) Periana, R. A. Platinum Catalysts for the High-Yield Oxidation of Methane to a Methanol Derivative. Science 1998, 280, 560−564. (16) Wood, B. R.; Reimer, J. A.; Bell, A. T.; Janicke, M. T.; Ott, K. C. Methanol Formation on Fe/Al-MFI via the Oxidation of Methane by Nitrous Oxide. J. Catal. 2004, 225, 300−306. (17) Lee, J. S.; Oyama, S. T. Oxidative Coupling of Methane to Higher Hydrocarbons. Catal. Rev.: Sci. Eng. 1988, 30, 249−280. (18) Amenomiya, Y.; Birss, V. I.; Goledzinowski, M.; Galuszka, J.; Sanger, A. R. Conversion of Methane by Oxidative Coupling. Catal. Rev.: Sci. Eng. 1990, 32, 163−227. (19) Rahman, A. K. M. L.; Kumashiro, M.; Ishihara, T. Direct Synthesis of Formic Acid by Partial Oxidation of Methane on H-ZSM5 Solid Acid Catalyst. Catal. Commun. 2011, 12, 1198−1200. (20) Kwapien, K.; Paier, J.; Sauer, J.; Geske, M.; Zavyalova, U.; Horn, R.; Schwach, P.; Trunschke, A.; Schlögl, R. Sites for Methane Activation on Lithium-Doped Magnesium Oxide Surfaces. Angew. Chem., Int. Ed. 2014, 53, 8774−8778. (21) Lunsford, J. H. Catalytic Conversion of Methane to More Useful Chemicals and Fuels:a Challenge for the 21st Century. Catal. Today 2000, 63, 165−174. (22) Li, G.; Vassilev, P.; Sanchez-Sanchez, M.; Lercher, J. A.; Hensen, E. J. M.; Pidko, E. A. Stability and Reactivity of Copper Oxo-Clusters in ZSM-5 Zeolite for Selective Methane Oxidation to Methanol. J. Catal. 2016, 338, 305−312. (23) Narsimhan, K.; Iyoki, K.; Dinh, K.; Román-Leshkov, Y. Catalytic Oxidation of Methane into Methanol over Copper-Exchanged Zeolites with Oxygen at Low Temperature. ACS Cent. Sci. 2016, 2, 424−429. (24) Ruban, A.; Hammer, B.; Stoltze, P.; Skriver, H. L.; Nørskov, J. K. Surface Electronic Structure and Reactivity of Transition and Noble Metals. J. Mol. Catal. A: Chem. 1997, 115, 421−429. (25) Nørskov, J. K.; Bligaard, T.; Rossmeisl, J.; Christensen, C. H. Density Functional Theory in Surface Chemistry and Catalysis. Nat. Chem. 2009, 1, 37−46. (26) Vayenas, C. G.; Bebelis, S.; Ladas, S. Dependence of Catalytic Rates on Catalyst Work Function. Nature 1990, 343, 625−627.
(27) Shen, X.; Pan, Y.; Liu, B.; Yang, J.; Zeng, J.; Peng, Z. More Accurate Depiction of Adsorption Energy on Transition Metals Using Work Function as One Additional Descriptor. Phys. Chem. Chem. Phys. 2017, 19, 12628−12632. (28) Calle-Vallejo, F.; Tymoczko, J.; Colic, V.; Vu, Q. H.; Pohl, M. D.; Morgenstern, K.; Loffreda, D.; Sautet, P.; Schuhmann, W.; Bandarenka, A. S. Finding Optimal Surface Sites on Heterogeneous Catalysts by Counting Nearest Neighbors. Science 2015, 350, 185− 189. (29) Kitchin, J. R.; Nørskov, J. K.; Barteau, M. A.; Chen, J. G. Role of Strain and Ligand Effects in the Modification of the Electronic and Chemical Properties of Bimetallic Surfaces. Phys. Rev. Lett. 2004, 93, 156801. (30) Zhuang, H.; Tkalych, A. J.; Carter, E. A. Surface Energy as a Descriptor of Catalytic Activity. J. Phys. Chem. C 2016, 120, 23698− 23706. (31) Klanner, C.; Farrusseng, D.; Baumes, L.; Lengliz, M.; Mirodatos, C.; Schüth, F. The Development of Descriptors for Solids: Teaching “Catalytic Intuition” to a Computer. Angew. Chem., Int. Ed. 2004, 43, 5347−5349. (32) Behler, J. Neural Network Potential-Energy Surfaces in Chemistry: A Tool for Large-Scale Simulations. Phys. Chem. Chem. Phys. 2011, 13, 17930. (33) Nørskov, J. K.; Bligaard, T.; Rossmeisl, J.; Christensen, C. H. Towards the Computational Design of Solid Catalysts. Nat. Chem. 2009, 1, 37−46. (34) Hansgen, D. A.; Vlachos, D. G.; Chen, J. G. Using First Principles to Predict Bimetallic Catalysts for the Ammonia Decomposition Reaction. Nat. Chem. 2010, 2, 484−489. (35) Xin, H.; Holewinski, A.; Linic, S. Predictive Structure À Reactivity Models for Rapid Screening of Pt-Based Multimetallic Electrocatalysts for the Oxygen Reduction Reaction. ACS Catal. 2012, 2, 12−16. (36) Saravanan, K.; Kitchin, J. R.; Von Lilienfeld, O. A.; Keith, J. A. Alchemical Predictions for Computational Catalysis: Potential and Limitations. J. Phys. Chem. Lett. 2017, 8, 5002−5007. (37) Omata, K. Screening of New Additives of Active-CarbonSupported Heteropoly Acid Catalyst for Friedel À Crafts Reaction by Gaussian Process Regression. Ind. Eng. Chem. Res. 2011, 50, 10948− 10954. (38) Saad, Y.; Gao, D.; Ngo, T.; Bobbitt, S.; Chelikowsky, J. R.; Andreoni, W. Data Mining for Materials : Computational Experiments with AB Compounds. Phys. Rev. B: Condens. Matter Mater. Phys. 2012, 85, 104104. (39) Hansen, K.; Montavon, G.; Biegler, F.; Fazli, S.; Rupp, M.; Scheffler, M.; Von Lilienfeld, O. A.; Tkatchenko, A.; Muller, K.-R. Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. J. Chem. Theory Comput. 2013, 9, 3404−3419. (40) Sinnott, S. B. Material Design and Discovery with Computational Materials Science. J. Vac. Sci. Technol., A 2013, 31, 050812. (41) Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T. Machine Learning Bandgaps of Double Perovskites. Sci. Rep. 2016, 6, 19375. (42) Ward, L.; Agrawal, A.; Choudhary, A.; Wolverton, C. A GeneralPurpose Machine Learning Framework for Predicting Properties of Inorganic Materials. npj Comput. Mater. 2016, 2, 16028. (43) Takahashi, K.; Tanaka, Y. Materials Informatics: A Journey towards Material Design and Synthesis. Dalt. Trans. 2016, 45, 10497− 10499. (44) Thornton, A. W.; Simon, C. M.; Kim, J.; Kwon, O.; Deeg, K. S.; Konstas, K.; Pas, S. J.; Hill, M. R.; Winkler, D. A.; Haranczyk, M.; et al. Materials Genome in Action: Identifying the Performance Limits of Physical Hydrogen Storage. Chem. Mater. 2017, 29, 2844−2854. (45) Ulissi, Z. W.; Singh, A. R.; Tsai, C.; Nørskov, J. K. Automated Discovery and Construction of Surface Phase Diagrams Using Machine Learning. J. Phys. Chem. Lett. 2016, 7, 3931−3935. (46) Emery, A. A.; Saal, J. E.; Kirklin, S.; Hegde, V. I.; Wolverton, C. High-Throughput Computational Screening of Perovskites for 8324
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326
Article
The Journal of Physical Chemistry C Thermochemical Water Splitting Applications. Chem. Mater. 2016, 28, 5621−5634. (47) Seko, A.; Maekawa, T.; Tsuda, K.; Tanaka, I. Machine Learning with Systematic Density-Functional Theory Calculations: Application to Melting Temperatures of Single- and Binary-Component Solids. Phys. Rev. B: Condens. Matter Mater. Phys. 2014, 89, 54303. (48) Khorshidi, A.; Peterson, A. A. Amp: A Modular Approach to Machine Learning in Atomistic Simulations. Comput. Phys. Commun. 2016, 207, 310−324. (49) Peterson, A. A.; Christensen, R.; Khorshidi, A. Addressing Uncertainty in Atomistic Machine Learning. Phys. Chem. Chem. Phys. 2017, 19, 10978−10985. (50) Altae-Tran, H.; Ramsundar, B.; Pappu, A. S.; Pande, V. Low Data Drug Discovery with One-Shot Learning. ACS Cent. Sci. 2017, 3, 283−293. (51) Gómez-Bombarelli, R.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Duvenaud, D.; Maclaurin, D.; Blood-Forsythe, M. A.; Chae, H. S.; Einzinger, M.; Ha, D. G.; Wu, T.; et al. Design of Efficient Molecular Organic Light-Emitting Diodes by a High-Throughput Virtual Screening and Experimental Approach. Nat. Mater. 2016, 15, 1120− 1127. (52) Kolb, B.; Lentz, L. C.; Kolpak, A. M. Discovering Charge Density Functionals and Structure-Property Relationships with PROPhet: A General Framework for Coupling Machine Learning and First-Principles Methods. Sci. Rep. 2017, 7, 1192. (53) Hattori, T.; Kito, S. Artificial Intelligence Approach to Catalyst Design. Catal. Today 1991, 10, 213−222. (54) Hattori, T.; Kito, S. Neural Network as a Tool for Catalyst Development. Catal. Today 1995, 23, 347−355. (55) Serra, J. M.; Corma, A.; Chica, A.; Argente, E.; Botti, V. Can Artificial Neural Networks Help the Experimentation in Catalysis? Catal. Today 2003, 81, 393−403. (56) Jinnouchi, R.; Asahi, R. Predicting Catalytic Activity of Nanoparticles by a DFT-Aided Machine-Learning Algorithm. J. Phys. Chem. Lett. 2017, 8, 4279−4283. (57) Ras, E.; Rothenberg, G. Heterogeneous Catalyst Discovery Using 21st Century Tools: A Tutorial. RSC Adv. 2014, 4, 5963−5974. (58) Comas-Vives, A.; Larmier, K.; Copéret, C. Understanding Surface Site Structures and Properties by First Principles Calculations: An Experimental Point of View! Chem. Commun. 2017, 53, 4296− 4303. (59) Bligaard, T.; Bullock, R. M.; Campbell, C. T.; Chen, J. G.; Gates, B. C.; Gorte, R. J.; Jones, C. W.; Jones, W. D.; Kitchin, J. R.; Scott, S. L. Toward Benchmarking in Catalysis Science: Best Practices, Challenges, and Opportunities. ACS Catal. 2016, 6, 2590−2602. (60) Göltl, F.; Müller, P.; Uchupalanun, P.; Sautet, P.; Hermans, I. Developing a Descriptor-Based Approach for CO and NO Adsorption Strength to Transition Metal Sites in Zeolites. Chem. Mater. 2017, 29, 6434−6444. (61) Li, Z.; Ma, X.; Xin, H. Feature Engineering of Machine-Learning Chemisorption Models for Catalyst Design. Catal. Today 2017, 280, 232−238. (62) Li, Z.; Wang, S.; Chin, W. S.; Achenie, L. E.; Xin, H. HighThroughput Screening of Bimetallic Catalysts Enabled by Machine Learning. J. Mater. Chem. A 2017, 5, 24131−24138. (63) Ras, E.-J.; Louwerse, M. J.; Mittelmeijer-Hazeleger, M. C.; Rothenberg, G. Predicting Adsorption on Metals: Simple yet Effective Descriptors for Surface Catalysis. Phys. Chem. Chem. Phys. 2013, 15, 4436−4443. (64) Takigawa, I.; Shimizu, K.; Tsuda, K.; Takakusagi, S. MachineLearning Prediction of the D-Band Center for Metals and Bimetals. RSC Adv. 2016, 6, 52587−52595. (65) van Santen, R. A.; Neurock, M.; Shetty, S. G. Reactivity Theory of Transition-Metal Surfaces: A Brønsted-Evans-Polanyi Linear Activation Energy: Free-Energy Analysis. Chem. Rev. 2010, 110, 2005−2048. (66) Bligaard, T.; Nørskov, J. K.; Dahl, S.; Matthiesen, J.; Christensen, C. H.; Sehested, J. The Brønsted-Evans-Polanyi Relation
and the Volcano Curve in Heterogeneous Catalysis. J. Catal. 2004, 224, 206−217. (67) Wang, S.; Temel, B.; Shen, J.; Jones, G.; Grabow, L. C.; Studt, F.; Bligaard, T.; Abild-Pedersen, F.; Christensen, C. H.; Nørskov, J. K. Universal Brønsted-Evans-Polanyi Relations for C-C, C-O, C-N, N-O, N-N, and O-O Dissociation Reactions. Catal. Lett. 2011, 141, 370− 373. (68) Medford, A. J.; Vojvodic, A.; Hummelshøj, J. S.; Voss, J.; Abildpedersen, F.; Studt, F.; Bligaard, T.; Nilsson, A.; Nørskov, J. K. From the Sabatier Principle to a Predictive Theory of Transition-Metal Heterogeneous Catalysis Q. J. Catal. 2015, 328, 36−42. (69) Besenbacher, F.; Chorkendorff, I.; Clausen, B. S.; Hammer, B.; Molenbroek, A. M.; Nørskov, J. K.; Stensgaard, I. Design of a Surface Alloy Catalyst for Steam Reforming. Science 1998, 279, 1913−1915. (70) Kokalj, A.; Bonini, N.; de Gironcoli, S.; Sbraccia, C.; Fratesi, G.; Baroni, S. Methane Dehydrogenation on Rh@Cu(111): A FirstPrinciples Study of a Model Catalyst. J. Am. Chem. Soc. 2006, 128, 12448−12454. (71) Lide, D. R., Ed. CRC Handbook of Chemistry and Physics, 84th ed., 2003−2004; CRC Press: Boca Raton, FL, 2003; 2616 pp. (72) Tran, R.; Xu, Z.; Radhakrishnan, B.; Winston, D.; Sun, W.; Persson, K. A.; Ong, S. P. Surface Energies of Elemental Crystals. Sci. Data 2016, 3, 160080. (73) Allred, A. L.; Rochow, E. G. A Scale of Electronegativity Based on Electrostatic Force. J. Inorg. Nucl. Chem. 1958, 5, 264−268. (74) Allred, A. L.; Rochow, E. G. Electronegativities of Carbon, Silicon, Germanium, Tin and Lead. J. Inorg. Nucl. Chem. 1958, 5, 269− 288. (75) Kresse, G.; Furthmüller, J. Efficient Iterative Schemes for Ab Initio Total-Energy Calculations Using a Plane-Wave Basis Set. Phys. Rev. B: Condens. Matter Mater. Phys. 1996, 54, 11169−11186. (76) Kresse, G.; Furthmüller, J. Efficiency of Ab-Initio Total Energy Calculations for Metals and Semiconductors Using a Plane-Wave Basis Set. Comput. Mater. Sci. 1996, 6, 15−50. (77) Blöchl, P. E. Projector Augmented-Wave Method. Phys. Rev. B: Condens. Matter Mater. Phys. 1994, 50, 17953−17979. (78) Perdew, J. P.; Burke, K.; Ernzerhof, M. Generalized Gradient Approximation Made Simple. Phys. Rev. Lett. 1996, 77, 3865−3868. (79) Marcinkowski, M. D.; Darby, M. T.; Liu, J.; Wimble, J. M.; Lucci, F. R.; Lee, S.; Michaelides, A.; Flytzani-Stephanopoulos, M.; Stamatakis, M.; Sykes, E. C. H. Pt/Cu Single-Atom Alloys as CokeResistant Catalysts for Efficient C−H Activation. Nat. Chem. 2018, 10, 325−332. (80) Gajewski, G.; Pao, C.-W. Ab Initio Calculations of the Reaction Pathways for Methane Decomposition over the Cu(111) Surface. J. Chem. Phys. 2011, 135, 064707. (81) Picard, R. R.; Cook, R. D. Cross-Validation of Regression Models Cross-Validation of Regression Models. J. Am. Stat. Assoc. 1984, 79, 575−583. (82) Ghiringhelli, L. M.; Vybiral, J.; Levchenko, S. V.; Draxl, C.; Scheffler, M. Big Data of Materials Science: Critical Role of the Descriptor. Phys. Rev. Lett. 2015, 114, 105503. (83) Shao, J. U. N. Linear Model Selection by Cross-Validation. J. Am. Stat. Assoc. 1993, 88, 486−494. (84) Pedregosa, F.; Varoquaux, G. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825−2830. (85) Japkowicz, N.; Shah, M. Evaluating Learning Algorithms; Cambridge University Press, 2011; p 423. (86) Murphy, K. P. Machine Learning: A Probabilistic Perspective; MIT Press, 2012. (87) Friedman, J. H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189−1232. (88) Geurts, P.; Ernst, D.; Wehenkel, L. Extremely Randomized Trees. Mach. Learn. 2006, 63, 3−42. (89) Drucker, H.; Burges, C. J. C.; Kaufman, L.; Smola, A.; Vapnik, V. Support Vector Regression Machines. Adv. Neural Inf. Process. Syst. 1997, 9, 155−161. (90) Rasmussen, C. E.; Williams, C. K. I. Gaussian Processes for Machine Learning; MIT Press, 2006. 8325
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326
Article
The Journal of Physical Chemistry C (91) Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5−32. (92) Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J. Classification and Regression Trees; Wadsworth Statistics/Probability; Chapman and Hall, 1984. (93) Inderwildi, O.; Jenkins, S. In-Silico Investigations in Heterogeneous Catalysis - Combustion and Synthesis of Small Alkanes. Chem. Soc. Rev. 2008, 37, 2274−2309. (94) Abild-Pedersen, F. Computational Catalyst Screening: Scaling, Bond-Order and Catalysis. Catal. Today 2016, 272, 6−13. (95) Legrain, F.; Carrete, J.; van Roekeghem, A.; Curtarolo, S.; Mingo, N. How the Chemical Composition Alone Can Predict Vibrational Free Energies and Entropies of Solids. Chem. Mater. 2017, 29, 6220−6227.
8326
DOI: 10.1021/acs.jpcc.7b12670 J. Phys. Chem. C 2018, 122, 8315−8326