Metallic Metal–Organic Frameworks Predicted by the Combination of

Jul 27, 2018 - Using a new strategy (i.e., transfer learning) of combining machine learning techniques, statistical multivoting, and ab initio calcula...
0 downloads 0 Views 1MB Size
Subscriber access provided by UNIV OF DURHAM

Energy Conversion and Storage; Plasmonics and Optoelectronics

Metallic Metal-Organic Frameworks Predicted by the Combination of Machine Learning Methods and Ab Initio Calculations Yuping He, Ekin Dogus Cubuk, Mark D. Allendorf, and Evan J. Reed J. Phys. Chem. Lett., Just Accepted Manuscript • DOI: 10.1021/acs.jpclett.8b01707 • Publication Date (Web): 27 Jul 2018 Downloaded from http://pubs.acs.org on July 29, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Metallic Metal-Organic Frameworks Predicted by the Combination of Machine Learning Methods and Ab initio Calculations Yuping He* Sandia National Laboratories, Livermore, California, 94551, USA *E-mail: [email protected] Ekin D. Cubuk Google Brain, Mountain View, California, 94043, USA Mark D. Allendorf Sandia National Laboratories, Livermore, California, 94551, USA Evan J. Reed Department of Materials Science and Engineering, Stanford University, Stanford, California, 94305, USA

Abstract Emerging applications of MOFs in electronic devices will benefit from the design and synthesis of intrinsically, highly electronically conductive MOFs; However very few are known to exist. It is a challenging task to search for electronically conductive MOFs within the tens of thousands of reported MOF structures. Using a new strategy (i.e. transfer learning) of combining machine learning techniques, statistical multi-voting, and ab initio calculations, we screened 2932 MOFs and identified six MOF crystal structures that are metallic at the level of semi-local DFT band theory: Mn2 [Re6X8 (CN)6]4 (X= S, Se,Te), Mn[Re3Te4 (CN)3], Hg[SCN]4Co[NCS]4, and CdC4. Five of these structures have been synthesized and reported in literature, but their electrical characterization has not been reported. Our work demonstrates the potential power of machine learning in materials science to aid in down-selecting from large numbers of potential candidates, and provides the information and guidance to accelerate the discovery of novel advanced materials.

ACS Paragon Plus Environment

1

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 18

Material Material Material A B C Learning Conductive MOFs Predicting Properties

Function

TOC

Metal-organic frameworks (MOFs) are intrinsically nanoporous materials with controllable pore sizes, and have long been recognized as highly promising candidates for energy storage and gas separation1-3. However, interest is growing beyond the conventional applications of nanoporous materials of MOFs for the electronic and optical devices4-6, such as, sensors, thermoelectrics and photovoltaics, on account of their rich chemical compositions and structural topologies with tunable thermal, electronic and optical properties, and potentially lowcost fabrication. These new applications open a wealth of scientific questions about their fundamental charge transport properties. Although the charge transport mechanisms in MOFs are only beginning to be understood, in most cases, the observed electrical conductivity in MOFs is thought to be described by charge hopping similar to organic semiconductors, rather than band transport, due to relatively narrow electronic band widths and associated large effective masses for charge carriers. The majority of MOFs are electrical insulators with large bandgap (i.e. > 2 eV). Many experimental and theoretical efforts have been made to search for electrically conductive MOFs5, 7-10, which is essential for their potential applications in electronic and optical device applications. Nevertheless, an intrinsic metallic MOF has not been reported yet. The structural topologies of MOFs are largely determined by the coordination number of metal center and the structure and symmetry of the organic ligands. By substituting various metals and organic ligands, over 20,000 different MOFs have been synthesized in the past decade11. Searching for a few conductive or metallic MOFs within the large numbers of MOF structures is infeasible by human intuition alone. In addition, the conventional guess-and-check approach is inefficient for the study of MOFs. For example, it is computationally infeasible to carry out high-throughput ab initio electronic structure calculations over 20,000 MOF structures,

ACS Paragon Plus Environment

2

Page 3 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

due to their large unit cells (100~1000 atoms). On the experimental side, it is still a challenging task to measure the electronic transport properties of MOFs due to the low-density porous structures and the need for specific device architectures.

Figure 1. Schematic diagram of training ML models (a), and screening metallic MOFs (b), using four supervised classifiers : logistic regression (LR), support vector classification (SVC), neural network (NN) and random forest (RF) followed by semi-local DFT calculations to identify the most promising candidates.

In this letter, we present a novel approach that combines the machine-learning methods and ab initio calculations to accelerate the search for metallic MOFs. We employed a transfer learning technique that applies the knowledge learned in one task (or domain) to solve the problem in another task (or domain)12. Figure 1 illustrates the designed transfer learning approach, that is, several ML models are trained using inorganic data in parallel, and the obtained predictive ML models are then used to search for the metallic MOFs in MOFs database based on the statistical multi-voting method13. The predicted metallic MOFs are further validated by the ab initio calculations. We identify here six MOF crystal structures that are metallic at the level of semi-local DFT band theory. Five of these have been synthesized and reported in the literature, but no electrical characterization has been reported. Machine learning (ML) has potential to effectively learn from the existing information, i.e. data and situations, and make accurate predictions and decisions. From a scientific point of view, ML is not necessarily different from the conventional prediction theory that is derived and tested by scientific methods, because the ML model with an accurate predictive capability is obtained from the data that contains the scientific knowledge and is generated from both experimental and theoretical methods. An advantage of ML is that it can solve problems in a high dimensional

ACS Paragon Plus Environment

3

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 18

space efficiently and accurately, compared to the traditional methods for making predictions. The power of ML is already demonstrated in many fields, including cognitive game theory14, pattern recognition15 and bioinformatics16. Recently the utility of ML has been reported in materials research, including the predictions of phase diagrams17-18, crystal structures19-20, chemisorption21, and other materials properties22-25. On multi-scale modeling and simulation, ML has been reported to assist in the development of interatomic potentials for classical molecular dynamic simulations26-29, and models for amorphous materials30-31. We employed a supervised classification learning to identify metallic MOFs within 2,937 MOF structures in CoRE-MOFs database of experimentally characterized structures32. We choose the bandgap of the material as the output value () to supervise the machine learning. The bandgap is the energy difference in electron volts (eV) between the highest occupied valence band and the lowest unoccupied conduction band. The fundamental difference between a metal and a semiconductor is that for a metal, the valence band energies overlap with conduction bands, resulting in no band gap, hence the bandgap is an ideal criterion for classifying the materials into metal and nonmetal (i.e. semiconductor). We define  = 1 if the material has a bandgap and thus nonmetal, or  = −1 if the material does not have a bandgap and it is metal. Therefore, our supervised learning is a binary classification. For the input values of features, we generated 45 material descriptors. We selected nine elemental properties (i.e. atomic number, group number, period number, electronegativity, electron affinity, melting temperature, boiling temperature, density and ionization energy) from a database33, and then computed five statistical quantities (i.e. standard mean, geometric mean, standard deviation, maximum value and minimum value) over the number of atoms in the chemical formula of materials, using the generic statistical reduction methods22,. We then conducted the feature selection based on a Ftest using the test model of SVC (see Feature Selection in supplemental material), and find that all 45 features are necessary to obtain the highest accuracy of prediction. Therefore, we used 45 features for the training of all models. Currently, there is no readily available database of bandgaps for MOFs. Therefore, we employed a transfer learning approach. For example, the classification model is trained using the data of inorganic compounds (~52,300) in the Open Quantum Materials Database (OQMD)34, in which the bandgaps of materials were computed by DFT based ab initio calculations with generalized gradient approximation (GGA) of Perdew-Burke-Ernzerhof (PBE)35 exchange-

ACS Paragon Plus Environment

4

Page 5 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

correlation functional (see Figure 1a). The materials in the OQMD are a mix of experimentally characterized structures and theoretical structures. The trained model is then used to identify the metallic MOFs under the assumption that the learned relationships can be transferred to MOFs. Prediction accuracy due to this assumption is quantified by using multi-model voting and ab initio calculations (see Figure 1b). Several supervised regression algorithms have been used to predict the values of bandgap for the specific inorganic materials with various accuracies36-38. There is neither a unique predictive model of property for a given material space, nor standard rule to a priori determine which ML method is best. In this work, we are interested in classifying materials into nonmetal and metal using ML. Since the ML model is trained in the inorganic space, in order to increase the probability of finding metallic MOFs, we trained four ML models: one linear classifier, i.e. logistic regression (LR), and three nonlinear classifiers, i.e. support vector classification (SVC) with a kernel of radial based function (rbf), neural network (NN) and random forest (RF), on the same input inorganic data. Each classifier we used is based on a different algorithm with different hyper-parameters (See the detail of classifiers in supplemental material), which must be tuned to obtain an optimal model with smallest mean square error (MSE). Cross-validation is often used to estimate the generalization performance. We first randomly separated the 52,300 inorganic data into 70% for the training set and 30% for the test set. The training set is used to optimize and train the models through cross-validation, whereas the test set is used to evaluate the accuracy of the trained models. We employed fivefold cross-validation to optimize each model, that is, the data is randomly partitioned into five groups, and the model is trained using four of the five subsets, and then evaluated on the remaining subset. This process is repeated using each of five partitions as the test set, and the predictive ability of the model is then assessed as the average performance of the model across all repetitions. To optimize the models, we train each ML model with a grid of its corresponding key hyper-parameters. The detail of classification methods and the optimized hyper-parameters for the final training are presented in the supplemental material. In order to obtain an accurate ML models, defining metrics to evaluate the machine learning performance is very important. In this work, the accuracy of predictive model is evaluated on the test set that is not used to train the models. We summarize the predicted results into a confusion matrix39 , which is then used to compute the precision, recall, and accuracy,

ACS Paragon Plus Environment

5

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 18

Table I. Calculated accuracy, precision, and recall for both training and test sets using the classifiers of logistic regression (LR), support vector classification (SVC) with a kernel of radial based function (rbf), neural network (NN) and random forest (RF), respectively, with respect to dummy classifier (DC).

Training

Testing

DC %

LR %

SVC (rbf) %

NN %

RF %

Accuracy

50

87

97

95

98

Precision

50

84

96

94

98

Recall

50

92

98

97

99

Accuracy

50

88

95

94

97

Precision

50

85

94

92

96

Recall

50

92

96

96

97

three common metric values used to measure the performance of machine learning classifiers (see Table I). The details of the confusion matrix for each model are presented in the supplemental material. The precision indicates how many predicted metals are actual metals, and is equal to the fraction TP/(TP+FP), where TP is the number of true positive and FP is the number of false positive. Recall presents how many actual metals are predicted to be metals, and is a fraction of TP/(TP+FN), here FN is false negative. Both precision and recall are based on the measure of metals, whereas the accuracy is used to characterize the predictions of both metals and nonmetals, and is equal to the fraction (TP+TN)/(TP+TN+FP+FN) in which TN represents the number of true negative. The dummy classifier is similar to a random guessing, and used to provide a floor value ( 50 %) for the purpose of comparison, since there are ~50% metals in the training data from OQMD. The results in Table I show that four predictive models are significantly better than the dummy classifier. The linear classifier of Logistic Regression (LR) is worse than the other three non-linear classifiers, i.e. Support Vector Classification (SVC), Neural Network (NN) and Random Forest (RF), because of the complex, nonlinear decision boundary in the material feature space. An example of decision boundary comparison among these ML classifiers is shown in Figure 2. Although all these ML models perform extremely well when applied to the inorganic data, we expect that the quality of prediction for MOFs will not be as good due to the wide range of

ACS Paragon Plus Environment

6

Page 7 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

MOF structures. To check the overlap of feature spaces between the two data sets, we applied multi-dimensional scaling (MDS), i.e. t-distributed Stochastic Neighbor Embedding (t-SNE)40, to

Figure 2. Examples of decision boundaries obtained from the classification methods of logistic regression (a), SVC with the kernel of rbf (b), and neural network (c) in a specific twodimensional feature space: the X-axis is mean atomic number, and the Y-axis is mean group number of each compound. Red: semiconductor, and blue: metal.

reduce the 45-dimensional feature space into 2-dimensional space for both data sets to allow their distributions to be captured. As shown in Figure 3, the spatial distribution of features exhibits some overlap, suggesting potential for the bandgap model to exhibit some predictive power for MOFs in some regions of feature space. To further quantify the overlap between MOF and inorganic feature spaces in the reduced 2D feature space, we estimated an overlap ratio of inorganic and MOF feature spaces by calculating the fraction of the number of MOFs, whose features are overlapping with those of inorganic materials, and the total number of MOFs. We find that increasing the number of training data (i.e. inorganic materials) can improve the overlap, in turn, enhance the transfer learning (see Figure S4 in supplemental material). These results suggest that transfer learning is expected to be applicable to some but not all regions of MOF space. Most MOFs are insulators with large bandgaps, and conductive MOFs are expected to be

ACS Paragon Plus Environment

7

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 18

scarce. Intrinsically metallic MOFs have not been reported. To increase the probability of finding rare metallic MOFs among a large number of structures, we employed a multi-voting statistical approach to determine the final predicted metallic MOFs. If the MOF structure is

Figure 3. Reduced 2-dimensional feature space of Inorganic (green) and MOFs (purple) obtained by t-distributed Stochastic Neighbor Embedding algorithm. The spatial distribution of features exhibits large regions of overlap, suggesting potential for the bandgap model trained on inorganic data to have predictive power for MOFs in some regions of feature space.

predicted to be a metal by more than two models, we accepted it as a metallic MOF candidate for further testing with DFT, otherwise, we categorize it as a nonmetal. Table II shows all nine predicted metallic MOFs based on the multi-voting. To test the predictive models, we applied each model (i.e. LR, SVC, NN and RF) to two 3D MOF structures, i.e. Cu3(btc)2 with btc = benzene-1,3,5-tricarboxylic acid, and Cu3(btdt)2 with btdt = benzene-1,3,5-trisdithiolate, which have the semi-local DFT bandgaps of 1.8 eV and 0.4 eV, respectively41, and also tested the models to 2D MOF, i.e. Ni3(HITP)2 with 2,3,6,7,10,11-hexaiminotriphenylene, which has a hybrid DFT bandgap of 0.25 eV

7, 42

. We find that all predictive ML models correctly indicate

that they are nonmetals. To further validate the ML predictions, we carried out a series of ab initio calculations on nine predicted metallic MOFs (see Table II) using VASP43, a Vienna Ab initio Simulation Package based on DFT. To be consistent with training data, we chose the PBE functional to approximate the exchange and correlation of electrons for all MOFs. The crystal structures of MOFs were first relaxed with respect to both volume and atomic positions under the

ACS Paragon Plus Environment

8

Page 9 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

optimization condition that all atomic forces are less than 0.01 eV/Å, and then self-consistent calculations of ground state density and electronic band structure were carried out. The detail of the DFT calculations can be found in the supplemental materials. We find that the DFT

Table II. Prediction of metallicity by each classifier: logistic regression (LR), support vector classification (SVC) with a kernel of radial based function (rbf), neural network (NN) and random forest (RF); The metallic MOFs confirmed by the calculated bandgap at DFT+PBE level (Bandgap in eV); Estimated Mott constant (); and the reference (Ref) of DFT confirmed metallic MOFs. SVC

Empirical Formula

Chemical Formula

LR

Mn8 Re24 S32 C24 N24

Mn2 [Re6S8 (CN)6]4

metallic

metallic

(rbf)

NN

RF

Bandgap /eV 0



Ref

0.1025

34

Mn8 Re24 Se32 C24 N24

Mn2 [Re6Se8 (CN)6]4

metallic

metallic

0

0.1042

35

Mn8 Re24 Te32 C24 N24

Mn2 [Re6Te8 (CN)6]4

metallic

metallic

0

0.1034

35

Co4 Hg4 C16 S16 N16

Hg[SCN]4Co[NCS]4

metallic

0

0.0890

38

Cd2C8

CdC4

metallic

metallic

0

0.0838

Mn4 Re12 Te16 C12 N12

Mn[Re3Te4 (CN)3]

metallic

metallic

0

0.1152

metallic

metallic

Na13 Fe4 Sb2 W18 C8 O86

Na13Fe4Sb2W18(C4O43)2

metallic

metallic

0.52

K4 Nd4 Re16 Te16 C48 N48

KNd[Re4Te4 (CN)12]

metallic

metallic

1.49

Cd8C32

CdC4

metallic

metallic

35

2.19

calculations confirm 6 metallic MOFs based on the obtained band gaps (see Table II). Interestingly, four of the six predicted metallic MOFs contains CN linkers, and transition metals of Mn and Re. In 2932 MOFs, we find that 1768 MOF structures contain CN linkers, 196 structures contain Mn, but only 4 MOFs contain Re, which are all predicted to be metallic. It is well known that DFT+PBE underestimates band gaps of nonmetals. To check the metallic character of predicted MOFs beyond DFT+PBE, we calculated the electronic band structure of Cd2C8 using DFT+HSE, and find that Cd2C8 remains metallic (see Figure S6 in supplemental material), suggesting that self interaction is not likely leading to fictitious metallization. The Random Forest has the best performance based on the multi-voting results, whereas the neural network performance is better according to the DFT results (see Table SIII in supplemental material). Since we did not carry out DFT calculations for all systems (e.g. 82 MOFs in Table SIV in supplemental material) predicted from each model, we could not

ACS Paragon Plus Environment

9

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 18

determine the prediction quality of each model. Nevertheless, based on the DFT results, we estimate the prediction accuracy of the multi-voting (i.e. the fraction of the number of DFTconfirmed metallic MOFs and the number of metallic MOFs predicted by multi-voting) to be 67%. The success of ML model is qualitatively consistent with the overlapping distributions of the two data sets in the reduced 2-dimensional feature space (see Figure 3 and Figure S3). A random baseline for this performance is difficult to ascertain without knowing the true number of metallic MOFs in the screening data set. Nevertheless, we discover six metallic MOFs, and demonstrate that using data-driven machine learning methods can not only speed up the search of metallic MOFs, but also increase the probability of discovering these materials. We find that five of six confirmed metallic MOFs have been experimentally synthesized. Both Mn2 [Re6X8(CN)6]4 (X=S, Se and Te)44-45 and Mn[Re3Te4 (CN)3] 45 are metal cyanides , in which hexadentate cluster anions [Re6S8(CN)6]4- coordinate to Mn(II) centers to form a neutral 3D coordination network topologically related to the structure of Prussian Blue46. Prussian Blue (PB) is the oldest metal cyanide compound, and has intrigued chemists with the fascinating electronic conducting properties as a function of their material colors. For example, vacuumdried Prussian Blue acts as an insulator, whereas its oxidized yellow and reduced white forms display semiconducting properties46. In particular, a recent study showed that the partially oxidized PB (i.e. Fe2+ to Fe3+), often called Berlin Green (BG), has ohmic resistance in a temperature range from 100K to 300K 47, indicating that it is electronically conductive. Similar to PB, Mn2 [Re6X8(CN)6]4 has a simple cubic structure for X= S, Se and Te, and becomes a simple orthorhombic as the structure is lightly disordered by the different X. Interestingly, the Mn2+ ions can also be oxidized to Mn3+ as it is immersed in the methanol solution of H2(salen) ligand (N, N'-ethylenebis(salicylideneamine))45. It is possible to speculate that, similar to BG, this oxidized form of Mn2 [Re6X8(CN)6]4 could be electronically conductive as well. The identified structures of Hg[SCN]4Co[NCS]4 was also reported to be synthesized in 1980

48

. but

its conducting properties have not been investigated. The structure of CdC4 is contained in the CoRE-MOF database, but we are unable to find a literature report of its synthesis and characterization. Figure 4 shows the calculated electronic band structures of six confirmed metallic MOFs by DFT. We find that they are not only metallic, but several also appear to have large band dispersions and wide bandwidths (i.e. up to a few hundred meV), particularly for Cd2C8 and

ACS Paragon Plus Environment

10

Page 11 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Mn4Re12Te16C12N12. These are large when compared to most MOF structures where the bands are usually flatter with bandwidths typically close to those of polymers (i.e. a few tens of meV).

Figure 4. The optimized crystal structures and calculated electronic band structures of six DFTconfirmed

metallic

MOFs

:

(a)

Mn8Re24C24S32N24

(b)

Mn8Re24C24Se32N24

(c)

Mn8Re24C24Te32N24 (d) Co4Hg4C16S16N16 (e) Cd2C8 and (f) Mn4Re12Te16C12N12 , respectively. The exact chemical formula and the corresponding atomic species are shown in Figure S4 of supplemental material .

The electronic density of states around Fermi level are also calculated, and can be found in the supplemental material (see Figure S5 in supplemental material). In the past decade, much effort has been made to synthesize a conductive MOF by band engineering methods through the substituting various combination of metal center and organic ligands5, 49-53. It has been recently reported that Fe may be advantageous in promoting electrical conductivity in a spectrum of MOFs, leading to smaller DFT-level bandgaps than for other metal atoms54. While none of the materials identified in this work include Fe atoms, the lack of any bandgap at the DFT-level is encouraging for their potential to exhibit high conductivity. The existing large dispersion and

ACS Paragon Plus Environment

11

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 18

wide bandwidth in some of these MOFs indicate that they may have larger carrier mobilities and carrier densities, compared to other MOFs. This could lead to a relatively high electrical conductivity, and make them promising candidate materials for applications in electronic and optical devices. While our approach has successfully identified a number of promising candidates at the DFT+PBE band theory level, it is important to note that there is potential for physics not captured by DFT+PBE to affect the electrical transport properties of these materials when they are synthesized and characterized. For example, hopping mechanisms with an activation barrier are commonly reported for organic materials and MOFs, suggesting that small polarons may play a role, or potentially other mechanisms beyond band theory, such as Mott insulator transition55 due to relatively low carrier densities and screening. In the latter case, one must consider the role of strong electron-electron interaction of d or f orbitals in the transition metal centers 56, which is not included in the DFT+PBE calculation and can lead to hopping activation energies for quasiparticles. The critical point of metal-Mott insulator transition can in many doped 

semiconductors be roughly estimated by a quantity  = √ × , in which  is the free carrier density of the material and is the effective Bohr radius. If  is smaller than 0.25, the material could be a Mott insulator. By approximating  with the number of metals per unit cell, we estimate the values of  for six MOFs (see Table II) which are all smaller than 0.25. Although this is a very crude estimate, it suggests potential for the opening of electronic bandgaps. Since these effects are expected to become less pronounced as the Kohn-Sham band dispersion energy increases, the relatively large dispersions of Cd2C8 and Mn4Re12Te16C12N12 may make them the most likely to have gapless electronic transport among the candidates identified. In principle, these effects beyond DFT+PBE could be incorporated into a screening algorithm if a suitable training data set exists containing the results of experimental measurements. Recently Zhup et al.24 have trained a support vector regression model with 3896 experimentally reported band gaps, fewer than the 40,000 DFT gaps utilized for training the model in this work. Furthermore, the utilization of DFT bandgaps for training enables direct validation of the model using DFT calculations of predicted metallic MOFs at the same level of theory. This eliminates sources of test error from the unconstrained and uncharacterized degrees of freedom in the experimental synthesis and characterization of these materials.

ACS Paragon Plus Environment

12

Page 13 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

In summary, MOF materials are typically poor electrical conductors, but promising candidates for the gas storage and separation. Recent experimental discoveries of a few MOFs with better conductivity 5 have triggered a large effort to synthesize more conductive MOFs for electrical and optical applications. The flexible chemical compositions leading to a large number of MOF structures are challenging for the conventional guess-and-check method to search for conductive MOFs. We have carried out here a new strategy, i.e. transfer learning, by using the ML models trained in the inorganic materials space to search the metallic materials in the MOFs space. The combination of statistical multi-voting and machine learning techniques predict nine metallic MOFs, and ab initio calculations confirm six intrinsic metallic MOFs at the level of semilocal DFT, leading to an accuracy of 67%. We believe that the accuracy of ML predictions can be further increased by training the models with the exact MOFs or organic polymer materials space. Several of these metallic MOFs exhibit substantial band dispersions around the Fermi energy, indicating that they could be promising electrical conductors. This work demonstrates the potential power of machine learning in materials science to aid in downselecting from large numbers of potential candidates. It is our hope that these results will drive experimental efforts to synthesize and characterize these MOFs.

Acknowledgments We acknowledge helpful discussions with Vitalie Stavila and W Philip Kegelmeyer. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525. The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government. Supporting Information Available: The details of machine learning methods and ab initio calculations. This material is available free of charge via the Internet at […].

ACS Paragon Plus Environment

13

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 18

Reference 1.

Ferey, G. Hybrid Porous Solids: Past, Present, Future. Chem. Soc. Rev. 2008, 37 (1), 191-

214. 2.

Wu, H.; Gong, Q.; Olson, D. H.; Li, J. Commensurate Adsorption of Hydrocarbons and

Alcohols in Microporous Metal Organic Frameworks. Chem. Rev. 2012, 112 (2), 836-868. 3.

Sumida, K.; Rogow, D. L.; Mason, J. A.; McDonald, T. M.; Bloch, E. D.; Herm, Z. R.;

Bae, T.-H.; Long, J. R. Carbon Dioxide Capture in Metal–Organic Frameworks. Chem. Rev. 2012, 112 (2), 724-781. 4.

Stassen, I.; Burtch, N.; Talin, A.; Falcaro, P.; Allendorf, M.; Ameloot, R. An Updated

Roadmap for The Integration of Metal-Organic Frameworks with Electronic Devices and Chemical Sensors. Chem. Soc. Rev. 2017, 46 (11), 3185-3241. 5.

Sun, L.; Campbell, M. G.; Dincă, M. Electrically Conductive Porous Metal–Organic

Frameworks. Angew. Chem. Int. Ed. 2016, 55 (11), 3566-3579. 6.

Stavila, V.; Talin, A. A.; Allendorf, M. D. MOF-Based Electronic and Opto-Electronic

Devices. Chem. Soc. Rev. 2014, 43 (16), 5994-6010. 7.

He, Y.; Spataru, C. D.; Leonard, F.; Jones, R. E.; Foster, M. E.; Allendorf, M. D.; Alec

Talin, A. Two-Dimensional Metal-Organic Frameworks with High Thermoelectric Efficiency Through Metal Ion Selection. Phys. Chem. Chem. Phys. 2017, 19 (29), 19461-19467. 8.

Talin, A. A.; Centrone, A.; Ford, A. C.; Foster, M. E.; Stavila, V.; Haney, P.; Kinney, R.

A.; Szalai, V.; El Gabaly, F.; Yoon, H. P.; Léonard, F.; Allendorf, M. D. Tunable Electrical Conductivity in Metal-Organic Framework Thin-Film Devices. Science 2013. 9.

Dou, J.-H.; Sun, L.; Ge, Y.; Li, W.; Hendon, C. H.; Li, J.; Gul, S.; Yano, J.; Stach, E. A.;

Dincă, M. Signature of Metallic Behavior in the Metal–Organic Frameworks M3(hexaiminobenzene)2 (M = Ni, Cu). J. Am. Chem. Soc. 2017, 139 (39), 13608-13611. 10.

Xie, L. S.; Sun, L.; Wan, R.; Park, S. S.; DeGayner, J. A.; Hendon, C. H.; Dincă, M.

Tunable Mixed-Valence Doping toward Record Electrical Conductivity in a Three-Dimensional Metal–Organic Framework. J. Am. Chem. Soc. 2018, 140 (24), 7411-7414. 11.

Furukawa, H.; Cordova, K. E.; O’Keeffe, M.; Yaghi, O. M. The Chemistry and

Applications of Metal-Organic Frameworks. Science 2013, 341 (6149). 12.

Pan, S. J.; Yang, Q. A Survey on Transfer Learning. IEEE Transactions on Knowledge

and Data Engineering 2010, 22 (10), 1345-1359.

ACS Paragon Plus Environment

14

Page 15 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

13.

André, L. D.; Andrew, H. V. d. V. A Group Process Model for Problem Identification

and Program Planning. J. Appl. Behav. Sci. 1971, 7 (4), 466-492. 14.

Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; van den Driessche, G.;

Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.; Leach, M.; Kavukcuoglu, K.; Graepel, T.; Hassabis, D. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 2016, 529, 484. 15.

Bulgarevich, D. S.; Tsukamoto, S.; Kasuya, T.; Demura, M.; Watanabe, M. Pattern

Recognition with Machine Learning on Optical Microscopy Images of Typical Metallurgical Microstructures. Sci. Rep. 2018, 8 (1), 2078. 16.

Chen, C.-C.; Juan, H.-H.; Tsai, M.-Y.; Lu, H. H.-S. Unsupervised Learning and Pattern

Recognition of Biological Data Structures with Density Functional Theory and Machine Learning. Sci. Rep. 2018, 8 (1), 557. 17.

Srinivasan, S.; Rajan, K. “Property Phase Diagrams” for Compound Semiconductors

through Data Mining. Materials 2013, 6 (1), 279-290. 18.

Zhang, Y.; Kim, E.-A. Quantum Loop Topography for Machine Learning. Phys. Rev.

Lett. 2017, 118 (21), 216401. 19.

Kong, C. S.; Luo, W.; Arapan, S.; Villars, P.; Iwata, S.; Ahuja, R.; Rajan, K. Information-

Theoretic Approach for the Discovery of Design Rules for Crystal Chemistry. J. Chem. Inf. Model. 2012, 52 (7), 1812-1820. 20.

Oliynyk, A. O.; Mar, A. Discovery of Intermetallic Compounds from Traditional to

Machine-Learning Approaches. Acc. Chem. Res. 2018, 51 (1), 59-68. 21.

Ma, X.; Li, Z.; Achenie, L. E. K.; Xin, H. Machine-Learning-Augmented Chemisorption

Model for CO2 Electroreduction Catalyst Screening. The Journal of Physical Chemistry Letters 2015, 6 (18), 3528-3533. 22.

Ward, L.; Agrawal, A.; Choudhary, A.; Wolverton, C. A General-Purpose Machine

Learning Framework for Predicting Properties of Inorganic Materials. npj Comput. Mater. 2016, 2, 16028. 23.

Sendek, A. D.; Yang, Q.; Cubuk, E. D.; Duerloo, K.-A. N.; Cui, Y.; Reed, E. J. Holistic

Computational Structure Screening of More Than 12 000 Candidates for Solid Lithium-Ion Conductor Materials. Energy & Environmental Science 2017, 10 (1), 306-320.

ACS Paragon Plus Environment

15

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

24.

Page 16 of 18

Zhuo, Y.; Mansouri Tehrani, A.; Brgoch, J. Predicting the Band Gaps of Inorganic Solids

by Machine Learning. The Journal of Physical Chemistry Letters 2018, 9 (7), 1668-1673. 25.

Janet, J. P.; Chan, L.; Kulik, H. J. Accelerating Chemical Discovery with Machine

Learning: Simulated Evolution of Spin Crossover Complexes with an Artificial Neural Network. The Journal of Physical Chemistry Letters 2018, 9 (5), 1064-1071. 26.

Behler, J.; Parrinello, M. Generalized Neural-Network Representation of High-

Dimensional Potential-Energy Surfaces. Phys. Rev. Lett. 2007, 98 (14), 146401. 27.

Bartók, A. P.; Payne, M. C.; Kondor, R.; Csányi, G. Gaussian Approximation Potentials:

The Accuracy of Quantum Mechanics, without the Electrons. Phys. Rev. Lett. 2010, 104 (13), 136403. 28.

Cubuk, E. D.; Malone, B. D.; Onat, B.; Waterland, A.; Kaxiras, E. Representations In

Neural Network Based Empirical Potentials. J. Chem. Phys. 2017, 147 (2), 024104. 29.

Hansen, K.; Biegler, F.; Ramakrishnan, R.; Pronobis, W.; von Lilienfeld, O. A.; Müller,

K.-R.; Tkatchenko, A. Machine Learning Predictions of Molecular Properties: Accurate ManyBody Potentials and Nonlocality in Chemical Space. The Journal of Physical Chemistry Letters 2015, 6 (12), 2326-2331. 30.

Cubuk, E. D.; Schoenholz, S. S.; Rieser, J. M.; Malone, B. D.; Rottler, J.; Durian, D. J.;

Kaxiras, E.; Liu, A. J. Identifying Structural Flow Defects in Disordered Solids Using MachineLearning Methods. Phys. Rev. Lett. 2015, 114 (10), 108001. 31.

Schoenholz, S. S.; Cubuk, E. D.; Sussman, D. M.; Kaxiras, E.; Liu, A. J. A Structural

Approach to Relaxation in Glassy Liquids. Nat. Phys. 2016, 12, 469. 32.

Chung, Y. G.; Camp, J.; Haranczyk, M.; Sikora, B. J.; Bury, W.; Krungleviciute, V.;

Yildirim, T.; Farha, O. K.; Sholl, D. S.; Snurr, R. Q. Computation-Ready, Experimental Metal– Organic Frameworks: A Tool To Enable High-Throughput Screening of Nanoporous Crystals. Chem. Mater. 2014, 26 (21), 6185-6192. 33.

Coursey, J. S.; Schwab, D. J.; Tsai, J. J.; Dragoset, R. A. Atomic Weights and Isotopic

Compositions with Relative Atomic Masses. NIST Physical Measurement Laboratory. https://www.nist.gove/pml/atomicweights-and-isotopic-compositions-relative-atomic-masses 2015.

ACS Paragon Plus Environment

16

Page 17 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

34.

Kirklin, S.; Saal, J. E.; Meredig, B.; Thompson, A.; Doak, J. W.; Aykol, M.; Rühl, S.;

Wolverton, C. The Open Quantum Materials Database (OQMD): Assessing the Accuracy of DFT Formation Energies. npj Comput. Mater. 2015, 1, 15010. 35.

Perdew, J. P.; Burke, K.; Ernzerhof, M. Generalized Gradient Approximation Made

Simple. Phys. Rev. Lett. 1996, 77 (18), 3865-3868. 36.

Pilania, G.; Gubernatis, J. E.; Lookman, T. Multi-Fidelity Machine Learning Models for

Accurate Bandgap Predictions of Solids. Computational Materials Science 2017, 129, 156-163. 37.

Dey, P.; Bible, J.; Datta, S.; Broderick, S.; Jasinski, J.; Sunkara, M.; Menon, M.; Rajan,

K. Informatics-Aided Bandgap Engineering for Solar Materials. Computational Materials Science 2014, 83, 185-195. 38.

Lee, J.; Seko, A.; Shitara, K.; Nakayama, K.; Tanaka, I. Prediction Model of Band Gap

for Inorganic Compounds by Combination of Density Functional Theory Calculations and Machine Learning Techniques. Phys. Rev. B 2016, 93 (11), 115104. 39.

Stehman, S. V. Selecting and Interpreting Measures of Thematic Classification Accuracy.

Remote Sensing of Environment 1997, 62 (1), 77-89. 40.

Van Der Maaten, L.; Hinton, G. Visualizaing Data Using t-SNE. Journal of Machine

Learning Research 2008, 9: . 41.

He, Y.; Alec Talin, A.; Allendorf, D. M. Thermoelectric Properties of 2D Ni3(hitp)2 and

3D Cu3(btc)2 MOFs: First-Principle Studies. ECS J. Solid State SC 2017, 6 (12). 42.

Sheberla, D.; Sun, L.; Blood-Forsythe, M. A.; Er, S.; Wade, C. R.; Brozek, C. K.;

Aspuru-Guzik, A.; Dincă, M. High Electrical Conductivity in Ni3(2,3,6,7,10,11hexaiminotriphenylene)2, a Semiconducting Metal–Organic Graphene Analogue. J. Am. Chem. Soc. 2014, 136 (25), 8859-8862. 43.

Kresse, G.; Furthmüller, J. Efficient Iterative Schemes for Ab Initio Total-Energy

Calculations Using a Plane-Wave Basis Set. Phys. Rev. B 1996, 54 (16), 11169-11186. 44.

Naumov, N. G.; Soldatov, D. V.; Ripmeester, J. A.; Artemkina, S. B.; Fedorov, V. E.

Extended Framework Materials Incorporating Cyanide Cluster Complexes: Structure of the First 3D Architecture Accommodating Organic Molecules. Chem. Commun. 2001, (6), 571-572. 45.

Kim, S.; Kim, Y.; Kal, Y.; Kim, S.-J. A New Organic–Inorganic Hybrid Framework

Containing Octahedral Hexarhenium Cluster and Its Transformation by Ligand Exchange. Inorg. Chim. Acta 2007, 360 (6), 1870-1874.

ACS Paragon Plus Environment

17

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

46.

Page 18 of 18

Xidis, A.; Neff, V. D. On the Electronic Conduction in Dry Thin Films of Prussian Blue,

Prussian Yellow, and Everitt's Salt. J. Electrochem. Soc. 1991, 138 (12), 3637-3642. 47.

Pajerowski, D. M.; Watanabe, T.; Yamamoto, T.; Einaga, Y. Electronic conductivity in

Berlin green and Prussian blue. Phys. Rev. B 2011, 83 (15), 153202. 48.

Udupa, M. R.; Krebs, B. Crystal and Molecular Structure of Mercury(II)

Tetrathiocyanatobis(Dimethylformamide)Cobaltate(II). Inorg. Chim. Acta 1980, 42, 37-41. 49.

Takaishi, S.; Hosoda, M.; Kajiwara, T.; Miyasaka, H.; Yamashita, M.; Nakanishi, Y.;

Kitagawa, Y.; Yamaguchi, K.; Kobayashi, A.; Kitagawa, H. Electroconductive Porous Coordination Polymer Cu[Cu(pdt)2] Composed of Donor and Acceptor Building Units. Inorg. Chem. 2009, 48 (19), 9048-9050. 50.

Kobayashi, Y.; Jacobs, B.; Allendorf, M. D.; Long, J. R. Conductivity, Doping, and

Redox Chemistry of a Microporous Dithiolene-Based Metal−Organic Framework. Chem. Mater. 2010, 22 (14), 4120-4122. 51.

Hao, Z.; Yang, G.; Song, X.; Zhu, M.; Meng, X.; Zhao, S.; Song, S.; Zhang, H. A

Europium(iii) Based Metal-Organic Framework: Bifunctional Properties Related to Sensing and Electronic Conductivity. J. Mater. Chem. A 2014, 2 (1), 237-244. 52.

Sun, L.; Miyakai, T.; Seki, S.; Dincă, M. Mn2(2,5-disulfhydrylbenzene-1,4-

dicarboxylate): A Microporous Metal–Organic Framework with Infinite (−Mn–S−)∞ Chains and High Intrinsic Charge Mobility. J. Am. Chem. Soc. 2013, 135 (22), 8185-8188. 53.

Zhang, Q.; Li, B.; Chen, L. First-Principles Study of Microporous Magnets M-MOF-74

(M = Ni, Co, Fe, Mn): the Role of Metal Centers. Inorg. Chem. 2013, 52 (16), 9356-9362. 54.

Sun, L.; Hendon, C. H.; Park, S. S.; Tulchinsky, Y.; Wan, R.; Wang, F.; Walsh, A.;

Dinca, M. Is Iron Unique in Promoting Electrical Conductivity in MOFs? Chem. Sci. 2017, 8 (6), 4450-4457. 55.

Anisimov, V. I.; Zaanen, J.; Andersen, O. K. Band theory and Mott insulators: Hubbard

U instead of Stoner I. Phys. Rev. B 1991, 44 (3), 943-954. 56.

Hendon, C. H.; Tiana, D.; Walsh, A. Conductive Metal-Organic Frameworks and

Networks: Fact or Fantasy? PCCP 2012, 14 (38), 13120-13132.

ACS Paragon Plus Environment

18