Understanding Quantitative Relationship Between Methane Storage

1 day ago - The Journal of Physical Chemistry C .... The new dataset with 37 features and 130,397 samples was then randomly split into train set and t...
1 downloads 0 Views 1MB Size
Subscriber access provided by EDINBURGH UNIVERSITY LIBRARY | @ http://www.lib.ed.ac.uk

C: Energy Conversion and Storage; Energy and Charge Transport

Understanding Quantitative Relationship Between Methane Storage Capacities and Characteristic Properties of Metal Organic Frameworks Based on Machine Learning Xuanjun Wu, Sichen Xiang, Jiaqi Su, and Weiquan Cai J. Phys. Chem. C, Just Accepted Manuscript • DOI: 10.1021/acs.jpcc.8b11793 • Publication Date (Web): 22 Mar 2019 Downloaded from http://pubs.acs.org on March 22, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Understanding methane

quantitative

storage

relationship

capacities

and

between

characteristic

properties of metal organic frameworks based on machine learning Xuanjun Wu*,a, Sichen Xiang a, Jiaqi Su a and Weiquan Cai*,b,a

aSchool

of Chemistry, Chemical Engineering & Life Sciences, Wuhan University

of Technology, Wuhan 430070, P.R. China; bSchool of Chemistry and Chemical Engineering, Guangzhou University, Guangzhou 510006, P.R. China.

ABSRACT: Metal organic frameworks (MOFs) are one category of the emerging porous materials, which are promising competitors applied in gas storage and separation due to their high porosity and high surface area. It is still time-consuming to search the optimal materials for methane storage from a large number of candidates by traditional methods such as molecular simulations and quantum mechanics. Recently, machine-learning (ML) algorithms were gradually used to accelerate the discovery of highperformance MOFs. In this work, Henry's coefficient besides other characteristic parameters were computed and appended into the previously reported dataset of hypothetical metal organic frameworks (hMOFs) for methane storage. The new dataset with 37 features and 130,397 samples was 1 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 29

then randomly split into train set and test set in the ratio of 7:3, which were applied for ML training and testing with three different algorithms, including support vector machine (SVM), random forest regression (RFR) and gradient boosting regression tree (GBRT). The results indicate that GBRT model demonstrates the best generalization ability to predict non-trained dataset, while RFR model results in the best predictive power in the training set. The analysis of feature importance from machine learning algorithms confirms that high generalization ability of GBRT model is attributed to the model extracting more information from wider range of features. RFR model results in the highest prediction accuracy with Pearson correlation coefficient (r2) of 0.9984 and root mean square error (RMSE) of 3.93 in the training set of absolute gravimetric uptakes. GBRT model results in the highest prediction accuracy with r2 of 0.9908 and RMSE of 9.40 in the test set of absolute gravimetric uptakes, which is of the highest prediction accuracy among the up-to-date reports. According to volumetric capacities for methane storage, the optimal hMOFs exhibit ϕ of 0.65~0.88, LCD of ~7.5 Å and VSA of ~2250 m2 cm-3 etc.

Keywords: metal organic framework; methane storage; machine learning; quantitative structure-activity relationship; henry coefficient.

Metal-organic frameworks are an emerging type of porous materials with intrinsic pores, which are self-assembled by metal or metal oxide nodes and 2 ACS Paragon Plus Environment

Page 3 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

organic linkers.1 The pore walls could be decorated with organic functional groups into a unique confined space with high surface area, enabling them as potential candidates in the field of gas storage,

2

substance separation,

3

molecular catalysis, 4 molecular sensor5 and drug delivery, 6 etc. Using metal-organic framework materials for methane (the main component of natural gas) storage is an old story, but it is still significantly challenging, because even if the best available MOFs are measured, the methane storage capacities remain far from the standard (315 cm3 STP cm-3) proposed by Department of Energy (DOE) of USA, probably less than 60% of the target as reported in the previous works.7 In the past two decades, great efforts have been made to develop new MOFs materials with high-performance methane storage,8 but to date, the highest volumetric methane storage and deliverable capacity (i.e., the difference of methane storage capacity between charge pressure and delivery pressure) for all reported MOFs at 65 bar and ambient temperature are still about 270 cm3 STP cm-3 and 200 cm3 STP cm-3, respectively.9 In order to improve methane storage capacities and meet DOE standards as soon as possible, Chen et al.10 advocated slight reduction of methane charge temperature, which was considered as a most effective and direct strategy in spite of its probability of extra energy consumption for low temperature. However, there are still a large gap between the current record storage capacities and the DOE target even at 270 K.

3 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 29

In particular, over the past decade, more than 20,000 different MOFs have been synthesized in the laboratory worldwide,10 and more than 800,000 different nanoporous materials including 137,953 predicted metal-organic framework (hMOFs),11 17970 predicted porous polymer network (hPPNs)12 and 381,178 predicted zeolitic imidazolate framework (hZIFs)13 etc. have been computationally constructed. This broad class of compounds exhibits extraordinary structural diversity that enables function-oriented design and synthesis of nanoporous materials, which is challenging for the conventional guess-and-check method to search for specific functional ones.

14Molecular

simulation technologies provided a method of high-throughput screening materials through brute-force computing their geometrical properties and gas storage capacities.15 However, these computational tools still only allow lowdimensional data analysis, roughly telling us the structure-activity relationship of MOF materials.14 Recently, the machine learning (ML) or other artificial intelligence methods have made some fundamental advances in accelerating material discovery because of the incentives of the Materials Genome Initiative.16 Support vector machine and random forest algorithms were applied first to predict gas storage and separation in MOFs.17 However, the prediction of ML models in both works did not result in high accuracy due to the small size of training sample space or the simple structural parameter descriptors. Recently, Gómez-Gualdrón’s group18 adopted other ML algorithms such as gradient boosted machines (GBM) and neural networks (NN) to explore 4 ACS Paragon Plus Environment

Page 5 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

hydrogen storage limitation and CO2 capture capabilities of MOFs. They further confirmed that both the chemistry and topology of MOF pores have a critical impact on gas adsorption, especially on polar molecules (CO2) adsorption. Considering that ML models are very data hungry, some researchers added more MOFs or structures with topological diversity into the training set.14,18,19 Thornton et al.14 adopted a combination of thermodynamic models coupled with a neural network algorithm to identify the performance limits of physical hydrogen storage from over 850,000 structures. Anderson and co-workers18a selected more topological-diversity MOFs and more ML models to investigate the effect of various functional groups on CO2 adsorption in MOFs despite of the prediction with large deviation. Anderson et al.18b adopted neural networks to improve the prediction accuracy in hydrogen adsorption with small amount of data. Aghaji et al.19 predicted the CO2/CH4 separation selectivity of about 320,000 nanoporous materials based on the support vector machine (SVM) algorithm. The SVM classifiers could correctly identify up to 90 % of highperformance MOFs. Lee et al.19 successfully used persistent homology theory of topological data analysis method to identify different nanoporous materials with similar channel geometry and high-capacity methane storage through analyzing the dataset of methane storage capacities in nanoporous materials. As mentioned above, the utility of ML algorithms demonstrated great power in materials research, including predictions of phase diagrams,20 crystal structures,21 chemical separation,19 gas storage16b and other materials 5 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 29

properties.22 Moreover, ML methods exhibited great efficiency and accuracy on treating a dataset with high-dimensional features, compared with the traditional methods for making predictions.23 In this context, a novel approach combining ML algorithms and the grandcanonical Monte Carlo (GCMC) simulations was presented here to elucidate the relationship between the characteristic properties and methane storage capacities of MOFs. Based on the reported dataset of methane storage capacities in 130,397 hypothetical MOFs (hMOFs), 11 the Henry coefficient, the type of functional groups, the atomic number density (that is the number of different elemental atoms per unit volume) and the functional group number density (that is the number of different functional groups per volume) were inserted into the original dataset as the extended features. The extended dataset will be used to identify top-performance candidate materials for methane storage as well as to elucidate relationships between multidimensional characteristic properties and performance on the machine learning algorithms basis. The Henry coefficient was chosen as a new descriptor for two important reasons: (1) the Henry coefficient was easy to calculate quickly; (2) the Henry coefficient was often used to describe the affinity between the adsorbate molecules and the porous frameworks. Therefore, those inserted descriptors were expected to enhance the prediction accuracy of machine learning models.

6 ACS Paragon Plus Environment

Page 7 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Using 102 building blocks including 5 inorganic nodes and 15 functional groups, Wilmer et al.11 had computationally assembled 137,953 hMOFs. Gravimetric and volumetric, absolute and excess methane storage capacities in 130,397 hMOFs at 35 bar and 298 K, and the geometrical parameters such as crystal density and surface area were directly extracted from the database, which

was

provided

by

the

Snurr’s

group

and

available

at

http://hmofs.norrthwestern.edu.11 It has been both experimentally and computationally proven that the insertion of functional groups into MOFs can finely adjust physical and chemical pore environment of the materials, thereby changing their properties including gas storage.24 Based on the original database, partial functional group frequency distributions and all functional group histograms were counted. The hMOFs with functional groups of m_5, m_6 and m_4 (refer to ethyl, propyl and methyl groups, respectively) demonstrate higher occurrence frequency of high-performance for methane storage (over 180 cm3 STP cm-3) than the other ones as shown in Figure 1a. Except the porous materials with m_15 functional groups, absolute volumetric methane uptakes of partial samples exceed 180 cm3 STP cm-3 in hMOFs with the same functional group. Among all hMOFs with methane capacity of over 205 cm3 STP cm-3, more than 75% of hMOFs contain those three functional groups. As shown in Figure 1b, the hMOFs with functional groups of m_5, m_6 and m_4 demonstrate the highest ratio of high-capacity ones (over 180 cm3 STP cm-3) out of the hMOFs with the same functional group. Among these top 7 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 29

hMOFs with m_5, m_6 or m_4 groups for methane adsorption, the largest number of hMOFs were constructed by the linker_28, linker_29 or linker_27 as shown in Figure S2. The almost same situation was also found in the top parent hMOFs (without functional group). It is indicated that those hMOFs based on the large conjugated linker exhibit the optimal methane capacities. The number of 16 functional groups (including -H from m_0 to m_15) and 11 chemical elements (including Zn, Cu, V, Zr etc.) per volume in all studied hMOFs were also calculated by in-house code. These descriptors were added as extra features to the original dataset together with the category of functional groups. As another characteristic descriptor, the Henry coefficient of methane adsorption in hMOFs was calculated from Widom insertions with RASPA software.25 All numerical descriptors were normalized with the standard scaler with the scikit-learn platform. 26 The dataset was randomly split into training set and test set in the ratio of 7:3. Three machine learning algorithms including support vector machine (SVM),27 random forest regression (RFR)28 and gradient boosting regression tree (GBRT)29 were used to train and test the dataset, and the quality of the training and test results was evaluated by the Pearson correlation coefficient and the root mean square error (RMSE) as following.26 n

2

r =1― RMSE =

∑i = 1(yi ― yi)2

(1)

n

∑i = 1(yi ― yi)2 n ∑i = 1(yi ― yi)2 n

(2)

8 ACS Paragon Plus Environment

Page 9 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 1. Functional group distribution in top MOFs with the high-capacity methane storage (over 180 cm3 STP cm-3) from 130,397 hMOFs dataset. a) partial functional group frequency distribution; b) all functional group histograms. Where n, yi, 𝑦𝑖 and y𝑖 refer to the number of samples, target value, predicted value and average target value, respectively. The chemical and parameter space explored by this dataset of materials is diverse. For example, hypoMOF-5015533 and hypoMOF-5036495 have achieved the maximum volumetric and gravimetric surface areas of 3607 m2/cm3 and 6947 m2/g, respectively, while some others have almost no surface area according to the original dataset.11 Figure S1 in supporting information illustrates partial geometrical feature histograms of all studied materials. HypoMOF-5033427 has achieved the maximum void fraction of 0.967 while and hypoMOF-1003774 has the minimum void fraction of 0.051. Three ML algorithms including SVM, RFR and GBRT were evaluated in this work to compare how different models predict the dataset. Model performance for 9 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 29

predicting methane storage capacities was evaluated by the Pearson correlation coefficient (r2) and the root mean square error (RMSE). In comparison, only 8 features in the original dataset as well as 37 features in our dataset as predictors of ML models were applied respectively to train the dataset. The predicted results for absolute gravimetric methane uptakes using the different datasets and ML models are illustrated in Figure 2 and in Figure 3, respectively. In general, the three ML models have all achieved more accurate predictions for absolute gravimetric methane uptakes using 37-feature input whether in training set or in test set. If using 8-feature input, RFR model results in the best predictive power and SVM model results in the worst one among all of the algorithms. The Pearson correlation coefficient (r2) and the root mean square error (RMSE) of RFR prediction reach the best of 0.9920 and 8.79, respectively in training set, while slightly worsen to 0.9407 and 23.88, respectively in test set. Especially in test set, our RFR model with 8-feature input tends to produce more significant deviations from the ideal prediction when the absolute adsorption capacity exceeds 200 cm3 (STP) g-1. Similarly, the same trend was seen for absolute gravimetric methane uptake predictions using the other two algorithms. However, if using 37-feature input in our dataset, all models result in better values of r2 and RMSE whether in training set or in test set. In particular, SVM model has achieved the best enhancement on predictive power while RFR model still results in the best predictive power in training set. At the 10 ACS Paragon Plus Environment

Page 11 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

same time, GBRT model results in the best predictive power in test set, highlighting on its best generalization ability.

Figure 2. Comparison of various ML predictions using 8-feature input in the original dataset of hMOFs with the results obtained by GCMC simulation for absolute gravimetric methane uptakes in 130,397 hMOFs. Upper row for training set and lower row for test set. The data points are colored by Gaussian kernel density estimates and the solid red lines refer to the perfect prediction (same as below). Except SVM model, the RFR and GBRT models can give feature importance in predicting methane storage uptakes, which implies the relative importance of 11 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 29

each feature during ML training. The relative importance of 37 features during RFR and GBRT training for different targets of methane storage capacity was plotted with heatmap style in Figure 4. Red color in the map implies the higher feature importance while blue color implies the lower feature importance. For RFR model, density and GSA of hMOFs shows the first and second highest feature importance during predicting gravimetric methane uptakes, while void fraction parameter also demonstrates high feature importance during all ML training. Especially, KH gives the relatively high feature importance, indicating that KH play a large role in decreasing the mean square error of methane uptake predictions. Among all chemical element features, the carbon atomic number density gives the best feature importance especially for predicting volumetric methane uptakes. In addition, the carbon, hydrogen and oxygen atomic number density play a relatively large role in improving predictive power. However, the other functional group number density demonstrates almost no contribution in enhancing prediction accuracy of RFR model because chemical effects of the functional groups had not been taken into account in the reference 11 when adsorption simulations for methane were performed by generic force fields and no atomic point charges.

12 ACS Paragon Plus Environment

Page 13 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 3. Comparison of various ML predictions using 37-feature input with our dataset of hMOFs to the results obtained by GCMC simulation for absolute gravimetric methane uptakes in 130,397 hMOFs. Upper row for SVM model, middle row for RFR model and lower row for GBRT model.

13 ACS Paragon Plus Environment

The Journal of Physical Chemistry

RFR

GBRT

Importance(%)

Importance(%)

25.20

m_15 m_14 m_13 m_12 m_11 m_10 m_9 m_8 m_7 m_6 m_5 m_4 m_3 m_2 m_1 m_0 fgcat brnv clnv fnv hnv nnv cnv onv zrnv vnv cunv znnv henrycoeff interpf interpc dens vsa gsa lcd pld vf

8.000

7.000

5.000

12.60

4.000

9.450

3.000

6.300

2.000

3.150

1.000

0.000

0.000

Nvx

15.75

Ngx

6.000

Nva

18.90

Nga

Nvx

Ngx

Nva

22.05

Nga

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 29

Different adsorption capacities

Figure 4. Feature importance heatmap for different methane adsorption capacities in 130,397 hMOFs based on the RFR and GBRT models. Nga, Nva, Ngx, and Nvx refer to absolute gravimetric, absolute volumetric, excess gravimetric and excess volumetric methane uptakes, respectively. For GBRT model, the carbon and oxygen atomic number density play the first and second most important role for predicting volumetric and gravimetric methane uptakes. Compared with RFR model, all feature importance of GBRT model were relatively balanced, indicating that more feature information was extracted in GBRT to enhance its generalization ability. Besides conventional features such as void fraction (ϕ) and surface area (GSA and VSA) from the original dataset, the number of some linker and corner atoms per volume shows high feature importance during predicting methane uptakes. At the same time, 14 ACS Paragon Plus Environment

Page 15 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Henry coefficient also gives the relatively high feature importance, indicating that Henry coefficient also play a large role in decreasing the mean square error of methane uptake predictions in GBRT model. In particular, the number of functional groups per volume demonstrates increasing importance and certainly results in enhancing prediction accuracy of GBRT model.

15 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 29

Figure 5. Comparison of various ML predictions using 37-feature input with our dataset of hMOFs to the results obtained by GCMC simulation for absolute volumetric methane uptakes in 130,397 hMOFs. Upper row for SVM model, middle row for RFR model and lower row for GBRT model. 16 ACS Paragon Plus Environment

Page 17 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Due to the volume limit of the natural gas storage tank, the performance of the on-board ANG system to store methane per unit volume must meet US DOE standards of 315 cm3 STP cm-3, which can ensure that ANG system is competitive enough for economic considerations compared to CNG systems.8a,30 Therefore, the volumetric uptakes of adsorbents are also a significant evaluation index for screening promising candidate MOFs.8b, 31 The predictions on absolute volumetric methane uptakes using 37-feature input by different ML models are shown in Figure 5. Apparently, the prediction accuracy on volumetric values by all three ML algorithms slightly decreases in comparison with that on gravimetric ones. All models result in better values of RMSE might be attributed to less mean target value (reduce from 205 cm3 g-1 to 141 cm3 STP cm-3) instead of improved prediction accuracy. GBRT model demonstrates the best generalization ability to predict non-trained dataset, while RFR model results in the best predictive power in the training set, which is in agreement with the results on absolute gravimetric uptakes for methane storage.

17 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 29

Figure 6. Relationship plots of absolute uptakes (N) versus (a) crystal density; (b) void fractions; (c) volumetric surface areas; (d) largest cavity diameters of hMOFs. Conventional 2D relationship plots as shown in Figure 6 are still an important complement to the ML approach, which can help us understand the relationship between methane storage capacities and multi-dimensional characteristic properties of metal organic frameworks.11 Figure 6a shows that the best absolute gravimetric capacities for methane storage are obtained with hMOFs containing density in the range of 0.2-0.6 g cm-3. Figure 6b illustrates how void fractions (ϕ) of hMOFs increase the absolute volumetric capacities to an extent 18 ACS Paragon Plus Environment

Page 19 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(the maximum of 267 cm3 STP cm-3); if the ϕ become too large, the capacities drop sharply to that of a free-space tank, which is due to the methane molecules in the middle of the pores do not feel the van der Waals interactions with the walls in hMOFs. Although the highest volumetric surface areas do not mean the highest volumetric capacities, in general the top-capacity hMOFs contain volumetric surface areas in range of 1500-3000 m2 cm-3 as shown in Figure 6c. The optimal pore diameter of ~7.5 Å for methane storage as shown in Figure 6d is a principle of porous material design, which will help us how to choose the length of building blocks. Table 1. Characteristic properties of hMOFs and their CH4 storage capacities at 298 K and 35 bar.

hMOFs

excess N (cm3cm-3)

similar

GCMC

MOFs

a

hMOF-0 hMOF1000000 hMOF2000504 hMOF4000226 a

exp.

GBR

a

T

VSA

GSA

LCD

(m2 cm-3)

(m2 g-1)

(Å)

IRMOF-130a

115

119

118

2262

3676

11.7 5

MIL-4730b

148

180

147

1640

1535

7.75

HKUST-130c

144

161

144

2093

2381

12.7 5

NOTT10730d

173

172

177

2166

2747

9.75

KH 

( mol kg-1 MPa-1)

0.79 6 0.60 1 0.74 1 0.72 8

ρ (g cm-3)

2.58

0.615

1.39

1.068

13.1

0.879

41.6

0.788

see the reference 11.

Among the set of hMOFs, there are some similar structures with the synthesized OFs including IRMOF-1, MIL-47, HKUST-1 (aka Cu3-BTC2) and NOTT-107.

30

The characteristic properties of those hMOFs and their CH4

storage capacities at 298 K and 35 bar were listed in Table 1. The GBRT model 19 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 29

could reproduce the excess volumetric methane uptakes calculated by the GCMC method. However, the computed results of hMOF-1000000 and hMOF2000504 are far from the experimental values of MIL-47 and HKUST-1, respectively. This may be attributed to the strong interaction between methane and unsaturated metal sites in HKUST-1 and the flexibility of the MIL-47 framework, which were all neglected during the GCMC calculation.

Figure 7. Relationship plots of KH versus VSA (left) and GSA (right) of hMOFs. According to the prediction results of three machine learning model as mentioned above, inserting the Henry coefficient as a new feature is beneficial to improve the prediction accuracy of the models, which means that there may be a certain degree of internal relationships between the Henry coefficient and the absolute volumetric capacities for methane storage. Figures 7 and 8 show the relationship plots between the Henry coefficient and surface area 20 ACS Paragon Plus Environment

Page 21 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(gravimetric and volumetric), and between absolute volumetric uptakes and the Henry coefficient for all studied hMOFs. The Henry coefficient of methane in 130,397 hMOFs varies widely, from almost zero to nearly 1,000 mol kg-1 MPa1.

In general, as seen in Figure 7, the hMOFs with the highest Henry coefficient

also show low volumetric surface areas (< 500 m2 cm-3) and low gravimetric surface areas (