Subscriber access provided by University of Leicester
Chemical Information
Using Data Mining to Search for Perovskite Materials with Higher Specific Surface Area Li Shi, Dongping Chang, Xiaobo Ji, and Wencong Lu J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00436 • Publication Date (Web): 20 Nov 2018 Downloaded from http://pubs.acs.org on November 21, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Using Data Mining to Search for Perovskite Materials with Higher Specific Surface Area Li Shia, Dongping Changb, Xiaobo Jia, Wencong Lua,b Email:
[email protected] a Department of Chemistry, College of Sciences, Shanghai University, Shanghai 200444, China b Materials Genome Institute, Shanghai University, and Shanghai Materials Genome Institute, Shanghai 200444, China
Abstract The specific surface area (SSA) of ABO3-type perovskite is one of important properties associating with photocatalytic ability. In this work, data mining methods were used to explore the relationship between the SSA (ranged 1-60m2g-1) of perovskite with its features including chemical compositions and technical parameters. The genetic algorithm (GA)-support vector regression (SVR) method was used to screen the main features for modeling. The correlation coefficient (R) between predicted SSA and experimental SSA reached as high as 0.986 for training set and 0.935 for leave-one-out cross validation (LOOCV), respectively. The ABO3-type perovskite with higher SSA can be screened out by using OCPMDM (online computation platform for materials data mining) developed in our laboratory. Further, an online web server has been developed to share the model for the prediction of SSA of ABO3-type perovskite,which is accessible at the web address: http://118.25.4.79/material_api/csk856q0fulhhhwv
Keywords: Perovskite; Data mining; Specific surface area; Visual screening; Online service
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 28
Introduction In recent years, machine learning or data mining has been successfully used in the researches on material science.1-8 For example, Xue et.al. demonstrated how to accelerate finding new materials with targeted properties by adaptive design, where the inference and global optimization were simultaneously considered to find the lowest thermal hysteresis NiTi-based shape memory alloys.
9
Hu et.al. reported that
machine-learning models of support vector regression were used to predict specific surface area of layered double hydroxide.10 Up to now, how to find a simple, efficient way to design new materials with desired property is still to be a challenge. It is expected to find new materials with targeted properties based on the data mining models, which can be used to accelerate the search for new materials. The perovskite-type oxide with formula ABO3 is shown in Figure 1. The ABO3 perovskite compounds are well known as new semiconductor photocatalysts, showing their unique values in fuel cells, 11, 12 catalysts, 13 ect. In ABO3 perovskite compound, A-site cation is usually rare earth element like La3+ , while B-site cation is transition element such as Fe3+. Both A and B site could be doped by other metal ions to improve their performances. It has been widely accepted that higher SSA could result in higher photocatalytic activity.
14
Therefore, it is a meaningful work to enlarge the
SSA by adjusting the chemical compositions of ABO3 perovskite compounds.
Figure 1 Structure of ABO3 perovskite
With the rapid development of materials Genome Initiative (MGI), more and more models of data mining were reported. However, it is inconvenient for readers to utilize the most of models without the details of black box and the input features. Therefore, it is necessary to provide an web-server for readers to utilize the models easily.15,
16
Since models should be useful to experimental scientists, we tried to
ACS Paragon Plus Environment
Page 3 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
develop a really useful machine-learning model for SSA of ABO3-type perovskite in this work, making the following six steps very clear: (i) how to collect a valid benchmark dataset to train and test the model; (ii) how to construct an effective model reflecting the intrinsic correlation between the target and the features; (iii) how to properly perform cross-validation tests to objectively evaluate the anticipated accuracy of the model; (iv) how to effectively help experimental scientists screen out the targeted materials;(v) how to establish a user-friendly web-server for the models that are accessible to the public; (vi) how to explain the model in the applications. Below, we are to describe how to deal with these steps in detail. Materials and Methods The Flowchart of Materials Data Mining
The main procedure of materials data mining was shown in Figure 2. Firstly, original samples of perovskites with known properties are prepared according to the published references, while the features can be automatically generated via the OCPMDM (online computation platform for materials data mining).17 Next, feature selection, model selection, hyper-parameter optimization and model validation were carried out to construct the model of predicting SSA of perovskites by using different kinds of materials data mining. Next, the model available can be used to assist experimental researchers for screening out the higher SSA of perovskites via visual screening of high throughput candidates of perovskites. Then an online web server was developed to share the model for the prediction of SSA of ABO3-type perovskite all over the web world. At last, the model available in the application is explained in the materials pattern recognition and sensitivity analysis. The main procedure of materials data mining covers the six steps mentioned in the introduction, which are illustrated as the following step by step.
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 2 The flowchart of materials data mining in this work
Data Preparation The ABO3 perovskites samples of dataset should be collected as many as possible, when they were synthesized via sol-gel synthesis method. At last, 50 samples with their SSAs ranging from 1 to 60 m2 g-1 were collected from the reference.
18-31
The dataset was divided into two subsets, i.e., the training set with 41
perovskites and the testing set with 10 perovskites, which were randomly selected from the dataset. As the same time, the samples with the SSA more than 10m2 g-1 were defined as optimal ones, while the others were defined as unsatisfactory samples. Features Generation After collecting the samples, the candidate features available should be collected to form a valid benchmark dataset to train and test the model. Table 1 lists a total of 24 candidate features including 21 atomic parameters quoted from Lange's Handbook of Chemistry(16th edition)32 and 3 technical conditions extracted from reference.18-31 All the data and their features were shown in Supporting information. Table 1 The features include atomic parameters and technical parameters
ACS Paragon Plus Environment
Page 4 of 28
Page 5 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
01.Atomic radius of the A position (Ra) 02.Atomic radius of the B position (Rb) 03. Electronegativity of the A position (Ea) 04. Electronegativity of the B position (Eb) 05. Tolerance Factor (IF) 06. Unit Cell Lattice Edge (αO3) 07.Critical Radius (rc) 08.Ionization potential of the A position (Za) 09.Ionization potential of the B position (Zb) 10.Ratio of the Atomic radius of the A position and B position (Ra/Rb) 11.Molecular mass (mass) 12.Electron affinity of A position (A-aff) 13.Electron affinity of B position (B-aff) 14.Melting point of A position (A-Tm) 15.Melting point of B position (B-Tm) 16.Normal boiling point of A position (A-Tb) 17.Normal boiling point of B position (B-Tb) 18.Enthalpy of fusion at the melting point of A position (A-Hfus) 19.Enthalpy of fusion at the melting point of B position (B-Hfus) 20.Density of A position (D-A) 21.Density of B position (D-B) 22.Calcination temperature (CT) 23.Calcination time (AH) 24.Drying temperature (DT)
Computational software In this work, the materials data mining was carried out by using the ExpMiner (Data mining software package) and OCPMDM (online computation platform for materials data mining) developed in our laboratory.17 The free version of the ExpMiner can be downloaded on the website of Laboratory of Materials Data Mining
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
in Shanghai University: http://chemdata.shu.edu.cn:8080/MyLab/Lab/download.jsp. The OCPMDM can be accessible at the web address: http://118.25.4.79/material_api/csk856q0fulhhhwv.
Results and discussion Features Selection As we all know, it is important to eliminate unnecessary features to improve the prediction performance of model. In the past, Features selection was dependent on either domain knowledge or machine learning results.33 In this work, the machine learning results of training dataset via genetic algorithm (GA)-support vector regression (SVR) approach was employed to screen the subset of features for modeling. GA-SVR can be used to find the optimal sub set of features.34, 35 In order to evaluate the features selection, the root-mean-square error (RMSE) was employed as the measures of goodness-of-fit. The RMSE 36 is defined as follows:
(p - e ) n
RMSE=
2
i
i 1
i
n
Where ei is the experimental value and pi is the predicted value, n is the number of the whole samples in the training set. Generally, the smaller the RMSE is, the better the set of features gets. Figure 3 illustrates how GA-SVR can be used to select the materials features. After 8 generations of GA approaches, the RMSE of SVR model is the smallest with 5 features including three atomic parameters and two technical parameters. Three atomic parameters are B-aff (the Electron affinity of B position), B-Tm (the Melting point of B position) and A-Tm (the Melting point of A position), while two technical parameters are CT (the Calcination temperature) and AH (the Calcination time). Table 2 and Table 3 list the SSA and selected features of training and testing data set, respectively.
ACS Paragon Plus Environment
Page 6 of 28
Page 7 of 28
6
RMSE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
5
4
0
20
40
60
80
100
Generation
Figure 3 The RMSE versus generation of evolution related to subset of features Table 2 The SSA and selected features of 40 training samples of perovskites No.
Molecular formula
SSA
B-aff
B-Tm
A-Tb
CT
AH
(m2g-1)
(J/mol)
(℃)
(℃)
(℃)
(h)
7.6
1670
907
900
2
1
ZnTiO3
1.05
2
LaFeO3
1.08
15.7
1538
3464
900
4
3
BiFeO3
0.7514
15.7
1538
1564
900
4
4
BiTi0.15Fe0.85O3
0.9507
14.485
1557.8
1564
900
4
5
LaCoO3
17
63.8
1495
3464
6
LaCo0.94Mg0.06O3
19
57.632
1444.3
3464
750
4
7
LaCo0.90Mg0.10O3
21
53.52
1410.5
3464
750
4
8
LaCo0.80Mg0.20O3
22
43.24
1326
3464
750
4
9
La0.5Bi0.2Ba0.2Mn0.1FeO3
27.75
15.7
1538
2626.7
500
4
10
La0.5Bi0.2Ba0.2Mn0.1FeO3
12.46
15.7
1538
2626.7
700
4
11
La0.5Bi0.2Ba0.2Mn0.1FeO3
5.91
15.7
1538
2626.7
800
4
12
LaFeO3
11.39
15.7
1538
3464
600
5
13
LaMg0.2 Fe0.8O3
15.07
4.76
1360.4
3464
600
5
14
LaMg0.6Fe0.4O3
24.41
-17.12
1005.2
3464
600
5
15
LaMg0.8Fe0.2O3
13.32
-28.06
827.6
3464
600
5
16
LaMgO3
10.17
-39
650
3464
600
5
17
LaCrO3
3.95
64.3
1907
3464
600
5
ACS Paragon Plus Environment
750
4
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 28
18
LaMg0.2Cr0.8O3
8.42
43.64
1655.6
3464
600
5
19
LaMg0.6Cr0.4O3
18.41
2.32
1152.8
3464
600
5
20
PrFeO3
10.88
15.7
1538
3520
700
5
21
LaFe0.9Co0.1O3
51.2
20.51
1533.7
3464
750
10
22
LaFe0.1Co0.9O3
42.8
58.99
1499.3
3464
750
10
23
LaFeO3
8.5
15.7
1538
3464
700
3
24
SrTiO3
16.4
7.6
1670
1382
650
10
25
La0.002Sr0.998TiO3
19.7
7.6
1670
1386.164
650
10
26
La0.005Sr0.995TiO3
22.3
7.6
1670
1392.41
650
10
27
La0.01Sr0.99TiO3
24.1
7.6
1670
1402.82
650
10
28
La0.02Sr0.98TiO3
23.2
7.6
1670
1423.64
650
10
29
LaFeO3
9.5
15.7
1538
3464
700
4
30
La0.5Bi0.2Ba0.2Mn0.1FeO3
20.04
15.7
1538
3464
700
2
31
La0.5Bi0.2Ba0.2Mn0.1FeO3
8.5
15.7
1538
3464
800
2
32
La0.5Bi0.2Ba0.2Mn0.1FeO3
5.8
15.7
1538
3464
900
2
33
LaNiO3
14.1
111.5
1455
3464
600
2
34
LaNiO3
12.7
111.5
1455
3464
700
2
35
LaNiO3
6.5
111.5
1455
3464
900
2
36
LaNiO3
15.1
111.5
1455
3464
600
4
37
LaNiO3
12.2
111.5
1455
3464
600
6
38
LaFeO3
21.9
15.7
1538
3464
500
4
39
LaFeO3
5.24
15.7
1538
3464
800
4
40
LaFeO3
1.09
15.7
1538
3464
1000
4
Table 3 The SSA and selected features of 10 testing data samples of perovskites No
Molecular formula
SSA
B-aff
B-Tm
A-Tb
CT
AH
(J/mol)
(℃)
(℃)
(℃)
(h)
20.63
15.7
1538
2626.7
600
4
4.19
15.7
1538
2626.7
900
4
(m2g-1) 1
La0.5Bi0.2Ba0.2Mn0.1FeO 3
2
La0.5Bi0.2Ba0.2Mn0.1FeO 3
ACS Paragon Plus Environment
Page 9 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
3
LaMg0.4Fe0.6O3
17.63
-6.18
4
LaMg0.4Cr0.6O3
29.71
22.98
5
LaMg0.8Cr0.2O3
14.46
-18.34
6
La0.5Bi0.2Ba0.2Mn0.1FeO
25.8
1182. 8 1404. 2
3464
600
5
3464
600
5
901.4
3464
600
5
15.7
1538
3464
500
2
22.55
15.7
1538
3464
600
2
3
7
La0.5Bi0.2Ba0.2Mn0.1FeO 3
8
LaNiO3
11.8
111.5
1455
3464
800
2
9
LaFeO3
15.37
15.7
1538
3464
600
4
10
LaFeO3
10.07
15.7
1538
3464
700
4
Model Selection In order to select an optimal regression model, the leaving-one-out cross-validation (LOOCV) was undertaken to evaluate different machine learning algorithms with correlation coefficients R. In this work, three different machine learning algorithms including partial least squares (PLS),35 artificial neural networks (ANN)37 and support vector regression (SVR)38 were used to construct models for predicting SSA of perovskite. Table 4 lists the correlation coefficients and RMSE of perovskites SSA in the LOOCV using PLS, SVR and ANN, respectively. Table 4 The correlation coefficients and RMSE of perovskites SSA in LOOCV of SVR, ANN and PLS approaches Methods
PLS
ANN
SVR
Correlation coefficients of LOOCV(R)
0.542
0.762
0.892
RMSE of LOOCV
8.995
7.374
4.809
Hyper-parameter optimization After model selection, it could be seen that SVR regression model was the best one with the maximum correlation and the minimal RMSE. In order to further optimize the regression model with the most generalization ability, the ε-insensitive loss function, capacity parameter C and kernel function was optimized by conducting the grid-search and evaluating the LOOCV results of for SVR models. It was found
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
that the least RMSE was 3.745 when the optimal C, ε and the gamma of radial basis function were 73, 0.03 and 0.9, respectively (see Figure 4).
Figure 4 RMSE of LOOCV versus ε and C
Model validation In order to guarantee the diversity of model after hyper-parameter optimization, both LOOCV and 5-fold cross-validation of training dataset were carried out to evaluate the performance of SVR regression model obtained. Figure 5 (a, b) shows the plots of the predicted values versus experimental values of SSA of ABO3 perovskites based on the LOOCV and 5-fold cross-validation of training dataset, respectively. It is found that their results have little difference between the leave-one-out (LOO) cross validation and the 5-fold cross-validation, with correlation coefficients equal to 0.935 and 0.933, respectively. 60 50
Predicted SSA(m2g-1)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(a)
40 30 20 10 0 0
10
20
30
40
50
60
Experimental SSA(m2g-1)
Figure 5 (a, b) Experimental SSA versus predicted SSA based on the LOOCV (a) and 5-fold cross-validation (b) of training dataset
In order to further test the prediction ability of the models obtained, Figure 6
ACS Paragon Plus Environment
Page 10 of 28
Page 11 of 28
illustrates the plots of experimental values and predicted values of SSA of ABO3 perovskites by using the SVR model for training and test dataset, respectively. The other two testing set including 10 random samples gave the similar the prediction results that can be found in the supporting information. The RMSE and the mean relative error (MRE) for testing dataset are 1.794, 25.20%, respectively. The MRE is defined as follows: MRE
1 n pi - ei 100% n i1 ei
Where ei and pi are the experimental and predicted value of i, n is the number of the whole samples. 60
Predicted SSA(m2g-1)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
40
20
Training set Testing set
0
0
20
40
60
Experimental SSA(m2g-1)
Figure 6 Experimental SSA versus predicted SSA of perovskite sample by SVR
It seems that the mean relative error (MRE) for testing dataset is rather large, it can be explained that many factors such as synthetic methods, calcination temperature could affect the SSA of perovskite.39 For example, the same material LaMgO3 were reported in two articles,27, 28 where the SSAs of perovskites were reported 7.13m2g-1 and 10.17m2g-1, respectively. So the MRE of experimental results is between 29.89% and 42.64%, larger than our model’s MRE. Model application Virtual Screening
In order to design the new ABO3-type perovskite with higher SSA materials, the model for predicting SSA of ABO3-type perovskite materials was integrated into the OCPMDM that could be used to screen out the ABO3-type perovskite materials with
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 28
targeted properties among innumerable candidates.17 Here perovskites crystals of our dataset including the candidates of new ABO3-type compounds are all cubic. The candidates to be screened on the OCPMDM were designed in the following steps: (1) The A-site is La element that is not doped by other metal ions. (2) The B-site is Fe element doped by Mg or Co no more than 100% with step 1.0%. (3) The Calcination temperature ranges from 500℃ to 1000℃ with step 100℃. (4) The Calcination time ranges from 2 hour to 10 hour with step 1 hour. Table 5 lists the five visual samples of ABO3-type perovskites with higher SSA screened out among 2000 candidates of perovskites by using the model available. It was found that the highest SSA of visual sample was 58.09m2g-1, exceeding the highest SSA (51.2m2g-1) in the training dataset. Therefore, it is helpful for experimental researchers to explore the new ABO3-type perovskites with higher SSA. It was also found that all candidates of new ABO3-type compounds could be cubic because of their tolerance factor (IF) between 0.85 and 0.90 as shown in Table 5. Table 5 The visual samples of ABO3-type perovskites with higher SSA screened out by using the model available
Molecular formula
SSA(m2g-1)
CT(℃)
LaFe0.8Mg0.2O3
57.70
900
10
0.8668
LaFe0.7Mg0.3O3
58.09
900
10
0.8594
LaFe0.9Co0.1O3
54.81
900
10
0.8821
LaFe0.8Co0.2O3
54.82
900
10
0.8823
LaFe0.7Co0.3O3
52.03
800
10
0.8826
AH(h)
IF
Online Prediction In order to help the experimental scientists to utilize the SVR model in designing the new ABO3-type perovskite with higher SSA, an online web server was developed to predict the SSA of ABO3-type perovskite based on the SVR model constructed. In the process of applying the model, the user need input the two technical parameters,
ACS Paragon Plus Environment
Page 13 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
while the three atomic parameters can be automatically filled via the OCPMDM. Figure 7 illustrates an example of online prediction for the SSA of ABO3-type perovskite. After inputting the material formula of new ABO3-type perovskite together with the calcination temperature and calcination time, then click the ‘predict’ button to obtain the SSA of new ABO3-type, which is very helpful for experimenter to design new ABO3-type perovskite with targeted SSA. The online web server to share the model available for the prediction of SSA of ABO3-type perovskite can be accessible
at
the
web
address
:
http://materialdata.shu.edu.cn/material_api/30dxpff8e49pdjza.
Figure 7 An example of online prediction for the SSA of ABO3-type perovskite
Model Explanation Materials pattern recognition
In this work, the predicted results from SVR model can be explained by using materials pattern recognition such as Fisher method.40 Figure 8 illustrates the materials pattern recognition of different samples by using Fisher method. It can be found that the samples with the SSA more than 10 m2g-1 together with the five visual samples of ABO3-type perovskites are distributed in the right side of classification diagram shown in Figure 8.
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
Higher SSA Lower SSA New SSA
1.5 1.0 0.5
FIS(2)
0.0 -0.5 -1.0 -1.5 -2.0 -2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
FIS(1)
(○): samples with the SSA less than 10 m2g-1, (□): the SSA more than 10 m2g-1, (△): 5 new visual samples screened out via SVR model Figure 8 Materials pattern recognition of different samples by using Fisher method Sensitivity analysis
Sensitivity analysis has been applied in many filed of data mining.41 It can be used to examine the trend of target variable depending on one of the features while the other features are kept constants. Figure 9 (a-e) illustrates sensitivity analysis of selected features (B-aff, B-Tm, A-Tb, CT, AH), respectively.
30 21
(b)
(a)
SSA(m2g-1)
SSA(m2g-1)
20 14
10
7 0
0
60
600
120
1200
1800 o
B-aff(J/mol)
B-Tm( C)
16
(c)
(d) 18
SSA(m2g-1)
14
SSA(m2g-1)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 28
12
10
15
12
1000
2000
3000
4000
400
600
o A-Tb( C)
800
CT(C)
ACS Paragon Plus Environment
1000
Page 15 of 28
50
(e) 40
SSA(m2g-1)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
30
20
10 3
6
9
AH(h)
Figure 9 Sensitivity analyses with (a) Electron affinity of B position (B-aff, (b) Melting point of B position (B-Tm), (c) Normal boiling point of A position (A-Tb), (d) Calcination temperature (CT) and (e)Calcination time(AH)
Conclusion This work demonstrates how to screen or design new ABO3-type perovskite with higher SSA based on the machine learning methods including SVR model. It can be concluded that the SVR model was successful in predicting the specific surface area of perovskites in a fast and easy way, which can be shared via an online web server. Therefore, it is expected that material data mining models combined with online web servers will accelerate materials design and optimization.
Acknowledgements Financial support to this work from the National Key Research and Development Program of China(No. 2016YFB0700504) is gratefully acknowledged. Supporting Information Available The Supporting Information is available free of charge on the ACS publications website at DOI:
AUTHOR INFORMATION Corresponding Author *Phone:(086-021-66132406); Email:
[email protected] Reference (1) Balachandran, P. V.; Xue, D. Z.; Theiler, J.; Hogden, J.; Lookman, T., Adaptive Strategies for Materials Design using Uncertainties. Sci. Rep. 2016, 6, 19660. (2) Esteva, A.; Kuprel, B.; Novoa, R. A., Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks. Oncologie 2017, 19, 407-408. (3) Yousefi, S.; Amrollahi, F.; Amgad, M.; Dong, C.; Lewis, J. E.; Song, C.; Gutman, D. A.; Halani, S. H.; Vega, J. E. V.; Brat, D. J.; Cooper, L. A. D., Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Sci. Rep. 2017, 7, 11707.
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(4) Zhai, X. Y.; Chen, M. T.; Lu, W. C., Accelerated search for perovskite materials with higher Curie temperature based on the machine learning methods. Comput. Mater.Sci. 2018, 151, 41-48. (5) Ning, X.; Walters, M.; Karypis, G., Improved Machine Learning Models for Predicting Selective Compounds (vol 52, pg 38, 2012). J. Chem. Inf. Model. 2012, 52, 1411-1411. (6) Scott, D. J.; Manos, S.; Coveney, P. V., Design of electroceramic materials using artificial neural networks and multiobjective evolutionary algorithms. J. Chem. Inf. Model. 2008, 48, 262-273. (7) Lu, W.; Xiao, R.; Yang, J.; Li, H.; Zhang, W., Data mining-aided materials discovery and optimization. Journal of Materiomics 2017, 3, 191-201. (8) Lu, W. C.; Ji, X. B.; Li, M. J.; Liu, L.; Yue, B. H.; Zhang, L. M., Using support vector machine for materials design. Adv. Manuf.2013, 1, 151-159. (9) Xue, D. Z.; Balachandran, P. V.; Hogden, J.; Theiler, J.; Xue, D. Q.; Lookman, T., Accelerated search for materials with targeted properties by adaptive design. Nat. Commun.2016, 7,11241. (10) Hu, B.; Lu, K. L.; Zhang, Q.; Ji, X. B.; Lu, W. C., Data mining assisted materials design of layered double hydroxide with desired specific surface area. Comput. Mater. Sci.2017, 136, 29-35. (11) Ullmann, H.; Trofimenko, N.; Tietz, F.; Stover, D.; Ahmad-Khanlou, A., Correlation between thermal expansion and oxide ion transport in mixed conducting perovskite-type oxides for SOFC cathodes. SSIon 2000, 138, 79-90. (12) Lai, K.-Y.; Manthiram, A., Self-Regenerating Co-Fe Nanoparticles on Perovskite Oxides as a Hydrocarbon Fuel Oxidation Catalyst in Solid Oxide Fuel Cells. Chem. Mater. 2018, 30, 2515-2525. (13) Chen, H.; Yu, H.; Peng, F.; Yang, G.; Wang, H.; Yang, J.; Tang, Y., Autothermal reforming of ethanol for hydrogen production over perovskite LaNiO3. Chem. Eng. J. 2010, 160, 333-339. (14) Bajorowicz, B.; Nadolna, J.; Lisowski, W.; Klimczuk, T.; Zaleska-Medynska, A., The effects of bifunctional linker and reflux time on the surface properties and photocatalytic activity of CdTe quantum dots decorated KTaO3 composite photocatalysts. Appl.Catal. B-Environ.2017, 203, 452-464. (15) Scott, D. J.; Manos, S.; Coveney, P. V.; Rossiny, J. C. H.; Fearn, S.; Kilner, J. A.; Pullar, R. C.; Alford, N. M. N.; Axelsson, A. K.; Zhang, Y.; Chen, L.; Yang, S.; Evans, J. R. G.; Sebastian, M. T., Functional ceramic materials database: An online resource for materials research. J. Chem. Inf. Model. 2008, 48, 449-455. (16) Zhang, Q.; Zhai, X. Y.; Xiong, P.; Kou, L.; Ji, X. B.; Lu, W. C., Prediction and synthesis of novel layered double hydroxide with desired basal spacing based on relevance vector machine. MaRBu 2017, 93, 123-129. (17) Zhang, Q.; Chang, D.; Zhai, X.; Lu, W., OCPMDM: Online computation platform for materials data mining. Chemometrics Intellig. Lab. Syst. 2018, 177, 26-34. (18) Li, S. D.; Jing, L. Q.; Fu, W.; Yang, L. B.; Xin, B. F.; Fu, H. G., Photoinduced charge property of nanosized perovskite-type LaFeO3 and its relationships with photocatalytic activity under visible irradiation. MaRBu 2007, 42, 203-212. (19) Li, Y. Y.; Yao, S. S.; Wen, W.; Xue, L. H.; Yan, Y. W., Sol-gel combustion synthesis and visible-light-driven photocatalytic property of perovskite LaNiO3. JAllC 2010, 491, 560-564. (20) Parida, K. M.; Reddy, K. H.; Martha, S.; Das, D. P.; Biswal, N., Fabrication of nanocrystalline LaFeO3: An efficient sol-gel auto-combustion assisted visible light responsive photocatalyst for water decomposition. IJHE 2010, 35, 12161-12168. (21) Sun, H. H.; Yang, H. P.; Cui, S. Z.; Nie, K.; Wu, J. M., Simultaneous Mg-Modification Inside and Outside of LaCoO3 Lattice and Their Photocatalytic Properties.Chin.J.Inorg.Chem.2016, 32, 1704-1712. (22) Tijare, S. N.; Bakardjieva, S.; Subrt, J.; Joshi, M. V.; Rayalu, S. S.; Hishita, S.; Labhsetwar, N.,
ACS Paragon Plus Environment
Page 16 of 28
Page 17 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Synthesis and visible light photocatalytic activity of nanocrystalline PrFeO3 perovskite for hydrogen generation in ethanol-water system. J. Chem. Sci.2014, 126, 517-525. (23) Tijare, S. N.; Joshi, M. V.; Padole, P. S.; Mangrulkar, P. A.; Rayalu, S. S.; Labhsetwar, N. K., Photocatalytic hydrogen generation through water splitting on nano-crystalline LaFeO3 perovskite. IJHE 2012, 37, 10451-10456. (24) Li, H. Q.; Cui, Y. M.; Wu, X. C.; Hong, W. S.; Hua, L., Effect of La Contents on the Structure and Photocatalytic Activity of La-SrTiO3 Catalysts. Chin. J. Inorg. Chem.2012, 28, 2597-2604. (25) Hu, R. S.; Li, C.; Wang, X.; Sun, Y.; Jia, H. X.; Su, H. Q.; Zhang, Y. L., Photocatalytic activities of LaFeO3 and La2FeTiO6 in p-chlorophenol degradation under visible light. Catal. Commun. 2012, 29, 35-39. (26) Tavakkoli, H.; Yazdanbakhsh, M., Fabrication of two perovskite-type oxide nanoparticles as the new adsorbents in efficient removal of a pesticide from aqueous solutions: Kinetic, thermodynamic, and adsorption studies. Microporous Mesoporous Mat.2013, 176, 86-94. (27) Josephine, B. A.; Manikandan, A.; Teresita, V. M.; Antony, S. A., Fundamental study of LaMg (x) Cr1-x O3-delta perovskites nano-photocatalysts: Sol-gel synthesis, characterization and humidity sensing. Korean J. Chem. Eng. 2016, 33, 1590-1598. (28) Teresita, V. M.; Manikandan, A.; Josephine, B. A.; Sujatha, S.; Antony, S. A., Electromagnetic Properties and Humidity-Sensing Studies of Magnetically Recoverable LaMg(x)Fe(1-x)O3-delta Perovskites Nano-photocatalysts by Sol-Gel Route. J. Supercond. Nov. Magn 2016, 29, 1691-1701. (29) Abdulkadir, I.; Jonnalagadda, S. B.; Martincigh, B. S., Synthesis and effect of annealing temperature on the structural, magnetic and photocatalytic properties of (La0.5Bi0.2Ba0.2Mn0.1) FeO(3-delta). MCP 2016, 178, 196-203. (30) Orak, C.; Atalay, S.; Ersoz, G., Photocatalytic and photo-Fenton-like degradation of methylparaben on monolith-supported perovskite-type catalysts. SS&T 2017, 52, 1310-1320. (31) Perween, S.; Ranjan, A., Improved visible-light photocatalytic activity in ZnTiO3 nanopowder prepared by sol-electrospinning. Sol. Energy Mater. Sol. Cells 2017, 163, 148-156. (32) Speight, J., Lange's Handbook of Chemistry, Sixteenth Edition. McGraw-Hill Education: New York, Chicago, San Francisco, Lisbon, London, Madrid, Mexico City, Milan, New Delhi, San Juan, Seoul, Singapore, Sydney, Toronto: 2005. (33) Mercader, A. G.; Duchowicz, P. R., Enhanced replacement method integration with genetic algorithms populations in QSAR and QSPR theories. Chemometrics Intellig. Lab. Syst. 2015, 149, 117-122. (34) Yang, S. S.; Lu, W. C.; Gu, T. H.; Yan, L. M.; Li, G. Z., QSPR Study of n-Octanol/Water Partition Coefficient of Some Aromatic Compounds Using Support Vector Regression. Mol. Inform. 2009, 28, 175-182. (35) Liu, H. X.; Zhang, R. S.; Yao, X. J.; Liu, M. C.; Hu, Z. D.; Fan, B. T., Prediction of the isoelectric point of an amino acid based on GA-PLS and SVMs. J. Chem. Inf. Comput. Sci. 2004, 44, 161-167. (36) Yang, X.; Li, M.; Su, Q.; Wu, M.; Gu, T.; Lu, W., QSAR studies on pyrrolidine amides derivatives as DPP-IV inhibitors for type 2 diabetes. Med. Chem. Res. 2013, 22, 5274-5283. (37) Rossel, R. A. V.; Behrens, T., Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 2010, 158, 46-54. (38) Niu, B.; Lu, W.-c.; Yang, S.-s.; Cai, Y.-d.; Li, G.-z., Support vector machine for SAR/QSAR of phenethyl-amines. Acta Pharmacol. Sin. 2007, 28, 1075-1086. (39) Kuang, Q.; Yang, S. H., Template Synthesis of Single-Crystal-Like Porous SrTiO3 Nanocube
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Assemblies and Their Enhanced Photocatalytic Hydrogen Evolution. ACS Appl. Mater. Interfaces 2013, 5, 3683-3690. (40) Yu, J., Nonlinear Bioprocess Monitoring Using Multiway Kernel Localized Fisher Discriminant Analysis. Ind. Eng. Chem. Res. 2011, 50, 3390-3402. (41) Yun, W. Y.; Lu, Z. Z.; Jiang, X., An efficient sampling approach for variance-based sensitivity analysis based on the law of total variance in the successive intervals without overlapping. MSSP 2018, 106, 495-510.
ACS Paragon Plus Environment
Page 18 of 28
Page 19 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Graphic for abstract 211x67mm (300 x 300 DPI)
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 1 Structure of ABO3 perovskite 277x233mm (300 x 300 DPI)
ACS Paragon Plus Environment
Page 20 of 28
Page 21 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Figure 2 The flowchart of materials data mining in this work 174x101mm (300 x 300 DPI)
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 3 The RMSE versus generation of evolution related to subset of features 84x59mm (600 x 600 DPI)
ACS Paragon Plus Environment
Page 22 of 28
Page 23 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Figure 4 RMSE of LOOCV versus ε and C
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 5 (a, b) Experimental SSA versus predicted SSA based on the LOOCV (a) and 5-fold cross-validation (b) of training dataset 176x66mm (300 x 300 DPI)
ACS Paragon Plus Environment
Page 24 of 28
Page 25 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Figure 6 Experimental SSA versus predicted SSA of perovskite sample by SVR 84x59mm (600 x 600 DPI)
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 7 An example of online prediction for the SSA of ABO3-type perovskite
ACS Paragon Plus Environment
Page 26 of 28
Page 27 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Figure 8 Materials pattern recognition of different samples by using Fisher method 84x59mm (300 x 300 DPI)
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 9 Sensitivity analyses with (a) Electron affinity of B position (B-aff, (b) Melting point of B position (BTm), (c) Normal boiling point of A position (A-Tb), (d) Calcination temperature (CT) and (e)Calcination time(AH)
ACS Paragon Plus Environment
Page 28 of 28