Using Data Mining To Search for Perovskite Materials with Higher

Esteva, A.; Kuprel, B.; Novoa, R. A.; Ko, J.; Swetter, S. M.; Blau, H. M.; Thrun, ..... S.; Evans, J. R. G.; Sebastian, M. T. Functional ceramic mater...
10 downloads 0 Views 2MB Size
Subscriber access provided by University of Leicester

Chemical Information

Using Data Mining to Search for Perovskite Materials with Higher Specific Surface Area Li Shi, Dongping Chang, Xiaobo Ji, and Wencong Lu J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00436 • Publication Date (Web): 20 Nov 2018 Downloaded from http://pubs.acs.org on November 21, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Using Data Mining to Search for Perovskite Materials with Higher Specific Surface Area Li Shia, Dongping Changb, Xiaobo Jia, Wencong Lua,b Email:[email protected] a Department of Chemistry, College of Sciences, Shanghai University, Shanghai 200444, China b Materials Genome Institute, Shanghai University, and Shanghai Materials Genome Institute, Shanghai 200444, China

Abstract The specific surface area (SSA) of ABO3-type perovskite is one of important properties associating with photocatalytic ability. In this work, data mining methods were used to explore the relationship between the SSA (ranged 1-60m2g-1) of perovskite with its features including chemical compositions and technical parameters. The genetic algorithm (GA)-support vector regression (SVR) method was used to screen the main features for modeling. The correlation coefficient (R) between predicted SSA and experimental SSA reached as high as 0.986 for training set and 0.935 for leave-one-out cross validation (LOOCV), respectively. The ABO3-type perovskite with higher SSA can be screened out by using OCPMDM (online computation platform for materials data mining) developed in our laboratory. Further, an online web server has been developed to share the model for the prediction of SSA of ABO3-type perovskite,which is accessible at the web address: http://118.25.4.79/material_api/csk856q0fulhhhwv

Keywords: Perovskite; Data mining; Specific surface area; Visual screening; Online service

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 28

Introduction In recent years, machine learning or data mining has been successfully used in the researches on material science.1-8 For example, Xue et.al. demonstrated how to accelerate finding new materials with targeted properties by adaptive design, where the inference and global optimization were simultaneously considered to find the lowest thermal hysteresis NiTi-based shape memory alloys.

9

Hu et.al. reported that

machine-learning models of support vector regression were used to predict specific surface area of layered double hydroxide.10 Up to now, how to find a simple, efficient way to design new materials with desired property is still to be a challenge. It is expected to find new materials with targeted properties based on the data mining models, which can be used to accelerate the search for new materials. The perovskite-type oxide with formula ABO3 is shown in Figure 1. The ABO3 perovskite compounds are well known as new semiconductor photocatalysts, showing their unique values in fuel cells, 11, 12 catalysts, 13 ect. In ABO3 perovskite compound, A-site cation is usually rare earth element like La3+ , while B-site cation is transition element such as Fe3+. Both A and B site could be doped by other metal ions to improve their performances. It has been widely accepted that higher SSA could result in higher photocatalytic activity.

14

Therefore, it is a meaningful work to enlarge the

SSA by adjusting the chemical compositions of ABO3 perovskite compounds.

Figure 1 Structure of ABO3 perovskite

With the rapid development of materials Genome Initiative (MGI), more and more models of data mining were reported. However, it is inconvenient for readers to utilize the most of models without the details of black box and the input features. Therefore, it is necessary to provide an web-server for readers to utilize the models easily.15,

16

Since models should be useful to experimental scientists, we tried to

ACS Paragon Plus Environment

Page 3 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

develop a really useful machine-learning model for SSA of ABO3-type perovskite in this work, making the following six steps very clear: (i) how to collect a valid benchmark dataset to train and test the model; (ii) how to construct an effective model reflecting the intrinsic correlation between the target and the features; (iii) how to properly perform cross-validation tests to objectively evaluate the anticipated accuracy of the model; (iv) how to effectively help experimental scientists screen out the targeted materials;(v) how to establish a user-friendly web-server for the models that are accessible to the public; (vi) how to explain the model in the applications. Below, we are to describe how to deal with these steps in detail. Materials and Methods The Flowchart of Materials Data Mining

The main procedure of materials data mining was shown in Figure 2. Firstly, original samples of perovskites with known properties are prepared according to the published references, while the features can be automatically generated via the OCPMDM (online computation platform for materials data mining).17 Next, feature selection, model selection, hyper-parameter optimization and model validation were carried out to construct the model of predicting SSA of perovskites by using different kinds of materials data mining. Next, the model available can be used to assist experimental researchers for screening out the higher SSA of perovskites via visual screening of high throughput candidates of perovskites. Then an online web server was developed to share the model for the prediction of SSA of ABO3-type perovskite all over the web world. At last, the model available in the application is explained in the materials pattern recognition and sensitivity analysis. The main procedure of materials data mining covers the six steps mentioned in the introduction, which are illustrated as the following step by step.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2 The flowchart of materials data mining in this work

Data Preparation The ABO3 perovskites samples of dataset should be collected as many as possible, when they were synthesized via sol-gel synthesis method. At last, 50 samples with their SSAs ranging from 1 to 60 m2 g-1 were collected from the reference.

18-31

The dataset was divided into two subsets, i.e., the training set with 41

perovskites and the testing set with 10 perovskites, which were randomly selected from the dataset. As the same time, the samples with the SSA more than 10m2 g-1 were defined as optimal ones, while the others were defined as unsatisfactory samples. Features Generation After collecting the samples, the candidate features available should be collected to form a valid benchmark dataset to train and test the model. Table 1 lists a total of 24 candidate features including 21 atomic parameters quoted from Lange's Handbook of Chemistry(16th edition)32 and 3 technical conditions extracted from reference.18-31 All the data and their features were shown in Supporting information. Table 1 The features include atomic parameters and technical parameters

ACS Paragon Plus Environment

Page 4 of 28

Page 5 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

01.Atomic radius of the A position (Ra) 02.Atomic radius of the B position (Rb) 03. Electronegativity of the A position (Ea) 04. Electronegativity of the B position (Eb) 05. Tolerance Factor (IF) 06. Unit Cell Lattice Edge (αO3) 07.Critical Radius (rc) 08.Ionization potential of the A position (Za) 09.Ionization potential of the B position (Zb) 10.Ratio of the Atomic radius of the A position and B position (Ra/Rb) 11.Molecular mass (mass) 12.Electron affinity of A position (A-aff) 13.Electron affinity of B position (B-aff) 14.Melting point of A position (A-Tm) 15.Melting point of B position (B-Tm) 16.Normal boiling point of A position (A-Tb) 17.Normal boiling point of B position (B-Tb) 18.Enthalpy of fusion at the melting point of A position (A-Hfus) 19.Enthalpy of fusion at the melting point of B position (B-Hfus) 20.Density of A position (D-A) 21.Density of B position (D-B) 22.Calcination temperature (CT) 23.Calcination time (AH) 24.Drying temperature (DT)

Computational software In this work, the materials data mining was carried out by using the ExpMiner (Data mining software package) and OCPMDM (online computation platform for materials data mining) developed in our laboratory.17 The free version of the ExpMiner can be downloaded on the website of Laboratory of Materials Data Mining

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

in Shanghai University: http://chemdata.shu.edu.cn:8080/MyLab/Lab/download.jsp. The OCPMDM can be accessible at the web address: http://118.25.4.79/material_api/csk856q0fulhhhwv.

Results and discussion Features Selection As we all know, it is important to eliminate unnecessary features to improve the prediction performance of model. In the past, Features selection was dependent on either domain knowledge or machine learning results.33 In this work, the machine learning results of training dataset via genetic algorithm (GA)-support vector regression (SVR) approach was employed to screen the subset of features for modeling. GA-SVR can be used to find the optimal sub set of features.34, 35 In order to evaluate the features selection, the root-mean-square error (RMSE) was employed as the measures of goodness-of-fit. The RMSE 36 is defined as follows:

 (p - e ) n

RMSE=

2

i

i 1

i

n

Where ei is the experimental value and pi is the predicted value, n is the number of the whole samples in the training set. Generally, the smaller the RMSE is, the better the set of features gets. Figure 3 illustrates how GA-SVR can be used to select the materials features. After 8 generations of GA approaches, the RMSE of SVR model is the smallest with 5 features including three atomic parameters and two technical parameters. Three atomic parameters are B-aff (the Electron affinity of B position), B-Tm (the Melting point of B position) and A-Tm (the Melting point of A position), while two technical parameters are CT (the Calcination temperature) and AH (the Calcination time). Table 2 and Table 3 list the SSA and selected features of training and testing data set, respectively.

ACS Paragon Plus Environment

Page 6 of 28

Page 7 of 28

6

RMSE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

5

4

0

20

40

60

80

100

Generation

Figure 3 The RMSE versus generation of evolution related to subset of features Table 2 The SSA and selected features of 40 training samples of perovskites No.

Molecular formula

SSA

B-aff

B-Tm

A-Tb

CT

AH

(m2g-1)

(J/mol)

(℃)

(℃)

(℃)

(h)

7.6

1670

907

900

2

1

ZnTiO3

1.05

2

LaFeO3

1.08

15.7

1538

3464

900

4

3

BiFeO3

0.7514

15.7

1538

1564

900

4

4

BiTi0.15Fe0.85O3

0.9507

14.485

1557.8

1564

900

4

5

LaCoO3

17

63.8

1495

3464

6

LaCo0.94Mg0.06O3

19

57.632

1444.3

3464

750

4

7

LaCo0.90Mg0.10O3

21

53.52

1410.5

3464

750

4

8

LaCo0.80Mg0.20O3

22

43.24

1326

3464

750

4

9

La0.5Bi0.2Ba0.2Mn0.1FeO3

27.75

15.7

1538

2626.7

500

4

10

La0.5Bi0.2Ba0.2Mn0.1FeO3

12.46

15.7

1538

2626.7

700

4

11

La0.5Bi0.2Ba0.2Mn0.1FeO3

5.91

15.7

1538

2626.7

800

4

12

LaFeO3

11.39

15.7

1538

3464

600

5

13

LaMg0.2 Fe0.8O3

15.07

4.76

1360.4

3464

600

5

14

LaMg0.6Fe0.4O3

24.41

-17.12

1005.2

3464

600

5

15

LaMg0.8Fe0.2O3

13.32

-28.06

827.6

3464

600

5

16

LaMgO3

10.17

-39

650

3464

600

5

17

LaCrO3

3.95

64.3

1907

3464

600

5

ACS Paragon Plus Environment

750

4

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 28

18

LaMg0.2Cr0.8O3

8.42

43.64

1655.6

3464

600

5

19

LaMg0.6Cr0.4O3

18.41

2.32

1152.8

3464

600

5

20

PrFeO3

10.88

15.7

1538

3520

700

5

21

LaFe0.9Co0.1O3

51.2

20.51

1533.7

3464

750

10

22

LaFe0.1Co0.9O3

42.8

58.99

1499.3

3464

750

10

23

LaFeO3

8.5

15.7

1538

3464

700

3

24

SrTiO3

16.4

7.6

1670

1382

650

10

25

La0.002Sr0.998TiO3

19.7

7.6

1670

1386.164

650

10

26

La0.005Sr0.995TiO3

22.3

7.6

1670

1392.41

650

10

27

La0.01Sr0.99TiO3

24.1

7.6

1670

1402.82

650

10

28

La0.02Sr0.98TiO3

23.2

7.6

1670

1423.64

650

10

29

LaFeO3

9.5

15.7

1538

3464

700

4

30

La0.5Bi0.2Ba0.2Mn0.1FeO3

20.04

15.7

1538

3464

700

2

31

La0.5Bi0.2Ba0.2Mn0.1FeO3

8.5

15.7

1538

3464

800

2

32

La0.5Bi0.2Ba0.2Mn0.1FeO3

5.8

15.7

1538

3464

900

2

33

LaNiO3

14.1

111.5

1455

3464

600

2

34

LaNiO3

12.7

111.5

1455

3464

700

2

35

LaNiO3

6.5

111.5

1455

3464

900

2

36

LaNiO3

15.1

111.5

1455

3464

600

4

37

LaNiO3

12.2

111.5

1455

3464

600

6

38

LaFeO3

21.9

15.7

1538

3464

500

4

39

LaFeO3

5.24

15.7

1538

3464

800

4

40

LaFeO3

1.09

15.7

1538

3464

1000

4

Table 3 The SSA and selected features of 10 testing data samples of perovskites No

Molecular formula

SSA

B-aff

B-Tm

A-Tb

CT

AH

(J/mol)

(℃)

(℃)

(℃)

(h)

20.63

15.7

1538

2626.7

600

4

4.19

15.7

1538

2626.7

900

4

(m2g-1) 1

La0.5Bi0.2Ba0.2Mn0.1FeO 3

2

La0.5Bi0.2Ba0.2Mn0.1FeO 3

ACS Paragon Plus Environment

Page 9 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

3

LaMg0.4Fe0.6O3

17.63

-6.18

4

LaMg0.4Cr0.6O3

29.71

22.98

5

LaMg0.8Cr0.2O3

14.46

-18.34

6

La0.5Bi0.2Ba0.2Mn0.1FeO

25.8

1182. 8 1404. 2

3464

600

5

3464

600

5

901.4

3464

600

5

15.7

1538

3464

500

2

22.55

15.7

1538

3464

600

2

3

7

La0.5Bi0.2Ba0.2Mn0.1FeO 3

8

LaNiO3

11.8

111.5

1455

3464

800

2

9

LaFeO3

15.37

15.7

1538

3464

600

4

10

LaFeO3

10.07

15.7

1538

3464

700

4

Model Selection In order to select an optimal regression model, the leaving-one-out cross-validation (LOOCV) was undertaken to evaluate different machine learning algorithms with correlation coefficients R. In this work, three different machine learning algorithms including partial least squares (PLS),35 artificial neural networks (ANN)37 and support vector regression (SVR)38 were used to construct models for predicting SSA of perovskite. Table 4 lists the correlation coefficients and RMSE of perovskites SSA in the LOOCV using PLS, SVR and ANN, respectively. Table 4 The correlation coefficients and RMSE of perovskites SSA in LOOCV of SVR, ANN and PLS approaches Methods

PLS

ANN

SVR

Correlation coefficients of LOOCV(R)

0.542

0.762

0.892

RMSE of LOOCV

8.995

7.374

4.809

Hyper-parameter optimization After model selection, it could be seen that SVR regression model was the best one with the maximum correlation and the minimal RMSE. In order to further optimize the regression model with the most generalization ability, the ε-insensitive loss function, capacity parameter C and kernel function was optimized by conducting the grid-search and evaluating the LOOCV results of for SVR models. It was found

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

that the least RMSE was 3.745 when the optimal C, ε and the gamma of radial basis function were 73, 0.03 and 0.9, respectively (see Figure 4).

Figure 4 RMSE of LOOCV versus ε and C

Model validation In order to guarantee the diversity of model after hyper-parameter optimization, both LOOCV and 5-fold cross-validation of training dataset were carried out to evaluate the performance of SVR regression model obtained. Figure 5 (a, b) shows the plots of the predicted values versus experimental values of SSA of ABO3 perovskites based on the LOOCV and 5-fold cross-validation of training dataset, respectively. It is found that their results have little difference between the leave-one-out (LOO) cross validation and the 5-fold cross-validation, with correlation coefficients equal to 0.935 and 0.933, respectively. 60 50

Predicted SSA(m2g-1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

40 30 20 10 0 0

10

20

30

40

50

60

Experimental SSA(m2g-1)

Figure 5 (a, b) Experimental SSA versus predicted SSA based on the LOOCV (a) and 5-fold cross-validation (b) of training dataset

In order to further test the prediction ability of the models obtained, Figure 6

ACS Paragon Plus Environment

Page 10 of 28

Page 11 of 28

illustrates the plots of experimental values and predicted values of SSA of ABO3 perovskites by using the SVR model for training and test dataset, respectively. The other two testing set including 10 random samples gave the similar the prediction results that can be found in the supporting information. The RMSE and the mean relative error (MRE) for testing dataset are 1.794, 25.20%, respectively. The MRE is defined as follows: MRE 

1 n pi - ei  100%  n i1 ei

Where ei and pi are the experimental and predicted value of i, n is the number of the whole samples. 60

Predicted SSA(m2g-1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

40

20

Training set Testing set

0

0

20

40

60

Experimental SSA(m2g-1)

Figure 6 Experimental SSA versus predicted SSA of perovskite sample by SVR

It seems that the mean relative error (MRE) for testing dataset is rather large, it can be explained that many factors such as synthetic methods, calcination temperature could affect the SSA of perovskite.39 For example, the same material LaMgO3 were reported in two articles,27, 28 where the SSAs of perovskites were reported 7.13m2g-1 and 10.17m2g-1, respectively. So the MRE of experimental results is between 29.89% and 42.64%, larger than our model’s MRE. Model application Virtual Screening

In order to design the new ABO3-type perovskite with higher SSA materials, the model for predicting SSA of ABO3-type perovskite materials was integrated into the OCPMDM that could be used to screen out the ABO3-type perovskite materials with

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 28

targeted properties among innumerable candidates.17 Here perovskites crystals of our dataset including the candidates of new ABO3-type compounds are all cubic. The candidates to be screened on the OCPMDM were designed in the following steps: (1) The A-site is La element that is not doped by other metal ions. (2) The B-site is Fe element doped by Mg or Co no more than 100% with step 1.0%. (3) The Calcination temperature ranges from 500℃ to 1000℃ with step 100℃. (4) The Calcination time ranges from 2 hour to 10 hour with step 1 hour. Table 5 lists the five visual samples of ABO3-type perovskites with higher SSA screened out among 2000 candidates of perovskites by using the model available. It was found that the highest SSA of visual sample was 58.09m2g-1, exceeding the highest SSA (51.2m2g-1) in the training dataset. Therefore, it is helpful for experimental researchers to explore the new ABO3-type perovskites with higher SSA. It was also found that all candidates of new ABO3-type compounds could be cubic because of their tolerance factor (IF) between 0.85 and 0.90 as shown in Table 5. Table 5 The visual samples of ABO3-type perovskites with higher SSA screened out by using the model available

Molecular formula

SSA(m2g-1)

CT(℃)

LaFe0.8Mg0.2O3

57.70

900

10

0.8668

LaFe0.7Mg0.3O3

58.09

900

10

0.8594

LaFe0.9Co0.1O3

54.81

900

10

0.8821

LaFe0.8Co0.2O3

54.82

900

10

0.8823

LaFe0.7Co0.3O3

52.03

800

10

0.8826

AH(h)

IF

Online Prediction In order to help the experimental scientists to utilize the SVR model in designing the new ABO3-type perovskite with higher SSA, an online web server was developed to predict the SSA of ABO3-type perovskite based on the SVR model constructed. In the process of applying the model, the user need input the two technical parameters,

ACS Paragon Plus Environment

Page 13 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

while the three atomic parameters can be automatically filled via the OCPMDM. Figure 7 illustrates an example of online prediction for the SSA of ABO3-type perovskite. After inputting the material formula of new ABO3-type perovskite together with the calcination temperature and calcination time, then click the ‘predict’ button to obtain the SSA of new ABO3-type, which is very helpful for experimenter to design new ABO3-type perovskite with targeted SSA. The online web server to share the model available for the prediction of SSA of ABO3-type perovskite can be accessible

at

the

web

address



http://materialdata.shu.edu.cn/material_api/30dxpff8e49pdjza.

Figure 7 An example of online prediction for the SSA of ABO3-type perovskite

Model Explanation Materials pattern recognition

In this work, the predicted results from SVR model can be explained by using materials pattern recognition such as Fisher method.40 Figure 8 illustrates the materials pattern recognition of different samples by using Fisher method. It can be found that the samples with the SSA more than 10 m2g-1 together with the five visual samples of ABO3-type perovskites are distributed in the right side of classification diagram shown in Figure 8.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Higher SSA Lower SSA New SSA

1.5 1.0 0.5

FIS(2)

0.0 -0.5 -1.0 -1.5 -2.0 -2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

FIS(1)

(○): samples with the SSA less than 10 m2g-1, (□): the SSA more than 10 m2g-1, (△): 5 new visual samples screened out via SVR model Figure 8 Materials pattern recognition of different samples by using Fisher method Sensitivity analysis

Sensitivity analysis has been applied in many filed of data mining.41 It can be used to examine the trend of target variable depending on one of the features while the other features are kept constants. Figure 9 (a-e) illustrates sensitivity analysis of selected features (B-aff, B-Tm, A-Tb, CT, AH), respectively.

30 21

(b)

(a)

SSA(m2g-1)

SSA(m2g-1)

20 14

10

7 0

0

60

600

120

1200

1800 o

B-aff(J/mol)

B-Tm( C)

16

(c)

(d) 18

SSA(m2g-1)

14

SSA(m2g-1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 28

12

10

15

12

1000

2000

3000

4000

400

600

o A-Tb( C)

800

CT(C)

ACS Paragon Plus Environment

1000

Page 15 of 28

50

(e) 40

SSA(m2g-1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

30

20

10 3

6

9

AH(h)

Figure 9 Sensitivity analyses with (a) Electron affinity of B position (B-aff, (b) Melting point of B position (B-Tm), (c) Normal boiling point of A position (A-Tb), (d) Calcination temperature (CT) and (e)Calcination time(AH)

Conclusion This work demonstrates how to screen or design new ABO3-type perovskite with higher SSA based on the machine learning methods including SVR model. It can be concluded that the SVR model was successful in predicting the specific surface area of perovskites in a fast and easy way, which can be shared via an online web server. Therefore, it is expected that material data mining models combined with online web servers will accelerate materials design and optimization.

Acknowledgements Financial support to this work from the National Key Research and Development Program of China(No. 2016YFB0700504) is gratefully acknowledged. Supporting Information Available The Supporting Information is available free of charge on the ACS publications website at DOI:

AUTHOR INFORMATION Corresponding Author *Phone:(086-021-66132406); Email: [email protected] Reference (1) Balachandran, P. V.; Xue, D. Z.; Theiler, J.; Hogden, J.; Lookman, T., Adaptive Strategies for Materials Design using Uncertainties. Sci. Rep. 2016, 6, 19660. (2) Esteva, A.; Kuprel, B.; Novoa, R. A., Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks. Oncologie 2017, 19, 407-408. (3) Yousefi, S.; Amrollahi, F.; Amgad, M.; Dong, C.; Lewis, J. E.; Song, C.; Gutman, D. A.; Halani, S. H.; Vega, J. E. V.; Brat, D. J.; Cooper, L. A. D., Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Sci. Rep. 2017, 7, 11707.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(4) Zhai, X. Y.; Chen, M. T.; Lu, W. C., Accelerated search for perovskite materials with higher Curie temperature based on the machine learning methods. Comput. Mater.Sci. 2018, 151, 41-48. (5) Ning, X.; Walters, M.; Karypis, G., Improved Machine Learning Models for Predicting Selective Compounds (vol 52, pg 38, 2012). J. Chem. Inf. Model. 2012, 52, 1411-1411. (6) Scott, D. J.; Manos, S.; Coveney, P. V., Design of electroceramic materials using artificial neural networks and multiobjective evolutionary algorithms. J. Chem. Inf. Model. 2008, 48, 262-273. (7) Lu, W.; Xiao, R.; Yang, J.; Li, H.; Zhang, W., Data mining-aided materials discovery and optimization. Journal of Materiomics 2017, 3, 191-201. (8) Lu, W. C.; Ji, X. B.; Li, M. J.; Liu, L.; Yue, B. H.; Zhang, L. M., Using support vector machine for materials design. Adv. Manuf.2013, 1, 151-159. (9) Xue, D. Z.; Balachandran, P. V.; Hogden, J.; Theiler, J.; Xue, D. Q.; Lookman, T., Accelerated search for materials with targeted properties by adaptive design. Nat. Commun.2016, 7,11241. (10) Hu, B.; Lu, K. L.; Zhang, Q.; Ji, X. B.; Lu, W. C., Data mining assisted materials design of layered double hydroxide with desired specific surface area. Comput. Mater. Sci.2017, 136, 29-35. (11) Ullmann, H.; Trofimenko, N.; Tietz, F.; Stover, D.; Ahmad-Khanlou, A., Correlation between thermal expansion and oxide ion transport in mixed conducting perovskite-type oxides for SOFC cathodes. SSIon 2000, 138, 79-90. (12) Lai, K.-Y.; Manthiram, A., Self-Regenerating Co-Fe Nanoparticles on Perovskite Oxides as a Hydrocarbon Fuel Oxidation Catalyst in Solid Oxide Fuel Cells. Chem. Mater. 2018, 30, 2515-2525. (13) Chen, H.; Yu, H.; Peng, F.; Yang, G.; Wang, H.; Yang, J.; Tang, Y., Autothermal reforming of ethanol for hydrogen production over perovskite LaNiO3. Chem. Eng. J. 2010, 160, 333-339. (14) Bajorowicz, B.; Nadolna, J.; Lisowski, W.; Klimczuk, T.; Zaleska-Medynska, A., The effects of bifunctional linker and reflux time on the surface properties and photocatalytic activity of CdTe quantum dots decorated KTaO3 composite photocatalysts. Appl.Catal. B-Environ.2017, 203, 452-464. (15) Scott, D. J.; Manos, S.; Coveney, P. V.; Rossiny, J. C. H.; Fearn, S.; Kilner, J. A.; Pullar, R. C.; Alford, N. M. N.; Axelsson, A. K.; Zhang, Y.; Chen, L.; Yang, S.; Evans, J. R. G.; Sebastian, M. T., Functional ceramic materials database: An online resource for materials research. J. Chem. Inf. Model. 2008, 48, 449-455. (16) Zhang, Q.; Zhai, X. Y.; Xiong, P.; Kou, L.; Ji, X. B.; Lu, W. C., Prediction and synthesis of novel layered double hydroxide with desired basal spacing based on relevance vector machine. MaRBu 2017, 93, 123-129. (17) Zhang, Q.; Chang, D.; Zhai, X.; Lu, W., OCPMDM: Online computation platform for materials data mining. Chemometrics Intellig. Lab. Syst. 2018, 177, 26-34. (18) Li, S. D.; Jing, L. Q.; Fu, W.; Yang, L. B.; Xin, B. F.; Fu, H. G., Photoinduced charge property of nanosized perovskite-type LaFeO3 and its relationships with photocatalytic activity under visible irradiation. MaRBu 2007, 42, 203-212. (19) Li, Y. Y.; Yao, S. S.; Wen, W.; Xue, L. H.; Yan, Y. W., Sol-gel combustion synthesis and visible-light-driven photocatalytic property of perovskite LaNiO3. JAllC 2010, 491, 560-564. (20) Parida, K. M.; Reddy, K. H.; Martha, S.; Das, D. P.; Biswal, N., Fabrication of nanocrystalline LaFeO3: An efficient sol-gel auto-combustion assisted visible light responsive photocatalyst for water decomposition. IJHE 2010, 35, 12161-12168. (21) Sun, H. H.; Yang, H. P.; Cui, S. Z.; Nie, K.; Wu, J. M., Simultaneous Mg-Modification Inside and Outside of LaCoO3 Lattice and Their Photocatalytic Properties.Chin.J.Inorg.Chem.2016, 32, 1704-1712. (22) Tijare, S. N.; Bakardjieva, S.; Subrt, J.; Joshi, M. V.; Rayalu, S. S.; Hishita, S.; Labhsetwar, N.,

ACS Paragon Plus Environment

Page 16 of 28

Page 17 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Synthesis and visible light photocatalytic activity of nanocrystalline PrFeO3 perovskite for hydrogen generation in ethanol-water system. J. Chem. Sci.2014, 126, 517-525. (23) Tijare, S. N.; Joshi, M. V.; Padole, P. S.; Mangrulkar, P. A.; Rayalu, S. S.; Labhsetwar, N. K., Photocatalytic hydrogen generation through water splitting on nano-crystalline LaFeO3 perovskite. IJHE 2012, 37, 10451-10456. (24) Li, H. Q.; Cui, Y. M.; Wu, X. C.; Hong, W. S.; Hua, L., Effect of La Contents on the Structure and Photocatalytic Activity of La-SrTiO3 Catalysts. Chin. J. Inorg. Chem.2012, 28, 2597-2604. (25) Hu, R. S.; Li, C.; Wang, X.; Sun, Y.; Jia, H. X.; Su, H. Q.; Zhang, Y. L., Photocatalytic activities of LaFeO3 and La2FeTiO6 in p-chlorophenol degradation under visible light. Catal. Commun. 2012, 29, 35-39. (26) Tavakkoli, H.; Yazdanbakhsh, M., Fabrication of two perovskite-type oxide nanoparticles as the new adsorbents in efficient removal of a pesticide from aqueous solutions: Kinetic, thermodynamic, and adsorption studies. Microporous Mesoporous Mat.2013, 176, 86-94. (27) Josephine, B. A.; Manikandan, A.; Teresita, V. M.; Antony, S. A., Fundamental study of LaMg (x) Cr1-x O3-delta perovskites nano-photocatalysts: Sol-gel synthesis, characterization and humidity sensing. Korean J. Chem. Eng. 2016, 33, 1590-1598. (28) Teresita, V. M.; Manikandan, A.; Josephine, B. A.; Sujatha, S.; Antony, S. A., Electromagnetic Properties and Humidity-Sensing Studies of Magnetically Recoverable LaMg(x)Fe(1-x)O3-delta Perovskites Nano-photocatalysts by Sol-Gel Route. J. Supercond. Nov. Magn 2016, 29, 1691-1701. (29) Abdulkadir, I.; Jonnalagadda, S. B.; Martincigh, B. S., Synthesis and effect of annealing temperature on the structural, magnetic and photocatalytic properties of (La0.5Bi0.2Ba0.2Mn0.1) FeO(3-delta). MCP 2016, 178, 196-203. (30) Orak, C.; Atalay, S.; Ersoz, G., Photocatalytic and photo-Fenton-like degradation of methylparaben on monolith-supported perovskite-type catalysts. SS&T 2017, 52, 1310-1320. (31) Perween, S.; Ranjan, A., Improved visible-light photocatalytic activity in ZnTiO3 nanopowder prepared by sol-electrospinning. Sol. Energy Mater. Sol. Cells 2017, 163, 148-156. (32) Speight, J., Lange's Handbook of Chemistry, Sixteenth Edition. McGraw-Hill Education: New York, Chicago, San Francisco, Lisbon, London, Madrid, Mexico City, Milan, New Delhi, San Juan, Seoul, Singapore, Sydney, Toronto: 2005. (33) Mercader, A. G.; Duchowicz, P. R., Enhanced replacement method integration with genetic algorithms populations in QSAR and QSPR theories. Chemometrics Intellig. Lab. Syst. 2015, 149, 117-122. (34) Yang, S. S.; Lu, W. C.; Gu, T. H.; Yan, L. M.; Li, G. Z., QSPR Study of n-Octanol/Water Partition Coefficient of Some Aromatic Compounds Using Support Vector Regression. Mol. Inform. 2009, 28, 175-182. (35) Liu, H. X.; Zhang, R. S.; Yao, X. J.; Liu, M. C.; Hu, Z. D.; Fan, B. T., Prediction of the isoelectric point of an amino acid based on GA-PLS and SVMs. J. Chem. Inf. Comput. Sci. 2004, 44, 161-167. (36) Yang, X.; Li, M.; Su, Q.; Wu, M.; Gu, T.; Lu, W., QSAR studies on pyrrolidine amides derivatives as DPP-IV inhibitors for type 2 diabetes. Med. Chem. Res. 2013, 22, 5274-5283. (37) Rossel, R. A. V.; Behrens, T., Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 2010, 158, 46-54. (38) Niu, B.; Lu, W.-c.; Yang, S.-s.; Cai, Y.-d.; Li, G.-z., Support vector machine for SAR/QSAR of phenethyl-amines. Acta Pharmacol. Sin. 2007, 28, 1075-1086. (39) Kuang, Q.; Yang, S. H., Template Synthesis of Single-Crystal-Like Porous SrTiO3 Nanocube

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Assemblies and Their Enhanced Photocatalytic Hydrogen Evolution. ACS Appl. Mater. Interfaces 2013, 5, 3683-3690. (40) Yu, J., Nonlinear Bioprocess Monitoring Using Multiway Kernel Localized Fisher Discriminant Analysis. Ind. Eng. Chem. Res. 2011, 50, 3390-3402. (41) Yun, W. Y.; Lu, Z. Z.; Jiang, X., An efficient sampling approach for variance-based sensitivity analysis based on the law of total variance in the successive intervals without overlapping. MSSP 2018, 106, 495-510.

ACS Paragon Plus Environment

Page 18 of 28

Page 19 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Graphic for abstract 211x67mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1 Structure of ABO3 perovskite 277x233mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 20 of 28

Page 21 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 2 The flowchart of materials data mining in this work 174x101mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3 The RMSE versus generation of evolution related to subset of features 84x59mm (600 x 600 DPI)

ACS Paragon Plus Environment

Page 22 of 28

Page 23 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 4 RMSE of LOOCV versus ε and C

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5 (a, b) Experimental SSA versus predicted SSA based on the LOOCV (a) and 5-fold cross-validation (b) of training dataset 176x66mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 24 of 28

Page 25 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 6 Experimental SSA versus predicted SSA of perovskite sample by SVR 84x59mm (600 x 600 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7 An example of online prediction for the SSA of ABO3-type perovskite

ACS Paragon Plus Environment

Page 26 of 28

Page 27 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 8 Materials pattern recognition of different samples by using Fisher method 84x59mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 9 Sensitivity analyses with (a) Electron affinity of B position (B-aff, (b) Melting point of B position (BTm), (c) Normal boiling point of A position (A-Tb), (d) Calcination temperature (CT) and (e)Calcination time(AH)

ACS Paragon Plus Environment

Page 28 of 28