Rationalizing Perovskites Data for Machine Learning and Materials

5 days ago - Machine learning has been recently used for novel perovskite ... Design of High-Efficiency and Environmentally Stable Mixed-Dimensional ...
2 downloads 0 Views 931KB Size
Subscriber access provided by University of Leicester

Energy Conversion and Storage; Plasmonics and Optoelectronics

Rationalizing Perovskites Data for Machine Learning and Materials Design Qichen Xu, Zhenzhu Li, Miao Liu, and Wan-Jian Yin J. Phys. Chem. Lett., Just Accepted Manuscript • DOI: 10.1021/acs.jpclett.8b03232 • Publication Date (Web): 27 Nov 2018 Downloaded from http://pubs.acs.org on November 28, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Rationalizing Perovskites Data for Machine Learning and Materials Design Qichen Xu,1,2 Zhenzhu Li,1,2 Miao Liu3, Wan-Jian Yin1,2* 1

College of Energy, Soochow Institute for Energy and Materials InnovationS (SIEMIS), Soochow University, Suzhou 215006, China 2 Key

Laboratory of Advanced Carbon Materials and Wearable Energy Technologies of Jiangsu Province, Soochow University, Suzhou 215006, China 3 Institute

of Physics (IOP), Chinese Academy of Science (CAS), China

Email: [email protected]

Abstract Machine learning has been recently used for novel perovskite designs, owing to the availability of large amount of perovskite formability data. Trustworthy results should be based on the valid and reliable data that can reveal the nature of materials as much as possible. In this study, a procedure has been developed to identify the formability of perovskites for all the compounds with the stoichiometry of ABX3 and (A′A′′)(B′B′′)X6, that exist in experiments and are stored in the database of Materials Projects. Our results have enriched data of perovskite formability in a large extent and corrected the possible errors of previous data in ABO3 compounds. Furthermore, machine learning with multiple models approach have identified the A2B′B′′O6 compounds that have suspicious formability results in current experimental data. Therefore, further experimental validation experiments are called for. This work paves a way for cleaning perovskite formability data for reliable machine learning work in future.

TOC

1 ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Perovskites exhibit novel properties in various aspects such as superconductivity, ferroelectricity, opto-electricity, magnetoresistance and ionic conductivity, therefore, play significant roles in applications of electronics, energy conversion, and catalysis, which largely depend on their great structural and compositional flexibility1-3. Searching for perovskite compounds that can meet particular application requirements and boost the performance for specific functionalities is of a fundamental interest for materials science community. For example, halide perovskites have recently attracted worldwide interest in solar cell field4-7 due to their superior properties8 such as extremely high optical absorption coefficient, super long carrier diffusion length and low-temperature solution processability. However, the intrinsic poor longterm stability and the containing of toxic Pb element have propelled researchers to seek for better compounds with improved chemical stability and environment-friendly composition. Such materials discovery is likely to be accelerated by machine-learning (ML) method. Recently, machine learning has been employed for novel materials design including perovskites10, owing to the vast experimental and computational perovskite data accumulated in the last fifty years11-15. For example, Pilania et al. utilized a dataset of 185 ABX3 compounds in experiments, built a classification model and proposed 40 new perovskites16. Balachandran et al. trained models based on a dataset of 390 ABO3 compounds in experiments and predicted the possibility of 235 other ABO3 perovskites17. Bartel et al. proposed a new tolerance factor based on 576 experimentally known ABX3 compounds (369 oxides, 207 halides)18. Li et al.19 trained a ML model based on the thermodynamics of 1,929 oxide perovskites calculated in the Materials Project20. Xie et al. also used DFT-calculated data in the Materials Project and established a smart crystal graph of convolutional neural networks for properties predictions from crystal structures21. Lu et al.22 identified six lead-free hybrid organic-inorganic perovskites with proper bandgap for solar cells and room temperature thermal stability from 5158 unexplored candidates. Instead of conventional computational materials design, which derived materials properties according to physical laws, e.g., solving Kohn-Sham equation, machine learning can learn the hidden rules based on a large data set and build a model to make corresponding predictions. The advantage of machine learning is its efficient algorithm in comparison to conventional methods for materials design, enabling acceleration of materials discovery including energy-related materials23-25. In principle, all nature are supposed to be buried under the existing training data and the physical rules are purely learnt based on those data. Therefore, the reliability and validity of data is the prerequisite for a reliable machine learning work. So far, the main data for perovskite formability can be found in a few literature [refs.11-13] which summarized the existing experimental data, and prevalent database such as Materials Projects20 and OQMD26. Due to the unavoidable human and measurements mistake, those data require further preprocessing to make it more visible and clear and reflect the natural facts for future machine learning work10, 27. For perovskite formability, the following issues should be addressed before utilizing those data: (i) For a particular ABX3 compound, previous data identify this compound as either YES or NO to form a perovskite, which is based on one sole experiment. In fact, this compound may or may not form perovskite dependent on experimental environment, which is also reflected by different ICSD 2 ACS Paragon Plus Environment

Page 2 of 17

Page 3 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

numbers for one compounds in Inorganic Crystal Structure Database (ICSD). A reliable database should include all those formability information. (ii) In ideal perovskite ABX3, B is at the center of BX6 octahedron with six coordination and A has twelve coordination. For a given chemical formula M1M2X3 (M1 and M2 are cations), it is crucial to identify which one of M1 and M2 is A or B. However, this may be not straightforward in particular to some double perovskites with chemical formula for example BaHo(CoO3)2 (Co is at B site) and TmSb(PbO3)2 (Tm/Sb is at B site) as in Materials Project database. (iii) For some perovskites ABX3, although we know which one is A or B, it is still not straightforward to provide proper features such as ionic radii, which are the most widely-used features in machine learning study for the perovskite formability, because A and B may have different combinations of charged states to meet the charge balance. For example, TbMnO3 can be considered as either Tb3+Mn3+O3 or Tb4+Mn2+O3 and Tb3+(Mn3+) and Tb4+(Mn2+) have different radii for different valence charge. (iv) Previous data is mainly for single perovskite ABX3. Data for double perovskite is still scarce. To address the aforementioned issues, we develop a procedure, as shown in Figure 1, to identify the formability of perovskites for all the ABX3 and (A′A′′)(B′B′′)X6 compounds (X = O, S, Se, Te, F, Cl, Br, I) in Materials Projects20. Since double-A-site perovskites are much less than double-B-site compounds found in the database, we will refer double perovskites as double-B-site A2B′B′′X6 in this study. Meanwhile, crystal structures only with ICSD numbers28 are selected, since there are some hypothetic structures in the database, which may not reflect the truth of experiments. The data of crystal structures in Materials Project is organized by MP ID, which may fold several similar ICSD structures into a single MP ID. It is also possible that different MP IDs may indicate the same crystal structures, since the data was updated by different purposes at different time. Therefore, we cleaned up the data set and the concise data set shows that that there are 590 ABX3 compounds and 538 A2B′B′′X6 compounds in total, and the details shown in Table 1. Our data expand previous experimental perovskite data in a large11-13. A comprehensive analysis of perovskite formability on correlated features, including atomic numbers, ionic radii, electronegativity, tolerance and octahedral factor, are performed to provide a human-readable visualization view on those data, which may provide insight for feature selection in machine learning. Moreover, the procedure has identified eleven ABO3 compounds that have different formability data from previous reports. A machine learning approach with multiple models have been used to identify suspicious data for perovskite formability of A2B′B′′O6 compounds. Excluding those suspicious data, machine learning can achieve prediction accuracy as high as 96.3 %. To identify whether a crystal structure is a perovskite, it is crucial to define what a perovskite structure should be. The ideal perovskite has a general formula ABX3 and the cubic structure with 3 ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

a high symmetry of Pm3m , comprising of highly flexible corner-sharing [BX6] octahedra. A typical feature of perovskites is its structural flexibility due to the distortion of [BX6] octahedra, leading to many tilt perovskite phases. In 1970s, Glazer developed a method to describe the octahedral tilt in perovskites i.e., combining the rotations along the three orthogonal symmetry axes of the octahedra.29 Group theory analysis based on Glazer notation led to fifteen unique tilt patterns, labeled by fifteen group symmetries, as revisited recently30, 31. The perovskite data prevalently adopted in recent machine learning work originated from a few literature [refs. 11-13]. For example, the criteria in Zhang et al’s work for classifying an ABO3-type perovskite compound is that it adopts one of the fifteen types of space groups, its coordination number for the A cation must be in the range 8–12 and the coordination number for the B cation is 613. However, identification of perovskite via space groups may have errors since space group is not a unique description for a crystal structure. For example, both γ-phase CsPbI3 (perovskite, photoactive) and δ-phase CsPbI3 (non-perovskite, photo-nonactive) belong to the same group of Pnma32. The situation is becoming more complicated when considering double perovskites A2B′B′′X6, since different cation ordering of (B′, B′′) can lead to multiple space groups. Previous theoretical calculations33 revealed that the lattice distortion does not significantly impact the electronic and optical properties of halide perovskite, but B-X bond break with face-shared or line-shared octahedral can destroy the unique properties of perovskites. Therefore, we identify perovskites based on (i) cation coordination: there is one type cation with octahedral coordination; (ii) topology: all the octahedra are corner-shared. Figure 1 summarizes the workflow of procedure to identify perovskite compounds, which start by collecting crystal structure for all the ABX3 and (A′A′′)(B′B′′)X6 compounds (with ICSD number) in Materials Project database. In perovskite, A and B are twelve and six coordination respectively. However, in tilt perovskite, the coordination number of A can be largely reduced. We assume that the octahedral coordination of B is critical sign for the perovskite phase. For a given compound with general formula M1M2X3, cation coordination for two types of cations (M1 and M2) is firstly calculated. If neither M1 nor M2 has octahedral coordination, the compounds are not perovskites. If there is one type of cation (say cation M1) having octahedral coordination, then M1 is B and M2 is A. If both M1 and M2 have octahedral coordination, the one with larger mean bond length is chosen as A. After identification of A and B, the stacking topology of B-centered octahedra is analyzed to judge whether those octahedra are corner-shared, edge-shared or faceshared. Only corner-shared cases are chosen as the perovskite. The derived database for perovskite formability is stored in Table S1. With the availability of data, what machine learning did is identifying the hidden correlation between features, i.e., ionic radii, electronegativity, atomic number, tolerance factor, octahedral factor, and the properties, i.e. formability of perovskite. In advance of machine learning, such relationship can be partially observed by human analysis, which may help to choose proper features for machine learning. Here, we take ABO3 compounds as examples to show their correlations of perovskite formability with atomic numbers, ionic radius, electronegativity, tolerance and octahedral factor, as shown in Figure 2. The aggregation of perovskite in a small region has been observed in all those two-dimensional plots. From Figure 2(a), we can see that a 4 ACS Paragon Plus Environment

Page 4 of 17

Page 5 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

large number of perovskites appear on the regions of [57 < ZA < 71, 21 < ZB < 30], where A is rare earth elements and B is 3d transition metal elements. Compounds located at this region have ~ 98.7% possibility to be perovskites. The ease of perovskite formability in this region can be ascribed to the small ionic radii variance for both rare earth elements (1.75 Å to 1.94 Å) and 3d transition metals elements (1.22 Å to 1.32 Å). It should be noted that ABO3 in such small area of (ZA, ZB) region (2.3% area ratio), where A = rare earth elements and B = 3d transition metal elements, include 19.6% existing perovskites. This imbalance could result in the bias for machine learning. For ZA-ZB map, apart from aggregation region as shown in Figure 2(a) and (b), many other perovskites (70.4%) are scattered outside the region, indicating that Z is not a good feature for perovskite formability. Instead, a much improved aggregation effect can be observed in two dimensional χA-χB, rA-rB and t-μ map, as shown in Figure 2(b-d). For example, in χA-χB map, the compounds in rectangular region [0.80 < χA < 1.28, 1.10 < χB < 2.00] have 86.4% probability to be a perovskite and this region includes 68.5% of perovskite with 9.0% area ratio. The aggregation region in rA-rB and t-μ maps are shown in Figure 2(c-d) and Table 2, indicating that rA, rB, t, and μ can be good features, which is in consistent with previous literature11, 15, 34 and recent machine learning work16, 17. Previous work considered double perovskite A2B′B′′X6 as single ABX3 by considering effective A(B) radius as the average of A′(B′) and A′′(B′′), i.e. (A′+A′′)/2 [(B′+B′′)/2]34, 35. This approximation ignore the chemical difference between A′(B′) and A′′(B′′), which is under explored in previous work. The ZB′-ZB′′, χB′-χB′′ and rB′-rB′′ maps for the perovskite formability are shown in Figure 3, which shows the imbalance between B′ and B′′ and the contrast of perovskite and nonperovskite region. For example, it is observed that the formability region for perovskites is around [1.80 Å< r B′ < 2.40 Å, 0.80 Å < r B′′ < 1.40 Å], separated with non-perovskite region of [1.50 Å < B′ < 2.00 Å, 0.70 Å < B′′ < 1.20 Å]. Such radii imbalance between B′ and B′′ cannot be reflected by averaging (rB′+rB′′)/2. For example, according to (rB′+rB′′)/2, the points of Q and P in Figure 3(c) should have the same probability to be a perovskites, which is not the fact according to existing experimental data, indicating that Q point have much higher probability for perovskite formability. To show the improvement of our results on previous data, we choose ABO3 compounds to compare our results with previous literature17. Eleven compounds have been found to have contradictory results, which are listed in Table 3. Their crystal structures in Figure 4, clearly show the validity of our data. For example, (K/Rb/CsTl)BrO3 do exhibit typical perovskite structure with corner-sharing [BrO6] octahedra as shown in Figure 4(a) and (b) , with Br away from the center of [BrO6] octahedra. Li(Re/Ta)O3 and CdCO3 have the same structures with relatively large distortion of [Re/Ta/CdO6] octahedra as shown in Figure 4(c) and 4(d). BaRuO3 and SrIrO3 are not perovskites since they have face-shared octahedra. Although our results are derived from experimental crystal structure as in Materials Projects and ICSD, the validity of data cannot be fully guaranteed since there are unavoidable human and measurements mistake in ICSD. Here, we take A2B′B′′O6 as example to use a multiple-model machine learning approach to trace the prediction results for each compound. Sixteen different machine learning models are chosen as implemented in Matlab, which includes (1) fine tree, (2) medium tree, and (3) coarse tree, (4) linear discriminant, (5) logistic regression, (6) linear SVM, 5 ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(7) cubic SVM, (8) quadratic SVM, (9) fine Gaussian SVM, (10) medium Gaussian SVM, (11) Weighted KNN, (12) Boosted tree, (13) Bagged tree, (14) subspace KNN, (15) Subspace Discriminant and (16) RUS Boosted Trees. By using our perovskite formability data for A2B′B′′O6, machine learning based on the features shown in Figure 2 can achieve average cross-validation (CV) accuracy about 90% for all those sixteen models [Table S2], indicating the robustness of our data on specific model. The prediction results on each compound by different models have been recorded. It is interesting to find the inconsistent predictions are aggregated on a few compounds for different machine learning models. Figure 5 listed the compounds with the number of inconsistent predictions in sixteen different models. For example, Ba2NbCrO6 was not perovskite structure in our database, however, it is predicted to be a perovskite by thirteen machine learning models out of sixteen. If those thirteen suspicious results are excluded in training data, the prediction accuracy can increase ~2% in average [Table S2]. Further experimental verifications of compounds in Figure 5 are called for. In summary, we have developed a procedure to identify the perovskite formability of ABX3 and A2B′B′′X6 compounds by using the crystal structure stored in Materials Project database. Our criteria for perovskite is topology of corner-sharing octahedral structure. Our results extended previous perovskite data in a large. Current results have corrected the mistakes of eleven compounds of previous data on ABO3. Machine-learning based on current data achieve ~ 90 % prediction accuracy for perovskite formability.

Supporting Information The database of perovskite formability [Table S1], the prediction accuracies based on multi-model machine learning approaches for A2B′B′′O6 compounds [Table S2] and the details of multiplemodel machine learning approach.

AUTHOR INFORMATION Corresponding author: Wan-Jian Yin, Email: [email protected] , Tel: +86-0512-67167457 Notes The authors declare no competing final interest.

ACKNOWLEDGMENT The authors acknowledge the funding support from National Natural Science Foundation of China (under Grant No. 51602211, No. 11674237), National Key Research and Development Program of China under grant No. 2016YFB0700700, Natural Science Foundation of Jiangsu Province of China (under Grant No. BK20160299), National Young Talent 1000 Program, Jiangsu ‘Double Talent’ Program and Suzhou Key Laboratory for Advanced Carbon Materials and Wearable 6 ACS Paragon Plus Environment

Page 6 of 17

Page 7 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Energy Technologies, China. The work was carried out at National Supercomputer Center in Tianjin and the calculations were performed on TianHe-1(A).

References (1) Yin, W.-J.; Weng, B.; Ge, J.; Sun, Q.; Li, Z.; Yan, Y. Oxide Perovskites, Double Perovskites and Derivatives for Electrocatalysis, Photocatalysis, and Photovoltaics. Energy & Environmental Science 2018. (2) Pena, M. A.; Fierro, J. L. G. Chemical Structures and Performance of Perovskite Oxides. Chemical reviews 2001, 101 (7), 1981–2018. (3) Green, M. A.; Ho-Baillie, A.; Snaith, H. J. The Emergence of Perovskite Solar Cells. Nature Photonics 2014, 8 (7), nphoton–2014. (4) Kojima, A.; Teshima, K.; Shirai, Y.; Miyasaka, T. Organometal Halide Perovskites as VisibleLight Sensitizers for Photovoltaic Cells. Journal of the American Chemical Society 2009, 131 (17), 6050–6051. (5) Kim, H.-S.; Lee, C.-R.; Im, J.-H.; Lee, K.-B.; Moehl, T.; Marchioro, A.; Moon, S.-J.; Humphry-Baker, R.; Yum, J.-H.; Moser, J. E. Lead Iodide Perovskite Sensitized All-Solid-State Submicron Thin Film Mesoscopic Solar Cell with Efficiency Exceeding 9%. Scientific reports 2012, 2, 591. (6) Lee, M. M.; Teuscher, J.; Miyasaka, T.; Murakami, T. N.; Snaith, H. J. Efficient Hybrid Solar Cells Based on Meso-Superstructured Organometal Halide Perovskites. Science 2012, 1228604. (7) Burschka, J.; Pellet, N.; Moon, S.-J.; Humphry-Baker, R.; Gao, P.; Nazeeruddin, M. K.; Grätzel, M. Sequential Deposition as a Route to High-Performance Perovskite-Sensitized Solar Cells. Nature 2013, 499 (7458), 316. (8) Yin, W.-J.; Yang, J.-H.; Kang, J.; Yan, Y.; Wei, S.-H. Halide Perovskite Materials for Solar Cells: A Theoretical Review. Journal of Materials Chemistry A 2015, 3 (17), 8926–8942. (9) Liu, Y.; Zhao, T.; Ju, W.; Shi, S. Materials Discovery and Design Using Machine Learning.

Journal of Materiomics 2017, 3 (3), 159–177. (10) Butler, K. T.; Davies, D. W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine Learning for Molecular and Materials Science. Nature 2018, 559 (7715), 547. (11) Li, C.; Lu, X.; Ding, W.; Feng, L.; Gao, Y.; Guo, Z. Formability of ABX3 (X= F, Cl, Br, I) Halide Perovskites. Acta Crystallographica Section B: Structural Science 2008, 64 (6), 702–707. (12) Roth, R. S. Classification of Perovskite and Other ABO3-Type Compounds. J. Res. Nat. Bur. Stand 1957, 58 (2), 75–88.

7 ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(13) Zhang, H.; Li, N.; Li, K.; Xue, D. Structural Stability and Formability of ABO3-Type Perovskite Compounds. Acta Crystallographica Section B: Structural Science 2007, 63 (6), 812– 818. (14) Filip, M. R.; Giustino, F. The Geometric Blueprint of Perovskites. Proceedings of the National Academy of Sciences 2018, 115 (21), 5397–5402. (15) Travis, W.; Glover, E. N. K.; Bronstein, H.; Scanlon, D. O.; Palgrave, R. G. On the Application of the Tolerance Factor to Inorganic and Hybrid Halide Perovskites: A Revised System. Chemical Science 2016, 7 (7), 4548–4556. (16) Pilania, G.; Balachandran, P. V.; Kim, C.; Lookman, T. Finding New Perovskite Halides via Machine Learning. Frontiers in Materials 2016, 3, 19. (17) Balachandran, P. V.; Emery, A. A.; Gubernatis, J. E.; Lookman, T.; Wolverton, C.; Zunger, A. Predictions of New ABO3 Perovskite Compounds by Combining Machine Learning and Density Functional Theory. Physical Review Materials 2018, 2 (4), 043802. (18) Bartel, C. J.; Sutton, C.; Goldsmith, B.; Ouyang, R.; Musgrave, C.; Ghiringhelli, L.; Scheffler, M. New Tolerance Factor to Predict the Stability of Perovskite Oxides and Halides. https://arxiv.org/pdf/1801.07700. (19) Li, W.; Jacobs, R.; Morgan, D. Predicting the Thermodynamic Stability of Perovskite Oxides Using Machine Learning Models. Computational Materials Science 2018, 150, 454–463. (20) Jain, A.; Ong, S. P.; Hautier, G.; Chen, W.; Richards, W. D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G. Commentary: The Materials Project: A Materials Genome Approach to Accelerating Materials Innovation. Apl Materials 2013, 1 (1), 011002. (21) Xie, T.; Grossman, J. C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Physical review letters 2018, 120 (14), 145301. (22) Lu, S.; Zhou, Q.; Ouyang, Y.; Guo, Y.; Li, Q.; Wang, J. Accelerated Discovery of Stable Lead-Free Hybrid Organic-Inorganic Perovskites via Machine Learning. Nature communications 2018, 9 (1), 3405. (23) Shi, S.; Gao, J.; Liu, Y.; Zhao, Y.; Wu, Q.; Ju, W.; Ouyang, C.; Xiao, R. Multi-Scale Computation Methods: Their Applications in Lithium-Ion Battery Research and Development. Chinese Phys. B 2016, 25 (1), 018212. (24) Liu, Y.; Zhao, T.; Yang, G.; Ju, W.; Shi, S. The Onset Temperature (Tg) of AsxSe1−x Glasses Transition Prediction: A Comparison of Topological and Regression Analysis Methods. Computational Materials Science 2017, 140, 315–321. (25) Wang, Y.; Zhang, W.; Chen, L.; Shi, S.; Liu, J. Quantitative Description on Structure– Property Relationships of Li-Ion Battery Materials for High-Throughput Computations. Science and Technology of Advanced Materials 2017, 18 (1), 134–146.

8 ACS Paragon Plus Environment

Page 8 of 17

Page 9 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

(26) Saal, J. E.; Kirklin, S.; Aykol, M.; Meredig, B.; Wolverton, C. Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD). JOM (Warrendale,Pa.:1989) 2013, 65 (11), 1501–1509. (27) De Luna, P.; Wei, J.; Bengio, Y.; Aspuru-Guzik, A.; Sargent, E. Use Machine Learning to Find Energy Materials. Nature 2017, 552 (7683), 23–25. (28) Bergerhoff, G.; Brown, I. D.; Allen, F. Crystallographic Databases. International Union of Crystallography, Chester 1987, 360, 77–95. (29) Glazer, A. M. The Classification of Tilted Octahedra in Perovskites. Acta Crystallographica Section B: Structural Crystallography and Crystal Chemistry 1972, 28 (11), 3384–3392. (30) Shojaei, F.; Yin, W.-J. Stability Trend of Tilted Perovskites. The Journal of Physical Chemistry C 2018. (31) Bechtel, J. S.; Van der Ven, A. Octahedral Tilting Instabilities in Inorganic Halide Perovskites. Physical Review Materials 2018, 2 (2), 025401. (32) Huang, Y.; Yin, W.-J.; He, Y. Intrinsic Point Defects in Inorganic Cesium Lead Iodide Perovskite CsPbI3. The Journal of Physical Chemistry C 2018, 122 (2), 1345–1350. (33) Yin, W.-J.; Shi, T.; Yan, Y. Unique Properties of Halide Perovskites as Possible Origins of the Superior Solar Cell Performance. Advanced Materials 2014, 26 (27), 4653–4658. (34) Sun, Q.; Yin, W.-J. Thermodynamic Stability Trend of Cubic Perovskites. Journal of the American Chemical Society 2017, 139 (42), 14905–14908. (35) Zhao, X.-G.; Yang, J.-H.; Fu, Y.; Yang, D.; Xu, Q.; Yu, L.; Wei, S.-H.; Zhang, L. Design of Lead-Free Inorganic Halide Perovskites for Solar Cells via Cation-Transmutation. Journal of the American Chemical Society 2017, 139 (7), 2630–2638.

9 ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 1. The numbers of compounds, MP IDs and ICSD numbers for each categories of compounds in Materials Project. ABX3 compounds

A2BBX6 compounds

ABO3 ABS3 ABSe3 ABTe3 ABF3 ABCl3 ABBr3 ABI3 Total A2BB′O6 A2BB′S6 A2BB′Se6 A2BB′Te6 A2BB′F6 A2BB′Cl6 A2BB′Br6 A2BB′I6 Total

Number of Compounds 379 47 26 3 55 40 27 13 590 438 1 0 0 72 23 4 0 538

Number of MP ID 759 56 27 4 95 57 34 14 1046 1073 1 0 0 86 24 4 0 1188

Number of ICSD 3144 76 32 9 265 88 51 21 3686 1361 1 0 0 109 32 6 0 1509

10 ACS Paragon Plus Environment

Page 10 of 17

Page 11 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Table 2. The perovskite aggregation region in Figure 2 and their probabilities of perovskite in the region and the perovskite ratio in the whole region. Aggregation Region Figure 2(a) Figure 2(b) Figure 2(c) Figure 2(d)

57 < ZA < 71, 21 < ZB < 30 1.15 < rA < 1.5, 0.52 < rB < 0.7 0.80 < χA < 1.28, 1.10 < χB < 2.0 0.85 < t < 1.10, 0.42 < μ < 0.73

Perovskite Ratio in the region 98.7%

Perovskite Ratio

85.6%

62.6%

86.4%

68.5%

78.5%

89.2%

11 ACS Paragon Plus Environment

19.6%

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 3. Eleven compounds that have different perovskite formability data between our and previous results.   KBrO3 RbBrO3 CsBrO3 TlBrO3 LiReO3 LiTaO3 CdCO3 BaRuO3 SrIrO3 SrTeO3 ErMnO3

Our results yes yes yes yes yes yes yes no no no no

Results in Ref. 16 no no no no no no no yes yes yes yes

Crystal Structure

Figure 4(a)(b) Figure 4(c)(d) Figure 4(e) Figure 4(f) Figure 4(g)

12 ACS Paragon Plus Environment

Page 12 of 17

Page 13 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 1. Workflow of procedure We have found totally 3354 (2234 ICSD) such compounds, including 759 kind of ABO3, 287 kind of ABX3 (X in halogen family), 597 kind of (AA′)(BB′)O6 and 115 kind of (AA′)(BB′)X6 (X in halogen family) by mp-number using pymatgen library.

13 ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Two-dimensional maps of perovskites (red) and non-perovskites (blue) for ABO3 compounds on (a) atomic number of cations; (b) ionic radii of cations; (c) electronegativity of cations and (d) t-μ. The dashed lines indicate the clustered region of perovskite and the percentage of perovskite formability in those region are also shown in the Figure.

14 ACS Paragon Plus Environment

Page 14 of 17

Page 15 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 3. Two-dimensional maps of perovskites (red) and non-perovskites (blue) for xxx kinds of A2B′B′′O6 compounds on (a) atomic number of cations B′ and B′′; (b) electronegativity of cations B′ and B′′; (c) ionic radii of cations B′ and B′′. The red circles and blue cross indicate perovskites and non-perovskites respectively. The colors are rendered according to the distribution of red/blue points to clearly indicate the formability region of perovskite/non-perovskite.

15 ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. Two representative crystal structures that we predicted to be perovskite but previous literature predicted not.

16 ACS Paragon Plus Environment

Page 16 of 17

Page 17 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 5. The number of inconsistent predictions on each A2B′B′′O6 compounds for sixteen kinds of machine learning models. T/F in parentheses means perovskite/non-perovskite in our database.

17 ACS Paragon Plus Environment