Machine-Learning for Understanding Compatibility of Organic

Jan 4, 2019 - Post-treatment is one of the facile and effective approaches to stabilize organic-inorganic hybrid perovskites. In this work, we apply m...
0 downloads 0 Views 2MB Size
Subscriber access provided by UNIV OF BARCELONA

Letter

Machine-Learning for Understanding Compatibility of OrganicInorganic Hybrid Perovskite with Post-Treating Amines Yongze Yu, Xuanheng Tan, Shougui Ning, and Yiying Wu ACS Energy Lett., Just Accepted Manuscript • DOI: 10.1021/acsenergylett.8b02451 • Publication Date (Web): 04 Jan 2019 Downloaded from http://pubs.acs.org on January 5, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

Machine-Learning for Understanding Compatibility of Organic-Inorganic Hybrid Perovskite with PostTreating Amines Yongze Yu†, Xuanheng Tan‡, Shougui Ning§,† and Yiying Wu†* †

Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio 43210, United States



Department of Chemical Physics, School of Chemistry and Material Science, University of Science and Technology of China, Hefei, Anhui, China § Institute of laser & Micro/Nano Engineering, College of Electronics & Information Engineering, Sichuan University, Chengdu, 610064, China

Corresponding Author *Yiying Wu * E-mail: [email protected]. Fax: +1-614-292-1685. Tel.: +1-614-247-7810.

ACS Paragon Plus Environment

1

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 26

ABSTRACT Post-treatment is one of the facile and effective approaches to stabilize organic-inorganic hybrid perovskites. In this work, we apply machine learning technique to study the trend of reactivity of different types of amines, which are used for the post-treatment of organic-inorganic hybrid perovskite films. 50 amines are classified based on their compatibility with the methylammonium lead iodide films. Machine learning models are constructed from the classification of these amines and their molecular descriptor features. The model has achieved 86% accuracy on predicting the outcomes of whether perovskite films are maintained after post-treatment. By analyzing the constructed models, it was found that amines with fewer hydrogen bond donors and acceptors, more steric bulk, secondary, tertiary amines and pyridine derivatives tend to have high compatibility with perovskite films.

TOC GRAPHICS

ACS Paragon Plus Environment

2

Page 3 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

Inorganic-organic halide perovskites have attracted great attention in photovoltaics1–6 and light emitting diode devices7,8 due to their remarkable advantages of long diffusion lengths of charge carriers,9–11 widely-tunable band gap with strong light absorbtion,12,13 and low fabrication cost.14,15 The perovskite solar cell efficiency has rapidly grown to 23.3% in the past few years,16 however, challenges like operational stabilities and toxicities have to be overcome before industrial fabrication of these devices can be developed. Various works have discovered the instability of perovskites under moisture,17 oxygen,18 and ultra-violet light.19 Among the strategies that have been utilized to stabilize the material against these factors, especially moisture issues,20 post treatment with small molecules by dip-coating or spin-coating is one simple but effective approach.21–28 For example, benzylamine24 have been used to modify the surface of perovskites to increase the moisture stability and pyridine as Lewis base passivated perovskite has significantly reduced the nonradiative electron-hole recombinations.21 Therefore, the both functionalities could be achieved by exploring more molecules, such as amines. However, experimentally, we observed many other amines could directly destroy the methylammonium lead iodide(MAPbI3) perovskite films (vide infra). The compatibility between the perovskite films and post-treatment molecules is the prerequisite to the post-treatment techniques. Therefore, identifying the rules that can predict the compatibility between the perovskite film and a molecule based on its molecular structure is of fundamental and practical interest for future development of the post-treatment materials. The emergence of machine learning approach has brought great interest and potential for the chemical sciences.29-46 It is a powerful tool for finding complicated patterns in high dimensional spaces, which employ algorithms to learn empirical data by modeling linear or non-linear relations between the physico-chemical features and target properties of molecules or materials. With the

ACS Paragon Plus Environment

3

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 26

machine-learning approach, researchers have made great progress on planning organic synthesis,33,42,44,46 product prediction of solvent thermal synthesis36 and crystal structure predictions.38,39 With the coming of the big data era, machine learning has shown superior power in handling large datasets up to the order of millions. Current discoveries are mostly based on existing databases like Inorganic Crystal Structure Database (ICSD),47 organic reaction database, etc. Machine learning, however, is not limited to data-mining from existing data in literature, but also utilizes models to predict results that have not been realized experimentally. The methodology can affect the thought process of researching new fundamentals and practices and guide experimentalists in planning and designing their experiments in order to acquire curated data for machine learning. Therefore, the ingenious combination of experimental designs and data analysis methods is crucial to apply the machine learning approach to broader scientific fields. Scheme 1. Flowchart of machine-learning-assisted exploration

In this work, we present a systematic study to discover the relations between the physicochemical properties of amines and their reactivities to the MAPbI3 perovskite from the experimental design to machine learning analysis approaches. We aim to build a model which is able to predict the compatibility of untested amines for post treatment on perovskite films.

ACS Paragon Plus Environment

4

Page 5 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

Moreover, chemical knowledge and insights can be extracted from the trained model, which is beneficial for the future design of the molecules for post-treatment methods. In order to discover what types of amines do not destroy the MAPbI3 perovskite film, supervised learning algorithms are implemented to train the model and make predictions. In terms of supervised learning tasks, the algorithm is able to learn a function that maps an input to an output based on example input-output pairs. Therefore, the features (input X) and labels (output Y) are supposed to be generated to achieve the purpose of training. Scheme 1 displays the flowchart of the machine-learning-assisted exploration. Each section will be described in detail below.

Figure 1. a) UV-vis absorption spectra of MAPbI3 perovskite film with benzylamine (BA) and triethylamine (TEA) post treatment, insets: images of films with and without treatment. b) Residue Index extracted out from the UV-vis absorption for various amines. c) Molecular structures with Reactive and Nonreactive classes labeled with Amine Number.

ACS Paragon Plus Environment

5

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 26

Generate Label Y. The output variable in the example is whether perovskite films are destroyed after the amine post-treatment. Quantifying of film damage is necessary for building the training models. Two examples of UV-vis absorption spectrum of MAPbI3 film before and after treatment is presented in Figure 1a. Benzylamine (BA) and triethylamine (TEA) are shown as examples of treating molecules. After being treated with triethylamine, the morphology and UV-vis absorption of the film are maintained, whereas, the benzylamine treated films are entirely destroyed so that a flat baseline is exhibited. The Residue Index for an amine is calculated by taking the ratio of absorbance before and after the amine treatment. The Residue Index over wavelength is plotted in Figure S3, and this index represents the degree of retention of the absorbance of perovskite after the amine treatment, therefore, the average values over 500nm to 700nm wavelength are reported as the evaluation indexes. The amine with high reactivity with MAPbI3 film would damage the film resulting in a Residue Index of value 0, whereas non-reactive amine will keep the film absorbance so that the Residue Index approaches 1. The Residue Indexes for tested 50 amines are plotted in Figure 1b. The supervised learning usually has two types of learning tasks: regression and classification. Regression trees have dependent variables that are continuous values or ordered values, and classification trees have dependent variables that are categorical and unordered. Sometimes the two tasks can be transformed to each other.48 In this work, we focus on classification task due to the following two reasons: 1) the output result is more intuitional, whether the molecules are reactive or nonreactive and 2) the classification algorithms are more diverse and relatively more developed. Therefore, to get categorical labels, we set a threshold on the Residue Indexes. The amines with residue index above 0.75 are labeled as Nonreactive class, and ones below 0.75 are labeled as

ACS Paragon Plus Environment

6

Page 7 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

Reactive class. The threshold setting is based on the apparent gap in Residue Indexes as shown in Figure 1b and is tunable for different intended use. The amine molecules with categorical groups are presented in Figure 1c. In this case, Nonreactive class amines are considered to be used directly for further exploration in other tests. Reactive class amines are no longer considered for further use. Generate Feature X. With the development of computational chemistry and cheminformatics, the quantitative transformation of molecules to molecular descriptors becomes realizable.49,50 Molecular descriptors are results of a logic and mathematical procedure which transforms chemical information into useful numbers or results of some standardized experiments.51 These molecular descriptors not only contain the experimental measurements such as molar refractivity, dipole moment, polarizability, and, in general, physicochemical properties, but also include theoretical molecular descriptors derived from the symbolic representations such as topological Balaban Index, Wiener Index, numbers of hydrogen donor sites, etc. JChem for Office (Excel) was used for chemical database access, structure-based property calculation, search and reporting. The Charge, Elemental Analysis, Geometry, Hydrogen-Bond Donor-Acceptor, Topology Analysis Plugs were used to calculate 53 features of the studied amines in Table S1 alphabetically. Ultimately, two types of functions were used in these calculations. One was directly applied on the molecule object, for example, Molecular Polarizability, Acceptor Count, Balaban Index, etc. Another calculation involved both molecule object and specific atom, in this case, nitrogen. This calculation reported properties on the atom, for example, the Sigma Orbital Electronegativity on the N atom. These features intend to comprise as much information as possible without human-biased screening under acceptable computational costs. These features will be used as the candidates for feature selection to build the model for supervised learning.

ACS Paragon Plus Environment

7

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 26

Understanding the dataset structure is helpful to accelerate the machine learning process. The features are generated from enumerating the molecular descriptors from the database, therefore some of the features have strong correlations. For example, if the N atom is a ring atom, this atom definitely cannot be a chain atom; the acceptor site counts are highly correlated with the acceptor counts. Such highly correlated features make little contribution to the learning process, and thus, can be treated as duplicates. Removing these duplicate features can reduce the redundancy of the input feature space. Pearson correlation coefficients (PCC)52 were calculated on each pair of features in Figure S4. The features with absolute value of PCC greater than 0.95 are grouped together. The correlation filtered groups are displayed in Table S2. Only one of the feature remains for each group. 22 features are removed from correlation filters and the rest of 31 features are used for training. The PCC of remaining selected features are plotted in Figure 2.

ACS Paragon Plus Environment

8

Page 9 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

Figure 2. Correlation matrix of Pearson coefficients between 31 selected features. Data training. Data training is the core section of the machine learning analysis, which is a direct representation of how good the mathematic representations are related to real world attributes. The core thought of supervised machine learning is minimizing the cost function between the predicted target and existing experience or observed data via parameter tuning. The entire dataset is split into two subsets, training set and test set, with the ratio of 80:20 in percentage.

ACS Paragon Plus Environment

9

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 26

The training set is used for model training, and the test set is used for evaluating the prediction ability of the model to unseen data. Table 1. Training scores and Test scores with different classifier Classifier

Train scorea

Test score

Optimized Hyperparameters

Logistic Regression (l1) 0.97 ± 0.02

0.83 ± 0.11

'C': 10, 'penalty': 'l1'

Logistic Regression (l2) 0.93 ± 0.02

0.84 ± 0.13

'C': 1.0, 'penalty': 'l2'

SVM(rbf)

0.94 ± 0.02

0.86 ± 0.12

'C': 10.0, gamma=1/n_feature

LinearSVM (l1)

0.96 ± 0.01

0.83 ± 0.13

'C': 1.0, 'penalty': 'l1'

LinearSVM (l2)

0.93 ± 0.02

0.85 ± 0.14

'C’: 0.1, 'penalty': ‘l2'

KNN

0.91 ± 0.03

0.84 ± 0.13

'n_neighbors': 3

Decision Tree

0.97 ± 0.02

0.81 ± 0.14

'max_depth': 3

Gaussian Naive Bayes

0.80 ± 0.04

0.74 ± 0.10

N/A

a. The scores are reported as the averages and standard deviations over 20 random (80%) training/(20%) test splits There are lots of popular machine learning algorithms for classification tasks such as Logistic regression, Decision tree, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Naïve Bayes, Neural Network etc.53 Different algorithms have different specialties, advantages, and drawbacks. We have listed the training and test results in Table 1, the training functions also described in Supporting Information (SI). The best test score 0.86 ± 0.12 can be achieved using SVM with Radial Basis Function (rbf) kernel. Some other models like Logistic Regression (l1 ad l2), SVM with linear kernel (l1 and l2), k-nearest neighbors and Decision tree could also get quite similar test scores, 0.83 ± 0.13, 0.84 ± 0.13, 0.83 ± 0.13, 0.85 ± 0.13, 0.84 ± 0.13, and 0.81 ± 0.14

ACS Paragon Plus Environment

10

Page 11 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

respectively. However, Decision tree and Gaussian Naïve Bayes can only get about 0.74 ± 0.10 prediction accuracy.

Figure 3. Confusion matrix between true label and predicted label using Support Vector Machine. The number in each section indicates the total counts over 20 splits of the tests. Evaluation metrics. There are several evaluation metrics for quantitatively evaluating the performance of machine learning models such as classification accuracy, area under curve (AUC), F1 score, confusion matrix, etc. In this work, we use classification accuracy, which is defined as the number of correct predictions divided by the total number of predictions made, on two-class classification task. We also plot the confusion matrix, which presents the total counts of the predicted classes verses the true classes over 20 times random splits of trainings and tests. As shown in Figure 3, we could achieve the 86% accuracy using SVM with rbf kernel. In future prediction, precision also play an important role when the model is deployed for production. Precision is defined as the true positive (TP) label, which is true predicted Nonreactive class, divided by total positive labels (true predicted (TP) plus false predicted (FP) Nonreactive class). In deployment stage, we need to use the predicted Nonreactive class for further investigation,

ACS Paragon Plus Environment

11

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 26

therefore, higher precision would reduce the cost of misprediction from the constructed machine. In this case, we could achieve 92% precision. Table 2. Predicted and test results of unseen test amine molecules amine_name

probabilitya

predict_labelb

test_label

3-acetylpyridine

0.968

1

1

1-aminopyrene

0.937

1

1

bis(dimethylamino)-methane

0.669

1

1

3-aminoheptane

0.267

0

0

cyclooctylamine

0.170

0

0

a. Probability is generated by ‘predict_proba’ function in sklearn package; it represents the probability of positive (Nonreactive) class. b. Nonreactive class is labeled as 1 and Reactive class is labeled as 0.

Model Deployment. The SVM model is chosen to setup for future prediction on any new unseen amine molecules. Here, we demonstrate how to employ the model for production. The model is set up to predict the outcomes of unseen amines in previous model training and evaluation stages. Five amines are unbiasedly selected for testing the model. The molecular descriptors are generated for each molecule, then the probabilities of positive label and predicted labels are calculated from the model. The results are displayed in Figure 4 and Table 2. All the outcomes of the tested amines match up with the predicted results. This test shows the generalizability of the model in predicting the compatibility of amine-type post-treating molecules with MAPbI3 films.

ACS Paragon Plus Environment

12

Page 13 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

Figure 4. Residue indexes of unseen test amines.

Figure 5. The coefficients fitted from the Logistic Regression on L1 penalty(LASSO). Feature selection and filtration. Besides building a model with good prediction ability to unseen new data in the future, understanding and interpretation of the model is another goal for machine learning assisted discovery. Even if we generate lots of features as input variables, the feature importance can be extracted, and important features can be selected from learned models. Feature selection is a frequently discussed topic in machine learning and statistics; it is the process of selecting a subset of most influent features for use in model construction with

ACS Paragon Plus Environment

13

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 26

classification success. Least Absolute Shrinkage and Selection Operation (LASSO) algorithm penalizes the regression coefficients with an L1 norm penalty, shrinking many of them to zero.52 It is an automatic algorithm that perform both regularization and feature selection tasks. With LASSO, sparse coefficients will be obtained by eliminating the irrelevant variables and selecting one from group variables.54 Any feature with non-zero regression coefficients are ‘selected’ by the LASSO algorithm. We implement the LASSO on Logistic Regression model, plotting the weights/coefficients in Figure 5. Specifically, the 10 non-zero coefficients are calculated for Nonreactive/Reactive classification. The amplitude and the sign of the coefficients will be used to rank the feature importance. Comparing to L1 penalty, L2 penalty is another popularly used regularization term in supervised learning.55 L2 regularization encourages the sum of the squares of the parameters to be small. L2 does not shrink the coefficients to zero but penalize more on larger weights instead. Coefficients with L2 regularization using Logistic Regression model are plotted in Figure S8. These coefficients can be also used to discover the feature/label correlations. Learning from the machine-learned model – chemical insights. The hidden trends explored from machine learning analysis can bring chemical insights that assist researchers in the design of more complicated system in the future. As shown in Figure 5 and Figure S8, the weights or coefficients for each feature can be calculated. In logistic regression, the probability of the prediction on one specific class y can be express as, P(y|X) =

1 1 + 𝑒𝑒 −𝑤𝑤𝑤𝑤

where w is the weights/coefficients matrix, and X is the input feature matrix, y is the studied specific class, i.e. the Nonreactive class in this work. Therefore, positive weighted features have positive contribution to the Nonreactive class.

ACS Paragon Plus Environment

14

Page 15 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

By screening the coefficients of the classification model, we could achieve chemical insights into the features of the amines, and how these features could affect the amine’s reactivity with MAPbI3 perovskite films. For example, in Figure 5, Sigma Orbital Electronegativity (SOE) has large positive weights, it means the amines with large value of Sigma orbital electronegativity on Nitrogen atom would be less likely to be reactive. Electronegativity is a measure of the power of a chemically bonded atom to attract electrons to itself and can be defined in this way only for bonding orbitals.56 As the s character of a hybrid orbital increases, so does the apparent electronegativity of the atom which hosts this hybrid orbital. Therefore, a double bonded nitrogen with the sp2 hybridization would have a larger SOE than a single bonded nitrogen with sp3 hybridization. For example, pyridine and piperidine, which both contain six atom rings, have 10.09 and 8.25 as SOE respectively, the former belongs to Nonreactive class but the later belongs to Reactive class. This information indicates pyridine derivatives have less chance to be reactive with MAPbI3 perovskite. Acceptor Site Count(ASC) specifically depicts the number of the hydrogen bond acceptor sites which has stronger correlation with the number of lone pair electrons within the molecules. Typically, more nitrogen and oxygen atoms would result in larger ASC. Positive weights on Reactive class and negative on Nonreactive class evidently show more hydrogen bonding sites increase the probability to be reactive. For example, triethanolamine, which contains the triethylamine backbone with three additional hydroxide groups, is classified in the Reactive class because the dramatic increase of the hydrogen donor and accepter site. Steric Effect Index(SEI) describes steric hindrance around nitrogen atom studied, calculated from the covalent radii values and topological distances.57 The large positive weights (Figure S8) indicate that the amines with larger number of connection and greater steric functional groups to

ACS Paragon Plus Environment

15

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 26

nitrogen are more likely to be reactive. For instance, diisopropylamine and hexylamine are isotopes both containing six carbons and one nitrogen, which are Nonreactive and Ractive class respectively. The former SEI is 2.5 but the later SEI is 1.9. In other words, primary amines are more reactive compared to secondary and tertiary amines. Distance Degree(DD) is described as the sum of the corresponding row values in the distance matrix for each atom.50 It measures the sum of bond distance from the Nitrogen atom to all other atoms. For instance, dodecylamine and hexylamine both contain alkyl chain and one amino group at the edge, which are Nonreactive and Ractive class respectively. The former has Distance Degree of 78 because there are more atoms farther from the nitrogen atom whereas the later has much smaller Distance Degree of 21. This parameter indicates large molecules tend to be in the Nonreactive class. Besides some easily interpreted physico-chemical properties above, some complicated topological molecular descriptors stand out from the analysis procedure. For example, The Balaban Index(BI) proposed by Balaban in 1982 is a topological index for a molecule based on distance sums as graph invariants with consideration of bonding degeneracies.58 Generally, larger molecules have greater BI than smaller molecules, branched molecules have greater BI than chainlike molecules, molecules with multiple bonding have greater BI than single bonding ones. Even though rule-based descriptions are not good representatives for the real meaning of these highlevel indices, from the experiment, we observe the amines with larger BI tend to be in Nonreactive class. Even if there are more features we have not discussed here, we could provide a short summary generated from the machine-learned model via weights analysis on Logistic Regression model.

ACS Paragon Plus Environment

16

Page 17 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

The amines that have less reactivity with MAPbI3 perovskite film are suggested to, but not limited to possess the properties below: 1) Large steric hindrance around nitrogen 2) Fewer hydrogen bond accepters and donors 3) Greater number of substituents on nitrogen atom 4) Multi-bonding on nitrogen such as pyridine derivatives 5) More branched isomers rather than chain-like isomers In conclusion, we demonstrated in this work that machine learning techniques can be applied to assisting material development. We studied on the compatibility of MAPbI3 perovskite film with various types of post-treating amines. Amines with fewer hydrogen bond donors and acceptors, more steric bulk, a greater number for substituents on nitrogen atom and pyridine derivatives tend to have high compatibility with perovskite films. The compatible and reactive molecules studied in this work suggest that small primary amines with multiple hydrogen bonding tend to destroy the perovskite in post-treatment process, which should be used with special care in latent solution fabrication process. The compatible molecules would be further examined to improve the moisture stability and electron-hole lifetimes of perovskite films in the future. Unlike other machine learning work focused on mining the existing database, we show that the data-driven learning process can also guide researchers to design their new experiments from scratch by providing more curated data which is more facile for machine learning. Chemical knowledge and insight can be learned from the model even with a relatively small dataset with careful analysis. Furthermore, we open the dataset publicly in Supporting Information files, more convenient and effective models are desired to be discovered for future researches.

ACS Paragon Plus Environment

17

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 26

ASSOCIATED CONTENT Supporting Information. Experimental methods and Data analysis. X-ray diffraction pattern, Pearson correlation of all generated features, Grouped correlated features, Cross validation curve and Learning curve of various estimators, Coefficients for logistic regression model with l2 penalty, Machine leaning algorithm functions with Scikit-learn package. Entire dataset including generated features and tested labels. (Dataset.zip) AUTHOR INFORMATION Corresponding Author *E-mail: [email protected]. Fax: +1-614-292-1685. Tel.: +1-614-247-7810. Notes The authors declare no competing financial interest.

ACKNOWLEDGMENT We acknowledge funding support from the U.S. Department of Energy (Award No. DE-FG0207ER46427). Y.Y. thanks Allison Curtze for the suggestion on manuscript. REFERENCES

ACS Paragon Plus Environment

18

Page 19 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

(1)

Correa-Baena, J.-P.; Saliba, M.; Buonassisi, T.; Grätzel, M.; Abate, A.; Tress, W.;

Hagfeldt, A. Promises and Challenges of Perovskite Solar Cells. Science 2017, 358 (6364), 739– 744. (2)

Chen, H.; Ye, F.; Tang, W.; He, J.; Yin, M.; Wang, Y.; Xie, F.; Bi, E.; Yang, X.; Grätzel,

M.; et al. A Solvent-and Vacuum-Free Route to Large-Area Perovskite Films for Efficient Solar Modules. Nature 2017, 550 (7674), 92–95. (3)

Liu, M.; Johnston, M. B.; Snaith, H. J. Efficient Planar Heterojunction Perovskite Solar

Cells by Vapour Deposition. Nature 2013, 501 (7467), 395–398. (4)

Jeon, N. J.; Noh, J. H.; Yang, W. S.; Kim, Y. C.; Ryu, S.; Seo, J.; Seok, S. Il. Compositional

Engineering of Perovskite Materials for High-Performance Solar Cells. Nature 2015, 517 (7535), 476–480. (5)

Tsai, H.; Nie, W.; Blancon, J. C.; Stoumpos, C. C.; Asadpour, R.; Harutyunyan, B.;

Neukirch, A. J.; Verduzco, R.; Crochet, J. J.; Tretiak, S.; et al. High-Efficiency Two-Dimensional Ruddlesden-Popper Perovskite Solar Cells. Nature 2016, 536 (7616), 312–317. (6)

Yang, S.; Fu, W.; Zhang, Z.; Chen, H.; Li, C. Z. Recent Advances in Perovskite Solar

Cells: Efficiency, Stability and Lead-Free Perovskite. J. Mater. Chem. A 2017, 5 (23), 11462– 11482. (7)

Kumawat, N. K.; Gupta, D.; Kabra, D. Recent Advances in Metal Halide-Based Perovskite

Light-Emitting Diodes. Energy Technol. 2017, 5 (10), 1734–1749. (8)

Sun, J.; Wu, J.; Tong, X.; Lin, F.; Wang, Y.; Wang, Z. M. Organic/Inorganic Metal Halide

Perovskite Optoelectronic Devices beyond Solar Cells. Adv. Sci. 2018, 5 (5), 1700780.

ACS Paragon Plus Environment

19

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(9)

Page 20 of 26

Dong, Q.; Fang, Y.; Shao, Y.; Mulligan, P.; Qiu, J.; Cao, L.; Huang, J. Electron-Hole

Diffusion Lengths > 175 μm in Solution-Grown CH3NH3PbI3 Single Crystals. Science 2015, 347 (6225), 967–970. (10) Xing, G.; Mathews, N.; Sun, S.; Lim, S. S.; Lam, Y. M.; Graẗzel, M.; Mhaisalkar, S.; Sum, T. C. Long-Range Balanced Electron-and Hole-Transport Lengths in Organic-Inorganic CH3NH3PbI3. Science 2013, 342 (6156), 344–347. (11) Stranks, S. D.; Stranks, S. D.; Eperon, G. E.; Grancini, G.; Menelaou, C.; Alcocer, M. J. P.; Leijtens, T.; Herz, L. M.; Petrozza, A.; Snaith, H. J. Electron-Hole Diffusion Lengths Exceeding 1 Micrometer in an Organometal Trihalide Perovskite Absorber. Science 2014, 342 (2013), 341–344. (12) Adjokatse, S.; Fang, H. H.; Loi, M. A. Broadly Tunable Metal Halide Perovskites for SolidState Light-Emission Applications. Mater. Today 2017, 20 (8), 413–424. (13) Kovalenko, M. V; Protesescu, L.; Bodnarchuk, M. I. Properties and Potential Optoelectronic Applications of Lead Halide Perovskite Nanocrystals. Science 2017, 358 (6364), 745–750. (14) Li, Z.; Zhao, Y.; Wang, X.; Sun, Y.; Zhao, Z.; Li, Y.; Zhou, H.; Chen, Q. Cost Analysis of Perovskite Tandem Photovoltaics. Joule 2018, 1–14. (15) Chang, N. L.; Ho-Baillie, A. W. Y.; Vak, D.; Gao, M.; Green, M. A.; Egan, R. J. Manufacturing Cost and Market Potential Analysis of Demonstrated Roll-to-Roll Perovskite Photovoltaic Cell Processes. Sol. Energy Mater. Sol. Cells 2018, 174, 314–324.

ACS Paragon Plus Environment

20

Page 21 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

(16) National

Renewable

Energy

Laboratory

(NREL).

https://www.

nrel.gov/pv/assets/images/efficiency-chart-20180716.jpg, 2018 (accessed July 2018). (17) Christians, J. A.; Miranda Herrera, P. A.; Kamat, P. V. Transformation of the Excited State and Photovoltaic Efficiency of CH3NH3PbI3 Perovskite upon Controlled Exposure to Humidified Air. J. Am. Chem. Soc. 2015, 137 (4), 1530–1538. (18) Niu, G.; Li, W.; Meng, F.; Wang, L.; Dong, H.; Qiu, Y. Study on the Stability of CH3NH3PbI3 Films and the Effect of Post-Modification by Aluminum Oxide in All-Solid-State Hybrid Solar Cells. J. Mater. Chem. A 2014, 2 (3), 705. (19) Li, W.; Zhang, W.; Van Reenen, S.; Sutton, R. J.; Fan, J.; Haghighirad, A. A.; Johnston, M. B.; Wang, L.; Snaith, H. J. Enhanced UV-Light Stability of Planar Heterojunction Perovskite Solar Cells with Caesium Bromide Interface Modification. Energy Environ. Sci. 2016, 9 (2), 490– 498. (20) Li, F.; Liu, M. Recent Efficient Strategies for Improving the Moisture Stability of Perovskite Solar Cells. J. Mater. Chem. A 2017, 5 (30), 15447–15459. (21) Noel, N. K.; Abate, A.; Stranks, S. D.; Parrott, E. S.; Burlakov, V. M.; Goriely, A.; Snaith, H. J. Enhanced Photoluminescence and Solar Cell Performance via Lewis Base Passivation of Organic-Inorganic Lead Halide Perovskites. ACS Nano 2014, 8 (10), 9815–9821. (22) Zhang, J.; Hu, Z.; Huang, L.; Yue, G.; Liu, J.; Lu, X.; Hu, Z.; Shang, M.; Han, L.; Zhu, Y. Bifunctional Alkyl Chain Barriers for Efficient Perovskite Solar Cells. Chem. Commun. 2015, 51 (32), 7047–7050.

ACS Paragon Plus Environment

21

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 26

(23) Tripathi, N.; Shirai, Y.; Yanagida, M.; Karen, A.; Miyano, K. Novel Surface Passivation Technique for Low-Temperature Solution-Processed Perovskite PV Cells. ACS Appl. Mater. Interfaces 2016, 8 (7), 4644–4650. (24) Wang, F.; Geng, W.; Zhou, Y.; Fang, H. H.; Tong, C. J.; Loi, M. A.; Liu, L. M.; Zhao, N. Phenylalkylamine Passivation of Organolead Halide Perovskites Enabling High-Efficiency and Air-Stable Photovoltaic Cells. Adv. Mater. 2016, 28 (45), 9986–9992. (25) Cao, J.; Yin, J.; Yuan, S.; Zhao, Y.; Li, J.; Zheng, N. Thiols as Interfacial Modifiers to Enhance the Performance and Stability of Perovskite Solar Cells. Nanoscale 2015, 7 (21), 9443– 9447. (26) Yang, S.; Wang, Y.; Liu, P.; Cheng, Y. B.; Zhao, H. J.; Yang, H. G. Functionalization of Perovskite Thin Films with Moisture-Tolerant Molecules. Nat. Energy 2016, 1 (2), 1–7. (27) Zhang, H.; Ren, X.; Chen, X.; Mao, J.; Cheng, J.; Zhao, Y.; Liu, Y.; Milic, J.; Yin, W.-J.; Grätzel, M.; et al. Improving the Stability and Performance of Perovskite Solar Cells via Off-theShelf Post-Device Ligand Treatment. Energy Environ. Sci. 2018, 11, 2253–2262. (28) Yoo, H. S.; Park, N. G. Post-Treatment of Perovskite Film with Phenylalkylammonium Iodide for Hysteresis-Less Perovskite Solar Cells. Sol. Energy Mater. Sol. Cells 2018, 179, 57–65. (29) Butler, K. T.; Davies, D. W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine Learning for Molecular and Materials Science. Nature 2018, 559 (7715), 547–555. (30) Coley, C. W.; Green, W. H.; Jensen, K. F. Machine Learning in Computer-Aided Synthesis Planning. Acc. Chem. Res. 2018, 51 (5), 1281–1289.

ACS Paragon Plus Environment

22

Page 23 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

(31) Pankajakshan, P.; Sanyal, S.; De Noord, O. E.; Bhattacharya, I.; Bhattacharyya, A.; Waghmare, U. Machine Learning and Statistical Analysis for Materials Science: Stability and Transferability of Fingerprint Descriptors and Chemical Insights. Chem. Mater. 2017, 29 (10), 4190–4201. (32) Sun, Y. T.; Bai, H. Y.; Li, M. Z.; Wang, W. H. Machine Learning Approach for Prediction and Understanding of Glass-Forming Ability. J. Phys. Chem. Lett. 2017, 8 (14), 3434–3439. (33) Zhou, Z.; Li, X.; Zare, R. N. Optimizing Chemical Reactions with Deep Reinforcement Learning. ACS Cent. Sci. 2017, 3 (12), 1337–1344. (34) Liu, Y.; Zhao, T.; Ju, W.; Shi, S. Materials Discovery and Design Using Machine Learning. J. Mater. 2017, 3 (3), 159–177. (35) Ramprasad, R.; Batra, R.; Pilania, G.; Mannodi-Kanakkithodi, A.; Kim, C. Machine Learning in Materials Informatics: Recent Applications and Prospects. npj Comput. Mater. 2017, 3 (1), 54. (36) Raccuglia, P.; Elbert, K. C.; Adler, P. D. F.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J. Machine-Learning-Assisted Materials Discovery Using Failed Experiments. Nature 2016, 533 (7601), 73–76. (37) Sumita, M.; Yang, X.; Ishihara, S.; Tamura, R.; Tsuda, K. Hunting for Organic Molecules with Artificial Intelligence: Molecules Optimized for Desired Excitation Energies. ACS Cent. Sci. 2018, 4 (9), 1126–1133. (38) Ye, W.; Chen, C.; Wang, Z.; Chu, I.-H.; Ong, S. P. Deep Neural Networks for Accurate Predictions of Crystal Stability. Nat. Commun. 2018, 9 (1), 3800.

ACS Paragon Plus Environment

23

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 26

(39) Ryan, K.; Lengyel, J.; Shatruk, M. Crystal Structure Prediction via Deep Learning. J. Am. Chem. Soc. 2018, 140 (32), 10158–10168. (40) Janet, J. P.; Chan, L.; Kulik, H. J. Accelerating Chemical Discovery with Machine Learning: Simulated Evolution of Spin Crossover Complexes with an Artificial Neural Network. J. Phys. Chem. Lett. 2018, 9 (5), 1064–1071. (41) Zhang, Y.; Ling, C. A Strategy to Apply Machine Learning to Small Datasets in Materials Science. npj Comput. Mater. 2018, 4 (1), 28–33. (42) Segler, M. H. S.; Preuss, M.; Waller, M. P. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature 2018, 555 (7698), 604–610. (43) Han, X.; Wang, X.; Zhou, K. Develop Machine Learning Based Predictive Models for Engineering Protein Solubility. arXiv Prepr. arXiv1806.11369 2018. (44) Skoraczyñski, G.; DIttwald, P.; Miasojedow, B.; Szymkuc, S.; Gajewska, E. P.; Grzybowski, B. A.; Gambin, A. Predicting the Outcomes of Organic Reactions via Machine Learning: Are Current Descriptors Sufficient? Sci. Rep. 2017, 7 (1), 1–9. (45) Goh, G. B.; Hodas, N. O.; Vishnu, A. Deep Learning for Computational Chemistry. J. Comput. Chem. 2017, 38 (16), 1291–1307. (46) Coley, C. W.; Barzilay, R.; Jaakkola, T. S.; Green, W. H.; Jensen, K. F. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci. 2017, 3 (5), 434–443. (47) Belkly, A.; Helderman, M.; Karen, V. L.; Ulkch, P. New Developments in the Inorganic Crystal Structure Database (ICSD): Accessibility in Support of Materials Research and Design. Acta Crystallogr. Sect. B Struct. Sci. 2002, 58, 364–369.

ACS Paragon Plus Environment

24

Page 25 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Energy Letters

(48) Salman, R.; Kecman, V. Regression as Classification. 2012 Proc. IEEE Southeastcon 2012, 1–6. (49) Engel, T. Basic Overview of Chemoinformatics. J. Chem. Inf. Model. 2006, 46 (6), 2267– 2277. (50) Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing/Volume II: Appendices, References; Wiley‐VCH Verlag GmbH & Co. KGaA: Weinheim,Germany; 2009. (51) Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors; WILEY‐VCH Verlag GmbH: Weinheim,Germany; 2008. (52) Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 267–288. (53) Kotsiantis, S. B. Supervised Machine Learning: A Review of Classification Techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 2007, 160, 3–24. (54) Fonti, V. Feature Selection Using LASSO. VU Amsterdam 2017, 1–26. (55) Ng, A. Y. Feature Selection, L1 vs. L2 Regularization, and Rotational Invariance. Twentyfirst Int. Conf. Mach. Learn. - ICML ’04 2004, 78. (56) Hinze, J.; Jaffé, H. H. Electronegativity. I. Orbital Electronegativity of Neutral Atoms. J. Am. Chem. Soc. 1962, 84 (4), 540–546. (57) Cao, C.; Liu, L. Topological Steric Effect Index and Its Application. J. Chem. Inf. Comput. Sci. 2004, 44 (2), 678–687.

ACS Paragon Plus Environment

25

ACS Energy Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 26

(58) Balaban, A. T. Highly Discriminating Distance-Based Topological Index. Chem. Phys. Lett. 1982, 89 (5), 399–404.

ACS Paragon Plus Environment

26