Machine Learning for Understanding Compatibility of Organic

Jan 4, 2019 - Post-treatment is one of the facile and effective approaches to stabilize ... are used for the post-treatment of organic–inorganic hyb...
0 downloads 0 Views 3MB Size
http://pubs.acs.org/journal/aelccp

Machine Learning for Understanding Compatibility of Organic−Inorganic Hybrid Perovskites with Post-Treatment Amines Yongze Yu,† Xuanheng Tan,‡ Shougui Ning,§,† and Yiying Wu*,† †

Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio 43210, United States Department of Chemical Physics, School of Chemistry and Material Science, University of Science and Technology of China, Hefei, Anhui 230022, China § Institute of Laser & Micro/Nano Engineering, College of Electronics & Information Engineering, Sichuan University, Chengdu 610064, China

Downloaded via IOWA STATE UNIV on January 21, 2019 at 08:15:52 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



S Supporting Information *

ABSTRACT: Post-treatment is one of the facile and effective approaches to stabilize organic−inorganic hybrid perovskites. In this work, we apply a machine learning technique to study the trend of reactivity of different types of amines, which are used for the post-treatment of organic−inorganic hybrid perovskite films. Fifty amines are classified based on their compatibility with the methylammonium lead iodide films. Machine learning models are constructed from the classification of these amines and their molecular descriptor features. The model has achieved 86% accuracy on predicting the outcomes of whether perovskite films are maintained after post-treatment. By analyzing the constructed models, it was found that amines with fewer hydrogen bond donors and acceptors, more steric bulk, secondary, tertiary amines, and pyridine derivatives tend to have high compatibility with perovskite films.

I

molecules is the prerequisite to the post-treatment techniques. Therefore, identifying the rules that can predict the compatibility between the perovskite film and a molecule based on its molecular structure is of fundamental and practical interest for future development of the post-treatment materials. The emergence of a machine learning approach has brought great interest and potential for the chemical sciences.29−46 It is a powerful tool for finding complicated patterns in highdimensional spaces, which employ algorithms to learn empirical data by modeling linear or nonlinear relations between the physicochemical features and target properties of molecules or materials. With the machine learning approach, researchers have made great progress in planning organic synthesis,33,42,44,46 product prediction of solvent thermal synthesis,36 and crystal structure predictions.38,39 With the coming of the big data era, machine learning has shown superior power in handling large data sets up to the order of millions. Current discoveries are mostly based on existing databases like the Inorganic Crystal Structure Database (ICSD),47 organic reaction database, etc. Machine learning,

norganic−organic halide perovskites have attracted great attention in photovoltaics1−6 and light-emitting diode devices7,8 due to their remarkable advantages of long diffusion lengths of charge carriers,9−11 widely tunable band gaps with strong light absorption,12,13 and low fabrication cost.14,15 The perovskite solar cell efficiency has rapidly grown to 23.3% in the past few years;16 however, challenges like operational stabilities and toxicities have to be overcome before industrial fabrication of these devices can be developed. Various works have discovered the instability of perovskites under moisture,17 oxygen,18 and ultraviolet light.19 Among the strategies that have been utilized to stabilize the material against these factors, especially moisture issues,20 posttreatment with small molecules by dip-coating or spin-coating is one simple but effective approach.21−28 For example, benzylamine (BA)24 has been used to modify the surface of perovskites to increase the moisture stability, and pyridine as a Lewis base passivated perovskite has significantly reduced the nonradiative electron−hole recombinations.21 Therefore, both functionalities could be achieved by exploring more molecules, such as amines. However, experimentally, we observed that many other amines could directly destroy the methylammonium lead iodide (MAPbI3) perovskite films (vide infra). The compatibility between the perovskite films and post-treatment © XXXX American Chemical Society

Received: December 16, 2018 Accepted: January 4, 2019 Published: January 4, 2019 397

DOI: 10.1021/acsenergylett.8b02451 ACS Energy Lett. 2019, 4, 397−404

Letter

Cite This: ACS Energy Lett. 2019, 4, 397−404

Letter

ACS Energy Letters Scheme 1. Flowchart of Machine-Learning-Assisted Exploration

Figure 1. (a) UV−vis absorption spectra of the MAPbI3 perovskite film with benzylamine (BA) and triethylamine (TEA) post-treatment, insets: images of films with and without treatment. (b) Residue index extracted from the UV−vis absorption for various amines. (c) Molecular structures with reactive and nonreactive classes labeled with the amine number.

perovskite films. Moreover, chemical knowledge and insights can be extracted from the trained model, which is beneficial for future design of molecules for post-treatment methods. In order to discover what types of amines do not destroy the MAPbI3 perovskite film, supervised learning algorithms are implemented to train the model and make predictions. In terms of supervised learning tasks, the algorithm is able to learn a function that maps an input to an output based on example input−output pairs. Therefore, the features (input X) and labels (output Y) are supposed to be generated to achieve the purpose of training. Scheme 1 displays the flowchart of machine-learning-assisted exploration. Each section will be described in detail below. Generate Label Y. The output variable in the example is whether perovskite films are destroyed after the amine post-

however, is not limited to data mining from existing data in literature but also utilizes models to predict results that have not been realized experimentally. The methodology can affect the thought process of researching new fundamentals and practices and guide experimentalists in planning and designing their experiments in order to acquire curated data for machine learning. Therefore, the ingenious combination of experimental designs and data analysis methods is crucial to apply the machine learning approach to broader scientific fields. In this work, we present a systematic study to discover the relations between the physicochemical properties of amines and their reactivities to the MAPbI3 perovskite from the experimental design to machine learning analysis approaches. We aim to build a model that is able to predict the compatibility of untested amines for post-treatment on 398

DOI: 10.1021/acsenergylett.8b02451 ACS Energy Lett. 2019, 4, 397−404

Letter

ACS Energy Letters

Figure 2. Correlation matrix of Pearson coefficients between 31 selected features.

treatment. Quantifying of film damage is necessary for building the training models. Two examples of the UV−vis absorption spectrum of the MAPbI3 film before and after treatment are presented in Figure 1a. BA and triethylamine (TEA) are shown as examples of treating molecules. After being treated with TEA, the morphology and UV−vis absorption of the film are maintained, whereas the BA-treated films are entirely destroyed so that a flat baseline is exhibited. The residue index for an amine is calculated by taking the ratio of absorbance before and after amine treatment. The residue index over wavelength is plotted in Figure S3, and this index represents the degree of retention of the absorbance of perovskite after amine treatment; therefore, the average values over 500−700 nm wavelengths are reported as the evaluation indexes. The amine with high reactivity with the MAPbI3 film would damage the film, resulting in a residue index value of 0, whereas nonreactive amine will keep the film absorbance so that the residue index approaches 1. The residue indexes for 50 tested amines are plotted in Figure 1b. The supervised learning usually has two types of learning tasks: regression and classification. Regression trees have

dependent variables that are continuous values or ordered values, and classification trees have dependent variables that are categorical and unordered. Sometimes the two tasks can be transformed to each other.48 In this work, we focus on the classification task for the following two reasons: (1) the output result is more intuitional, whether the molecules are reactive or nonreactive, and (2) the classification algorithms are more diverse and relatively more developed. Therefore, to get categorical labels, we set a threshold on the residue indexes. The amines with a residue index above 0.75 are labeled as the nonreactive class, and the ones below 0.75 are labeled as the reactive class. The threshold setting is based on the apparent gap in residue indexes, as shown in Figure 1b, and is tunable for different intended use. The amine molecules with categorical groups are presented in Figure 1c. In this case, nonreactive class amines are considered to be used directly for further exploration in other tests. Reactive class amines are no longer considered for further use. Generate Feature X. With the development of computational chemistry and cheminformatics, the quantitative transformation of molecules to molecular descriptors becomes realiz399

DOI: 10.1021/acsenergylett.8b02451 ACS Energy Lett. 2019, 4, 397−404

Letter

ACS Energy Letters Table 1. Training Scores and Test Scores with Different Classifiers classifier Logistic Regression (l1) Logistic Regression (l2) SVM(rbf) LinearSVM (l1) LinearSVM (l2) KNN Decision Tree ̈ Bayes Gaussian Naive

train scorea 0.97 0.93 0.94 0.96 0.93 0.91 0.97 0.80

± ± ± ± ± ± ± ±

0.02 0.02 0.02 0.01 0.02 0.03 0.02 0.04

test score

optimized hyperparameters

± ± ± ± ± ± ± ±

“C”: 10, “penalty”: “l1” “C”: 1.0, “penalty”: “l2” “C”: 10.0, gamma = 1/n_feature “C”: 1.0, “penalt”: “l1” “C”: 0.1, “penalty”: “l2” “n_neighbors”: 3 “max_depth”: 3 n/a

0.83 0.84 0.86 0.83 0.85 0.84 0.81 0.74

0.11 0.13 0.12 0.13 0.14 0.13 0.14 0.10

a

The scores are reported as the averages and standard deviations over 20 random (80%) training/(20%) test splits

able.49,50 Molecular descriptors are results of a logic and mathematical procedure that transforms chemical information into useful numbers or results of some standardized experiments.51 These molecular descriptors not only contain experimental measurements such as molar refractivity, dipole moment, polarizability, and, in general, physicochemical properties but also include theoretical molecular descriptors derived from symbolic representations such as the topological Balaban Index (BI), Wiener Index, numbers of hydrogen donor sites, etc. JChem for Office (Excel) was used for chemical database access, structure-based property calculation, search, and reporting. The Charge, Elemental Analysis, Geometry, Hydrogen-Bond Donor−Acceptor, and Topology Analysis plugs were used to calculate 53 features of the studied amines, shown in Table S1 alphabetically. Ultimately, two types of functions were used in these calculations. One was directly applied on the molecule object, for example, molecular polarizability, acceptor count, BI, etc. Another calculation involved both a molecule object and specific atom, in this case, nitrogen. This calculation reported properties on the atom, for example, the σ orbital electronegativity (SOE) on the N atom. These features intend to comprise as much information as possible without human-biased screening under acceptable computational costs. These features will be used as candidates for feature selection to build the model for supervised learning. Understanding the data set structure is helpful to accelerate the machine learning process. The features are generated by enumerating the molecular descriptors from the database; therefore, some of the features have strong correlations. For example, if the N atom is a ring atom, this atom definitely cannot be a chain atom; the acceptor site counts (ASCs) are highly correlated with the acceptor counts. Such highly correlated features make little contribution to the learning process and thus can be treated as duplicates. Removing these duplicate features can reduce the redundancy of the input feature space. Pearson correlation coefficients (PCCs)52 were calculated on each pair of features in Figure S4. The features with an absolute value of PCCs greater than 0.95 are grouped together. The correlation filtered groups are displayed in Table S2. Only one of the features remains for each group. Twentytwo features are removed from correlation filters, and the rest of the 31 features are used for training. The PCCs of the remaining selected features are plotted in Figure 2. Data Training. Data training is the core section of the machine learning analysis, which is a direct representation of how good the mathematical representations are related to realworld attributes. The core thought of supervised machine learning is minimizing the cost function between the predicted target and existing experience or observed data via parameter

tuning. The entire data set is split into two subsets, the training set and test set, with a ratio of 80:20 in percentage. The training set is used for model training, and the test set is used for evaluating the prediction ability of the model to unseen data. There are a lot of popular machine learning algorithms for classification tasks such as Logistic Regression, Decision Tree, Support Vector Machines (SVM), K-Nearest Neighbors ̈ Bayes, Neural Network, etc.53 Different (KNN), Naive algorithms have different specialties, advantages, and drawbacks. We listed the training and test results in Table 1; the training functions are also described in the Supporting Information (SI). The best test score 0.86 ± 0.12 can be achieved using SVM with a Radial Basis Function (rbf) kernel. Some other models like Logistic Regression (l1 ad l2), SVM with a linear kernel (l1 and l2), KNN, and Decision Tree could also get quite similar test scores, 0.83 ± 0.13, 0.84 ± 0.13, 0.83 ± 0.13, 0.85 ± 0.13, 0.84 ± 0.13, and 0.81 ± 0.14, respectively. ̈ Bayes can only However, Decision Tree and Gaussian Naive get about 0.74 ± 0.10 prediction accuracy. Evaluation Metrics. There are several evaluation metrics for quantitatively evaluating the performance of machine learning models such as classification accuracy, area under curve (AUC), F1 score, confusion matrix, etc. In this work, we use classification accuracy, which is defined as the number of correct predictions divided by the total number of predictions made, on the two-class classification task. We also plot the confusion matrix, which presents the total counts of the predicted classes versus the true classes over 20 times random splits of trainings and tests. As shown in Figure 3, we could achieve the 86% accuracy using SVM with the rbf kernel. In future prediction, precision also plays an important role when the model is deployed for production. Precision is defined as the true positive (TP) label, which is the true predicted nonreactive class, divided by the total positive labels (true predicted (TP) plus false predicted (FP) nonreactive class). In the deployment stage, we need to use the predicted nonreactive class for further investigation; therefore, higher precision would reduce the cost of misprediction from the constructed machine. In this case, we could achieve 92% precision. Model Deployment. The SVM model is chosen to set up for future prediction on any new unseen amine molecules. Here, we demonstrate how to employ the model for production. The model is set up to predict the outcomes of unseen amines in previous model training and evaluation stages. Five amines are unbiasedly selected to test the model. The molecular descriptors are generated for each molecule, and the probabilities of the positive label and predicted labels are calculated from the model. The results are displayed in Figure 400

DOI: 10.1021/acsenergylett.8b02451 ACS Energy Lett. 2019, 4, 397−404

Letter

ACS Energy Letters

understanding and interpretation of the model is another goal for machine-learning-assisted discovery. Even if we generate a lot of features as input variables, the feature importance can be extracted, and important features can be selected from learned models. Feature selection is a frequently discussed topic in machine learning and statistics; it is the process of selecting a subset of most influential features for use in model construction with classification success. Least Absolute Shrinkage and Selection Operation (LASSO) algorithms penalize the regression coefficients with an l1 norm penalty, shrinking many of them to zero.52 It is an automatic algorithm that performs both regularization and feature selection tasks. With LASSO, sparse coefficients will be obtained by eliminating the irrelevant variables and selecting one from group variables.54 Any feature with nonzero regression coefficients is “selected” by the LASSO algorithm. We implement LASSO on the Logistic Regression model, plotting the weights/coefficients in Figure 5.

Figure 3. Confusion matrix between the true label and predicted label using SVM. The number in each section indicates the total counts over 20 splits of the tests.

4 and Table 2. All of the outcomes of the tested amines match up with the predicted results. This test shows the general-

Figure 5. Coefficients fitted from the Logistic Regression on l1 penalty (LASSO).

Specifically, the 10 nonzero coefficients are calculated for nonreactive/reactive classification. The amplitude and the sign of the coefficients will be used to rank the feature importance. Compared to the l1 penalty, the l2 penalty is another popularly used regularization term in supervised learning.55 l2 regularization encourages the sum of the squares of the parameters to be small. l2 does not shrink the coefficients to zero but penalizes more on larger weights instead. Coefficients with l2 regularization using the Logistic Regression model are plotted in Figure S8. These coefficients can be also used to discover the feature/label correlations. Learning f rom the Machine-Learned Model: Chemical Insights. The hidden trends explored from machine learning analysis can bring chemical insights that assist researchers in the design of more complicated systems in the future. As shown in Figures 5 and S8, the weights or coefficients for each feature can be calculated. In logistic regression, the probability of the prediction on one specific class y can be express as 1 P(y|X) = 1 + e−wX

Figure 4. Residue indexes of unseen test amines.

Table 2. Predicted and Test Results of Unseen Test Amine Molecules amine_name

probabilitya

predict_labelb

test_label

3-acetylpyridine 1-aminopyrene bis(dimethylamino)methane 3-aminoheptane cyclooctylamine

0.968 0.937 0.669 0.267 0.170

1 1 1 0 0

1 1 1 0 0

Probability is generated by the “predict_proba” function in the sklearn package; it represents the probability of the positive (nonreactive) class. bThe nonreactive class is labeled as 1, and the reactive class is labeled as 0.

a

izability of the model in predicting the compatibility of aminetype post-treatment molecules with MAPbI3 films. Feature Selection and Filtration. Besides building a model with good prediction ability to unseen new data in the future, 401

DOI: 10.1021/acsenergylett.8b02451 ACS Energy Lett. 2019, 4, 397−404

Letter

ACS Energy Letters where w is the weights/coefficients matrix, X is the input feature matrix, and y is the studied specific class, i.e., the nonreactive class in this work. Therefore, positive weighted features have positive contribution to the nonreactive class. By screening the coefficients of the classification model, we could achieve chemical insights into the features of the amines and how these features could affect the amine’s reactivity with MAPbI3 perovskite films. For example, in Figure 5, SOE has large positive weights; this means that the amines with a large value of SOE on the nitrogen atom would be less likely to be reactive. Electronegativity is a measure of the power of a chemically bonded atom to attract electrons to itself and can be defined in this way only for bonding orbitals.56 As the s character of a hybrid orbital increases, so does the apparent electronegativity of the atom that hosts this hybrid orbital. Therefore, a double-bonded nitrogen with sp2 hybridization would have a larger SOE than a single-bonded nitrogen with sp3 hybridization. For example, pyridine and piperidine, which both contain six atom rings, have 10.09 and 8.25 as the SOE, respectively; the former belongs to the nonreactive class, but the latter belongs to the reactive class. This information indicates that pyridine derivatives have less chance to be reactive with MAPbI3 perovskite. The ASC specifically depicts the number of the hydrogen bond acceptor sites that have stronger correlation with the number of lone pair electrons within the molecules. Typically, more nitrogen and oxygen atoms would result in larger ASCs. Positive weights on the reactive class and negative on the nonreactive class evidently show that more hydrogen bonding sites increase the probability to be reactive. For example, triethanolamine, which contains the TEA backbone with three additional hydroxide groups, is classified in the reactive class because of the dramatic increase of the hydrogen donor and acceptor sites. The Steric Effect Index (SEI) describes steric hindrance around the nitrogen atom studied, calculated from the covalent radii values and topological distances.57 The large positive weights (Figure S8) indicate that the amines with a larger number of connections and greater steric functional groups to nitrogen are more likely to be reactive. For instance, diisopropylamine and hexylamine are isotopes both containing six carbons and one nitrogen, which are nonreactive and reactive classes, respectively. The former’s SEI is 2.5, but the latter’s SEI is 1.9. In other words, primary amines are more reactive compared to secondary and tertiary amines. The Distance Degree (DD) is described as the sum of the corresponding row values in the distance matrix for each atom.50 It measures the sum of bond distances from the nitrogen atom to all other atoms. For instance, dodecylamine and hexylamine both contain an alkyl chain and one amino group at the edge, which are nonreactive and reactive classes, respectively. The former has a DD of 78 because there are more atoms farther from the nitrogen atom, whereas the latter has a much smaller DD of 21. This parameter indicates that large molecules tend to be in the nonreactive class. Besides some easily interpreted physicochemical properties above, some complicated topological molecular descriptors stand out from the analysis procedure. For example, the BI proposed by Balaban in 1982 is a topological index for a molecule based on distance sums as graph invariants with consideration of bonding degeneracies.58 Generally, larger molecules have a greater BI than smaller molecules, branched molecules have a greater BI than chain-like molecules, and

molecules with multiple bonding have a greater BI than singlebonding ones. Even though rule-based descriptions are not good representatives for the real meaning of these high-level indexes, from the experiment, we observe that the amines with a larger BI tend to be in the nonreactive class. Even if there are more features that we have not discussed here, we could provide a short summary generated from the machine-learned model via weights analysis on the Logistic Regression model. The amines that have less reactivity with MAPbI3 perovskite film are suggested to, but not limited to, possess the properties below: (1) (2) (3) (4) (5)

Large steric hindrance around nitrogen Fewer hydrogen bond acceptors and donors Greater number of substituents on the nitrogen atom Multibonding on nitrogen such as pyridine derivatives More branched isomers rather than chain-like isomers

In conclusion, we demonstrated in this work that machine learning techniques can be applied to assisting material development. We studied the compatibility of the MAPbI3 perovskite film with various types of post-treatment amines. Amines with fewer hydrogen bond donors and acceptors, more steric bulk, a greater number for substituents on the nitrogen atom, and pyridine derivatives tend to have high compatibility with perovskite films. The compatible and reactive molecules studied in this work suggest that small primary amines with multiple hydrogen bonding tend to destroy the perovskite in post-treatment processes, which should be used with special care in latent solution fabrication processes. The compatible molecules will be further examined to improve the moisture stability and electron−hole lifetimes of perovskite films in the future. Unlike other machine learning work focused on mining the existing database, we show that the data-driven learning process can also guide researchers to design new experiments from scratch by providing more curated data that is more facile for machine learning. Chemical knowledge and insight can be learned from the model even with a relatively small data set with careful analysis. Furthermore, we open the data set publicly in Supporting Information files, and more convenient and effective models are desired to be discovered for future researches.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acsenergylett.8b02451.



Experimental methods and data analysis, X-ray diffraction pattern, Pearson correlation of all generated features, grouped correlated features, cross-validation curve and learning curve of various estimators, coefficients for the Logistic Regression model with l2 penalty, and machine learning algorithm functions with the Scikit-learn package (PDF) Entire data set including generated features and tested labels (ZIP)

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Fax: +1-614-292-1685. Tel.: +1-614-247-7810. 402

DOI: 10.1021/acsenergylett.8b02451 ACS Energy Lett. 2019, 4, 397−404

Letter

ACS Energy Letters ORCID

(17) Christians, J. A.; Miranda Herrera, P. A.; Kamat, P. V. Transformation of the Excited State and Photovoltaic Efficiency of CH3NH3PbI3 Perovskite upon Controlled Exposure to Humidified Air. J. Am. Chem. Soc. 2015, 137 (4), 1530−1538. (18) Niu, G.; Li, W.; Meng, F.; Wang, L.; Dong, H.; Qiu, Y. Study on the Stability of CH3NH3PbI3 Films and the Effect of PostModification by Aluminum Oxide in All-Solid-State Hybrid Solar Cells. J. Mater. Chem. A 2014, 2 (3), 705. (19) Li, W.; Zhang, W.; Van Reenen, S.; Sutton, R. J.; Fan, J.; Haghighirad, A. A.; Johnston, M. B.; Wang, L.; Snaith, H. J. Enhanced UV-Light Stability of Planar Heterojunction Perovskite Solar Cells with Caesium Bromide Interface Modification. Energy Environ. Sci. 2016, 9 (2), 490−498. (20) Li, F.; Liu, M. Recent Efficient Strategies for Improving the Moisture Stability of Perovskite Solar Cells. J. Mater. Chem. A 2017, 5 (30), 15447−15459. (21) Noel, N. K.; Abate, A.; Stranks, S. D.; Parrott, E. S.; Burlakov, V. M.; Goriely, A.; Snaith, H. J. Enhanced Photoluminescence and Solar Cell Performance via Lewis Base Passivation of OrganicInorganic Lead Halide Perovskites. ACS Nano 2014, 8 (10), 9815− 9821. (22) Zhang, J.; Hu, Z.; Huang, L.; Yue, G.; Liu, J.; Lu, X.; Hu, Z.; Shang, M.; Han, L.; Zhu, Y. Bifunctional Alkyl Chain Barriers for Efficient Perovskite Solar Cells. Chem. Commun. 2015, 51 (32), 7047−7050. (23) Tripathi, N.; Shirai, Y.; Yanagida, M.; Karen, A.; Miyano, K. Novel Surface Passivation Technique for Low-Temperature SolutionProcessed Perovskite PV Cells. ACS Appl. Mater. Interfaces 2016, 8 (7), 4644−4650. (24) Wang, F.; Geng, W.; Zhou, Y.; Fang, H. H.; Tong, C. J.; Loi, M. A.; Liu, L. M.; Zhao, N. Phenylalkylamine Passivation of Organolead Halide Perovskites Enabling High-Efficiency and Air-Stable Photovoltaic Cells. Adv. Mater. 2016, 28 (45), 9986−9992. (25) Cao, J.; Yin, J.; Yuan, S.; Zhao, Y.; Li, J.; Zheng, N. Thiols as Interfacial Modifiers to Enhance the Performance and Stability of Perovskite Solar Cells. Nanoscale 2015, 7 (21), 9443−9447. (26) Yang, S.; Wang, Y.; Liu, P.; Cheng, Y. B.; Zhao, H. J.; Yang, H. G. Functionalization of Perovskite Thin Films with Moisture-Tolerant Molecules. Nat. Energy 2016, 1 (2), 15016. (27) Zhang, H.; Ren, X.; Chen, X.; Mao, J.; Cheng, J.; Zhao, Y.; Liu, Y.; Milic, J.; Yin, W.-J.; Grätzel, M.; et al. Improving the Stability and Performance of Perovskite Solar Cells via Off-the-Shelf Post-Device Ligand Treatment. Energy Environ. Sci. 2018, 11, 2253−2262. (28) Yoo, H. S.; Park, N. G. Post-Treatment of Perovskite Film with Phenylalkylammonium Iodide for Hysteresis-Less Perovskite Solar Cells. Sol. Energy Mater. Sol. Cells 2018, 179, 57−65. (29) Butler, K. T.; Davies, D. W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine Learning for Molecular and Materials Science. Nature 2018, 559 (7715), 547−555. (30) Coley, C. W.; Green, W. H.; Jensen, K. F. Machine Learning in Computer-Aided Synthesis Planning. Acc. Chem. Res. 2018, 51 (5), 1281−1289. (31) Pankajakshan, P.; Sanyal, S.; De Noord, O. E.; Bhattacharya, I.; Bhattacharyya, A.; Waghmare, U. Machine Learning and Statistical Analysis for Materials Science: Stability and Transferability of Fingerprint Descriptors and Chemical Insights. Chem. Mater. 2017, 29 (10), 4190−4201. (32) Sun, Y. T.; Bai, H. Y.; Li, M. Z.; Wang, W. H. Machine Learning Approach for Prediction and Understanding of GlassForming Ability. J. Phys. Chem. Lett. 2017, 8 (14), 3434−3439. (33) Zhou, Z.; Li, X.; Zare, R. N. Optimizing Chemical Reactions with Deep Reinforcement Learning. ACS Cent. Sci. 2017, 3 (12), 1337−1344. (34) Liu, Y.; Zhao, T.; Ju, W.; Shi, S. Materials Discovery and Design Using Machine Learning. J. Mater. 2017, 3 (3), 159−177. (35) Ramprasad, R.; Batra, R.; Pilania, G.; Mannodi-Kanakkithodi, A.; Kim, C. Machine Learning in Materials Informatics: Recent Applications and Prospects. npj Comput. Mater. 2017, 3 (1), 54.

Yongze Yu: 0000-0001-8861-5322 Yiying Wu: 0000-0001-9359-1863 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We acknowledge funding support from the U.S. Department of Energy (Award No. DE-FG02-07ER46427). Y.Y. thanks Allison Curtze for suggestions on the manuscript.



REFERENCES

(1) Correa-Baena, J.-P.; Saliba, M.; Buonassisi, T.; Grätzel, M.; Abate, A.; Tress, W.; Hagfeldt, A. Promises and Challenges of Perovskite Solar Cells. Science 2017, 358 (6364), 739−744. (2) Chen, H.; Ye, F.; Tang, W.; He, J.; Yin, M.; Wang, Y.; Xie, F.; Bi, E.; Yang, X.; Grätzel, M.; et al. A Solvent-and Vacuum-Free Route to Large-Area Perovskite Films for Efficient Solar Modules. Nature 2017, 550 (7674), 92−95. (3) Liu, M.; Johnston, M. B.; Snaith, H. J. Efficient Planar Heterojunction Perovskite Solar Cells by Vapour Deposition. Nature 2013, 501 (7467), 395−398. (4) Jeon, N. J.; Noh, J. H.; Yang, W. S.; Kim, Y. C.; Ryu, S.; Seo, J.; Seok, S. Il Compositional Engineering of Perovskite Materials for High-Performance Solar Cells. Nature 2015, 517 (7535), 476−480. (5) Tsai, H.; Nie, W.; Blancon, J. C.; Stoumpos, C. C.; Asadpour, R.; Harutyunyan, B.; Neukirch, A. J.; Verduzco, R.; Crochet, J. J.; Tretiak, S.; et al. High-Efficiency Two-Dimensional Ruddlesden-Popper Perovskite Solar Cells. Nature 2016, 536 (7616), 312−317. (6) Yang, S.; Fu, W.; Zhang, Z.; Chen, H.; Li, C. Z. Recent Advances in Perovskite Solar Cells: Efficiency, Stability and Lead-Free Perovskite. J. Mater. Chem. A 2017, 5 (23), 11462−11482. (7) Kumawat, N. K.; Gupta, D.; Kabra, D. Recent Advances in Metal Halide-Based Perovskite Light-Emitting Diodes. Energy Technol. 2017, 5 (10), 1734−1749. (8) Sun, J.; Wu, J.; Tong, X.; Lin, F.; Wang, Y.; Wang, Z. M. Organic/Inorganic Metal Halide Perovskite Optoelectronic Devices beyond Solar Cells. Adv. Sci. 2018, 5 (5), 1700780. (9) Dong, Q.; Fang, Y.; Shao, Y.; Mulligan, P.; Qiu, J.; Cao, L.; Huang, J. Electron-Hole Diffusion Lengths > 175 μm in SolutionGrown CH3NH3PbI3 Single Crystals. Science 2015, 347 (6225), 967− 970. (10) Xing, G.; Mathews, N.; Sun, S.; Lim, S. S.; Lam, Y. M.; Grätzel, M.; Mhaisalkar, S.; Sum, T. C. Long-Range Balanced Electron-and Hole-Transport Lengths in Organic-Inorganic CH3NH3PbI3. Science 2013, 342 (6156), 344−347. (11) Stranks, S. D.; Stranks, S. D.; Eperon, G. E.; Grancini, G.; Menelaou, C.; Alcocer, M. J. P.; Leijtens, T.; Herz, L. M.; Petrozza, A.; Snaith, H. J. Electron-Hole Diffusion Lengths Exceeding 1 Micrometer in an Organometal Trihalide Perovskite Absorber. Science 2013, 342 (2013), 341−344. (12) Adjokatse, S.; Fang, H. H.; Loi, M. A. Broadly Tunable Metal Halide Perovskites for Solid-State Light-Emission Applications. Mater. Today 2017, 20 (8), 413−424. (13) Kovalenko, M. V.; Protesescu, L.; Bodnarchuk, M. I. Properties and Potential Optoelectronic Applications of Lead Halide Perovskite Nanocrystals. Science 2017, 358 (6364), 745−750. (14) Li, Z.; Zhao, Y.; Wang, X.; Sun, Y.; Zhao, Z.; Li, Y.; Zhou, H.; Chen, Q. Cost Analysis of Perovskite Tandem Photovoltaics. Joule 2018, 2, 1559. (15) Chang, N. L.; Ho-Baillie, A. W. Y.; Vak, D.; Gao, M.; Green, M. A.; Egan, R. J. Manufacturing Cost and Market Potential Analysis of Demonstrated Roll-to-Roll Perovskite Photovoltaic Cell Processes. Sol. Energy Mater. Sol. Cells 2018, 174, 314−324. (16) National Renewable Energy Laboratory (NREL). https://www. nrel.gov/pv/assets/images/efficiency-chart-20180716.jpg (accessed July 2018). 403

DOI: 10.1021/acsenergylett.8b02451 ACS Energy Lett. 2019, 4, 397−404

Letter

ACS Energy Letters (36) Raccuglia, P.; Elbert, K. C.; Adler, P. D. F.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J. Machine-Learning-Assisted Materials Discovery Using Failed Experiments. Nature 2016, 533 (7601), 73−76. (37) Sumita, M.; Yang, X.; Ishihara, S.; Tamura, R.; Tsuda, K. Hunting for Organic Molecules with Artificial Intelligence: Molecules Optimized for Desired Excitation Energies. ACS Cent. Sci. 2018, 4 (9), 1126−1133. (38) Ye, W.; Chen, C.; Wang, Z.; Chu, I.-H.; Ong, S. P. Deep Neural Networks for Accurate Predictions of Crystal Stability. Nat. Commun. 2018, 9 (1), 3800. (39) Ryan, K.; Lengyel, J.; Shatruk, M. Crystal Structure Prediction via Deep Learning. J. Am. Chem. Soc. 2018, 140 (32), 10158−10168. (40) Janet, J. P.; Chan, L.; Kulik, H. J. Accelerating Chemical Discovery with Machine Learning: Simulated Evolution of Spin Crossover Complexes with an Artificial Neural Network. J. Phys. Chem. Lett. 2018, 9 (5), 1064−1071. (41) Zhang, Y.; Ling, C. A Strategy to Apply Machine Learning to Small Datasets in Materials Science. npj Comput. Mater. 2018, 4 (1), 28−33. (42) Segler, M. H. S.; Preuss, M.; Waller, M. P. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature 2018, 555 (7698), 604−610. (43) Han, X.; Wang, X.; Zhou, K. Develop Machine Learning Based Predictive Models for Engineering Protein Solubility. arXiv Prepr. arXiv1806.11369 2018. (44) Skoraczyñski, G.; DIttwald, P.; Miasojedow, B.; Szymkuc, S.; Gajewska, E. P.; Grzybowski, B. A.; Gambin, A. Predicting the Outcomes of Organic Reactions via Machine Learning: Are Current Descriptors Sufficient? Sci. Rep. 2017, 7 (1), 1−9. (45) Goh, G. B.; Hodas, N. O.; Vishnu, A. Deep Learning for Computational Chemistry. J. Comput. Chem. 2017, 38 (16), 1291− 1307. (46) Coley, C. W.; Barzilay, R.; Jaakkola, T. S.; Green, W. H.; Jensen, K. F. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci. 2017, 3 (5), 434−443. (47) Belsky, A.; Hellenbrandt, M.; Karen, V. L.; Luksch, P. New Developments in the Inorganic Crystal Structure Database (ICSD): Accessibility in Support of Materials Research and Design. Acta Crystallogr., Sect. B: Struct. Sci. 2002, 58, 364−369. (48) Salman, R.; Kecman, V. Regression as Classification. 2012 Proc. IEEE Southeastcon; 2012; pp 1−6. (49) Engel, T. Basic Overview of Chemoinformatics. J. Chem. Inf. Model. 2006, 46 (6), 2267−2277. (50) Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics: Vol. I: Alphabetical Listing/Vol. II: Appendices, References; Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2009. (51) Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors; WILEY-VCH Verlag GmbH: Weinheim, Germany, 2008. (52) Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267−288. (53) Kotsiantis, S. B. Supervised Machine Learning: A Review of Classification Techniques. Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies 2007, 160, 3−24. (54) Fonti, V. Feature Selection Using LASSO; VU Amsterdam, 2017; pp 1−26. (55) Ng, A. Y. Feature Selection, L1 vs. L2 Regularization, and Rotational Invariance. Twenty-First Int. Conf. Mach. Learn. - ICML ’04 2004, 78. (56) Hinze, J.; Jaffé, H. H. Electronegativity. I. Orbital Electronegativity of Neutral Atoms. J. Am. Chem. Soc. 1962, 84 (4), 540− 546. (57) Cao, C.; Liu, L. Topological Steric Effect Index and Its Application. J. Chem. Inf. Comput. Sci. 2004, 44 (2), 678−687. (58) Balaban, A. T. Highly Discriminating Distance-Based Topological Index. Chem. Phys. Lett. 1982, 89 (5), 399−404. 404

DOI: 10.1021/acsenergylett.8b02451 ACS Energy Lett. 2019, 4, 397−404