Applying Bayesian Approach to Combinatorial Problem in Chemistry

Apr 18, 2017 - ABSTRACT: A Bayesian optimization procedure, in combination with density functional theory calculations, was applied to a combinatorial...
2 downloads 0 Views 3MB Size
Article pubs.acs.org/JPCA

Applying Bayesian Approach to Combinatorial Problem in Chemistry Yasuharu Okamoto* The IoT Devices Research Laboratories, NEC Corporation, 34 Miyukigaoka, Tsukuba, Ibaraki 305-8501 Japan ABSTRACT: A Bayesian optimization procedure, in combination with density functional theory calculations, was applied to a combinatorial problem in chemistry. As a specific example, we examined the stable structures of lithium−graphite intercalation compounds (Li−GICs). We found that this approach efficiently identified the stable structure of stage-I and -II Li−GICs by calculating 4−6% of the full search space. We expect that this approach will be helpful in solving problems in chemistry that can be regarded as a kind of combinatorial problem.



INTRODUCTION

such as the least absolute shrinkage and selection operator (LASSO) seem to become more and more important for automating the determination of explanatory variables or features from among a huge variable space without requiring specialized knowledge of the relevant materials. There have been a number of studies that were aimed at accelerating materials research, especially by using LASSO. These studies include searching for new structure classifiers of binary semiconductors,8 predicting the band gaps of chalcopyrite compounds14 and double perovskites,15 and modeling dielectric breakdown.16 It seems to be reasonable to assume that an increasing number of variables will be used in future studies to improve the power of the prediction of the regression model with the help of the dimensional compression technique used in sparse modeling. In an extreme case, anything will do for explanatory variables because sparse modeling selects suitable ones from a huge variable space. In this study, a somewhat different approach was examined: We show that the features of the regression model are almost uniquely determined when we deal with a problem that can be regarded as a kind of combinatorial problem. Determining stable crystal structures or stable adsorption structures seems to be a type of combinatorial problem for which the ML approach works well. This is especially true of the series of crystals obtained by topotactic reactions that conserve the host crystal structure. Lithium−graphite intercalation compounds (Li− GICs), where Li atoms are intercalated into graphite, are one of the best-known examples of the topotactic reaction. Li− GICs are industrially important because they are the most popular anode active materials of Li-ion batteries. In Li−GICs, Li atoms occupy the hollow site of carbon honeycombs and form a √3 × √3 superlattice structure in the interlayer of graphite.17 The nth stage Li−GIC corresponds to the structure that forms once every n interlayer of graphite is occupied by Li atoms, which forms the superlattice, and the remaining (n − 1)

The recent progress and prevalence of the machine learning (ML) approach in addition to sophisticated data science technology has had an impact on various fields of science. In fear of missing out on the wave of new technology, many researchers have managed to take advantage of the fruits of advanced data science in their studies. This is true of materials science, and the number of studies categorized into materials informatics (MI) has increased considerably in recent years.1−4 MI is a data-driven approach in materials science, and it is expected to become the fourth approach of chemical research after experimentation, theory, and computer simulation. It seems that applying data science to chemical problem falls roughly into two categories. The one has always been called as chemoinformatics where researchers elaborately design the features of machine learning based on the already known knowledge of chemistry. The other is a more data-centric approach where researchers extract useful information from big data by using statistical techniques that are suitable for handling a huge amount of data. As compared with other major fields of informatics such as bioinformatics or image recognition, MI might suffer from a relatively low availability of appropriate material data. The paucity of available data in MI seems to have an influence on its research styles in a way; deep learning (DL) that currently sweeps big data analysis does not necessarily have an overwhelming advantage in comparison with other conventional statistical methods such as regression analysis or support vector machine due to the small size of the available data. The former style of research, in particular combined with density functional theory (DFT) calculations to generate the explained and/or explanatory variables of a regression model, is more dominant than the latter style of research as it is now.5−13 However, the latter might become widely used in future because designing suitable features is sometimes too tedious and difficult in complicated problems. Actually, such a transition of research style was observed in the field of image classification in data science. Considering that labor-saving is a strong motivating factor in the use of information technology, sparse modeling methods © 2017 American Chemical Society

Received: February 20, 2017 Revised: April 14, 2017 Published: April 18, 2017 3299

DOI: 10.1021/acs.jpca.7b01629 J. Phys. Chem. A 2017, 121, 3299−3304

Article

The Journal of Physical Chemistry A

pseudopotential were used for C (C.pbe-rrkjus.UPF) and Li (Li.pbe-mt_fhi.UPF), respectively. Note that the pseudopotential of Li includes nonlinear core correction.22 Plane-wave basis sets with cutoff energies of 30 and 300 Ry were, respectively, used for the expansion of wave functions and the charge density. k-Point samplings (3 × 3 × 3) with Methfessel−Paxton smearing were used for Brillouin zone integration.

interlayers are empty. Thus, the chemical formula of stage-I and -II Li−GICs is represented as LiC 6 and (LiC 6 )(C 6 ), respectively. The characteristic structure of Li−GIC is suitable as a test bed for the ML approach because we can easily measure the success and failure of the approach. Moreover, it is natural to consider the presence or absence of Li in hollow sites as the explanatory variables that can be directly related to the stage structure of Li−GICs. It is worth noting that searching for the stable structures of Li intercalation to graphite becomes a combinatorial problem when we assume that the presence or absence of Li respectively corresponds to a value of 0 or 1 of explanatory variables.



RESULTS AND DISCUSSION First, we examined the stable structure of the stage-I Li−GIC. A computational cell with a periodicity of 2√3 × 2√3 × 1 was



COMPUTATIONAL METHODS In this study, we used a Bayesian optimization procedure by combining it with DFT calculation. Figure 1 illustrates this

Figure 2. Computational cell (2√3 × 2√3) of carbon plane of graphite and 12 hollow sites for Li intercalation.

Table 1. Coefficient of Determination (R2) of Training Data Set with Different Hyperparameters (σ and λ) Values

Figure 1. Schematic illustration of Bayesian optimization procedure used in this study.

R2

hyperparameters σ σ σ σ

procedure. Although the phrases “data-driven approach” or “inverse problem” are sometimes emphasized to explain the characteristics of MI, in fact, it is much easier to treat an optimization problem directly by emulating the full search space by constructing a suitable regression model as follows. First, DFT calculations are done with the structure of training data to calculate the relevant property, i.e., the objective function. Note that the change in Gibbs energy before and after Li intercalation into graphite was used as the objective function. Second, the calculated results are fitted to a regression model by using the kernel ridge method with a Gaussian process.18 Third, the interim optimal value of the full search space (except for the region that is already included in the training data) is determined by the regression model. Fourth, DFT calculation is done with the optimal structure, and the result is added into the training data set. Then, the regression model is reconstructed on the basis of the updated training data set. Steps three and four are repeated until the true optimal value is reached. All our first-principles calculations were done by using the Quantum ESPRESSO (ver. 5.0.2) program package.19 Calculations of the electronic structure were made in accordance with the DFT framework under periodic boundary conditions with the vdW-DF2-C09 exchange correlation functional.20 As shown in the previous study,21 the functional fairly well describes the van der Waals interaction between graphene layers in addition to the ionic interaction between Li and graphite. Both hexagonal cell parameters (a and c) and all ionic positions in the computational cell were fully relaxed in all calculations by using the Broyden−Fletcher−Goldfarb−Shanno (BFGS) algorithm. Ultrasoft pseudopotential and Troullier−Martins soft

= = = =

10.0 5.0 1.0 1.0

λ = 0.1

λ = 1.0 λ = 0.5

0.9917375 0.9920177 0.9986678 0.9727654 0.9900394

used for stage-I calculation, and AA stacking was assumed when Li atoms were inserted into graphite. As shown in Figure 2, the cell contains 24 carbon atoms and 12 hollow sites for Li intercalation. Note that the cell has three stable Li arrangements, i.e., solutions that satisfy the requirements for a √3 × √3 superlattice structure. We assumed that one Li atom always occupies site 1 in the figure in order to alleviate the frustration among the three solutions. This assumption reduces the number of possible Li arrangements from 212 to 211 (= 2048). The arrangements of Li atoms in the initial training data set were generated according to the number of Li atoms (NLi) included in the cell as follows: NLi = 1: Site 1 is occupied by a Li atom (1 arrangement). NLi = 2: Site 1 and one site among 2−12 are occupied by a Li atom (11 arrangements). NLi = 11: Site 1 and 10 sites among 2−12 are occupied by Li atoms (11 arrangements). NLi = 12: All hollow sites are occupied by Li atoms (1 arrangement). N Li = 3−10: Twelve arrangements are randomly generated at each NLi (96 arrangements). Thus, the initial training data set comprised 120 pieces of training data in total. 3300

DOI: 10.1021/acs.jpca.7b01629 J. Phys. Chem. A 2017, 121, 3299−3304

Article

The Journal of Physical Chemistry A

states in ΔG(NLi) are solid and the difference between the Gibbs energy and total energy is relatively small. Gaussian kernel ridge regression model contains two hyperparameters (σ and λ).8,18 These two hyperparameters were adjusted as follows: First, by using the initial 120 pieces of training data, we examined the dependence of the coefficient of determination (R2) on the parameter σ by changing its value (10.0, 5.0, and 1.0) with fixing the other parameter (λ = 0.1). Then, the dependence was also checked by changing the parameter λ (1.0, 0.5, and 0.1) with fixing the other parameter (σ = 1.0). The results were listed in Table 1. We found that R2 was close to 1.0 in all cases and it did not much depend on hyperparameters at least at the range examined here. In the following calculations, we used the value of 1.0 and 0.1 for σ and λ, respectively. It seems that although the value of R2 can be closer to 1.0 by further tuning of the parameters, this might cause an overfitting problem. To assess the fitness of the employed kernel ridge regression method, we compared the ΔG(NLi) obtained by DFT calculation with that by the regression model with respect to the initial 120 pieces of training data (Figure 3a). It is notable that all plotted circles are close to the 45°-line, which represents the perfect fitting of the regression model. In addition, we found that an ordinary multiple linear regression model did not fit well with the training data. Moreover, the linear model failed to predict the optimal value because all coefficients in the model were positive except for the constant term. This means that site 1 was only occupied by a Li atom as a stable structure. Figure 4a shows the predicted interim optimal structure after the first optimization process obtained by the kernel ridge regression method. Although there are local √3 × √3 Li arrangements in this, Li atoms occupied the nearest neighbor hollow sites that were energetically not stable because of the electrostatic repulsion between two Li atoms. Note that Li atoms become Li+ ions in Li−GICs by donating their electrons to graphite. This instability is shown in Figure 3b, where the ΔG(NLi) obtained by the DFT calculation at the predicted structure was less stable than that of the initial training data. In addition, NLi = 3 is short of the desired value of NLi = 4 for the √3 × √3 superlattice structure. The second optimization process (Figure 4b) resolved the energetically unfavorable Li arrangements that are caused by occupying the nearest neighbor hollow sites. However, the structure still lacked one Li atom. The third optimization process (Figure 4c) finally reached the √3 × √3 superlattice structure. It is notable that the ΔG(NLi) of the superlattice

Figure 3. (a) Comparison of ΔG(NLi) obtained by DFT calculation with that by regression models with respect to initial 120 pieces of training data. Solid blue circles and open red triangles correspond to results of Gaussian kernel regression model and linear regression model, respectively. (b) ΔG(NLi) obtained by DFT calculation in initial training data and three optimization process. Red arrow designates optimal value obtained by Bayesian optimization procedure.

As stated, the objective function employed in this study is the Gibbs energy change [ΔG(NLi )] before and after Li intercalation. This is defined as follows: ΔG(NLi) = NLi × G(Li) + 24 × G(C) − G(LiNLiC24)

where G(Li), G(C), and G(LiNLiC24) are the Gibbs energies of bulk Li, graphite, and Li−GIC containing NLi Li atoms in the computational cell, respectively. Note that we used the total energy change instead of the Gibbs energy change because all

Figure 4. (a) Predicted structure after first optimization process. Red and blue circles surround Li pairs in nearest neighbor hollow sites and local √3 × √3 Li structure, respectively. (b) Predicted structure after second optimization process. Red arrow designates site lacking Li atom. (c) Predicted structure after third optimization process. A √3 × √3 superlattice structure was reached. 3301

DOI: 10.1021/acs.jpca.7b01629 J. Phys. Chem. A 2017, 121, 3299−3304

Article

The Journal of Physical Chemistry A

Table 2. Ranking of True Optimal Arrangement after 1st Prediction Step of 10 Subsets (NR), Average Value, and Standard Deviation of NR ( NR and STD) with Respect to Number of Li Arrangements in Training Data (Ntr) Ntr 60 76 92 108

NR 126 14 5 25

23 9 5 3

15 19 31 6

16 7 3 10

36 21 6 6

13 28 5 5

6 11 25 34

103 7 16 6

179 21 16 3

13 85 30 8

NR

STD

53.0 22.2 14.2 10.6

57.5 22.0 10.5 9.9

Figure 5. Relation between Ntr (number of Li arrangements in training data set) and NR (averaged ranking of true optimal arrangement after first prediction step). Dashed line shows fitted curve in exponential form ( NR = 4.37 × 107 Ntr−3.323). Figure 7. ΔG(NLi) of Li4C48 with different Li arrangements obtained by DFT calculation in initial training data and five optimization process. Red arrow designates optimal value obtained by Bayesian optimization procedure.

structure was significantly lower than that in the training data set as shown in Figure 3b. We found that the true optimal arrangement of the stage-I Li−GIC was ranked the fourth lowest energy after the first prediction by the Gaussian kernel regression. As stated, three times optimization processes were needed to reach the optimal arrangement. Thus, the ranking of the true optimal arrangement after the first prediction step (NR) seems to provide an indication of the number of optimization process required to find a solution, which can be used to analyze a relation between the number of optimization process and the number of Li arrangements in a training data set (Ntr). In order to find such a relation, we constructed subsets of training data by randomly choosing Ntr (= 60, 76, 92, and 108)

arrangements of Li from the initial training data set composed of 120 arrangements. Ten data sets were made at each value of Ntr. We listed NR for each Ntr as well as the average value of NR among 10 subsets ( NR ) and its standard deviation (STD) in Table 2. Although the deviation of NR at each Ntr is considerably large, we observed a trend that NR decreases as Ntr becomes large. Figure 5 shows the relation between NR and Ntr, which clearly illustrates the trend. It is noteworthy that although a choice of smaller Ntr might contribute to reduce the total number of DFT calculations needed to find a solution, the

Figure 6. (a) Predicted structure after first optimization process. (b) Predicted structure after second optimization process. (c) Predicted structure after third optimization process. 3302

DOI: 10.1021/acs.jpca.7b01629 J. Phys. Chem. A 2017, 121, 3299−3304

The Journal of Physical Chemistry A computation of training data can be done in parallel, whereas Bayesian optimization must be done in serial manner in a simple implementation of the present scheme. Therefore, the use of large Ntr might be convenient in practical ways. Having confirmed that the Bayesian optimization procedure works well in finding the stable structure of a simple stage-I Li− GIC, we are now in a position to evaluate the usefulness of the procedure by applying it to a more complicated problem of stage-II. Our stage-II model was identical to the stage-I model with respect to the carbon plane along the a and b axes, but it had twice the length along the c axis. This model contained 48 carbon atoms and 24 hollow sites. The number of possible Li arrangements was 224 = 16,777,216 in total. We restricted the search space to NLi = 4 because huge computational resources seem to be needed to calculate an adequate number of training data without some sort of restriction. Similar to the stage-I calculation, we assumed that site 1 is always occupied by a Li atom. Thus, the number of possible Li arrangements was reduced from 16,777,216 to 1771. We randomly chose 74 samples from 1771 Li arrangements to generate an initial training data set. Figure 6a shows the predicted interim optimal structure after the first optimization process. Both carbon layers have two Li atoms, and the arrangement of Li on the second carbon layer is quite different from the √3 × √3 superlattice structure. The second optimization process (Figure 6b) moved one Li atom from the second layer to an appropriate site of the first layer. However, one Li atom still remained on the second layer. The third optimization process (Figure 6c) finally reached the stageII Li−GIC structure; one carbon plane formed the √3 × √3 superlattice structure, and the other carbon plane was empty. As shown in Figure 7, the ΔG(NLi) at the optimal structure was markedly lower than that at other structures. In summary, this study demonstrates the usefulness of the Bayesian optimization approach in combination with DFT calculation for searching for a stable structure of materials when the search is properly regarded as a combinatorial problem. We provided a specific example of the approach by applying it to stage-I and -II Li−GICs. Applicable problems of the approach are expected to go well beyond finding a stable structure of alkali metal intercalation into graphite. For example, finding a stable adsorption superlattice structure or finding a stable composition of binary alloy would be suitable for the approach. However, there are a few points of caution to consider regarding the approach. One is that the computational cell must have an appropriate periodicity of the targeted structure because we usually perform DFT calculations under periodic boundary conditions. The other is that it is difficult to judge whether the optimal structure is already obtained or not during the optimization process when the targeted structure is totally unknown. Although the former difficulty can be solved with a dollop of help from experiments such as X-ray diffraction, the latter difficulty might be more serious.



Article



ACKNOWLEDGMENTS



REFERENCES

This work was partially supported by the Japan Science and Technology Agency (Advanced Low Carbon Technology Research and Development Program). This work was also supported by “Materials research by Information Integration” Initiative (MI2I) from Japan Science and Technology Agency.

(1) Rajan, K., Ed. Informatics for Materials Science and Engineering; Butterworth-Heinemann, Inc.: Oxford, UK, 2013. (2) Kalidindi, S. R., Ed. Hierarchical Materials Informatics: Novel Analytics for Materials Data; Butterworth-Heinemann, Inc.: Oxford, UK, 2015. (3) Lookman, T.; Alexander, F. J.; Rajan, K., Eds. Information Science for Materials Discovery and Design; Springer International Publishing: Switzerland, 2016. (4) Hill, J.; Mulholland, G.; Persson, K.; Seshadri, R.; Wolverton, C.; Meredig, B. Materials science with large-scale data and informatics: Unlocking new opportunities. MRS Bull. 2016, 41, 399−409. (5) Hachmann, J.; et al. Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry − the Harvard Clean Energy Project. Energy Environ. Sci. 2014, 7, 698−704. (6) Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von Lilienfeld, A. Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. J. Chem. Theory Comput. 2015, 11, 2087−2096. (7) Castelli, I. E.; Hüser, F.; Pandey, M.; Li, H.; Thygesen, K. S.; Seger, B.; Jain, A.; Persson, K. A.; Ceder, G.; Jacobsen, K. W. New Light-Harvesting Materials Using Accurate and Efficient Bandgap Calculations. Adv. Energy Mater. 2015, 5, 1400915. (8) Ghiringhelli, L. M.; Vybiral, J.; Levchenko, S. V.; Draxl, C.; Scheffler, M. Big Data of Materials Science: Critical Role of the Descriptor. Phys. Rev. Lett. 2015, 114, 105503. (9) Jain, A.; Hautier, G.; Ong, S. P.; Persson, K. New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships. J. Mater. Res. 2016, 31, 977−994. (10) Jalem, R.; Kimura, M.; Nakayama, M.; Kasuga, T. InformaticsAided Density Functional Theory Study on the Li Ion Transport of Tavorite-Type LiMTO4F (M3+−T5+, M2+−T6+). J. Chem. Inf. Model. 2015, 55, 1158−1168. (11) Seko, A.; Togo, A.; Hayashi, H.; Tsuda, K.; Chaput, L.; Tanaka, I. Prediction of Low-Thermal-Conductivity Compounds with FirstPrinciples Anharmonic Lattice-Dynamics Calculations and Bayesian Optimization. Phys. Rev. Lett. 2015, 115, 205901. (12) Seko, A.; Maekawa, T.; Tsuda, K.; Tanaka, I. Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids. Phys. Rev. B: Condens. Matter Mater. Phys. 2014, 89, 054303. (13) Huang, B.; von Lilienfeld, O. A. Understanding Molecular Representations in Machine Learning: The Role of Uniqueness and Target Similarity. J. Chem. Phys. 2016, 145, 161102. (14) Dey, P.; Bible, J.; Datta, S.; Broderick, S.; Jasinski, J.; Sunkara, M.; Menon, M.; Rajan, K. Informatics-aided bandgap engineering for solar materials. Comput. Mater. Sci. 2014, 15, 185−195. (15) Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T. Machine learning bandgaps of double perovskites. Sci. Rep. 2016, 6, 19375. (16) Kim, C.; Pilania, G.; Ramprasad, R. From Organized HighThroughput Data to Phenomenological Theory using Machine Learning: The Example of Dielectric Breakdown. Chem. Mater. 2016, 28, 1304−1311. (17) Enoki, T.; Suzuki, M.; Endo, M., Eds. Graphite Intercalation Compounds and Applications; Oxford University Press: Oxford, 2003. (18) Rasmussen, C. E., Williams, C. K. I., Eds. Gaussian Processes for Machine Learning; MIT Press: Cambridge, 2006. (19) Giannozzi, P.; Baroni, S.; Bonini, N.; Calandra, M.; Car, R.; Cavazzoni, C.; Ceresoli, D.; Chiarotti, G. L.; Cococcioni, M.; Dabo, I.; et al. QUANTUM ESPRESSO: A Modular and Open-source Software

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Yasuharu Okamoto: 0000-0001-6751-2447 Notes

The author declares no competing financial interest. 3303

DOI: 10.1021/acs.jpca.7b01629 J. Phys. Chem. A 2017, 121, 3299−3304

Article

The Journal of Physical Chemistry A Project for Quantum Simulations of Materials. J. Phys.: Condens. Matter 2009, 21, 395502−19. (20) Cooper, V. R. Van der Waals Density Functional: An Appropriate Exchange Functional. Phys. Rev. B: Condens. Matter Mater. Phys. 2010, 81, 161104. (21) Okamoto, Y. Density Functional Theory Calculations of Alkali Metal (Li, Na, and K) Graphite Intercalation Compounds. J. Phys. Chem. C 2014, 118, 16−19. (22) Louie, S. G.; Froyen, S.; Cohen, M. L. Nonlinear Ionic Pseudopotentials in Spin-Density-Functional Calculations. Phys. Rev. B: Condens. Matter Mater. Phys. 1982, 26, 1738−1742.

3304

DOI: 10.1021/acs.jpca.7b01629 J. Phys. Chem. A 2017, 121, 3299−3304