Atomic Energies from a Convolutional Neural Network - Journal of

May 29, 2018 - Based on previous studies(10,14,23,24) and our own experience, truncating ..... While kCON can make solid total energy predictions on v...
0 downloads 0 Views 3MB Size
Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE

Structure Prediction

Atomic Energies from a Convolutional Neural Network Xin Chen, Mathias Siggaard Jørgensen, Jun Li, and Bjørk Hammer J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00149 • Publication Date (Web): 29 May 2018 Downloaded from http://pubs.acs.org on May 29, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Atomic energies from a convolutional neural network Xin Chen†,§, Mathias S. Jørgensen§, Jun Li†, Bjørk Hammer§ †

Department of Chemistry and Laboratory of Organic Optoelectronics & Molecular Engineering of the Ministry of Education, Tsinghua University, Beijing 100084, China §

Interdisciplinary Nanoscience Center (iNANO) and Department of Physics and Astronomy, Aarhus University, Aarhus DK-8000, Denmark

Keywords: atomic energy, convolutional neural network, machine learning, global minimum

ACS Paragon Plus Environment

1

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 30

Abstract

Understanding interactions and structural properties at the atomic level is often a prerequisite to the design of novel materials. Theoretical studies based on quantum-mechanical first-principles calculations can provide this knowledge, but at an immense computational cost. In recent years, machine learning has been successful in predicting structural properties at a much lower cost. Here we propose a simplified structure descriptor with no empirical parameters, “k-Bags”, together with a scalable and comprehensive machine learning framework that can deepen our understanding of atomic properties of structures. This model can readily predict structure-energy relations that can provide results close to the accuracy of ab initio methods. The model provides chemically meaningful atomic energies enabling theoretical analysis of organic and inorganic molecular structures. Utilization of the local information provided by the atomic energies significantly improves upon the stochastic steps in our evolutionary global structure optimization, resulting in a much faster global minimum search of molecules, clusters, and surfaced supported species.

ACS Paragon Plus Environment

2

Page 3 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1. Introduction

Understanding the interaction model and stability of chemical systems at the atomic level is critical in many disciplines ranging from molecular synthesis to design of new drug and materials. A synthesis chemist would – whether intendedly or unintendedly – combine basic chemistry textbook concepts, such as Lewis formulas, conceptual valence bond theory, electronegativity and aromaticity counts with life-long experience and be able to name, for instance, characteristics of functional groups in organic compounds. Likewise, an inorganic chemist may discuss the stability and likely abundance of local defects in crystals based on ionic radii, crystal field theory, bond-order models, acidity/basicity, and the like. Knowledge at the atomic level of chemical systems allows for rational design of new reaction routes and invention of innovative materials. Such conventional concept-driven analytic chemistry has over the past two decades been supported by elaborate first-principles calculations using density functional theory (DFT) and other quantum-mechanical approaches, and reasonably reliable DFT calculations may now be done routinely, resulting in the buildup of computed structure-energy databases. This new development has opened an unprecedented route to identifying stability in matter – the machine learning route. In the machine learning (ML) approach1-25, computational efforts are spent based on existing structure-energy relations for entire atomistic structures (e.g., molecules, clusters, solids) to train models that may predict total energies of unknown structures. Inspired by the Parrinello-Behler approach1, significant progress has been made with implementations of artificial neural networks that decompose the total energy into atomic contributions to achieve transferable predictions1-2, 45, 21, 25-29

. However, this approach requires a significant number of empirical parameters to create

symmetry-function based input features. In this work, we present a k-body convolutional neural

ACS Paragon Plus Environment

3

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 30

network (kCON) that preserves the atomic energy decomposition together with a new simplified spatially (translationally, rotationally, permutationally) invariant descriptor based on interatomic distances without any empirical parameters to tune. This model can take full advantage of modern processes such as GPUs for faster training while improving upon a previously proposed model by Zhai and Alexandrova

9

by introducing transferability and atomic energy predictions.

We demonstrate that the atomic energies provided by our model are chemically meaningful and can be used to guide the stochastic operations in evolutionary global structure optimization toward a faster global minimum search for structures of organic and inorganic molecules and clusters.

2. Methods

2.1 The many-body expansion theory The many-body expansion is one of the most widely used schemes for describing potential energy surfaces (PES)3-4, 9, 23-24, 30-31. In the many-body expansion scheme the total energy of a system with N atoms can be modeled as the sum of the contributions from all k-body terms, E

total

C1N

= åE a

( k =1) a

C2N

+åE

( k =2) ab

a ,b

C3N

( k =3 ) + å Eabc + ...

(1)

a ,b , c

where k = 1, 2, …, N is the many-bond index, C# " is the binomial coefficient that denotes the ("'()

number of k-combinations from a set of N atoms, 𝐸% ("'+)

(depending on the type of atoms only), 𝐸%*

is the one-body contributions

represents two-body (bond) contributions ("'-)

(depending on type of atoms, and interatomic distances), and 𝐸%*,

are contributions from

triple-atom interactions (depending further on bond angles). Since two-body terms alone cannot uniquely describe a structure9,

32

, higher-order terms ( 𝑘 ≥ 3 ) should be included. The

ACS Paragon Plus Environment

4

Page 5 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

computational cost for k-body roughly scales as 𝒪(𝑁 " ). Thus, truncation becomes necessary for a proper compromise of accuracy and efficiency. Based on previous studies10, 14, 23-24 and our own experience, truncating the MBE equation at 𝑘 5%6 = 3 is often the most balanced choice. Our kCON model can achieve sufficiently good performance with 1-, 2- and 3-body terms (Figure S3). For the 1-body part, a linear model (Figure S2) is used and for the 2- and 3-body terms one can use any nonlinear function 𝐅" to fit the energies. For example, the 3-body energy can be modeled as: ( k =3) Eabc = F3Aa Ab Ac (rab , rac , rbc )

(2)

where 𝑟%* is the interatomic distance (using the minimum-image convention for periodic structures) between atom 𝑎 and 𝑏, and 𝐴% is the element type of atom 𝑎. To normalize the interatomic distances to (0, 1), the covalent radii33 based exponential kernel9, 34 is used:

zab = e

-

rab Lab

(3)

where the covalent bond distance 𝐿%* is the sum of the covalent radii of atom 𝑎 and 𝑏. Thus, the total energy equation with the kmax = 3 truncation can be simplified to: E

total

C1N

= åE a

( k =1) Aa

C2N

+ åF a ,b

Aa Ab 2

C3N

( zab ) + å F3Aa Ab Ac ( zab , zac , zbc )

(4)

a ,b , c

The atomic energy 𝐸% can be derived by decomposing equation (4):

E

total

N N N æ ö C1N 1 C1 Aa Ab 1 C1 C1 Aa Ab Ac ç ( zab , zac , zbc ) ÷ = å Ea = å E Aa + å F2 ( zab ) + åå F3 ç ÷÷ a ! ! 2 3 a ç b¹a b ¹ a c ¹b c¹a è ø

C1N

(5)

This transformation assumes that dimers and trimmers contribute evenly to the two-body and three-body energies. This assumption will prove to work well for the energy evaluation. 2.2 The “k-Bags” Descriptor

ACS Paragon Plus Environment

5

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 30

To accurately predict structure-energy relations, one must carefully choose the model layout and the descriptor (i.e. the features that describe the input structure). Several descriptors have been proposed in previous works, including Bag-of-Bonds8, the Coulomb Matrix7, fingerprints35, and symmetry functions1, where the latter have achieved significant success with simple fullyconnected (FC) neural networks (NNs) in predicting total energies1-2, 11-13. The introduction of descriptors reduces the complexity of designing and training the machine learning models. However, the construction of the descriptors typically requires a lot of empirical experience and poorly constructed descriptors that either do not uniquely represent structures or that are not invariant to rotation, translation and permutation of identical atomic elements introduce noise in the model. Furthermore, NNs themselves are typically treated as black boxes. Hence, a deeper understanding of how NNs play the role of learning is still required for the development of better models. To overcome these problems, we introduce within our framework a new type of structure descriptor, where atomistic structures are described by interatomic distances directly9 and no empirical parameters are required. The descriptor, named “k-Bags”, is constructed from all possible k-body distances, bagged according to atomic types, and conditionally sorted to provide permutational invariance. This descriptor is invariant to translation and rotation, and uniquely describes a single structure for sufficiently large k. The k-Bags descriptor is fairly easy to construct and all the higher-order terms (2- and 3-) can be stored in a single matrix (Figure S1). Since the k-Bags descriptor simply bags k-body terms, it is also simple to build a NN architecture that provides atomic energy contributions, thus providing a new and direct way to understand how the ML model works (See Section 2.3 and Figure 2 for details). The raw k-Bags descriptor scales with 𝒪(𝑁 - ). However, taking into account only k-body interactions within a

ACS Paragon Plus Environment

6

Page 7 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

certain cutoff radius of an atom can greatly reduce the dimension of the descriptor while sacrificing only little accuracy. Our kCON model is built within the many-body expansion theory, which has proven successful in many theoretical chemical studies. Under the MBE scheme, one can decompose the MBE total energy into atomic energies with straightforward transformations (Equation (5) of Section 2.1). The derived atomic energies, which are later proved to be chemically reasonable, can lead to an efficient way of understanding chemical structures at the atomic level. As only interatomic distances are required, kCON naturally satisfies translational and rotational invariances, and the 1- and 2-body parts also uphold permutational invariance. While the elements (interatomic distances) of each k-body (k ≥ 3) interaction can be in arbitrary order, the parameters of the convolutional kernels are ordered, thus ruining the permutational invariance. To solve this issue and restore permutational invariance, a conditional sorting policy is adopted: the columns of the input matrix (the input layer) of each k-body CNN are ordered according to the bond types, and for each k-body interaction (matrix row) we only sort the entries for the same atom types (see Figure S4 for a more detailed description). We also tested the differential property of the k-Bags desciptor. Figure S5 presents a coordinate scan for the stretch mode of a C-C bond in a sample molecule, C6NH15, from the QM7 database. The MAE within ±0.2 Å of the equilibrium distance is just 0.2 kcal/mol compared with the B3LYP/631G(d,p) results obtained from Gaussian 09, Revision D.01.36-37 Such a potential surface scan shows promise that the k-Bags descriptor may be extended to describe atomic forces in future applications.

2.3 kCON Architecture

ACS Paragon Plus Environment

7

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 30

Figure 1 shows the general architecture of the kCON model. kCON is a collection of independent convolutional neural networks (CNNs) with 1x1 kernels and each CNN is used to learn a specific k-body (k = 1, 2, 3) term. The total energy is the sum of the outputs of all k-body CNNs. 1D-convolutional neural networks have proven successful in natural language processing38-39. Their property of supporting batched variable-length inputs allows us to construct a neural network capable of efficiently processing atomistic structures of varying size and composition in parallel while modern GPUs can be adopted to dramatically increase the processing speed. 1x1 CNNs are mathematically equivalent to FC neural networks. However, using a CNN over an FC implementation makes batch training much easier since external indexing is no longer required4 because CNNs naturally support proceeding matrices. In addition, the architecture of the kCON and the property of the k-Bags descriptor provide an easy way to inspect and visualize how the model transforms the interatomic distances and learns chemical patterns from the training data. Figure 2 shows the 3-body CNN for some representative Carbon-Hydrogen-Nitrogen (CHN) bodies collected with the affinity propagation clustering algorithm40. The CHN bodies in Figure 2a, which are extracted directly from the structures, form the input feature matrix shown in Figure 2b. Each row of the input matrix represents a CHN pattern and each column represents a bond (C-H, C-N, H-N). Figure 2c are responses of 8 kernels of the first convolutional layer. Each kernel of the first convolutional layer will only recognize and strongly respond – positively (red) or negatively (blue) – to certain types of input k-body distances. These responses form the abstract features to its successors, and finally the energy contributions (Figure 2d) can be learned. As an example, the motifs highlighted in red in Figure 2a are two extremes: the CHN in the bottom is rather stable, presumably because it is found in structures similar to the global minimum, while the other in the

ACS Paragon Plus Environment

8

Page 9 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

top is a rather unstable motif. The kernels of each k-body convolutional neural network act as chemical pattern learners. Table S1 summarizes the required number of parameters of a k-body convolutional neural network.

2.4 kCON Implementation The kCON model is implemented using Google’s TensorFlow41. TensorFlow is an open source numerical computation framework using data flow graphs with the ability of deriving gradients automatically and accelerating evaluations with GPUs. The Mini-batch Adam Optimizer42 is used for optimizing the neural network. Table S2 summarizes the training configurations for all datasets mentioned above. All convolutional layers are randomly initialized with the Xavier algorithm43 and our tests suggest that the Xavier initializer outperforms other initialization algorithms (for example the MSRA initializer44). The optimal learning rate parameters (the initial learning rate, the exponential decay step and decay rate) are obtained from grid searches. The activation function is the parametric rectified linear unit (PReLU)44 with a fixed alpha value of 0.01. PReLU typically gives much better results compared with traditional activation functions like the hyperbolic tangent (tanh) or sigmoid functions. In our test, the adoption of PReLU reduced the testing MAE of the C9H7N(PBE) dataset from ~0.34 eV (tanh) to ~0.27 eV (PReLU, 𝛼 = 0.2) and ~0.22 eV (PReLU, 𝛼 = 0.01), respectively. The results turn out to be rather insensitive to different sizes of the layers. We did a few tests and found both pyramid (128, 64, 32) and spindle (32, 64, 64, 32) NN structures to be acceptable. To train the network, the root-mean-squared-error (RMSE) loss, RMSE =

1 N

N

å( E - E ) i =1

i

NN i

2

ACS Paragon Plus Environment

9

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 30

is adopted as the objective function where N represents the total number of training samples, 𝐸C and 𝐸C## are the quantum chemistry and neural network predicted energy of sample 𝑖.

3. Selected Applications

3.1 kCON Performance To measure the performance of the proposed kCON we trained the model on various datasets (See the Datasets section of the Supplementary Information for details) for total energy predictions, including C9H7N (computed at the Perdew-Burke-Ernzerhof (PBE)45 level), periodic anatase TiO2 (001)46 at the DFTB level, and the publicly available datasets QM77, 47 and GDB948 computed at the DFT level. QM7 and GDB-9 are datasets with thousands of stoichiometries and composed of stable organic molecules that can be synthesized, while C9H7N and TiO2 are datasets resulting from global optimization searches46 with many high-energy structures. Table 1 summarizes the results from our kCON-based ML model. Before training, 20% of the samples were chosen randomly as the independent test sets. For the chemically stable organic molecules in QM7 and GDB-9, kCON can achieve highly accurate predictions. The mean absolute error (MAE) is 1.0 kcal/mol (0.043 eV or 2 meV/atom) for both datasets. Thus, the performance of kCON is comparable with the state-of-art deep neutral networks; the Deep Tensor Neural Network (DTNN) model5 and the Bonds-In-Molecules Neural Network (BIMNN) model4 both achieved a MAE of ~1.0 kcal/mol on the GDB-9 dataset. The kernel ridge regression method16 can give slightly better results (MAE of 0.58 kcal/mol) when the features also include 3-body (angle) and 4-body (dihedral) terms (as in the Histogram of Distances, Angles and Dihedrals (HDAD16) descriptor). SOAP (smooth overlap atomic positions)-GAP (Gaussian approximation potential)15, another kernel-based method, also gives a MAE of 1.0

ACS Paragon Plus Environment

10

Page 11 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

kcal/mol when learning the Coupled-Cluster (CC) energies directly. The training cost of kCON can be significantly reduced if we apply a cutoff radius of 4 times the covalent bond length 𝐿%* from equation 3 (Table 2) while the performances are almost unchanged. One should also be aware that these two datasets only contain very stable structures. Even the most simplified linear regression model (LR) can give acceptable performance, MAE ~0.8 eV (18.4 kcal/mol), on these datasets (Figure S2). Two-body terms can significantly decrease the overall MAEs, but to reach the chemical accuracy (