Lattice Convolutional Neural Network Modeling of Adsorbate

Jul 12, 2019 - We develop a novel lattice convolutional neural network (LCNN) that ... of the crystal graph convolutional neural network (test RMSE of...
0 downloads 0 Views 4MB Size
Subscriber access provided by UNIV OF SOUTHERN INDIANA

C: Surfaces, Interfaces, Porous Materials, and Catalysis

Lattice Convolutional Neural Network Modelling of Adsorbate Coverage-Effects Jonathan Lym, Geun Ho Gu, Yousung Jung, and Dionisios G. Vlachos J. Phys. Chem. C, Just Accepted Manuscript • DOI: 10.1021/acs.jpcc.9b03370 • Publication Date (Web): 12 Jul 2019 Downloaded from pubs.acs.org on July 18, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Lattice Convolutional Neural Network Modelling of Adsorbate Coverage-Effects Jonathan Lym1+, Geun Ho Gu2+, Yousung Jung2*, Dionisios G. Vlachos1* 1Department

of Chemical and Biomolecular Engineering and Catalysis Center for Energy Innovation (CCEI), University of Delaware, Newark, DE 19716

2Korea

Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro 34141, Daejeon 305-335, South Korea

+These

authors contributed equally to this work.

Abstract Coverage effects, known also as lateral interactions, are often important in surface processes but their study via exhaustive density functional theory (DFT) is impractical due to the large configurational degrees of freedom. The cluster expansion (CE) is the most popular surrogate model accounting for coverage effects but suffers from slow convergence, its linear form, and its tendency to be biased toward the selection of smaller clusters. We develop a novel lattice convolutional neural network (LCNN) that improves upon some of CE’s limitations and exhibits better performance (test RMSE of 4.4 meV/site) compared to state-of-the-art methods, such as the CE assisted by a genetic algorithm and the convolution operation of the crystal graph convolution neural network (test RMSE of 5.5 and 6.8 meV/site, respectively) by 201 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 50

30%. Furthermore, LCNN can outperform other methods with less training data, implying accuracy with less DFT calculations. We analyze the van der Waals (vdW) interaction via visualization of the hidden representation of the adsorbate lattice system in terms of individual site formation energies.

*Corresponding authors: [email protected]; [email protected]

2 ACS Paragon Plus Environment

Page 3 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Introduction Understanding the surface properties of materials is important for several applications including heterogeneous catalysts and semiconductors.1–4 Advancements in computing power enable the prediction of properties not easily measurable by experiments. One estimation tool that has gained significant attention is Density Functional Theory (DFT).5–8 However, DFT’s speed suffers for systems involving more than a couple 100 atoms9, and for exhaustively exploring a large chemical space. One such problem is the determination of energies and configurations of surface adsorbates.10,11 Adsorbates can occupy various sites, resulting in numerous configurational degrees of freedom. To overcome this challenge, a surrogate model can be trained, using DFT calculations, and used to predict properties of larger systems.12–14 One pioneering surrogate model used for many-body problems is the cluster expansion (CE).15 It is a lattice-based model that is effective at modelling multicomponent, periodic systems with short and long-range interactions. A system with N sites and M different types of components can be represented by a configuration vector, 𝛔, of length N, where each element is

the

occupancy

of

the

site

labeled

using

the

Ising

convention,

σi ∈

{ ± m, ± m ― 1, …, ± 1, (0)} where M = 2m (or 2m + 1). The general form of the Hamiltonian describing the energy of the system is expressed as: ECE(𝛔) =

∑∑V

(𝐬) (𝐬) α Φα (𝛔)

α

= 𝐕𝚽

(1)

(𝐬)

where ECE(𝛔) is the energy per site of configuration 𝛔, α is a symmetrically-distinct cluster, ( 𝐬) is a vector that specifies the basis functions to use for cluster α, V(𝐬) α is the interaction energy of the cluster using the specified basis functions, Φ(𝐬) α is the orbit-averaged cluster function, 𝐕 is a vector of the interaction energies, and 𝚽 is the correlation matrix whose rows correspond to configurations and columns correspond to symmetrically-distinct clusters with different basis functions.16 3 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Point

1-1-1

1NN

2NN

1-2-3

3NN

1-1-2

Page 4 of 50

4NN

2-2-2

Figure 1. Examples of clusters on a hexagonal lattice. Site, pair, and triplet clusters are highlighted in red, green and blue respectively. The interaction energies, 𝐕, are estimated using regression from a known set of configurations and energies (referred to as the training dataset). There are theoretically infinite clusters that can be defined by varying the cluster size and pair-wise distances (as shown in Figure 1) but given the finite set of training data, the CE must be truncated to prevent overfitting. Choosing suitable clusters is important to ensure good model performance.17,18 As the number of clusters increases, the number of ways of arranging adsorbates scales exponentially. Previously, clusters were chosen heuristically where small clusters with shortranged interactions were favored.19–22 However, this approach is biased and does not necessarily lead to an optimal model. Techniques in the machine learning community have helped address the combinatorial problem of selecting clusters. Cluster expansion software, such as the Alloy-Theoretical Automated Toolkit (ATAT)22–25 and the Universal Cluster Expansion (UNCLE)26, uses a genetic algorithm (GA) to choose clusters and has been used with great success.17,27–32 Another approach for cluster selection is regularizing parameters using the Least Absolute Shrinkage and Selection Operator (LASSO).33 This approach has been used for various 4 ACS Paragon Plus Environment

Page 5 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

applications, such as the prediction of thermochemical properties of gas and surface-phase molecules34, the interactions between single metal atoms and oxide supports35, and cluster expansions.28 Although the CE is widely used in the literature and has been generally very successful for various solid state applications, it converges slowly when adsorbates move away from ideal lattice positions36 and lateral interactions are nonlinear. More sophisticated models, such as the convolutional neural network (CNN) can reduce this error. A number of machine learning models have recently demonstrated higher accuracy for predicting molecular properties than the hybrid DFT itself.37 The CNN is especially well suited for structure-property relationships where it has been used to screen inorganic solid electrolytes38 and predict molecular properties with high accuracy.39,40 In molecular graph theory, mathematical objects called nodes and edges represent atoms and pairs of atoms, respectively. The molecular graph theory-based CNN transforms the properties of atoms and pairs of atoms through various operations to correlate them with molecular properties.39,40 This type of model is expected to be applicable for lattices as well, as binding sites and distances between sites can be thought of as nodes and edges, respectively. This graph theory-based CNN approach, to best of our knowledge, has not been applied to lattices and adsorbates. In this work, we develop a new machine learning model called lattice convolutional neural network (LCNN) to model coverage effects. We compare the performance of five different approaches. The first three use the CE model with different cluster selection strategies: (1) heuristic cluster selection, (2) GA, and (3) LASSO regression. The last two use CNNs: (4) LCNN, and (5) LCNN model with the convolution function of crystal graph convolutional neural network (CGCNN)41. We show that our newly developed model shows the greatest accuracy, and outperforms the other models with fewer data. We also visualize the latent representation of the lattice/adsorbate system and postulate advantages of the deep learning 5 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 50

over the CE method. Finally, we demonstrate the model’s ability to deconvolute the system’s formation energy to individual site formation energies by comparing the site energy with and without van der Waals (vdW) interactions, which is in accordance with the expected physical intuition. We compare the approaches for the co-adsorption of oxygen and nitrogen oxide on Pt(111) using published DFT data.

Methods Pt(111) Data Set All the data used to train, validate, and test the machine-learning models on the Pt(111) system was provided by Bajpai et al.27 We calculate the formation energy per site, EF(𝛔), in a similar manner using: nO

Pt + 2 O2 + nNONO→OθONOθNO/Pt EF(𝛔) =

E0(𝛔𝐢) N

― E0(𝟎) ―

θO 2

E0,O2 ― θNOE0, NO

(2) (3)

where E0(𝛔), E0(𝟎), E0,O2, E0,NO are the DFT-calculated electronic energy of configuration 𝛔, Pt(111) surface with no adsorbates, an oxygen molecule isolated in a vacuum, and a nitric oxide molecule isolated in a vacuum, respectively; nO and nNO are the numbers of atomic oxygen (O) and nitric oxide (NO) adsorbed; θO and θNO are the surface coverages of O and NO, respectively; and N is the number of fcc sites available on the surface. The provided database contains a total of 653 configurations. Cluster Expansion We use the Python library NetworkX42 to represent the clusters and configurations as graphs. This approach allows us to easily find symmetrically-distinct clusters located in configurations based on the relative distance between nodes. In each configuration, the surface atoms are directly read from the VASP output files and converted to a NetworkX graph. The nodes of 6 ACS Paragon Plus Environment

Page 7 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

these graphs contain information about the sites’ occupancies (three in total: 0 for a vacant site, 1 for O filled, 2 for NO filled). All nodes were connected via edges that represented the Euclidean distances between pair of atoms. The clusters were generated in an iterative fashion. For the Pt(111) system, clusters were restricted to a maximum of 4 nodes and the maximum pair-wise distances in nearest neighbors (NN), were 8, 6, and 4 for pairs, triplets, and quadruplets, respectively. We used a similar orthogonal basis set as ATAT22–25: Θ0(𝜎𝑖) = ― cos Θ1(𝜎𝑖) = ― sin

2𝜋𝜎𝑖

( ) ( ) 3

2𝜋𝜎𝑖 3

(4)

(5)

where σi is the occupancy of the ith site and Θj is the jth basis function. For each cluster in configuration, 𝛔, its weight can be calculated using: Φ(𝐬) α, i(𝛔) = Θn1(σ1)Θn2(σ2)…Θn|α|(σ|α|)

(6)

where α is a cluster with lattice points, α = {1, 2, …, |α|}, (𝐬) is a vector of indices which indicates the basis function to be used, (s) = {n1, n2…n|α|}, and i is an index of a particular arrangement of the cluster in the configuration. Each element of the correlation matrix, 𝚽, corresponds to a configuration, cluster, and arrangement of basis functions and can be calculated using: Φ(𝐬) 𝛼 (𝛔) =

1 N(𝐬) 𝛼 (𝛔)

N(𝐬) 𝛼 (𝛔)

∑Φ

(𝐬) α,i

(7)

i

where N(s) α (𝛔) is the number of symmetrically-distinct clusters in configuration, 𝛔. Given these restrictions, there is a total of 496 clusters. Below, we show an example of how to calculate an element of the 𝚽 matrix. We use a hypothetical unit cell with 0.25 monolayer (ML) O and 0.25 ML NO shown in Figure 2. The cluster evaluated is a pair of 1NN using the first basis function for the first site and the zeroth 7 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 50

(1,0) basis function for the second site (i.e., Φ(𝒔) 𝛼 (𝝈) = Φ1NN ([0, 2, 0, 1])). Using Equation (7), the

element of the correlation matrix, Φ(2,1) 1NN ([0, 2, 0, 1]), is 0.

3'

4'

1

2'

3

4'

1'

2

3'

4

1'

3'

4'

1'

2'

Figure 2. Example configuration of 0.25 ML of O and 0.25 ML of NO. Red, blue, and green represent O, NO, and vacant sites respectively. The bold line indicates the unit cell. The sites from the original cell are surrounded by one periodic layer to show interactions across the unit cell. Faded colors represent periodic images whose interactions are considered only if at least one node in the cluster is from the original unit cell. Table 1. Example calculating the element of the correlation matrix that represents the configuration shown in Figure 2 for the 1 NN cluster using the second basis function for the first site and the first basis function for the second site. 𝜎1 𝜎2 Θ1(𝜎1) Θ0(𝜎2) N(1,0) 1𝑁𝑁,𝑖 (𝛔)

Graph σ1

Φ(1,0) 1𝑁𝑁,i(𝛔)

(1,0) N(1,0) 1NN,iΦ1𝑁𝑁,i

σ2

0

0

0

―1

4

0

0

σ1

σ2

0

1

0

1 2

4

0

0

σ1

σ2

0

2

0

1 2

4

0

0 8

ACS Paragon Plus Environment

Page 9 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

σ1

σ2

1

0



3 2

―1

4

σ1

σ2

1

1



3 2

1 2

0



3 4

σ1

σ2

1

2



3 2

1 2

2



3 4



σ1

σ2

2

0

3 2

―1

4



3 2

―2 3

σ1

σ2

2

1

3 2

1 2

2

3 4

3 2

σ1

σ2

2

2

3 2

1 2

0

3 4

0

Sum

24

3 2

2 3

0 3 2

0

Heuristic Cluster Expansion The heuristic approach was implemented using in-house Python code using Scikit-learn for the linear regression models.43 The vector of configuration energies and the complete correlation matrix with all clusters and configurations were read. Then, a tenth of the data was randomly chosen and separated from the initial vector and matrix. The training correlation matrices were created by iteratively varying the clusters included using the heuristics suggested by Walle et al.22: “a cluster can be included only if all its subclusters have already been included” and “an m-point cluster can be included only if all m-point clusters of a smaller diameter have already been included”. Linear models were fit to each set of clusters and the most effective set was identified by minimizing the 10-fold crossvalidation error. Finally, the clusters and correspondinging correlation energies were used to predict the energies of the test set.

9 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 50

Genetic Algorithm-Assisted Cluster Expansion The cluster expansion model with the GA used for cluster selection was implemented using the multicomponent Alloy-Theoretical Automated Toolkit (ATAT) software.22–25 All the configurations were converted to ATAT format. The lattice was specified with lat.in, which stores the periodic cell size and the position of sites that can be occupied. The configuration structures were specified using str.out files, which also indicate the periodic cell sizes but carry information about the occupancy of each site. The symmetry and clusters of the lattice were sufficiently identified using default tolerances. The ATAT code was purposefully not integrated with any ab-initio software to prevent additional structures from being added to the training set. The multicomponent MIT ab-initio Phase Stability (mmaps) code was run using the default crossvalidation method until the default convergence criteria were met. The test error was calculated using the correlation dumper (corrdump) command with the test str.out files, the clusters (clusters.out) and corresponding energies (eci.out) predicted from the training data as inputs. LASSO-assisted Cluster Expansion LASSO reduces overfitting by eliminating parameters that do not contribute significantly to the regression. It builds upon the linear least squares regression by adding an L1 norm penalization term to the cost function. Using the notation from the cluster expansion, this can be represented as: 𝐕(λ) = min ‖𝐄𝐅 ― 𝚽𝐕‖22 + λ‖𝐕‖1 𝐕

(8)

Here 𝚽 is the correlation matrix, 𝐄𝐅 is a vector of the formation energies corresponding to the configurations calculated by Equation (3), 𝐕 is a vector of the interaction energies of the clusters, and λ is the penalization parameter. Due to the shape of the L1 norm in the 𝐕 space, minimization of the cost function tends to favor sparse solutions. As λ approaches infinity, the regression coefficients are set to 0 since 10 ACS Paragon Plus Environment

Page 11 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

the penalization cost is weighted greater than the least squares cost; and as λ approaches zero, the least squares regression solution is obtained. Tuning λ using cross-validation gives the optimal amount of penalization that will eliminate parameters that contribute to overfitting leading to an optimal model. The cluster expansion model with LASSO used for cluster selection was implemented using in-house Python code. LASSO regression was implemented using the Scikit-learn library in Python.43 A similar methodology was used as the heuristic approach except the clusters were selected using the LASSO cost function. Lattice Convolutional Neural Network Compared to a classical neural network, the operations and parameters are shared in the convolutional neural network (CNN), decreasing the number of parameters required. CNNs have been applied to molecular-graph learning with great success.39,41 The inputs for these models depend on atom feature vector (A), pair of atoms feature vector (P), and atom connectivity, given implicitly by the graph. The connectivity is important for model accuracy so defining mathematical operations to account for connectivity is key. In the CNN model of Kearnes et al.39, four convolutional transformation operations were defined: atom to atom (A→A), bond to bond (P→P), atom to bond (A→P), and bond to atom (P→A). By introducing effective transformation operations between the atom and bond properties (A→P, P→A), these operations were exploited to construct a large neural network. In Faber et al.37, atom features such as atom type, chirality, formal charge, ring sizes, hybridization, hydrogen bonding, and aromaticity are used to construct an initial atom feature vector. Bond features such as bond type, graph distance, whether two atoms are in the same ring, and spatial distance are used to construct an initial bond feature vector. The CNN is, then, used to correlate energies and lowest unoccupied molecular orbital energy, highest occupied molecular orbital energy, and other properties at near chemical accuracy. Recently, Xie et al.41 developed a convolution operation 11 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 50

for crystal graphs. In their model, convolutions are performed on a connected pair of atoms, and new atom properties are computed by pooling the convoluted pair of atoms. The CNN model presented in this work resembles that of Kearnes et al. and Xie et al., but the lattice imposes different characteristics: (1) binding sites become nodes in the graph, (2) edges represent the immediate adjacency between the sites, and (3) the lattice is symmetric, which may be exploited for improved accuracy. In this model, we disregard edge properties and only consider node properties. In addition, the CNN model of this work exploits a new convolution operation to account for symmetry in the lattice. The overall design of the model is shown in Figure 3a. Given a periodic lattice system with ns number of sites, each site’s features can be extracted and made into a matrix. In this work, we adopt a one hot encoding method based on site occupation states, but other features can be used as well. Given nc number of possible site occupation states, a site is given a real vector and each value in the vector represents a different occupancy state. The position that corresponds to the occupancy of the site is set to one, whereas others are set to zero. A site representation vector is denoted as 𝒙𝑗𝑖, where i is the index of the site in the lattice system, and j is the number of convolutions that the vector has undergone. Initially, the dimension of this representation is ℝ𝑛c, but depending on the convolution hyperparameter, the dimension can change after convolution. All the binding sites in the system define the site layer, 𝑿𝑗 ∝ ℝ𝑛𝑠 × 𝑛c = [𝒙𝑗1;…;𝒙𝑗𝑛𝑠]. The schematic of the lattice convolution (LConv) of a site layer, 𝑿𝑗→𝑿𝑗 + 1, is shown in Figure 3b. This operation is formally written as: 𝒙𝑗𝑖 = 𝑔(𝑓(ℎ(𝑿𝑗,𝜈𝑖,1)),𝑓(ℎ(𝑿𝑗,𝜈𝑖,2)),𝑓(ℎ(𝑿𝑗,𝜈𝑖,3)),…,𝑓(ℎ(𝑿𝑗,𝜈𝑖,𝑛p)))

(9)

The function ℎ(𝑿𝑗,𝜈𝑖,𝑘) gathers the local environment around the site i. It returns 𝜶𝑗𝑖,𝑘, a flattened subset of 𝑿𝑗, that contains x of site i as well as neighbor sites of i. The order of x in 𝜶𝑗𝑖,𝑘 is defined in a way that each site’s geometric locations in relation to other sites are 12 ACS Paragon Plus Environment

Page 13 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

consistent. The geometrically defined ordering enables the model to learn the local structure information as the shifted softplus function, f, is applied to 𝜶𝑗𝑖,𝑘 in the next step after weights and biases are added. Due to the symmetry in the lattice, there are multiple ways to construct 𝜶𝑗𝑖,𝑘. We handle the degeneracy by applying all the degenerate sequences of x to f. k in 𝜈𝑖,𝑘 and 𝜶𝑗𝑖,𝑘 indicates the index of the permutation, and np is the number of possible permutations. The presented model is considered a CNN, as the weights for function f are shared. After activation, the commutative pooling function is applied to the result of f, in order to consider various permutations without bias. Particularly, we use the summation function. This operation is performed for all the sites in the lattice to construct a new site layer, 𝑿𝑗 + 1. For the lattice containing multiple site types, more convolution operations are constructed, so that each type of site has its own weights and biases for f.

13 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 50

Figure 3. Flow chart of the lattice convolutional neural networks. (a) The overall design of the model with an example 2D hexagonal lattice. Features and each site’s neighbor lists are extracted (one hot encoding is used in this study). This information undergoes the lattice convolution (LConv) followed by site-wise activation and linear multiplication. The linear layer reduces the dimension of each site’s feature one, representing the individual site’s property which is average sum pooled to get property per site. (b) Flow chart of the lattice convolution for a site. The local environment of a site is gathered using the neighbor list, followed by permutations to account for symmetry in the lattice. Each gathered local environment undergoes activation followed by summation to calculate the new site features.

In the application to O, NO/Pt(111) system, we define the ordering of x in 𝒂𝑗𝑖,𝑘 as follows: the first element of the 𝒂𝑗𝑖,𝑘 is 𝒙𝑗𝑖, and the following element is a randomly selected neighboring site. The next site shares a surface Pt atom with the previous site. The rest of the sites are picked consecutively by continuing in the rotational direction of the previous two sites (clockwise or counterclockwise in relation to the site i) and adding sites in that direction. The algorithm stops when all nearest neighbors of a specified distance to the first site are found. This ordering is demonstrated in the diagram of the leftmost column of Figure 3b using colors. Particularly for O, NO/Pt(111) system, there are 6 degenerate ordering of sites, due to mirror symmetry and three-fold rotation symmetry. While the convolution operation allows the machine to learn the representation of the lattice, additional site-wise operations are performed to predict formation energy. We tested a number of different configurations following convolution: (1) linear multiplication, (2) softmax and linear multiplication operations, (3) activation and linear multiplication operations, (4) softmax, activation, and linear multiplication operations, and (5) activation, softmax, and linear multiplication operations. We find that configurations after convolution do not affect the model 14 ACS Paragon Plus Environment

Page 15 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

error (±0.5 meV/site error) as significantly as model hyperparameters do. After some experimentation, we chose configuration (3) as its accuracy is slightly better than the others. The final operation is linear multiplication, which reduces the site layer matrix to a ℝ𝑛𝑠 vector. The linear multiplication is applied in order to predict the formation energy of each site. Then the pooling operation averages the sites’ energies to compute the system’s formation energy per site. This model has five hyperparameters: (1) maximum distance of neighbors considered in function h, (2) number of convolutional layers, (3) the output size of f, (4) the output size of the rectified linear function, and (5) the L2 norm regularization strength. Figure 3b shows an example considering only the first nearest-neighbors, but farther neighbors can be considered as well. The output size of f and the shifted softplus function can be controlled where several f functions would be trained for output sizes larger than 1. For hyperparameter optimization, we perform random search (repeated 20 times), which has bee found to be more efficient than the grid search.44 For the cross-validation, the size of the test, validation, training sets is 10%, 10%, and 80%, respectively. For the learning curve analysis discussed below, the size of the validation set is retained at 10% and the remaining data is used as training sets, until its size becomes bigger than the training set, in which case, the size of the validation is set so that it is equal to the size of the training set. The CNN is implemented in Tensorflow45, and the model is trained using the Adam optimizer with a learning rate of 0.1.46 We implement exponential decay where learning rate is decayed by 50% when cross validation score did not improve after 5000 epochs. The optimization is stopped after learning rate decayed below 0.01. We tried also 0.01 and 0.001 initial learning decaying to an order of magnitude less, but the model RMSE did not improve. The initial representation of the lattice (one hot encoding layer) is standardized to improve convergence. We also use batch-normalization before applying the shifted softplus function.47 We find that 15 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 50

batch-normalization is critical to the convergence of the model. The model is published in our GitHub Page (https://github.com/VlachosGroup/lcnn). We also use the convolution operation from the crystal graph convolutional neural network of Xie et al.41 with some modification: 𝒛𝑡(𝑖,𝑗)𝑘 = 𝒙𝑡𝑖 ⨁ 𝒙𝑡𝑗 𝒙𝑡𝑖 + 1 =

∑σ(𝒛

𝑡 𝑡 (𝑖,𝑗)𝑘𝑾𝜎

+ 𝒃𝑡𝜎)⨀θ(𝒛𝑡(𝑖,𝑗)𝑘𝑾𝑡θ + 𝒃𝑡θ)

(10)

𝑗,𝑘

Here, t is the number of convolutions performed to the site layer, ⨁ represents concatenation, ⨀ represents element-wise multiplication, σ is a sigmoid function, θ is a shifted softplus function, and k indicates the kth bond between sites i and j. Compared to Xie et al., concatenation with bond properties for z is omitted as there are no edge properties in this work, and 𝒙𝑡𝑖 is not added together with the summation over j and k to compute 𝒙𝑡𝑖 + 1, as the dimension of 𝒙𝑡𝑖 + 1 and 𝒙𝑡𝑖 can be different in this work. Model Assessment All the approaches above were evaluated using the test set’s root-mean-square error (RMSE) and mean absolute error (MAE): 1 RMSEmodel = 𝑁 1 MAEmodel = 𝑁

𝑁

∑(𝐸

model(𝛔i)

― 𝐸F(𝛔i))2

(11)

𝑖 𝑁

∑ |𝐸

model(𝛔i)

― 𝐸F(𝛔i)|

𝑖

(12)

where N is the number test configurations, Emodel(𝛔i) and EF(𝛔i) are the formation energies of configuration 𝛔i predicted using the model and DFT respectively.

16 ACS Paragon Plus Environment

Page 17 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Results and Discussion Model Evaluation In order to optimize LCNN, we perform 20 random searches for hyperparameters as shown in Table 2. Here we refer LCNN-cgcnn as the LCNN framework but with the CGCNN convolution function. Table 3 shows each model’s test RMSE and MAE. The residual plots are shown in Figures S1 – S5. Here, LCNN shows accuracy better by >20% than other methods. Notably, LCNN-cgcnn shows the worst accuracy. We attribute this to the pairwise convolution operation of the cgcnn while all the other methods can recognize patterns beyond two-body interactions.

Table 2. Hyperparameter search for the two machine learning models. Hyperparameters

Considered Optimal Parameters Range LCNN LCNN-cgcnna

Maximum Neighbor Distance

1–2

2

2

Number of convolution layers

1–2

2

2

Output size of f in convolution

2 – 50

44

14

4

3

6.7×10-5

1.7×10-5

Output size activation

of

site-wise 2 – 4 10-3 – 10-5

L2 Norm Penalty a Convolution

is operated using the CGCNN convolution function.

Table 3. Root mean squared error and mean absolute error of test sets as well as their standard errors for each model. Model

RMSE (meV/site)

MAE (meV/site)

Heuristic CE

5.8 ± 0.3

3.9 ± 0.2 17

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

LASSO CE 5.3 ± 0.2

3.4 ± 0.2

GA CE

5.5 ± 0.3

3.4 ± 0.1

LCNNcgcnn

6.8 ± 0.5

4.6 ± 0.2

LCNN

4.4 ± 0.3

3.0 ± 0.3

Page 18 of 50

Latent Space Visualization In order to understand the mechanism behind the machine-learning model, we visualize the site layers at various stages of the model as shown in Figure 4. Here, we construct a 30 × 30 site rectangular lattice and randomly assign 270 O and 270 NO to sites. Our model architecture uses site-wise convolutions, allowing for the quantification of the machine’s representation of each site. The top left diagram in Figure 4 shows the visualization of one-hot encoding input layer where red, green and blue colors represent O occupied, vacant and NO occupied sites. Upon convolutions, the initial ℝ900 × 3 site layer becomes a ℝ900 × 44matrix. We perform principal component analysis to reduce the site layer to ℝ900 × 3 using the first three principal components. Similar to one-hot encoding input layer visualization, the site layer projected to the first, the second, and the third eigenspace are treated as red, green and blue colors, respectively. We observe that, after one convolution, the red, green, blue colors of the site layer represent the local density of NO and O, and vacant sites, respectively (Figure 4b and Figure S6). Figure 4c, and d shows the reduced site layer after subsequent operations. As the convolution progresses, LCNN effectively correlates these local environments to the site formation energies (Figure 4e). For these site layers, projection to the first principle component explains >99% of the variance in the layer and highly resembles the formation energy visualization, indication of the site formation energy information encoding (see Figure S7 for heatmap visualized projection). In the formation energy site layer (Figure 4f), we observe that NO dense regions 18 ACS Paragon Plus Environment

Page 19 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

result in low formation energy, whereas vacant site dense regions are high in energy. This is in agreement with Bajpai et al.27 where high NO coverage and high vacancy coverage results in large and small exothermic formation energy, respectively. To further understand the behavior of LCNN, two NO have been placed at varying distance and the layers have been visualized (Figure S8). It seems that the first convolution recognizes the occupancy patterns around each site signified by diverse arrays of colors around the occupied sites. After the second convolution, the visualization resembles the site-contributions, and hence the second convolution and following operations transform the recognized local patterns into energy. We note that visualization in Figure 4 is useful but also can be sensitive to the number of convolutions and radius considered by the model.

19 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 50

Figure 4. Visualization of site layer at (a) one-hot encoding input, after (b) one, (c) two, (d) after site-wise activation, and (e) linear multiplication in neural network architecture for a randomly created periodic lattice. Each hexagon represents a hollow binding site. Site layers for (b), (c), (d), and (e) are projected to their three principal components, and each projection used as red, blue, and green. The ratio of explained variance by red, green and blue colors are listed in parentheses below the figure. The heat color in (f) represents site formation energy in eV/site. 20 ACS Paragon Plus Environment

Page 21 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Effect of Van-der Waals forces The vdW forces can be important for adsorbate-adsorbate interactions in surface modelling (particularly when the density functional used cannot account for such forces properly either empirically or theoretically). In order to estimate the effect of adding empirical vdW corrections for the present coverage effects, we plotted the formation energy convex hull diagram as shown in Figure 5. We generate unique configurations in a unit cell containing up to 12 sites using the Hart et al. algorithm48, resulting in a total of 1,581,607 configurations. We note that the convex hull diagram in Figure 5b reproduces the result in Bajpai et al.27, validating the model. Figure 5 shows that vdW interactions can change the trend in the convex hull. Notably, the global ground state configuration changes with and without vdW corrections. PBE results show that the minimum configuration is observed at 0.78 ML of NO whereas it is 1.0 ML of NO for PBE-D3. This is due to the enhanced attractive interactions between NO with D3 corrections. Out of the 169 and 192 ground state configurations predicted by the PBE-D3 and PBE-based convex hulls, only 63 configurations intersect. This signifies that vdW corrections are significant enough to change stable configurations. We also find that 66 coverages are common ground state coverages between PBE and PBE-D3-based convex hull, but the corresponding configurations are different. Differences in configurations of these ground state coverages are analyzed to understand the effect of vdW corrections. The PBE-D3based model was used to compare the formation energies of the PBE-D3 ground state configurations to the PBE ground state configurations and we found the former configurations to be more stable by 2.1±0.4 meV/site. The same comparison using the PBE model shows the latter configurations to be more stable by 1.7±0.4 meV/site.

21 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 50

Figure 5. The convex hull of LCNN computed formation energies in eV/site trained using (a) PBE-D3 and (b) PBE. All possible unit cells containing up to 12 binding sites are considered (a total of 1,581,607 configurations). The vertices of convex hull facets indicate the ground state configurations. The green circle indicates the configuration with the global minimum formation energy.

We extract the configurations of a ground state coverage with the largest disparity in the preferred configuration by formation energy that is, configurations of coverage that has the PBED3 PBE largest value of |𝐸PBED3 + 𝐸PBE PBED3 ― 𝐸PBE PBE ― 𝐸PBED3|, where E is the formation energy and

superscript indicates the functional for which the configuration is preferred and subscript indicates the functional used to calculate energy. We visualize site formation energies of these configurations as shown in Figure S9. The PBE-D3 ground state configuration has O-NO-NO arranged in a 1-1-1 triplet (cluster representation shown in Figure 1) as the most stable sites in red color, whereas it is the least stable pattern in the PBE-based model. We suspect that this pattern is not stable in short-range interactions, but is stabilized due to the enhanced vdW interactions of NO molecules. For the configuration preferred by the PBE-based model, the visualization of site formation energies shows that the vacant sites are stabilized by short-range interactions. However, the PBE-D3-based model shows that this effect is outweighed by the 22 ACS Paragon Plus Environment

Page 23 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

vdW interaction favoring the adsorbate occupied sites. In summary, vdW corrections are significant for the preferred configurations given a coverage. As emphasized in kinetic Monte Carlo (KMC) literature, differences in the configurations observed can have a large impact on kinetics.49–51 This analysis also demonstrates the utility of the visualization method. Learning Curves In order to assess how quickly each model learns from the data, we vary the amount of training data and observe the impact on the training and test error. The steps taken are as follows: (1) separate 10% of the dataset for testing, (2) sample a fraction of the remaining data for training (e.g., 10%), (3) train and cross-validate all the models using the sampled fraction, (4) evaluate the training and test error of the models, (5) repeat steps 1-4 four times by randomly sampling different fractions of training data, (6) average the training and test errors for the five sets, and (7) increment the fraction of training data sampled and repeat steps 2-6. The results are shown in Figure 6.

23 ACS Paragon Plus Environment

The Journal of Physical Chemistry

25

Heuristic CE LASSO CE GA CE LCNN-cgcnn LCNN

20

Test RMSE (meV/site)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 50

15

10

5

0 0.0

0.2

0.4

0.6

0.8

1.0

Training Fraction Figure 6. The test error as a function of the fraction of the training data used. Similar to the results indicated in Table 3, the LCNN outperforms the other methods. The LCNN model has better prediction accuracy with only 50% of the training data. We note that LCNN only starts to outperform around 50% of the training set, which is attributed to the complexity of the model requiring larger data sets. Thus, various methods should be tested to decide whether the model error is satisfactory or more data should be collected to use complex models. The LCNN performs activation operations to the local environment, whereas the LCNN-cgcnn performs activation operations to pairs of sites. Thus, the LCNN activation operation is more complex and requires a larger set of data for effective training.

Conclusions In this work, we have developed a novel lattice convolutional neural network to model the coverage effects of an O, NO/Pt(111) system. We compared it to cluster expansions aided with different cluster selection methods, and a LCNN model constructed with the convolution 24 ACS Paragon Plus Environment

Page 25 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

function of crystal graph convolutional neural network and found the new approach performs the best with a test RMSE of 4.4 meV/site. Upon calculating the learning curves of each model, we found the LCNN outperforms the other models using less training data. The visualization of latent space demonstrates that the LCNN captures the local environment well. We attribute this learning of the local environment to its excellent performance over the cluster expansion method as the cluster expansion prefers long range interactions with smaller clusters. The LCNN architecture also allows estimation of site formation energy, which can reveal physical insights into the surface adsorbate system. Finally, we note that LCNN framework is highly flexible and applicable to other lattice-based systems.

Conflicts of Interest The authors declare no competing financial interest.

Supporting Information The following files are available free of charge. LCNN-CoverageEffects_SI shows the residual plots of each model using 90% of the data for training and the eigenspace visualization of the LCNN (PDF).

Acknowledgments The authors would like to thank the Schneider group for providing the structures used to train and test the models. JL and DGV acknowledge support from the Catalysis Center for Energy Innovation, an Energy Frontier Research Center funded by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Award no. DE-SC0001004 and a Delaware Energy Institute Fellowship to JL. GG and YJ acknowledge support from BK21

25 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 50

PLUS program funded by National Research Foundation of Korea, Ministry of Education, Republic of Korea.

References (1)

Salciccioli, M.; Stamatakis, M.; Caratzoulas, S.; Vlachos, D. G. A Review of Multiscale Modeling of Metal-Catalyzed Reactions: Mechanism Development for Complexity and Emergent

Behavior.

Chem.

Eng.

Sci.

2011,

66

(19),

4319–4355.

https://doi.org/10.1016/j.ces.2011.05.050. (2)

Greeley, J.; Nørskov, J. K.; Mavrikakis, M. Electronic Structure and Catalysis on Metal Surfaces.

Annu.

Rev.

Phys.

Chem.

2002,

53

(1),

319–348.

https://doi.org/10.1146/annurev.physchem.53.100301.131630. (3)

Diebold, U. The Surface Science of Titanium Dioxide. Surf. Sci. Rep. 2003, 48 (5–8), 53–229. https://doi.org/10.1016/S0167-5729(02)00100-0.

(4)

Batzill, M.; Diebold, U. Surface Studies of Gas Sensing Metal Oxides. Phys. Chem. Chem. Phys. 2007, 9 (19), 2307. https://doi.org/10.1039/b617710g.

(5)

Nørskov, J. K.; Abild-Pedersen, F.; Studt, F.; Bligaard, T. Density Functional Theory in Surface Chemistry and Catalysis. Proc. Natl. Acad. Sci. U. S. A. 2011, 108 (3), 937–943. https://doi.org/10.1073/pnas.1006652108.

(6)

Ziegler, T. Approximate Density Functional Theory as a Practical Tool in Molecular Energetics

and

Dynamics.

Chem.

Rev.

1991,

91

(5),

651–667.

https://doi.org/10.1021/cr00005a001. (7)

Geerlings, P.; De Proft, F.; Langenaeker, W. Conceptual Density Functional Theory. Chem. Rev. 2003, 103 (5), 1793–1874. https://doi.org/10.1021/cr990029p. 26 ACS Paragon Plus Environment

Page 27 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(8)

Cohen, A. J.; Mori-Sánchez, P.; Yang, W. Challenges for Density Functional Theory. Chem. Rev. 2012, 112 (1), 289–320. https://doi.org/10.1021/cr200107z.

(9)

Ratcliff, L. E.; Mohr, S.; Huhs, G.; Deutsch, T.; Masella, M.; Genovese, L. Challenges in Large Scale Quantum Mechanical Calculations. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2017, 7 (1), 1–24. https://doi.org/10.1002/wcms.1290.

(10)

Meredig, B.; Agrawal, A.; Kirklin, S.; Saal, J. E.; Doak, J. W.; Thompson, A.; Zhang, K.; Choudhary, A.; Wolverton, C. Combinatorial Screening for New Materials in Unconstrained Composition Space with Machine Learning. Phys. Rev. B - Condens. Matter Mater. Phys. 2014, 89 (9), 1–7. https://doi.org/10.1103/PhysRevB.89.094104.

(11)

Saal, J. E.; Kirklin, S.; Aykol, M.; Meredig, B.; Wolverton, C. Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials

Database

(OQMD).

Jom

2013,

65

(11),

1501–1509.

https://doi.org/10.1007/s11837-013-0755-4. (12)

Ulissi, Z. W.; Singh, A. R.; Tsai, C.; Nørskov, J. K. Automated Discovery and Construction of Surface Phase Diagrams Using Machine Learning. J. Phys. Chem. Lett. 2016, 7 (19), 3931–3935. https://doi.org/10.1021/acs.jpclett.6b01254.

(13)

Rupp, M.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. 2011, 058301 (February), 1–5. https://doi.org/10.1103/PhysRevLett.108.058301.

(14)

Jinnouchi, R.; Asahi, R. Predicting Catalytic Activity of Nanoparticles by a DFT-Aided Machine-Learning Algorithm. J. Phys. Chem. Lett. 2017, 8 (17), 4279–4283. https://doi.org/10.1021/acs.jpclett.7b02010.

27 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(15)

Page 28 of 50

Sanchez, J. M.; Ducastelle, F. Generalized Cluster Description of Multicomponent Systems.

Phys.

A

Stat.

Mech.

its

Appl.

1984,

128

(1–2),

334–350.

https://doi.org/10.1016/0378-4371(84)90096-7. (16)

Sanchez, J. M.; Ducastelle, F.; Gratias, D. Generalized Cluster Description of Multicomponent Systems. Phys. A Stat. Mech. its Appl. 1984, 128 (1–2), 334–350. https://doi.org/10.1016/0378-4371(84)90096-7.

(17)

Herder, L. M.; Bray, J. M.; Schneider, W. F. Comparison of Cluster Expansion Fitting Algorithms for Interactions at Surfaces. Surf. Sci. 2015, 640, 104–111. https://doi.org/10.1016/j.susc.2015.02.017.

(18)

Miller, S. D.; Kitchin, J. R. Uncertainty and Figure Selection for DFT Based Cluster Expansions for Oxygen Adsorption on Au and Pt (111) Surfaces. Mol. Simul. 2009, 35 (10–11), 920–927. https://doi.org/10.1080/08927020902833137.

(19)

Tang, H.; Van Der Ven, A.; Trout, B. L. Phase Diagram of Oxygen Adsorbed on Platinum (111) by First-Principles Investigation. Phys. Rev. B - Condens. Matter Mater. Phys. 2004, 70 (4), 1–10. https://doi.org/10.1103/PhysRevB.70.045420.

(20)

Tang, H.; Van Der Ven, A.; Trout, B. L. Lateral Interactions between Oxygen Atoms Adsorbed on Platinum (111) by First Principles. Mol. Phys. 2004, 102 (March 2015), 273–279. https://doi.org/10.1080/0026897042000178088.

(21)

Lazo, C.; Keil, F. J. Phase Diagram of Oxygen Adsorbed on Ni(111) and Thermodynamic Properties from First-Principles. Phys. Rev. B - Condens. Matter Mater. Phys. 2009, 79 (24), 1–18. https://doi.org/10.1103/PhysRevB.79.245418.

(22)

van de Walle, A.; Ceder, G. Automating First-Principles Phase Diagram Calculations. 28 ACS Paragon Plus Environment

Page 29 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

J. Phase Equilib. 2002, 23 (4), 348–359. https://doi.org/10.1361/105497102770331596. (23)

van de Walle, A.; Asta, M. D. Self-Driven Lattice-Model Monte Carlo Simulations of Alloy Thermodynamic Properties and Phase Diagrams. Model. Simul. Mater. Sc. 2002, 10, 521. https://doi.org/10.1088/0965-0393/10/5/304.

(24)

van de Walle, A.; Asta, M.; Ceder, G. The Alloy Theoretic Automated Toolkit: A User Guide. Calphad 2002, 26 (4), 539–553. https://doi.org/10.1016/S0364-5916(02)800062.

(25)

van de Walle, A. Multicomponent Multisublattice Alloys, Nonconfigurational Entropy and Other Additions to the Alloy Theoretic Automated Toolkit. Calphad 2009, 33, 266– 278. https://doi.org/10.1016/j.calphad.2008.12.005.

(26)

Lerch, D.; Wieckhorst, O.; Hart, G. L. W.; Forcade, R. W.; Muller, S. UNCLE: A Code for Constructing Cluster Expansions for Arbitrary Lattices with Minimal User-Input. Model. Simul. Mater. Sci. Eng. 2009, 17 (5). https://doi.org/10.1088/09650393/17/5/055003.

(27)

Bajpai, A.; Frey, K.; Schneider, W. F. Binary Approach to Ternary Cluster Expansions: NO-O-Vacancy System on Pt(111). J. Phys. Chem. C 2017, 121 (13), 7344–7354. https://doi.org/10.1021/acs.jpcc.7b00914.

(28)

Nelson, L. J.; Ozoliņš, V.; Reese, C. S.; Zhou, F.; Hart, G. L. W. Cluster Expansion Made Easy with Bayesian Compressive Sensing. Phys. Rev. B - Condens. Matter Mater. Phys. 2013, 88 (15), 1–10. https://doi.org/10.1103/PhysRevB.88.155105.

(29)

Schmidt, D. J.; Chen, W.; Wolverton, C.; Schneider, W. F. Performance of Cluster Expansions of Coverage-Dependent Adsorption of Atomic Oxygen on Pt(111). J. Chem. 29 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 50

Theory Comput. 2012, 8 (1), 264–273. https://doi.org/10.1021/ct200659c. (30)

Levy, O.; Hart, G. L. W.; Curtarolo, S. Uncovering Compounds by Synergy of Cluster Expansion and High-Throughput Methods. J. Am. Chem. Soc. 2010, 132 (13), 4830– 4833. https://doi.org/10.1021/ja9105623.

(31)

Chen, W.; Schmidt, D.; Schneider, W. F.; Wolverton, C. Ordering and Oxygen Adsorption in Au-Pt/Pt(111) Surface Alloys. J. Phys. Chem. C 2011, 115 (36), 17915– 17924. https://doi.org/10.1021/jp205995j.

(32)

Wu, C.; Schmidt, D. J.; Wolverton, C.; Schneider, W. F. Accurate CoverageDependence Incorporated into First-Principles Kinetic Models: Catalytic NO Oxidation on Pt (1 1 1). J. Catal. 2012, 286 (2), 88–94. https://doi.org/10.1016/j.jcat.2011.10.020.

(33)

Tibshirani, R. Regression Selection and Shrinkage via the Lasso. Journal of the Royal Statistical Society B. 1996, pp 267–288. https://doi.org/10.2307/2346178.

(34)

Gu, G. H.; Plechac, P.; Vlachos, D. G. Thermochemistry of Gas-Phase and Surface Species via LASSO-Assisted Subgraph Selection. React. Chem. Eng. 2018. https://doi.org/10.1039/C7RE00210F.

(35)

O’Connor, N. J.; Jonayat, A. S. M.; Janik, M. J.; Senftle, T. P. Interaction Trends Between Single Metal Atoms and Oxide Supports Identified with Density Functional Theory

and

Statistical

Learning.

Nat.

Catal.

2018,

1

(7),

531–539.

https://doi.org/10.1038/s41929-018-0094-5. (36)

Nguyen, A. H.; Rosenbrock, C. W.; Reese, C. S.; Hart, G. L. W. Robustness of the Cluster Expansion: Assessing the Roles of Relaxation and Numerical Error. Phys. Rev. B 2017, 96 (1), 1–11. https://doi.org/10.1103/PhysRevB.96.014107. 30 ACS Paragon Plus Environment

Page 31 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(37)

Faber, F. A.; Hutchison, L.; Huang, B.; Gilmer, J.; Schoenholz, S. S.; Dahl, G. E.; Vinyals, O.; Kearnes, S.; Riley, P. F.; Von Lilienfeld, O. A. Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. J. Chem. Theory Comput. 2017, 13 (11), 5255–5264. https://doi.org/10.1021/acs.jctc.7b00577.

(38)

Ahmad, Z.; Xie, T.; Maheshwari, C.; Grossman, J. C.; Viswanathan, V. Machine Learning Enabled Computational Screening of Inorganic Solid Electrolytes for Suppression of Dendrite Formation in Lithium Metal Anodes. ACS Cent. Sci. 2018, 4 (8), 996–1006. https://doi.org/10.1021/acscentsci.8b00229.

(39)

Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Molecular Graph Convolutions: Moving Beyond Fingerprints. J. Comput. Aided. Mol. Des. 2016, 30 (8), 595–608. https://doi.org/10.1007/s10822-016-9938-8.

(40)

Li, Y.; Tarlow, D.; Brockschmidt, M.; Zemel, R. Gated Graph Sequence Neural Networks. 2015, No. 1, 1–20. https://doi.org/10.1103/PhysRevLett.116.082003.

(41)

Xie, T.; Grossman, J. C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett. 2018, 120 (14), 145301. https://doi.org/10.1103/PhysRevLett.120.145301.

(42)

Hagberg, A. A.; Schult, D. A.; Swart, P. J. Exploring Network Structure, Dynamics, and Function Using NetworkX. In Proceedings of the 7th Python in Science Conference; Varoquaux, G., Vaught, T., Millman, J., Eds.; Pasadena, CA USA, 2008; pp 11–15.

(43)

Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning

in

Python.

J.

Mach.

Learn.

Res.

2012,

12,

2825–2830.

https://doi.org/10.1007/s13398-014-0173-7.2. 31 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(44)

Page 32 of 50

Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305.

(45)

Abadi, M. M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A System for Large-Scale Machine Learning. In OSDI; 2016; Vol. 16, pp 265–283. https://doi.org/10.1016/00766879(83)01039-3.

(46)

Konur, O.; Kingma, D. P.; Ba, J. Adam: A Method for Stochastic Optimization. CoRR 2014, abs/1412.6 (2), 365–380. https://doi.org/10.1063/1.4902458.

(47)

Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR 2015, abs/1502.0.

(48)

Hart, G. L. W.; Forcade, R. W. Algorithm for Generating Derivative Structures. Phys. Rev.

B

-

Condens.

Matter

Mater.

Phys.

2008,

77

(22),

1–12.

https://doi.org/10.1103/PhysRevB.77.224115. (49)

Temel, B.; Meskine, H.; Reuter, K.; Scheffler, M.; Metiu, H. Does Phenomenological Kinetics Provide an Adequate Description of Heterogeneous Catalytic Reactions? J. Chem. Phys. 2007, 126 (20). https://doi.org/10.1063/1.2741556.

(50)

Pineda, M.; Stamatakis, M. Beyond Mean-Field Approximations for Accurate and Computationally Efficient Models of on-Lattice Chemical Kinetics. J. Chem. Phys. 2017, 147 (2). https://doi.org/10.1063/1.4991690.

(51)

Chen, Z.; Wang, H.; Su, N. Q.; Duan, S.; Shen, T.; Xu, X. Beyond Mean-Field Microkinetics: Toward Accurate and Efficient Theoretical Modeling in Heterogeneous Catalysis.

ACS

Catal.

2018,

8

(7),

5816–5826. 32

ACS Paragon Plus Environment

Page 33 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

https://doi.org/10.1021/acscatal.8b00943.

33 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 50

TOC Graphic

34 ACS Paragon Plus Environment

Page 35 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Examples of clusters on a hexagonal lattice. Site, pair, and triplet clusters are highlighted in red, green and blue respectively. 249x161mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Example configuration of 0.25 ML of O and 0.25 ML of NO. Red, blue, and green represent O, NO, and vacant sites respectively. Bold line indicates the unit cell. The sites from the original cell are surrounded by one periodic layer to show interactions across the unit cell. Faded colors represent periodic images whose interactions are considered only if at least one node in the cluster is from the original unit cell. 111x78mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 36 of 50

Page 37 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Flow chart of the lattice convolutional neural networks. (a) The overall design of the model with an example 2D hexagonal lattice. Features and each site’s neighbor lists are extracted (one hot encoding is used in this study). This information undergoes the lattice convolution (LConv) followed by site-wise activation and linear multiplication. The linear layer reduces the dimension of each site’s feature one, representing the individual site’s property which is average sum pooled to get property per site. (b) Flow chart of the lattice convolution for a site. The local environment of a site is gathered using the neighbor list, followed by permutations to account for symmetry in the lattice. Each gathered local environment undergoes activation followed by summation to calculate the new site features. 198x163mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Visualization of site layer at (a) one-hot encoding input, after (b) one, (c) two, (d) after site-wise activation, and (e) linear multiplication in neural network architecture for a randomly created periodic lattice. Each hexagon represents a hollow binding site. Site layers for (b), (c), (d), and (e) are projected to their three principal components, and each projection used as red, blue, and green. The ratio of explained variance by red, green and blue colors are listed in parentheses below the figure. The heat color in (f) represents site formation energy in eV/site. 175x247mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 38 of 50

Page 39 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

The convex hull of LCNN computed formation energies in eV/site trained using (a) PBE-D3 and (b) PBE. All possible unit cells containing up to 12 binding sites are considered (a total of 1,581,607 configurations). The vertices of convex hull facets indicate the ground state configurations. The green circle indicates the configuration with the global minimum formation energy. 267x104mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The test error as a function of the fraction of the training data used. 209x178mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 40 of 50

Page 41 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

29x8mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

35x11mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 42 of 50

Page 43 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

35x12mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

35x11mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 44 of 50

Page 45 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

37x11mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

37x12mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 46 of 50

Page 47 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

35x12mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

37x12mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 48 of 50

Page 49 of 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

38x12mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TOC Graphic 232x137mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 50 of 50