Assessing Gaussian Process Regression and ... - ACS Publications

Permutationally Invariant Polynomial Approaches to Represent ..... xi/l and xj/l, where l is the length-scale parameter that is optimized in training,...
1 downloads 0 Views 4MB Size
Subscriber access provided by Kaohsiung Medical University

Dynamics

Assessing Gaussian Process Regression and Permutationally Invariant Polynomial Approaches to Represent High-Dimensional Potential Energy Surfaces Chen Qu, Qi Yu, Brian L. Van Hoozen, Joel M Bowman, and Rodrigo A. Vargas-Hernandez J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00298 • Publication Date (Web): 30 May 2018 Downloaded from http://pubs.acs.org on June 2, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Assessing Gaussian Process Regression and Permutationally Invariant Polynomial Approaches to Represent High-Dimensional Potential Energy Surfaces Chen Qu ,† Qi Yu,† Brian L. Van Hoozen, Jr.,† Joel M. Bowman,∗,† and Rodrigo A. Vargas-Hern´andez‡ †Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, USA ‡Department of Chemistry, University of British Columbia, Vancouver, BC V6T 1Z1, Canada E-mail: [email protected]

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract The mathematical representation of large data sets of electronic energies has seen substantial progress in the past ten or so years. The so-called Permutationally Invariant Polynomial (PIP) representation is one established approach. This approach dates from 2003, when a global potential energy surface (PES) for CH5+ was reported using a basis of polynomials that are invariant with respect to the 120 permutations of the five equivalent H atoms. More recently, several approaches from “machine learning” have been applied to fit these large data sets. Gaussian Process (GP) regression is such an approach. Here we consider the implementation of the (full) GP due to Krems and co-workers, with a modification that renders it permutationally invariant, which we denote by PIP-GP. This modification uses the approach of Guo and co-workers and later extended by Zhang and co-workers, to achieve permutational invariance for neural-network fits. The PIP, GP and PIP-GP approaches are applied to four case studies for fitting data sets of electronic energies: H3 O+ , OCHCO+ and H2 CO/cisHCOH/trans-HCOH with the goal of assessing precision, accuracy in normal mode analysis and barrier heights, and timings. We also report an application to (HCOOH)2 , where the full PIP approach is possible but where the PIP-GP one is not feasible. However, by replicating data, which is feasible in this case, the GP approach is able to represent the data with comparable precision to the PIP one. We examine these assessments for varying sizes of data sets in each case to determine the dependence of properties of the fits on the training data size. We conclude with some comments on the different aspects of computational effort of the PIP, GP and PIP-GP approaches and also challenges these methods face for more “rugged” PESs, exemplified here by H2 CO/cis-HCOH/trans-HCOH.

2

ACS Paragon Plus Environment

Page 2 of 48

Page 3 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1

Introduction

The potential energy surface (PES) plays a central role in theoretical and computational chemistry, ranging from calculations of reaction dynamics to rovibrational energy levels of molecules to conformations of macromolecules. Though it can be obtained on the fly, i.e., by obtaining the electronic energy directly from electronic structure packages at every molecular configuration, an accurate analytical representation of the PES is usually preferred, since the great reduction in the cost of potential evaluation is worth the effort to obtain it. The challenge is to develop efficient and robust mathematical representations of PESs using data sets of electronic energies that span a high-dimensional space that describes large-amplitude molecular motions. Broadly speaking, there are two types of molecular interactions of interest, covalent and non-covalent. The former describes chemical reactions and chemical isomerizations. The latter describes weaker (hence non-covalent) interactions, e.g., H-bonding in water clusters. Ideally, the chosen representation should be capable of accurately describing both types of interactions. There has been substantial progress in the past ten or so years in developing such representations. The one based on permutationally invariant polynomials (PIPs) has produced many PESs for both non-covalent and covalent interactions. 1–3 This approach dates from 2003, when a global PES for CH5+ was reported using a basis of polynomials that are invariant with respect to the 120 permutations of the five equivalent H atoms. 4 More recently, Gaussian Process (GP) regression has been proposed by several groups as another method to represent high-dimensional PESs. 5–12 The GP approach is very general and has many applications, other than representing PESs. 13 At the risk of oversimplifying, it is a method that interpolates from known data (the training set) by using a weighted sum of the known data. If the training set is treated as noise-free, the GP interpolation goes through the data. However, as we note below and has been noted previously, the noise term is almost always added to avoid ill-conditioning of the covariance matrix. The PIP approach is a least-squares fit to known data and so in general does not reproduce them. 3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Here we apply the GP approach due to Krems, 8,9 and an extension of it, introduced here, to four molecular case studies. The original approach was applied to several four-atom systems 8,9 and also tested in quantum reactive scattering applications, namely H+H2 , 10,12,14 H+H2 O and H+CH4 , 10 and OH+H2 . 14 This implementation will be referred to as “full GP” later in the discussion section. We extend this full GP approach to render it permutationally invariant. We follow the approach proposed by Guo and co-workers 15–17 and extended by Zhang and co-workers 18,19 for achieving this for Neural Network (NN) fits. In this approach the “design variables” are a set of PIPs in all the internuclear distances and so the fit is permutationally invariant. Borrowing from the notation “PIP-NN” introduced by the Guo group, we denote the analogous GP approach by “PIP-GP”. For these four studies, the data sets consist of thousands of CCSD(T) electronic energies for H3 O+ , OCHCO+ , (HCOOH)2 and MRCI energies for H2 CO/HCOH. PIP PESs have already been reported for these molecules by us 20–23 and used in dynamics calculations. Here we examine the performance of PIP, GP and the new PIP-GP approaches. We consider the dependence of the fitting error, the computational effort to train and use the fits, and standard properties of the PES such as stationary points and harmonic frequencies on the training set of data. The paper is organized as follows. The next section presents a review of the theory of the PIP, GP and PIP-GP approaches. Applications and assessments of these approaches to the four molecular systems are then given, followed by a general discussion. Finally, a summary and concluding remarks about future work are given.

2 2.1

Theory of PIP, GP and PIP-GP representations Permutationally Invariant Polynomial Fitting (PIP)

Permutational symmetry is one of the essential properties in molecular systems. When the system has identical atoms, the constructed PES should be invariant with respect to permu4

ACS Paragon Plus Environment

Page 4 of 48

Page 5 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

tations of like atoms. To address this symmetry, we introduced PIPs of all the internuclear distances (actually transformed to Morse variables, see below) as a fitting basis to construct the PES. 1,4,24 For a N -atom molecule, the N (N −1)/2 internuclear distances, typically transformed to so-called Morse variables, are used in the permutationally invariant polynomial approach. Morse variables are given by yij = exp(−rij /a), where rij is the internuclear distance between atom i and j, and a is a range parameter, which is typically 2–3 bohr. However, for precise fitting in the long range, we have proposed a two-component PES, in which the long-range component uses a larger value of a. 3,25,26 In principle, this non-linear range parameter could be optimized; however, our limited investigation of optimizing it did not result in a significant, i.e., a factor of two or more, reduction in the root-mean-square (rms) fitting error. Below is a brief review of the approaches we used over the past decade to obtain the permutationally invariant fitting basis. One, which we termed the “pedestrian” approach, is to start with a single monomial and apply a permutation symmetrization operator to it to generate a symmetrized monomial. 1,3,27 This is pedagogically useful and is also very simple to develop the basis for molecules with small symmetric groups, e.g., A2 B, A3 , A2 B2 , A3 B, A3 B2 . 1 A more sophisticated, factored approach, based on monomial symmetrization to recursively generate high-order polynomials from lower-order ones was introduced by Xie and Bowman. 27 This software is freely available. 28 Another, and more efficient, factored representation of PIPs comes from invariant polynomial theory. 29 In this approach, the PIPs are given in terms of polynomials of so-called primary invariants, p(y), times secondary invariants, q(y). For a molecule of N atoms, there are N (N − 1)/2 internuclear distances/Morse variables, and thus this is the dimension of the space. From invariant theory, the number of primary invariants equals the dimensionality of the space, and thus N (N − 1)/2.

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 48

The factored expression for the potential in terms of primary and secondary invariants is

V (y) =

M X

cα hα (p(y))qα (y),

(1)

α=1

where hα (p) is a polynomial in the primary invariants and cα are the unknown linear coefficients. This summation is truncated at M terms, which depends on the total polynomial order. The coefficients are determined by solving the usual equations of linear least-squares fitting. The number of terms, M , for a given total polynomial order in all of the above approaches is the same and can be determined from the Molien series. 1,24 Tables giving the number for a variety of molecule types can be found in refs. 1,24,27. Typically, the total polynomial order is between 5 and 10 and M is of the order of hundreds to roughly 2000. A relevant example here is for OCHCO+ , with a corresponding symmetric group of order 2!2!. The dataset of 8613 CCSD(T) energies was fit using a total polynomial order of 5, which, using permutational symmetry, contains 904 terms. Without symmetry the number of terms is 3003. The factored approach has been used to generate a large library of representations of potentials for molecules with as many as ten atoms. 1

2.2

Gaussian Process (GP)

The Gaussian process model, which is a kernel-based machine learning algorithm, can be well served for Bayesian non-linear and non-parametric regression problem. This approach was recently introduced to represent high-dimensional PESs. Suppose there are N molecular configurations {x1 , · · · , xN }, where xi is a vector that specifies the molecular configuration, for example, the internal coordinates of a molecule, and N potential energy values associate with them {V1 , · · · , VN }. In the GP approach, these potential energies follow a multivariate

6

ACS Paragon Plus Environment

Page 7 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

normal distribution 





 V1   .   ..  ∼ N     VN

 



 µ1   k(x1 , x1 ) · · · k(x1 , xN )   .    .. .. ...  ..  ,   , . .         µN k(xN , x1 ) · · · k(xN , xN )

(2)

and in matrix form, V ∼ N (µ, K) ,

(3)

where µ is the mean vector, and the covariance matrix K consists of kernel function k(xi , xj ) that measures the similarity between two configurations xi and xj . For simplicity, the mean vector µ is assumed to be zero afterwards. Most of the kernel functions depend on the Euclidean distance dij between the two vectors xi /l and xj /l, where l is the length-scale parameter that is optimized in training, and it could be a single value or one for each element in x. There are several choices of the kernel function available, such as the squared-exponential kernel, which is used here. It is  2 dij k(xi , xj ) = σ exp − + δij σnoise , 2 2

(4)

where σ is a parameter to be optimized. The δij σnoise is the noise term that can be added to the diagonal of the covariance matrix. In principle, this term is not necessary for potential fitting, because the electronic energies should not contain noise. However, adding noise can avoid the ill-conditioning of the covariance matrix, and, more generally it can be considered as a parameter that is part of the GP optimization procedure. In the training step, the hyper-parameters l, σ, and σnoise are optimized such that the log-likelihood function

log L = −

 1 V > K −1 V + log |K| + N log 2π 2

7

ACS Paragon Plus Environment

(5)

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 48

is maximized. 7,9,13 Once the optimal hyper-parameters are determined, the GP model can predict the potential energy V0 for an arbitrary molecular configuration x0 . The potential energies still follow the same multivariate normal distribution, so that       V  0  K K0   ,   ∼ N   ,  > V0 0 K0 K00

(6)

where K0 = [k(x0 , x1 ) · · · k(x0 , xN )]> , and K00 = k(x0 , x0 ). The conditional distribution of V0 given the training data V is also a normal distribution, and we use its conditional mean V = K0> K −1 V

(7)

as the prediction for the potential, and the conditional variance

σ0 = K00 − K0> K −1 K0

(8)

as the uncertainty of the predicted energy. So the GP model predicts the potential energy based on the trained covariance matrix which gives the relevance between different molecule structures. Training with this approach scales non-linearly with the number of training data, N . Also, the determination of hyperparameters using Equation 5 requires an iterative process in which the inverse of the covariance matrix K is needed in each iteration, thus the training scales as O(tN 3 ), where t is the number of iterations and N is the number of training data. This is more demanding than our PIP approach, because the PIP is a linear regression problem and thus only requires a “oneshot” computation with complexity of O(N c2 ), where c is the number of linear coefficients. For prediction of the energy, considering that K −1 V can be pre-calculated in training, the computational cost scales as O(N ). The computational cost of GP could be greatly reduced by making sparse approximations. 7,30–32 These methods choose subsets or pseudo-input of 8

ACS Paragon Plus Environment

Page 9 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

training data, and thus the computational cost can reduced to O(m2 N ) for training, and O(m) for prediction, where m is the number of chosen sparse points and it could be much smaller than N . In this article our tests and conclusions are based on non-sparse, full GP, because the Python code we use does not have the sparse option, and the full GP version has been and is still actively used in various applications. 9–12 Another aspect for Gaussian process regression is the incorporation of permutational symmetry into the kernels. In the full GP version mentioned above this is not done. In this version, the vector x contains the internal coordinates of the molecule, for example internuclear distances or reciprocal internuclear distances, so that the invariance with respect to translations and rotations of the molecules is guaranteed. However, these internal coordinates do not contain any information about the permutational symmetry of the molecule, so the fitted PES is not permutationally invariant. We present a “remedy” to this below using PIPs. However, we note that one straightforward approach to numerically introduce permutational invariance is to replicate the training data according to the permutation operations. Clearly, this approach significantly increases the cost, since much more training data are added. The second approach to ensure the permutational symmetry involves the symmetrization of the kernels over all the permutations of identical atoms. 11,33,34 A third one is similar to the atomistic-NN. 35 In this approach, the “SOAP” kernel is used to describe the environment of each atomic species, 6 and the final predicted potential energy is a sum of all atomic energies. 7 This method can also automatically ensure rotational and permutational invariance. These three approaches are not considered here.

2.3

Permutationally invariant polynomial Gaussian process (PIPGP)

Inspired by the PIP-NN approach, 15–18 we incorporate the permutational symmetry into GP by using permutationally invariant polynomials of the internal coordinates as the “designed variables” x. Since each entry of the vector is one of the PIPs of the molecule, x is also 9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 48

permutationally invariant, and so is the predicted energy. However, as pointed out by Zhang and co-workers, 18 the number of polynomials would quickly become too large when the system is large, if all the primary and secondary invariants were to be used. Since the computational cost of the kernel function, and thus the potential prediction, usually grow linearly with the size of the vector x, the vector must be kept as small as possible. For this purpose, we adopt the fundamental invariants (FIs) proposed by Zhang and co-workers. 18 The FIs can be considered as the minimal “basis” that can generate all the PIPs, and they can be found using the algebra software Singular. 36 (These polynomials were introduced in 2013 a different context by Opalka and Domcke. 37 ) In fact the fundamental invariants and the primary/secondary invariants we used in our PIP method are closely related. The secondary invariants can be further divided into reducible and irreducible invariants. All reducible invariants can be represented by the irreducible ones. So the FIs only include the irreducible secondary and primary invariants (some redundant primary invariants will also be excluded). Most secondary invariants are reducible ones, so the number of the FIs is significantly smaller than all the primary and secondary invariants. For example, there are 1118 primary and secondary invariants for an A5 type molecule, but only 56 fundamental invariants. 18 However, we note that the number of FIs for A5 is still much larger than its degree of freedom, which is 9. Therefore, the FIs also becomes too expensive for complex molecules with high permutational symmetry, such as the formic acid dimer (FAD), (HCOOH)2 , which we discuss in detail below. The explicit expressions of FIs for 3–5 atom molecules can be obtained from https://github.com/kjshao.

3

Case Studies

Here we consider data sets of high-level electronic energies for H3 O+ , OCHCO+ , H2 CO/HCOH, and (HCOOH)2 . The corresponding equilibrium structures are shown in Figure 1. As explained in detail below, the order is given in terms of increasing complexity. PIP PESs for

10

ACS Paragon Plus Environment

Page 11 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

all of these molecules have already been reported; 20–23 however, here we re-fit these datasets with various training sets using GP and the PIP-GP approach. We note that the full datasets are “scattered” and very sparse compared to data generated on a direct-product grid. For example, for ten-atom (HCOOH)2 , the data set consists of 13475 energies, which in a directproduct grid would correspond to 1.48 configurations per degree of freedom. Details on how these data are obtained are given in review articles; 1,3 however, pruning of data from low-level direct dynamics calculations at a variety of total energies is the main strategy used.

Figure 1: Structures of H3 O+ , OCHCO+ , H2 CO and formic acid dimer. The software used in these studies is as follows. For the GP calculations the training is done with the Python code of Krems and co-workers, 14 which requires the Python plug-in scikit-learn. 38 Our own Fortran software is used to read the output of Python, evaluate the potential energy, and perform geometry optimization and normal mode analyses. The PIP fits are done using our home-grown software, 1 which makes extensive use of the MAGMA 11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Computational Algebra package. 39 All the calculations used a single core on a single compute node (Intelr Xeonr CPU E5-2630 v3 @ 2.40 GHz Base Clock Frequency). For all the test cases, the squared-exponential kernel was employed in GP and PIP-GP. Other kernels were tested, and we did not find a significant difference, so we used the squaredexponential one. However, testing a few different kernel functions for best performance is always a good idea when using the GP approach. For the first three test cases, each component of the vector x has its length-scale parameter, while in the case of formic acid dimer, due to large dimensionality and computational cost, only one single length parameter was used. For GP, we also tested three different variables for the vector x: internuclear distances, inverse of these distances, and Morse variables of the distances. Morse variables worked best except for the formic acid dimer case, where the inverse of the internuclear distances gave the smallest fitting error. Therefore, Morse variables were used for H3 O+ , OCHCO+ , and H2 CO/HCOH, while the inverse of internuclear distance was used for (HCOOH)2 . (Note that for our PIP fit, we always use the Morse variables.) When PIP-GP was applied, in two test cases (H3 O+ and OCHCO+ ) described below, we first transformed the internuclear distance into Morse variable and then constructed the fundamental invariants in terms of Morse variables; in H2 CO/HCOH, however, we first constructed the FIs in terms of internuclear distances and then transformed the FIs to Morse variables. We did not find significant difference between these two approaches. We tested the GP model with and without the noise term for H3 O+ . For GP we were able to perform the training but for PIP-GP numerical problems were found, due to the ill-conditioned covariance matrix. Therefore, noise terms were included in all the other test cases in order to avoid numerical instability.

12

ACS Paragon Plus Environment

Page 12 of 48

Page 13 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

3.1

H 3 O+

The first test case we investigate is H3 O+ , which is a four-atom system with C3v symmetry at global minimum and D3h symmetry at the saddle point separating two equivalent minima. The structures of minimum and saddle point are shown in Figure 1. We constructed a global potential energy surface of H3 O+ that dissociates to H+ and H2 O. 20 The fitting data set consists of 79,213 CCSD(T)-F12/aVQZ and 2284 MRCI/aVTZ energies with the energy up to 200 kcal/mol above the global minimum. The PIP approach was used for PES fitting, where the maximum polynomial order is set as 10 and the total number of linear coefficients is 1506. The rms error for the entire data set is 6.7 cm−1 . More details can be found in ref. 20. Here, we do not consider the dissociation of H3 O+ and restrict the energy range to a maximum of 60 kcal/mol (around 21,000 cm−1 ) above the global minimum. Thus, 32,141 data points are generated from original data set. This new set of data is also used as the test data after PES construction. We created five sets of training data with 500, 1000, 2000, 5000, and 10,000 points, by choosing different step sizes (1 point in 5 steps for example). All of the data sets have the same energy range and a similar distribution of energies. Histograms of the data sets are shown in Figure 2. Morse variables, yij = exp(−rij /a), were used in the GP fits with a as 2.5 bohr. Previous studies have used the inverse of the atomic distances. 10,11 However, we found that Morse variables result in significantly smaller test set rms errors. The relevant fitting results are shown in Table 1.

13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 48

Figure 2: Histograms of the hydronium data sets. The size of each data set is given in the top left corner of each plot. Table 1: The rms error (in cm−1 ) on the training and test data set, the timing (in seconds) for training and 100,000 potential evaluations

GP-500 GP-1000 GP-2000 GP-5000 GP-10000 PIP-500 (348) PIP-1000 (348) PIP-2000 (590) PIP-5000 (960) PIP-10000 (1506) PIP-32141 (1506)

rms(train)

rms(test)

t(train)

t(evaluate)

0.00 0.00 0.00 0.00 0.00 5.49 10.27 6.47 7.64 6.96 7.09

238.74 125.76 36.33 11.17 5.62 116.99 39.20 28.13 11.81 9.99 7.09

49 352 751 8717 48662 0.03 0.06 0.21 1.73 10.25 33.47

2.81 5.63 11.19 28.03 55.58 0.16 0.16 0.21 0.29 0.40 0.40

The number of polynomials in the PIP basis is given in parentheses. In the GP approach used in this case, the noise term in the kernel function is zero and so the GP PES goes through all the training data set, as it should. The rms error for the test data decreases to 5.62 cm−1 when increasing the size of training data set. However, the

14

ACS Paragon Plus Environment

Page 15 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

computation cost for training process scales as O(N 3 ) in this full GP version, and so the data size of ∼10,000 points becomes the practical limit for training. As to the time of potential calls, a linear relation between cost and the size of training data can be easily verified. We also conducted PIP fitting for each set of training data, using an appropriate maximum polynomial order. The polynomial order is determined to improve fitting accuracy but to also avoid overfitting the data. Comparing the rms errors of test data from two approaches, the PIP method has better performance when using smaller training data. The two approaches reach similar rms values when the number of training data goes above 5000. In Figure 3, we plot the difference between predicted energies and ab initio ones from different approaches with 1000 training data. In the energy region below 15,000 cm−1 , both GP and PIP approaches give very accurate predictions with few outliers. Beyond that region, more outliers are observed in the GP fit and this is the main reason that GP approach has a larger rms error. As seen in Table 1, for different sizes of training data, PIP training only costs several seconds while GP training may cost much longer. This is mainly a consequence of the efficiency of linear vs non-linear optimization. Also, the PES evaluation using PIP depends on the number of linear coefficients and the speed of 100,000 potential calls is around 20 times faster than GP approach even when 500 training data was used.

Figure 3: Difference between predicted energies and ab initio energies using validation data. Each point in the figure represents one data point We next applied the PIP-GP approach to construct the PES. There are 9 fundamental 15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 48

invariants for H3 O+ , an A3 B-type molecule. So we replaced the input vector of six Morse variables with 9 fundamental invariants in terms of the Morse variables. In this case, noise terms are added to reduce ill-conditioning of the covariance matrix and we constructed the PES using different sets of training data. We do not show the detailed rms errors and computation cost using PIP-GP approach as these are just marginally different from the GP results shown in Table 1. As expected, the computational cost to evaluate the PIP-GP PES is greater than GP owing to the increase in the dimensionality of the input vector from 6 to 9. We next test the properties of the PESs using GP, PIP-GP and PIP approaches with 1000 training data, since 1000 training data points already provide accurate predictions below 15,000 cm−1 . Specifically, the harmonic frequencies of H3 O+ global minimum and saddle point are shown in Table 2. Table 2: Harmonic frequencies and barrier height (in cm−1 ) of H3 O+ from different PESs and CCSD(T)-F12b/aVQZ calculations Minimum

Saddle point

GP-1k

PIP-GP-1k

PIP-1k

ab initio

GP-1k

PIP-GP-1k

PIP-1k

ab initio

888 1693 1696 3608 3703 3710

889 1696 1696 3605 3707 3707 GP-1k 672.3

885 1696 1696 3607 3708 3708

886 1698 1698 3605 3705 3705 PIP-GP-1k 667.1

669i 1631 1634 3661 3805 3814

668i 1635 1635 3660 3814 3814 PIP-1k 672.6

671i 1632 1632 3657 3813 3813

671i 1634 1634 3659 3810 3810 ab initio 671.8

barrier

Hydronium has two degenerate vibration modes which are the bend and asymmetric stretches. In principal, the constructed PES should display this behavior. However, the calculated harmonic frequencies from the GP PES are not exactly degenerate for these modes, although the frequencies are relatively close to the high-level ab initio calculation. As expected, the PIP-GP PES does produce degenerate frequencies at both the global minimum

16

ACS Paragon Plus Environment

Page 17 of 48

and saddle point and the harmonic frequencies for all modes are slightly improved compared to those from the GP PES. The PIP-PES produces frequencies in very good agreement, and marginally the best overall, with the ab initio results. The ab initio saddle point barrier height is also well reproduced by all the PESs. Thus, these properties of the PES are well reproduced by the PIP-GP and PIP approaches even with a modest data size of 1000. 40 30 20

PIP-GP-1k 30 GP-1k

2000

1000

10 0 -10 -20 -30

40

PIP-1k

400

673

5000 10000

20 10 0 -10

Qasym-str (a.u.)

Qsym-str (a.u.)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

400

1000

673 2000 5000 10000 15000

-20 -30

15000 20000

20000

-40

-40 -80 -60 -40 -20 0 20 Qim (a.u.)

40

60

80

-80 -60 -40 -20 0 20 Qim (a.u.)

40

60

80

Figure 4: Contour plot of PES along imaginary-frequency mode, sym-stretch and asymstretch motion at saddle point Figure 4 shows two contour plots of potential energy surfaces from PIP, PIP-GP and GP methods. The imaginary-frequency mode, symmetric-stretch and asymmetric-stretch motions at the saddle point are used for the contour plots. As seen, along the imaginaryfrequency mode (umbrella motion), the potential displays a symmetric double well for all three PESs. The potentials calculated from PIP, GP, PIP-GP are graphically virtually superimposable and the contours are smooth. It is worth noting that for the 1000 training set the rms fitting errors for the test set (125 and 39 cm−1 for the GP and PIP fits, respectively) are much larger than the errors in the harmonic frequencies at the stationary points and the barrier height. Clearly, the rms fitting error for the test set is not a good indicator of the quality of the fits for these properties. Perhaps a better indicator is a plot of the energy dependence of the rms fitting error, as we did in 2003 for CH5+ . 4 However, the plots given here of the error vs energy are typically found in the recent literature and so we adopt this way to display the error. One advantage of these plots is that they clearly indicate the presence of “outliers”. These indicate that 17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

there are configurations in the test set that are “far” from the configurations in the training sets and can help the “learning process” in a further iteration of training. Also, while these outliers can contribute heavily to the overall rms fitting error, they may have minimal effect on the harmonic analysis at the minimum and saddle point.

3.2

OCHCO+

OCHCO+ is a five-atom cation that has global minimum structure of C∞v symmetry. Its saddle point structure has the proton equidistant between two CO groups which is of D∞h symmetry. The structures of the global minimum and saddle point are shown in Figure 1. We reported a semi-global potential energy surface of OCHCO+ using the PIP method. 21 This PES was constructed with 8613 energies calculated at the CCSD(T)-F12/aVTZ level of theory and the fit was done with a maximum total polynomial order of 5 (904 coefficients). 21 This PES was used in rigorous diffusion Monte Carlo and full dimensional vibrational configuration interaction calculations, which demonstrated the important delocalization behavior of the proton. Here we use this dataset to construct PESs with GP, PIP-GP and PIP approaches. For this case, we selected 7800 data points from the full data set between proton transfer saddle point and one minimum. Unlike H3 O+ , where the two minima are related exactly by an umbrella inversion, in this case, the permutational symmety of the atoms is needed be describe the two minima. Thus, this case demonstrates the failure of the GP approach but the success of the PIP-GP one. There are 10 internuclear distances, and thus Morse variables, of OCHCO+ and 16 fundamental invariant polynomials which are given elsewhere. 40 We transformed the internuclear distance into a set of Morse variables in the same expression as in H3 O+ case. The fundamental invariants were constructed using Morse variables. In both GP and PIP-GP approaches, we included the noise term in the covariance matrix. The rms errors for training data, indicated by the numbers after the dash for the type 18

ACS Paragon Plus Environment

Page 18 of 48

Page 19 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

of fit and 7800 test data points are reported in Table 3. The times to perform training and 100,000 potential calls are also listed. Table 3: The rms error (in cm−1 ) on the training and test set, the timing (in seconds) for training and 100,000 potential evaluations of PESs for OCHCO+

GP-520 GP-780 GP-1560 GP-2600 PIP-GP-520 PIP-GP-780 PIP-GP-1560 PIP-GP-2600 PIP-520 (323) PIP-780 (323) PIP-1560 (904) PIP-2600 (904) PIP-8613 (904)

rms(train)

rms(test)

t(train)

t(evaluate)

7.65 7.33 6.31 5.30 8.27 7.74 6.36 5.48 3.88 7.09 4.4 0.7 0.4

36.65 24.08 13.37 11.00 32.09 26.62 15.78 15.55 259.18 293.53 128.65 97.40 0.4

121 289 1070 5094 271 652 2044 8599 0.03 0.05 0.29 0.49 3.07

4.08 6.03 11.81 19.36 5.14 7.62 14.99 25.52 0.2 0.2 0.36 0.36 0.36

The number of polynomials in the PIP basis is listed in parentheses. Since we added noise terms to the kernel function, the predicted energies do not exactly go through the training ab initio data. Thus, the fitting error of GP and PIP-GP approach is very close for both training and test data. Also, as more training data is used, the test rms error decreases, as expected. In this example, the rms on the test data is smaller using GP and PIP-GP than PIP with the opposite trend seen for the rms on the training data. As to the computational cost in training, as seen, the timing increases nonlinearly with the size of training data and it took 2.38 hours to train the PIP-GP model using the largest training set. However, for the PIP approach, the fitting time is only 3.07 s using the full 8613 ab initio data set. It should also be noted that the training time using PIP-GP method is almost twice than the time using GP method. The increase of time is mainly due to the dimensionality of the input variables, where 16 invariant polynomials are used rather than 10 internuclear distance. For the time of PES evaluation, using two GP approaches, the evaluation time is much more than the PIP-PES, where 100,000 potential calls only cost 19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

0.36 s. So in the case of OCHCO+ , the PIP approach is more computationally efficient than the GP approach for both training and prediction process. This is mainly due to relatively small number and simplicity of the polynomials in the PIP PES compared to the complexity of the GP and PIP-GP evaluations. Using 1560 training data for further analyses, we plot the difference of prediction and ab initio energies based on 7800 test points, shown in Figure 5. As seen, and in contrast to the situation for H3 O+ , there are more outliers in the PIP than the PIP-GP predictions. As a result, we see in Table 3, that using only 1560 training data, the PIP approach has much larger rms error for the test set comparing with PIP-GP approach. The PES from PIP-GP approach presents better results with much less outliers. It might be of interest to delve into the cause/source of these outliers; however, as we show below all the PESs trained with the 1560 dataset produce accurate results for the properties examined here. So, the outliers (and the rms fitting errors of roughly 30 and 259 cm−1 , for the GP and PIP PESs, respectively) evidently have a small impact on these properties.

Figure 5: Difference between predicted energies and ab initio energies for OCHCO+ using validation data. Each point in the figure represents one data point We optimized the structure of global minimum and saddle point using different PESs. As stated above, when generating the training data, we only include geometries where the proton is closer to one CO group. Thus, only one global minimum structure is included in the training data and we choose that minimum structure to optimize. The harmonic frequencies 20

ACS Paragon Plus Environment

Page 20 of 48

Page 21 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

are calculated at optimized structures and given in Table 4. Table 4: Harmonic frequencies and barrier height (in cm−1 ) of OCHCO+ from different PESs (using 1560 training data) and direct ab initio CCSD(T)-F12b/aVTZ calculations Minimum

Saddle point

GP

PIP-GP

PIP-PES

ab initio

GP

PIP-GP

PIP-PES

ab initio

130 130 200 268 268 1127 1127 1749 2260 2489

134 134 203 271 271 1130 1130 1759 2258 2490 GP 366.5

131 131 202 269 269 1130 1130 1752 2256 2488

132 132 202 270 270 1131 1131 1752 2255 2487 PIP-GP 384.0

993i 136 136 285 285 393 1270 1270 2306 2326

848i 152 152 293 293 394 1274 1274 2291 2328 PIP-PES 382.8

858i 143 143 291 291 394 1273 1273 2287 2326

859i 144 144 291 291 394 1276 1276 2292 2324 ab initio 393.6

barrier

Using 1560 training data, both GP and PIP-GP predict accurate harmonic frequencies of the chosen global minimum. The PIP-GP frequencies are slightly improved, compared with GP ones. Different from the harmonic frequencies of H3 O+ , GP provides the correct degeneracy for the three bending modes. This is a direct (and trivial) consequence of getting the correct linear structures of a single minimum and the saddle point. The advantage of PIP-GP approach appears in frequencies of the saddle point. High-level ab initio calculation shows that there exists an imaginary mode at 859 cm−1 . This motion is the proton transfer stretch and, using GP PES, the calculated frequency, 993 cm−1 , is far from the benchmark one. However, when PIP-GP approach is used the predicted frequencies are improved and are much closer to ab initio results. The PIP PES with this training set produces marginally the best results in comparison with the ab initio ones. The last test is the relaxed potential curve connecting the two equivalent minima and saddle point, as a function of the imaginary-frequency normal coordinate, Qim , and denoted V (Qim ). This path is shown in Figure 6. The GP and PIP-GP results are obtained using 21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1560 training data. Clearly, the GP PES is an extrapolation into the other well and cannot reproduce it, since the training data does not include any information in that well. After introducing permutationally invariant polynomials as input vector, the issue gets almost perfectly solved such that the PIP-GP and PIP potentials are virtually superimposable. 1500 PIP-PES PIP-GP GP 1000

V (cm-1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 48

500

0

-500 -40

-30

-20

-10

0 10 Q (a.u.)

20

30

40

Figure 6: V(Qim) potential of OCHCO+ using different PESs

3.3

H2 CO/HCOH

The third assessment focuses on the capability of GP and PIP to represent a PES that contains several distinct minima and saddle points. The isomerization of formaldehyde (H2 CO) to cis and trans-hydroxycarbene (HCOH) and the saddle points between cis- and trans-HCOH are used in this test. Recently, GP was applied to H2 CO, but focused for the global minimum configuration. 41 In that work, GP and NN were used to refit a quartic force field of the formaldehyde minimum, 42 and the accuracy of the two methods was compared by computing the vibrational energies. The training data sets were “symmetry unique” and permutation of the H-atoms was enforced by choosing distances depending on the HH-CO dihedral angle. Clearly, describing the high-energy isomers HCOH, global minimum and high-energy saddle points is a much more complex problem. For this case both PIP and 22

ACS Paragon Plus Environment

Page 23 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

PIP-GP are used. An energy schematic diagram of the three minima and two saddle points is shown in Figure 7. As one can see, even without considering the dissociation of H2 CO, the PES is complex. A PIP PES, including the dissociation saddle point and exit channel to the H2+CO products and the H+HCO products was recently reported by Wang et al., 23 using 67,193 energies, calculated at MRCI+Q/VTZ level of theory. For the application here, which is just to the stationary points indicated in Figure 7, points in these dissociation regions were eliminated and this resulted in a data set of 34,750 energies.

Figure 7: Energy schematic diagram of the stationary points for the H2 CO isomerization. The values are potential energies in cm−1 . A histogram of the energy distribution of these points is shown in Fig. 8. One can see that the points span a large energy range, and most of the points are clustered at about 20,000 cm−1 due to the cis- and trans-HCOH isomers in this range. These points were further separated into 5 regions based on their Euclidean distance to the 5 stationary geometries. For each subset, the data were sorted in increasing energy and one was picked every n

23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8: Histograms of the H2 CO data sets. points. In this test, we considered n = 7, 4, 2, and added a few additional points (about tens of them) that were randomly sampled in high-energy region for better coverage. Since the 34,750 points have been sorted according to the energy, the training set obtained basically has the same energy distribution as the one shown in Fig. 8, and therefore it is not plotted. The final three training sets contain 5104, 8703, and 17,383 points, respectively. For each training set, points that are not included form the corresponding test set. This approach of selecting training points from the full set, while quite effective for this particular problem, is not sophisticated and optimal, and there is clearly room for advanced sampling methods. Also, the smallest data set is significantly larger than the smallest data sets considered in the two previous case studies. In this test, we applied the PIP-GP method, using the 7 fundamental invariants fi for A2 BC-type molecules, which are available on Github. 40 The FIs were constructed using all the internuclear distances, and then they were transformed to Morse variables, as the input for the GP model: xi = exp(−fi /a), where a = 3.0 ˚ A. This is different from the approach applied for H3 O+ and OCHCO+ , where the internuclear distances were first transformed to the Morse variables and then symmetrized to get the FIs. We have tested that the fitting 24

ACS Paragon Plus Environment

Page 24 of 48

Page 25 of 48

error only has insignificant dependency on these two approaches. The squared-exponential kernel function with 7 length-scale parameters (one for each fundamental invariant) was employed. A small noise term, whose upper limit is 5 × 10−4 , was added to the diagonal elements of the kernel matrix, and this noise-level parameter was also optimized in the training process. For comparison, PIP fits were also done for the three training sets, as well as for all the data (34,750 points), using a maximum polynomial order of 8 or 9, with 1589 or 2625 linear coefficients, respectively. Table 5: The rms error (in cm−1 ) on the training and test set, and the timing (in seconds) for training and 100,000 potential evaluations, for different fits for H2 CO/HCOH. rms(train)

rms(test)

t(train)

t(evaluate)

63 57 58 119 139 145 105

238 229 143 432 322 239 NA

6716 22135 108124 4.4 9.0 17.8 93.4

30 52 103 0.4 0.4 0.4 0.6

PIP-GP-5104 PIP-GP-8703 PIP-GP-17383 PIP-5104 (1589) PIP-8703 (1589) PIP-17383 (1589) PIP-34750 (2625)

The number of polynomials in the PIP basis is listed in parentheses.

8000

8000 PIP-GP-17383

Epred - Eai (cm-1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

PIP-17383

6000

6000

4000

4000

2000

2000

0

0

-2000

-2000

-4000

-4000

-6000

-6000 0

10000 20000 30000 40000 50000

0

10000 20000 30000 40000 50000

-1

Ab initio energy (cm-1)

Ab initio energy (cm )

Figure 9: Difference between predicted and ab initio energies for H2 CO/HCOH on the test set. Each point in the figure represents one data point.

25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 48

Table 5 shows the rms error on the training and test sets, as well as the computer time for the training and potential calls. Due to the complexity of the PES, both training and test errors are roughly one order of magnitude larger than the previous test cases, and PIPGP performs slightly better than our PIP fit in terms of rms error. However, the fit using polynomial order of 9 with the full dataset of 34750 points produces the smallest rms error. Of course, there are not “test” data in this case. However, in order to be consistent with the PIP-GP fits we use the training data of 17383 for the PIP fit and the tests of properties of the PESs. With respect to timing, as in the previous test cases, the PIP-GP training times are substantially greater than the PIP ones. This is partially due to the efficiency of the programing language (Fortran is more efficient than Python), but mostly due to the scaling of the nonlinear optimization of the GP method. As in the two previous cases, the evaluation times of the PIP-GP PES are much larger than the PIP PES. Again this is due to the smaller number of polynomials in the PIP PES compared to the training data size and also differences in the complexity of the PIP-GP and PIP representations. Figure 9 shows the difference between the predicted potentials for H2 CO/HCOH and the ab initio energies for the test set. One can see that the PIP-GP has slightly fewer outliers, and that’s why it has smaller test error than the PIP method. Table 6: Energies of the five stationary points (in cm−1 ) obtained from different potential fits, and comparison with MRCI+Q/VTZ calculation.

PIP-GP-5104 PIP-GP-8703 PIP-GP-17383 PIP-5104 PIP-8703 PIP-17383 PIP-34750 MRCI+Q/VTZ

H2 CO

trans-HCOH

cis-HCOH

TS1

TS2

0 0 0 0 0 0 0 0

18185 18195 18201 18181 18184 18192 18205 18190

19811 19797 19798 19788 19798 19784 19815 19803

28662 28676 28674 28671 28691 28676 28670 28657

30117 30124 30126 30122 30122 30106 30135 30124

26

ACS Paragon Plus Environment

Page 27 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 7: Harmonic frequencies (in cm−1 ) of the five stationary points from indicated PESs and direct MRCI+Q/VTZ calculations. PIP-GP-5104

PIP-GP-17383

PIP-5104

PIP-17383

PIP-34750

MRCI+Q

1196 1274 1528 1772 2950 2982

1191 1274 1556 1775 2944 2973

1199 1274 1562 1764 2927 2987

1196 1270 1565 1769 2930 2986

1191 1281 1561 1765 2934 2977

1201 1284 1555 1782 2945 3040

1080 1189 1324 1536 2850 3850

1067 1197 1327 1509 2847 3837

1142 1247 1346 1533 2811 3873

1152 1242 1304 1530 2806 3803

1101 1225 1321 1494 2829 3783

1102 1224 1333 1522 2857 3753

1056 1214 1300 1419 2753 3612

1024 1231 1324 1459 2791 3736

1065 1154 1279 1417 2856 3627

1042 1272 1296 1414 2824 3590

963 1175 1315 1448 2808 3666

1028 1249 1338 1492 2765 3662

1367i 695 1035 1243 2613 3861

1393i 564 1083 1177 2618 3771

1328i 511 1065 1262 2698 3732

1324i 526 1100 1199 2598 3464

1372i 646 1100 1333 2545 3721

1501i 780 1185 1410 2816 3858

2141i 730 1232 1468 2715 2932

2186i 758 1264 1467 2598 2862

2140i 551 1219 1458 2795 2954

2146i 652 1223 1417 2787 3089

2102i 648 1291 1435 2538 2872

2168i 759 1310 1419 2623 2980

H2 CO

trans-

cis-

TS1

TS2

27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 6 shows the energies of the five stationary points on the PES, relative to the global minimum. All the fits reproduce the ab initio energies very well, and the differences never exceed 40 cm−1 . The harmonic frequencies are listed in Table 7. Due to the complexity of the PES, the agreement between the fits and ab initio frequencies is not as good as the other tests presented in this work. Nevertheless, the frequencies can be reproduced reasonably well using the fits, and PIP-GP performs slightly better than PIP fits when the same training data is used. Note in particular, that the frequencies at the global minimum are generally more accurate than those at the two high-energy minima and saddle-point transition states. This case, although for only a tetraatomic, is a (small) example of a rugged PES and clearly it presents challenges for both the GP and PIP approaches. The properties of the PIP PES do improve with the largest data set of 34750; however, this data size is not practical to consider with the current full PIP-GP approach. We return to this case in the Discussion section as it does, in our opinion, offer a important challenge to both the PIP-GP and PIP methods.

3.4

Formic acid dimer

The first three assessments involve molecules that contain 4 and 5 atoms. Here we consider a much larger system and one with a large order symmetric group, the cyclic formic acid dimer (FAD). FAD contains 10 atoms, and therefore has 45 internuclear distances and 24 vibrational degrees of freedom. It is currently the largest system that has a full-dimensional PES using the PIP approach. Here we apply the full GP method to it and compare the GP and PIP fits in terms of fitting error, optimized geometries, harmonic frequencies, barrier for the double-proton transfer, and the computational cost. We recently reported a full-dimensional PES (and dipole moment surface) for FAD. This is a PIP fit to 13475 configurations and corresponding CCSD(T)-F12b/haTZ electronic energies. The details of data sampling and the PIP fit can be found in our early publications; 22,43 however, we note that configurations that span both equivalent minima and the saddle points 28

ACS Paragon Plus Environment

Page 28 of 48

Page 29 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

between are included in the data set. The full permutational symmetry group of FAD is of order 4!4!2! = 1152, and this was used in the PIP basis for the fit. This does result in the most compact basis; however, the polynomials are relatively complex, compared to those for small systems. We investigated using FIs for FAD and, the number of FIs in this case is prohibitively large to use in the PIP-GP approach. Even considering the lowest order symmetry, which only requires the permutation of two HCOO groups (without the two hydrogen-bonded H) the number of FIs is 235. Using an input vector of order 235 becomes impractical in GP fitting, so for FAD, we simply replicate the data, by permuting the two HCOO groups. This is feasible in this case, but is not a general strategy for the GP approach. For the GP fit, we prepared two training sets of size 2695 and 4491 from the full (13,475) data set, by first sorting the original data in increasing energy and picking one every 5 or 3 points. Then we replicated the data by permuting the positions of the two HCOO groups while keeping the two H-bonded hydrogen unchanged. The resulting two data sets for the GP fit contain 5390 and 8982 points, respectively, which are small enough for the GP approach to be feasible. We used all (45) inverse internuclear distances as the vector x. The squared-exponential kernel function with only one length-scale parameter was applied, because using one lengthscale for each component of x would greatly increase the computational cost of the training. A small noise term (< 10−4 ) was added to the diagonal elements of the kernel matrix. Clearly, the approach of replicating the data for the GP training will result in a correct description of the double well and double-proton saddle point. However, the the PIP approach does not need nor benefit from replication. In fact, it doubles the effort of the fit with no benefit. So to make a fair comparison for the same size (5390 and 8982) data sets, we obtained 5390 and 8982 points from the original full data. The rms fitting errors of both GP and PIP fits were tested on all the points below 25,000 cm−1 , and this test set consists of 12,844 points.

29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

Table 8 shows the rms errors on the training and test sets, the timing for the training and 100,000 potential calls, and the barrier heights of the double-proton transfer, from indicated fits. The last entry is the PIP fit using all the data. Due to the small noise term added to the kernel matrix, the training errors of GP are not zero, but they are small, i.e., 7–8 cm−1 . The training errors of the PIP fits are larger, as seen. The rms errors for the full data set are nearly the same for all fits. All the fits reproduce the barrier height of the double-proton transfer very well, compared to the ab initio value of 2853 cm−1 . Table 8: The rms error (in cm−1 ) on the training and test set, the timing (in seconds) for training and 100,000 potential evaluations, and the proton-transfer barrier (in cm−1 ) of different fits for the formic acid dimer.

GP-5390 GP-8982 PIP-5390 PIP-8982 PIP-13475

rms(train)

rms(test)

t(train)

t(evaluate)

Barrier

7 8 42 57 74

107 53 82 58 NA

6538 13523 16 20 42

34 57 29 29 29

2853 2853 2850 2849 2848

Figure 10 shows fitting error on the test set for the two fits using 8982 points. The fitting errors are similar, and consistent with the overall rms on the test set. 1500

1500 GP-8982

Epred - Eai (cm-1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 48

PIP-8982

1000

1000

500

500

0

0

-500

-500

-1000

-1000 0

5000 10000 15000 20000 25000

0

5000 10000 15000 20000 25000

-1

Ab initio energy (cm-1)

Ab initio energy (cm )

Figure 10: The difference between the predicted and the ab initio energies of FAD on the test set. Each point in the figure represents one data point.

30

ACS Paragon Plus Environment

Page 31 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

In terms of the computational complexity, as shown in Table 8, GP is much slower in training due to the scaling, in consistency with the previous cases. In contrast to the previous cases, however, the cost of potential evaluations is virtually the same for all the fits, although as expected the GP cost scales linearly with the size of dataset and the PIP cost is constant for three data sets since the number of fitting coefficients is the same. This near equality is really a coincidence, but it does illustrate that the computational cost of the PIP fit depends on the complexity and the number of the polynomials used for the fit. In this case the complexity is large owing to the large order of the symmetric group. Table 9: Harmonic frequencies (in cm−1 ) of the formic acid dimer from the indicated PESs and direct CCSD(T)/aug-cc-pVQZ calculations. 44 Minimum

Saddle point

GP-8982 PIP-8982 PIP-13475 CCSD(T)/aVQZ 70 167 175 210 253 274 689 715 950 979 1083 1098 1257 1265 1404 1413 1450 1479 1719 1786 3105 3112 3242 3314 a

70 168 171 209 255 276 693 717 953 971 1087 1104 1257 1258 1400 1406 1443 1483 1716 1777 3089 3101 3227 3329

70 167 170 209 254 275 693 716 956 970 1084 1100 1255 1258 1406 1408 1448 1481 1715 1780 3095 3097 3232 3326

a

72 167 177 209 255 275 684 713 963 986 1079 1099 1252 1256 1405 1409 1455 1481 1713 1777 3099 3103 3204 3306

Ref. 44 31

ACS Paragon Plus Environment

GP-8982 PIP-8982 PIP-PES 1364i 80 224 228 309 517 595 747 802 1081 1088 1237 1311 1358 1407 1412 1414 1415 1573 1678 1748 1765 3124 3127

1355i 81 219 228 319 512 591 745 813 1069 1080 1248 1339 1389 1398 1403 1407 1414 1588 1690 1743 1748 3097 3107

1355i 80 219 226 317 514 592 744 814 1065 1079 1241 1341 1395 1397 1400 1404 1408 1604 1691 1743 1749 3101 3106

Journal of Chemical Theory and Computation

Next, consider the predicted harmonic frequencies of the minimum and saddle points, given in Table 9 for the larger data set. As seen that both GP and PIP PESs can reproduce the harmonic frequencies very well. Based on the rms error on the test set and the harmonic frequencies, GP fits have overall similar performance as the PIP ones. We also give benchmark CCSD(T) results of Miliordos and Xantheas at the minimum. The excellent agreement with these is one indicator of the quality of these PESs. 1500 PIP-PES GP-8982

1000 500 Potential (cm-1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 48

0 -500 -1000 -1500 -2000 -2500 -3000 -100

-50

0 Qim value (a.u.)

50

100

Figure 11: (a) The V (Qim ) path of FAD, obtained from two different fits (GP-10107 and PIP-PES); (b) 1D wavefunction for the GP-10107 potential curve shown in (a), calculated using DVR. Next, we show that by replicating the training data, the GP model can describe the double proton transfer correctly. The potential energy surface for the double proton transfer in FAD has a symmetric double-well feature, with two equivalent C2h minima connected by a D2h saddle points. With PIP approach, this symmetry in PES is guaranteed. When the GP model is trained using the replicated data, this symmetry is also reproduced. In Figure 11, we plot the V (Qim ) curve, which is the minimum energy path along the imaginary-frequency mode of the saddle point, by relaxing the remaining 3N − 7 modes. The V (Qim ) curve from

32

ACS Paragon Plus Environment

Page 33 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the PIP PES is perfectly symmetric, and the GP-8982 one almost overlaps with the PIP PES. For the GP-8982 fit, the energy difference between V (+Q) and V (−Q) is less than 0.05 cm−1 , so the potential curve could be considered as completely symmetric. In addition, We calculated the splitting of this 1D potential curves, using the DVR method. For PIP PES, the 1D splitting is 0.44 cm−1 , 22 while for GP-8982, this splitting is 0.45 cm−1 , in perfect agreement with the PIP PES. To summarize this last case study, we showed that replicating data by permuting the two HCOO groups enabled the GP approach to succeed and provide a precise representation of the PES of FAD. However, replicating the data significantly slows down the speed for potential evaluation. Furthermore, as noted in the Introduction, this procedure is not general one, since the replicating data by factors of more than two can quickly overwhelm the computational cost of GP and even PIP training. Nevertheless, given the sizeable increase in the dimensionality of FAD compared to the other cases, it is gratifying to demonstrate that data set of less than 10,000 are sufficient to obtain PESs that accurately reproduce the harmonic frequencies of the minima and saddle point, barrier height and the “Qim ” reaction path. That this can be achieved with the GP approach is perhaps expected, given previous studies showing the efficiency of the method for small data sets. 10,14 However, we also demonstrated that the PIP approach can produce PESs of comparable precision with the same small data sets. (Whether even smaller data sets can achieve comparable accuracy may be of interest for future work.)

4

Discussion

To begin this section, we recall the differences in the PIP and the full GP and PIP-GP approaches considered here as well as their implementations to obtain precise representations of datasets of electronic energies. The full GP approach uses training, in the language of machine learning. This requires

33

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the inverse of a kernel matrix, and so the computational effort scales as O(N 3 ), where N is the size of the training set. So practically, N is limited to around 104 , and ideally much smaller. For non-zero σnoise the resulting GP does not exactly reproduce the training data set and so there is an rms error for the training data. Typically, as here, there is a test data set, where the trained GP is evaluated to determine the precision of the GP representation of that dataset. Ultimately it is the precision of the GP representation of the large data set that is of primary importance, since that indicates the likely precision of the GP for an arbitrary input. The GP evaluation for an arbitrary input is O(N ). This describes the scaling of the computational effort of the GP routines contained in the open source scikit-learn we have used. The value of N ranged from a few hundred to a few thousand in the study here. Other, more efficient options are possible for training and we discuss these briefly below. The computational effort of training in the PIP approach, which is based on linear leastsquares optimization, is much less than full GP, for a given size dataset. Thus, the PIP approach can be trained on larger datasets, i.e., tens of thousands or more, at virtually trivial cost. For example, the optimization of the PIP approach for data sets of the order of 105 have been done for many molecules and chemical reactions. 3 Thus, the size of the training data set is not really a limitation for the PIP approach. Of course, testing the PIP fit on new data is important and this is typically done. However, these new data are routinely incorporated into a new fit. Nevertheless, here the GP and PIP approaches were trained on the same size data sets to make a consistent comparisons of the two approaches. The PIP approach performed comparably to the GP and PIP-GP approaches, even for data sizes as small as 520 for the five atom OCHCO+ cation. This had not been demonstrated before. Consider next the evaluation costs of the PIP and the full GP approaches. These depend on very different parameters of these approaches. GP depends linearly on the data size. For PIP this cost scales with size of the basis, which depends in a fairly complicated (although calculable way via the Molien series) way on the symmetric group and the maximum total

34

ACS Paragon Plus Environment

Page 34 of 48

Page 35 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

polynomial order. Examples of this are given elsewhere, here we have given this number for the specific cases. In general, this number is much less than the data set size, to avoid overfitting. Still, as indicated here, the size of the PIP basis varies from several hundred to roughly two thousand. Both GP and PIP have important pre-factors to consider as well. In GP it is the complexity of the kernel and in PIP it is the complexity of the polynomials. For the case studies here, with the exception of FAD, the modest number of polynomials and their relative simplicity results in a very fast evaluation of the PIP representation, relative to the GP evaluation. FAD is an important exception. In this case the GP and PIP evaluation times are similar. For FAD, with 45 variables, the number of terms in the PIP basis is 1784 and the polynomials are complex and thus expensive to evaluate and so the cpu time is considerably longer than for the other examples. However, with this PIP basis a larger data size would not result in an increase in computational effort but the GP evaluation would of course. Next, consider the computational effort of the PIP-GP approach. In general, it is larger than the effort of GP because the number of variables, i.e., the fundamental invariants, is larger than the number of internuclear distances. The exact number of FIs depends in a complex way on the number of atoms and the symmetric group. In general, for a given number of atoms the number of FIs increases with the order of the symmetric group, e.g., see ref. 18. The OCHCO+ case clearly showed this increase, although it was modest because the symmetric group is only of order 4 and there are 16 FIs compared to 10 internuclear distances. For FAD, the PIP-GP approach is not practical, even using a maximally reduced symmetric group of order 2 (which can describe the symmetry of the double well) as there are 235 FIs (which are polynomials in the internuclear distances) compared to 45 internuclear distances and so the computational effort would increase by roughly an order of magnitude. For this reason, we resorted to the expedient of replicating the data. Clearly for any higher order symmetric group the PIP-GP approach would be infeasible as would replicating data. (By contrast, in the PIP approach the full symmetric group, of order 1152, is used.)

35

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Given that the computational bottleneck of the full GP approach is the data size, there is a major effort by the developers and chemical physics users of GP to deal with this. So-called “sparsification” is a powerful approach. 7,30–32 In this approach a large training set is made sparse to obtain an approximate kernel matrix that can be trained much more efficiently and also where the evaluation time scales linearly with the size of the sparse data set. A very recent demonstration of this can be found in ref. 45, where a different version of GP fitting, developed by Cs´anyi and co-workers, 5,7,33 than used here was performed on water 2-body and 3-body electronic energies. The sparsified data sets were roughly 5 000 in size, compared to initial full data sets of roughly 42,000 and 12,000, which were reduced in a first step to 9000 and 10,000, respectively for the 2 and 3-body cases. Excellent results were obtained with this approach. Krems and co-workers are actively working on methods to greatly reduce the size of the data set in the full GP approach. 8,9,41 For example, in very recent work for the OH+H2 reaction, a data set of only 290 energies was shown to be sufficient to obtain the J = 0 cumulative reaction probability (CRP). This data set was determined via an iterative approach in which the exact CRP was in hand and used to guide the generation of the data for the GP fit. Nevertheless, the final result clearly indicates that such a small number of data with GP can produce accurate scattering results. The approach we took here to generate small datasets was to start with relatively sparse data set and to simply reduce them further by eliminating data while preserving the energy distribution of the data. It would of be interesting to test the above methods to systematically further reduce the datasets used here and so that is a possible future research project for us or others. To conclude this discussion, we make some remarks about PESs that describe chemical reactions. One property of such PESs is that they should separate into non-interacting fragments. This separability is not exact in either the PIP or GP/PIP-GP approaches, as illustrated in the following AB + CD example. For convenience, we set y1 = exp(−rAB /a) and y2 = exp(−rCD /a). For non-interacting fragments (the distance between the two fragments is large), the total potential of AB + CD

36

ACS Paragon Plus Environment

Page 36 of 48

Page 37 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

should be the sum of the potentials of the two fragment, i.e.,

V (y1 , y2 ) = V (y1 ) + V (y2 ).

(9)

The other four Morse variables, y3 –y6 , are all zero at the non-interacting limit, so they are omitted in the above equation. In the PIP approach, however, due to cross terms y1a y2b (with non-zero a and b) in the fitting basis, Equation 9 cannot be satisfied. For the GP (and PIP-GP) approach, Equation 9 is true only if the kernel function is separable: (i)

(i)

(j)

(i)

(j)

(j)

(i)

(j)

k({y1 , y2 }, {y1 , y2 }) = k(y1 , y1 ) + k(y2 , y2 ),

(10)

where the superscripts (i) and (j) indicates two different geometries. But the kernels commonly used, such as the squared-exponential kernel applied in this work, does not have such property. This rigorous separability condition is worth further investigation for the GP approach. For the PIP method based on monomial symmetrization there is a remedy to restore this exact separability. For the AB + CD example above, one can simply remove all the cross terms y1a y2b for non-zero a and b, and the resulting fitting basis are rigorously separable in fragment coordinates. This has been done by Truhlar and co-workers for fitting a PES of N4 , 46 and by us for the PESs of methane-water interactions. 47,48 The “purification” can result in reduction of the fitting basis by only around 10 percent terms, so the fitting accuracy is basically not sacrificed. By contrast, for the current PIP-GP this is still problematic. First of all, the input vector for these two methods use PIPs (or FIs), that may already be nonseparable. Furthermore, even in regular GP, where the input vector is completely separable because each entry is just one internuclear distance, the kernel function in GP can mix these distances and make the final potential inseparable in fragment coordinates. Therefore, it will take further work to ensure the exact separability for PIP-GP approach. Another point to raise about fragmentation is the precise representation of the long-range 37

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

interaction of the fragments, which is of the order of wavenumbers or tenths of wavenumbers. This is clearly a challenge for a global PES, and so we proposed a 2-component PES approach that utilizes the purified fitting basis in the long range, 25,26 In this approach the fitting accuracy for the fragments was greatly enhanced. Finally, some comments on the extendability of the GP and PIP-GP approaches to complex chemical reactions. Specifically, consider as examples the unimolecular dissociation of CH3 CHO and the bimolecular chemical reaction O( 3P)+C2 H2 . PIP PESs for both have been reported, using data sets of roughly 150 000 CCSD(T)/aug-cc-pVTZ energies. 49,50 There are numerous minima and saddle points on these PESs and also numerous reaction channels. We certainly don’t review these PESs in detail here; however, the permutational invariance of like atoms is important in order to describe the dynamics and so we ask how PIP-GP (and also PIP-NN) might deal with this. The symmetric group of CH3 CHO is of order 4!2! = 48, which although not large (compared to CH5+ where the symmetric group is of order 120), it is almost certainly too big to use the PIP approach for GP (and NN). The complexity of such reaction systems is a challenge as well for the PIP approach, as illustrated here for the relatively “simple” H2 CO/cis-/trans-HCOH case. In this case both the PIP and PIP-GP PES gave harmonic frequencies, especially for the high-energy isomers and TS barriers, that are substantially less accurate than those for the three other cases. (The energetics of these stationary points, by contrast, were accurately obtained in these PESs.) It seems reasonable that a single, global fit to a surface with chemically distinct multiple minima and saddle points would be less successful than to a surface with a single minimum and/or saddle point. One straightforward approach to deal with this complexity is to develop “local” fits that can accurately describe limited regions of configuration space and then develop the global PES by combining the local ones, using “switching functions”, coordinate-dependent “weights”, etc. This is actually an old idea, perhaps dating from 1986 51 but with more recent major advances in the implementation (“Shepard Interpolation”) by the Collins group. 52 A recent example of this approach using PIP PESs from our group is a 2-component PES, 25,26

38

ACS Paragon Plus Environment

Page 38 of 48

Page 39 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

describing dissociation. In this approach, one component describes the long-range interaction to sub wavenumber precision. Zhang and co-workers have recently reported a 13-component PES for the O( 1D) + CH4 reaction which each component is a NN fit to local data. 53 Clearly, this is a fruitful line of research that can immediately be applied to GP and PIP-GP PESs.

5

Summary and Conclusions

Borrowing ideas to incorporate permutational invariance in neural-network representations of potential energy surfaces, proposed by Guo and co-workers and extended later by Zhang and co-workers, we presented an extension of the GP approach of Krems and co-workers to represent high-dimensional potential energy surfaces to incorporate permutational invariance. The new approach, as well as the standard GP one, and the established permutationally invariant polynomial approach were applied to four case studies, i.e., H3 O+ , OCHCO+ , H2 CO/cis-HCOH/trans-HCOH and (HCOOH)2 , using pre-existing data sets of electronic energies. Overall, the assessments of the PIP and GP/PIP-GP approaches described here are the following. PIP and PIP-GP provide fits of electronic energies for the four and 5-atom systems that are essentially equivalent and of high precision. With respect to extrapolation into regions not well explored by the training data, both approaches display outliers; these are more numerous for training datasets that are roughly an order of magnitude smaller than the test datasets, as expected. However, even with training datasets of the order of only 1000 energies both the PIP and GP and especially PIP-GP do very well in describing properties of the PES, namely, barrier heights and harmonic frequencies at minima and saddle points. Both methods provide reasonable fits even for modest-sized, e.g., several hundred energies datasets. While this has been noted previously for the GP method, 9 it was not for the PIP approach, which routinely uses 10,000 or more electronic energies, owing to the ease in dealing with such large datasets.

39

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

We demonstrated that the overall rms fitting errors for the test set can be much larger than the errors in the harmonic frequencies and at the stationary points and the barrier height. We also noted that although outliers can contribute to the overall rms fitting error, they may have minimal effects on PES properties of interest. With respect to computational effort, the PIP, GP and PIP-GP methods have different bottlenecks. The training in the full GP approach scales non-linearly with the number of data, owing to the effort to obtain the inverse of the GP covariance matrix, and the computational effort to evaluate the GP PES scales linearly with the size of the training data set. These are well-known. The computational effort in the PIP approach arises in different ways. First, the training consists in performing a linear least-squares fit of the the data. This is a fast procedure that depends weakly on the size of the dataset. So, the training step of the PIP approach is negligible; however the size of the training set and the complexity of the PIP representation are (indirectly) related. This is because when the data set is large, a large polynomial order is usually used, so that the fitting error can be reduced. Since the number of terms in the PIP representation scales non-linearly with the total polynomial order, the time to evaluate the PIP PESs can scale non-linearly with the size of the data set. However, if the same polynomial order is used, the cost to evaluate the potential does not depend on the size of the data set. The PIP-GP approach we have suggested is generally slower in both the training and evaluation steps than GP. The main reason is that the input is more complex in that approach. In addition to an increase in size of the input, since these variables are permutationally invariant polynomials, their evaluation is more complex than just Morse variables, for example. With this in mind, the conclusion based on the timing tests for the two 4-atom and one 5-atom case is that GP and PIP-GP is a factor of 10-100 slower to evaluate than PIP for the same size training set. Clearly, this is a large range and the case-by-case details were described above. The conclusions about the 10-atom formic acid dimer deserve special commentary. This system is a major jump in size and permutational complexity than the other cases. The PIP-

40

ACS Paragon Plus Environment

Page 40 of 48

Page 41 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

GP approach for this case is not feasible, as the number of PIP Fundamental Invariants even with the greatest reduction in the order of the symmetric group (2!) is 235. The standard GP approach is successful because the training data could be replicated. However, this is not a general solution as replication of data can quickly lead to data sets of the order of 105 - 106 configurations. Approaches to incorporate permutational symmetry into the GP have been proposed, 7 however, these make use of a special decomposition of the total energy in terms of atomic energies and so are quite different from the approach used here. Another issue to be explored with the GP and PIP-GP approach is the application to PESs that describe reaction channels. As was briefly noted, the fragments in these channels asymptotically become non-interacting and describing this rigorously with the general GP approach has not been demonstrated, to the best of our knowledge. Of course switching functions can be used to switch between a GP fit and non-interacting fragments, however, at first glance at least, this does not appear capable of accurately describing the long-range interaction that may be important in say low energy collisions. Finally, we note that the present data sets are available upon request to the authors.

Acknowledgement The authors thank Roman V. Krems for helpful discussions, especially about the Python code developed by one of co-authors (RAVH). We also thank G´abor Cs´anyi for helpful discussions on numerous aspects of the GP approach. JMB thanks the National Science Foundation (CHE-1463552) and RAVH thanks the Natural Sciences and Engineering Research Council of Canada (NSERC) for financial support.

References (1) Braams, B. J.; Bowman, J. M. Permutationally invariant potential energy surfaces in high dimensionality. Int. Rev. Phys. Chem. 2009, 28, 577–606. 41

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(2) Bowman, J. M.; Czak´o, G.; Fu, B. High-dimensional ab initio potential energy surfaces for reaction dynamics calculations. Phys. Chem. Chem. Phys. 2011, 13, 8094–8111. (3) Qu, C.; Yu, Q.; Bowman, J. M. Permutationally invariant potential energy surfaces. Annu. Rev. Phys. Chem. 2018, 69, 6.1–6.25. (4) Brown, A.; Braams, B. J.; Christoffel, K.; Jin, Z.; Bowman, J. M. Classical and quasiclassical spectral analysis of CH5+ using an ab initio potential energy surface. J. Chem. Phys. 2003, 119, 8790–8793. (5) Bart´ok, A. P.; Payne, M. C.; Kondor, R.; Cs´anyi, G. Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons. Phys. Rev. Lett. 2010, 104, 136403. (6) Bart´ok, A. P.; Kondor, R.; Cs´anyi, G. On representing chemical environments. Phys. Rev. B 2013, 87, 184115. (7) Bart´ok, A. P.; Cs´anyi, G. Gaussian approximation potentials: A brief tutorial introduction. Int. J. Quantum Chem. 2015, 115, 1051–1057. (8) Cui, J.; Krems, R. V. Gaussian Process Model for Collision Dynamics of Complex Molecules. Phys. Rev. Lett. 2015, 115, 1–5. (9) Cui, J.; Krems, R. V. Efficient non-parametric fitting of potential energy surfaces for polyatomic molecules with Gaussian processes. J. Phys. B: At. Mol. Opt. Phys. 2016, 49 . (10) Guan, Y.; Yang, S.; Zhang, D. H. Construction of reactive potential energy surfaces with Gaussian process regression: Active data selection. Mol. Phys. 2017, 116, 823–834. (11) Uteva, E.; Graham, R. S.; Wilkinson, R. D.; Wheatley, R. J. Interpolation of intermolecular potentials using Gaussian processes. J. Chem. Phys. 2017, 147, 161706.

42

ACS Paragon Plus Environment

Page 42 of 48

Page 43 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(12) Kolb, B.; Marshall, P.; Zhao, B.; Jiang, B.; Guo, H. Representing Global Reactive Potential Energy Surfaces Using Gaussian Processes. J. Phys. Chem. A 2017, 121, 2552–2557. (13) Rasmussen, C. E.; Williams, C. K. I. Gaussian processes for machine learning; the MIT Press, 2006. (14) Vargas-Hern´andez, R. A.; Guan, Y.; Zhang, D. H.; Krems, R. V. A machine-learning approach for the inverse scattering problem in quantum reaction dynamics. ArXiv eprints arXiv:1711.06376 2018, (15) Jiang, B.; Guo, H. Permutation Invariant Polynomial Neural Network Approach to Fitting Potential Energy Surfaces. J. Chem. Phys. 2013, 139, 054112. (16) Li, J.; Jiang, B.; Guo, H. Permutation invariant polynomial neural network approach to fitting potential energy surfaces. II. Four-atom systems. J. Chem. Phys. 2013, 139, 204103. (17) Jiang, B.; Li, J.; Guo, H. Potential energy surfaces from high fidelity fitting of ab initio points: The permutation invariant polynomial - neural network approach. Int. Rev. Phys. Chem. 2016, 35, 479–506. (18) Shao, K.; Chen, J.; Zhao, Z.; Zhang, D. H. Communication: Fitting potential energy surfaces with fundamental invariant neural network. J. Chem. Phys. 2016, 145, 071101. (19) Fu, B.; Zhang, D. H. Ab initio potential energy surfaces and quantum dynamics for polyatomic bimolecular reactions. J. Chem. Theory Comput. 2018, 14, 2289–2303. (20) Yu, Q.; Bowman, J. M. Ab initio potential for H3 O+ → H+ + H2 O: A step to a many-body representation of the hydrated proton? J. Chem. Theory Comput. 2016, 12, 5284–5290.

43

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(21) Fortenberry, R. C.; Yu, Q.; Mancini, J. S.; Bowman, J. M.; Lee, T. J.; Crawford, T. D.; Klemperer, W. F.; Francisco, J. S. Communication: Spectroscopic consequences of proton delocalization in OCHCO+ . J. Chem. Phys. 2015, 143, 071102. (22) Qu, C.; Bowman, J. M. An ab Initio Potential Energy Surface for the Formic Acid Dimer: Zero-point Energy, Selected Anharmonic Fundamental Energies, and Groundstate Tunneling Splitting Calculated in Relaxed 1–4-mode Subspaces. Phys. Chem. Chem. Phys. 2016, 18, 24835. (23) Wang, X.; Houston, P. L.; Bowman, J. M. A new (multi-reference configuration interaction) potential energy surface for H2 CO and preliminary studies of roaming. Phil. Trans. R. Soc. A 2017, 375, 20160194. (24) Huang, X.; Braams, B. J.; Bowman, J. M. Ab initio potential energy and dipole moment surfaces for H5 O2+ . J. Chem. Phys. 2005, 122, 044308. (25) Wang, Q.; Bowman, J. M. Two-component, ab initio potential energy surface for CO2 H2 O, extension to the hydrate clathrate, CO2 @(H2 O)20 , and VSCF/VCI vibrational analyses of both. J. Chem. Phys. 2017, 147, 161714. (26) Yang, B.; Zhang, P.; Qu, C.; Wang, X. H.; Stancil, P. C.; Bowman, J. M.; Balakrishnan, N.; McLaughlin, B. M.; Forrey, R. C. Full-Dimensional Quantum Dynamics of SiO in Collision with H2 . J. Phys. Chem. A. 2018, 122, 1511–1520. (27) Xie, Z.; Bowman, J. M. Permutationally Invariant Polynomial Basis for Molecular Energy Surface Fitting via Monomial Symmetrization. J. Chem. Theory Comput. 2010, 6, 26–34. (28) Wang, Q.;

Qu, C. MSA Software. 2016;

https://github.com/Kee-Wang/

PES-Fitting-MSA.

44

ACS Paragon Plus Environment

Page 44 of 48

Page 45 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(29) Derksen, H.; Kemper, G. Computational Invariant Theory; Springer Verlag: Berlin, Heidelberg, New York, 2002. (30) Csat´o, L.; Opper, M. Sparse On-Line Gaussian Processes. Neural Comput. 2002, 14, 641–668. (31) Qui˜ nonero Candela, J.; Rasmussen, C. E. A Unifying View of Sparse Approximate Gaussian Process Regression. J. Mach. Learn. Res. 2005, 6, 1939–1959. (32) Snelson, E.; Ghahramani, Z. In Advances in Neural Information Processing Systems 18 ; Weiss, Y., Sch¨olkopf, B., Platt, J. C., Eds.; MIT Press, 2006; pp 1257–1264. (33) Bart´ok, A. P.; Gillan, M. J.; Manby, F. R.; Cs´anyi, G. Machine-learning approach for one- and two-body corrections to density functional theory: Applications to molecular and condensed water. Phys. Rev. B 2013, 88, 054104. (34) Grisafi, A.; Wilkins, D. M.; Cs´anyi, G.; Ceriotti, M. Symmetry-Adapted MachineLearning for Tensorial Properties of Atomistic Systems. Phys. Rev. Lett. 2017, 120, 36002. (35) Behler, J. Constructing high-dimensional neural network potentials: A tutorial review. Int. J. Quantum Chem. 2015, 115, 1032–1050. (36) Decker, W.; Greuel, G.-M.; Pfister, G.; Sch¨onemann, H. Singular 4-1-1 — A computer algebra system for polynomial computations. http://www.singular.uni-kl.de, 2018. (37) Opalka, D.; Domcke, W. Interpolation of multi-sheeted multi-dimensional potentialenergy surfaces via a linear optimization procedure. The Journal of Chemical Physics 2013, 138, 224103. (38) Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Courna-

45

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

peau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. (39) Bosma, W.; Cannon, J.; Playoust, C. Efficient implementation of high dimensional model representations. J. Comp. Symb. Comp. 1997, 24, 235–245. (40) Shao, K. Fundamental Invariants. 2016; https://github.com/kjshao/FI. (41) Kamath, A.; Vargas-Hern´andez, R. A.; Krems, R.; Carrington, T.; Manzhos, S. Neural networks vs Gaussian process regression for representing potential energy surfaces: A comparative study of fit quality and vibrational spectrum accuracy. J. Chem. Phys. 2018, 148, 241702. (42) Carter, S.; Handy, N. C.; Demaison, J. The rotational levels of the ground vibrational state of formaldehyde. Mol. Phys. 1997, 90, 729–737. (43) Qu, C.; Bowman, J. M. High-dimensional fitting of sparse datasets of CCSD(T) electronic energies and MP2 dipole moments, illustrated for the formic acid dimer and its complex IR spectrum. J. Chem. Phys. 2018, 148, 241713. (44) Miliordos, E.; Xantheas, S. S. On the Validity of the Basis Set Superposition Error and Complete Basis Set Limit Extrapolations for the Binding Energy of the Formic Acid Dimer. J. Chem. Phys. 2015, 142, 094311. (45) Nguyen, T. T.; Sz´ekely, E.; Imbalzano, G.; Behler, J.; Cs´anyi, G.; Ceriotti, M.; G¨otz, A. W.; Paesani, F. Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions. J. Chem. Phys. 2018, 148, 241725. (46) Paukku, Y.; Yang, K. R.; Varga, Z.; Truhlar, D. G. Global ab initio ground-state potential energy surface of N4 . J. Chem. Phys. 2013, 139, 044309.

46

ACS Paragon Plus Environment

Page 46 of 48

Page 47 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(47) Qu, C.; Conte, R.; Houston, P. L.; Bowman, J. M. “Plug and Play” Full-Dimensional ab Initio Potential Energy and Dipole Moment Surfaces and Anharmonic Vibrational Analysis for CH4 −H2 O. Phys. Chem. Chem. Phys. 2015, 17, 8172. (48) Conte, R.; Qu, C.; Bowman, J. M. Permutationally Invariant Fitting of Many-Body, Noncovalent Interactions with Application to Three-Body Methane-Water-Water. J. Chem. Theory Comp. 2015, 11, 1631–1638. (49) Heazlewood, B. R.; Jordan, M. J. T.; Kable, S. H.; Selby, T. M.; Osborn, D. L.; Shepler, B. C.; Braams, B. J.; Bowman, J. M. Roaming is the dominant mechanism for molecular products in acetaldehyde photodissociation. Proc. Natl. Acad. Sci. U.S.A 2008, 105, 12719–12724. (50) Fu, B.; Han, Y.-C.; Bowman, J. M.; Angelucci, L.; Balucani, N.; Leonori, F.; Casavecchia, P. Intersystem crossing and dynamics in O( 3P) + C2 H4 multichannel reaction: Experiment validates theory. Proc. Natl. Acad. Sci. U.S.A 2012, 109, 9733–9738. (51) Bowman, J. M.; Bittman, J. S.; Harding, L. B. Ab initio calculations of electronic and vibrational energies of HCO and HOC. J. Chem. Phys. 1986, 85, 911–921. (52) Collins, M. A. Molecular potential-energy surfaces for chemical reaction dynamics. Theor. Chem. Acc. 2002, 108, 313–324. (53) Guan, Y.; Yang, S.; Zhang, D. H. Application of Clustering Algorithms to Partitioning Configuration Space in Fitting Reactive Potential Energy Surfaces. J. Phys. Chem. A 2018, 122, 3140–3147.

47

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

PIP-GP PES

PIP PES

ACS Paragon Plus Environment

Page 48 of 48