Searching the Force Field Electrostatic Multipole ... - ACS Publications

Feb 29, 2016 - Searching the Force Field Electrostatic Multipole Parameter Space. Sofie Jakobsen and Frank Jensen*. Department of Chemistry, Aarhus ...
1 downloads 0 Views 2MB Size
Article pubs.acs.org/JCTC

Searching the Force Field Electrostatic Multipole Parameter Space Sofie Jakobsen and Frank Jensen* Department of Chemistry, Aarhus University, DK-8000 Aarhus, Denmark S Supporting Information *

ABSTRACT: We show by tensor decomposition analyses that the molecular electrostatic potential for amino acid peptide models has an effective rank less than twice the number of atoms. This rank indicates the number of parameters that can be derived from the electrostatic potential in a statistically significant way. Using this as a guideline, we investigate different strategies for deriving a reduced set of atomic charges, dipoles, and quadrupoles capable of reproducing the reference electrostatic potential with a low error. A full combinatorial search of selected parameter subspaces for N-methylacetamide and a cysteine peptide model indicates that there are many different parameter sets capable of providing errors close to that of the global minimum. Among the different reduced multipole parameter sets that have low errors, there is consensus that atoms involved in π-bonding require higher order multipole moments. The possible correlation between multipole parameters is investigated by exhaustive searches of combinations of up to four parameters distributed in all possible ways on all possible atomic sites. These analyses show that there is no advantage in considering combinations of multipoles compared to a simple approach where the importance of each multipole moment is evaluated sequentially. When combined with possible weighting factors related to the computational efficiency of each type of multipole moment, this may provide a systematic strategy for determining a computational efficient representation of the electrostatic component in force field calculations.



INTRODUCTION Molecular dynamics simulations of large biomolecular systems rely on computationally efficient parametrized energy functions for calculating the intra- and intermolecular interactions.1,2 The total energy is written as a sum of bonded and nonbonded terms, where the latter normally is partitioned into an electrostatic and a van der Waals component. The combination of the mathematical functions for each energy term and the associated parameters is denoted a force field. The computationally most efficient description of the electrostatic energy is based on a Coulomb interaction between partial atomic charges, while the van der Waals interaction typically is parametrized by a Lennard-Jones potential. The accuracy of the electrostatic interaction can be improved by including either higher order atomic multipole moments3−5 or additional point charges located at nonatomic positions, as for example in the TIP5P water model6 or in the OPLS-AAx force field for describing the sigma hole in halogenated compounds.7 The advantage of off-atomic partial charges is that the force field model remains within the point charge framework, which is computationally efficient and requires no new implementations in standard simulation software, but has the disadvantage that it is difficult to derive a systematic improvement in the accuracy. The advantage of including higher order atomic multipoles is that they provide a systematic way of increasing the accuracy © 2016 American Chemical Society

but carry the disadvantage that new types of electrostatic interactions must be implemented in the computational codes, and they are computationally more demanding than interactions between partial charges. Both strategies improve the accuracy by extending the number of electrostatic parameters, but we have in previous work shown that the parameter space rapidly becomes near-redundant.8 The redundancy is present already in models employing a partial charge for each atomic site, and ∼1/4 of the charges can consequently be removed with only a small loss in accuracy for molecules with 20−40 atoms. The parameter space in a model with partial charges, dipole, and quadrupole moments on all atomic sites is so redundant that ∼1/2 of the parameters can be removed with essentially no loss of accuracy. These results were obtained by a procedure where electrostatic parameters were added or removed sequentially, but this may be suboptimum for rigorously establishing the minimum number of parameters for a given target accuracy and for selecting the optimum combination of charge, dipole, and quadrupole parameters. In the present work we elaborate on the last two points by assessing the inherent dimension of the electrostatic parameter Received: December 15, 2015 Published: February 29, 2016 1824

DOI: 10.1021/acs.jctc.5b01187 J. Chem. Theory Comput. 2016, 12, 1824−1832

Article

Journal of Chemical Theory and Computation

Figure 1. Illustration of the decomposition of a third-order tensor, where the λ weighting factors are absorbed into the vectors for simplicity.

followed by a sequential removal of parameters until a target value of 10% above the limiting value was reached. For our reference data set consisting of ∼1000 conformations of 20 amino acid peptide models, this was able to consistently reduce the number of multipoles to ∼40% of the full set of electrostatic parameters.8 In the present paper we present a more thorough analysis of the effective rank of the electrostatic potential using tensor decomposition. We furthermore test whether the internal correlation between multipoles is important for the selection of a subset of nonredundant multipoles by applying a full combinatorial search of small parameter subsets. The conclusion is that there is a very large degree of freedom in selecting a reduced set of multipoles capable of reproducing the ̈ approach accuracy of a full multipole model and that the naive of sequentially adding or removing single parameters is sufficient for identifying a near-optimum set of parameters.

space and investigate how it can efficiently be spanned by atomic centered multipole moments up to quadrupoles. The goal of the present work is to systematically improve the representation of the molecular electrostatic potential (ESP), which is responsible for the long-range interactions in a force field description of molecular systems. The ESP will depend on the molecular conformation and intermolecular interactions, and this geometry dependence will necessitate inclusion of polarization effects. In order to avoid complications arising from adding polarization parameters to an already redundant set of electrostatic parameters, we have elected to focus on the somewhat simpler problem of reproducing the ESP for a single conformation of a given molecular system and address the geometry dependence at a later stage. Given that there may be many different sets of electrostatic parameters capable of achieving the same target accuracy, the objective is to identify the one with the lowest computational cost and preferably by a method that can be applied to any system with only minimal user involvement. The computational cost relates to both the number of interaction terms and the complexity of each term, and the former increases significantly when higher order multipole moments are introduced. The interaction between two atoms described with multipoles up to quadrupoles requires 9 × 9 = 81 terms, compared to a single term for calculating the Coulomb interaction between two atomic point charges. Charge−charge interactions only depend on the distance between the two sites, but interactions between higher order moments in addition depend on their relative orientations, which leads to computationally more demanding expressions. These factors make multipole force fields significantly more demanding compared to standard fixed charge force fields and have been a barrier for the application of such models. Our previous work showed that ∼1/2 of the atomic charge, dipole, and quadrupole moments are near-redundant,8 implying that significant computational resources are spent without a gain in accuracy. The parameter redundancy is present already at the atomic point charge level, where it causes problems of rank deficiency in the fitting algorithm.9−13 As a step toward designing computational efficient force fields with systematic improved accuracies, it would be useful to have a strategy for identifying and eliminating near-redundant parameters. In the previous work we addressed the redundancy problem for the charge-only model by sequentially removing charge parameters and monitoring the error of the resulting optimum fit to a reference ESP. This top-down elimination could not be used for a model with charge, dipole, and quadrupole parameters on all atomic sites, as any parameter could be removed without increasing the error function in the first of many eliminations rounds. We instead determined a reduced set of electrostatic parameters by a two-stage procedure, where dipole and quadrupole parameters were added sequentially to a model with atomic charges on all sites until the error function was within 5% of the limiting value,



METHODS AND RESULTS Tensor Decomposition. The problem of near-redundancy in the electrostatic parameter space makes it desirable to develop a method for selecting a subset of nonredundant parameters, which (optimally) span the full multipole parameter space. As a first step we assess a lower limit for the number of nonredundant parameters, i.e., the dimensionality of the parameter space. Since the electrostatic parameters aim at reproducing a reference ESP, this is equal to the number of parameters that can be extracted from the ESP in a statistically significant way, and this can be estimated using tensor decomposition.14−16 A tensor is defined as an Ndimensional array, i.e., a data structure with N indices. The 3dimensional arrangement of the ESP surrounding a molecule can be considered a third-order tensor. In general, a tensor of order N can be decomposed into a weighted sum of vector outer products, where each outer product uses N vectors (tensors of order one). For a third order tensor, ? ∈ I × J × K , the decomposition can be written as R

?≈

∑ λr ar ⊗ br ⊗ cr r

(1)

Here, λr is a weight factor, and ar ∈ I , br ∈ J , and cr ∈ K are the decomposing vectors for r = 1, ..., R. The decomposition is schematically illustrated in Figure 1. The rank of a tensor ? , denoted R? , is defined as the smallest number of vector outer products that generates ? as their sum to within a predefined threshold. The rank of a tensor can be estimated by generating decompositions of increasing rank, until the difference between the decomposed and the original tensor is converged below a predefined limit. We have in the present work used the canonical decomposition/parallel factors method, abbreviated CP-decomposition. The ESP of cysteine capped with acetyl and methyl amide groups (denoted CYS) have been calculated at the ωB97X1825

DOI: 10.1021/acs.jctc.5b01187 J. Chem. Theory Comput. 2016, 12, 1824−1832

Article

Journal of Chemical Theory and Computation

Figure 2. Convergence of the tensor error as a function of the decomposition rank. Vertical lines correspond to one and two times the number of atoms, respectively.

D17/aug-pc-118 level using the cubegen facility in Gaussian09.19 The ESP is calculated in a cubic grid containing a total of 800,000 points with a grid spacing of 0.20 a0 (100, 100, and 80 points respectively in the x-, y-, and z-dimensions). This ensures that all points within 5.2 a0 of any atom are included in the box. At the surface of the box the electron density has a maximum value of 1.8 × 10−5 e/a03. Grid points within the van der Waals radius of any atom are set to zero in the tensor decomposition analysis, since it is irrelevant to analyze the potential close to the nuclei in a force field context. This leaves approximately 675,000 points in the ESP tensor. Tensor decompositions are constructed for CYS by an alternating least-squares algorithm20 using CP-ALS from the MATLAB Tensor Toolbox.21,22 Figure 2 shows the normalized errors, ε, for decompositions in the rank range R from 1 to 100. The errors are calculated as the Frobenius norm between the original (; ) and the decomposed (; CP R ) tensors and normalized by the norm of the original tensor as shown in eq 2 ε=

||; − ; CP R || ||;||

parameters consisting of atomic and off-center charges only (i.e., no multipole moments); however, finding the optimum positions for the off-center charges is likely to be a complicated nonlinear multiminima optimization problem.23 A potential viable strategy could be to find an optimum set of atomic multipoles and in a subsequent step model each of these multipoles by a limited set of off-center charges in order to achieve a better computational efficiency.24,25 Such improvements may be considered at a later stage. Atomic point charges and multipoles are not uniquely defined, and they can therefore be determined in various ways. In a force field context the purpose of point charges is to generate accurate electrostatic interaction energies, meaning that the model should aim at reproducing the true ESP for relevant intermolecular distances. The distributed multipole analysis (DMA) developed by Stone26 provides a route to derive a finite number of multipoles capable of reproducing the ESP exactly, but the finite number is unfortunately several orders of magnitude larger than the number of atoms. A popular approach is to limit the multipole expansion to at most quadrupoles and only located at atomic positions. The truncation of the multipole expansion and limiting the expansion points necessarily implies a loss of accuracy, but a significant part of the accuracy can be recovered by f itting the values of the multipole moments to the actual ESP, rather than deriving them from the wave function, as in the DMA approach. Fitting is the approach taken in the present work, and the parameters are obtained by minimizing an error function defined as the squared deviation between the quantum mechanical ESP (ϕQM) and the ESP approximated by a set of multipoles (ϕapp), as shown in eq 3

(2)

Figure 2 shows that the error is reduced to ∼15% for a rank corresponding to the number of atoms and is converged to within a few percent around twice the number of atoms. These ranks indicate how many independent parameters that can be extracted from the ESP data in a statistically significant way for a given threshold and serve as lower bounds for the number of parameters. The actual number of parameters that is needed can be larger due to constraints in the electrostatic model, such as using an atom-based multipole expansion. Essentially the same rank convergence as shown in Figure 2 has been found for other capped amino acids as well. Multipole Analysis. The results in the previous section indicate that a set of ∼NAtom optimum parameters should be sufficient for reproducing the molecular ESP with a ∼15% error, and ∼2NAtom parameters are sufficient for reducing the error to within a few percent. This can be compared to the 3NAtom parameters in a full multipole model with a charge, dipole, and quadrupole on each atomic site, indicating that approximately half of these parameters are redundant. The question we address in this section is how to select a nonredundant set of parameters that simultaneously are capable of a good accuracy and are computationally efficient. A search within a parameter set of multipole moments on atomic sites is a combinatorial problem, which will be analyzed in the following. Alternatively, a search could be made for a set of

χ2 =

1 P

P

∑ [ϕQM(rp) − ϕapp(rp)]2 p=1

(3)

As in our previous work,8 the ESPs are sampled in a large number of reference points placed on an isodensity surface corresponding to a value of ρ = 10−3e/a30, which ensure that we capture all the available information.27 The fitting procedure for point charges has often been discussed in the literature10,11,28−33 and can easily be extended to include higher order multipole moments, see ref 8 for details. Even though the fitting of multipoles involves inversion of a matrix with the dimension of the number of independent multipole components and manipulation of a large number of fitting points (on the order of ∼100,000), most of the information needs to be calculated only once which makes the actual fitting procedure 1826

DOI: 10.1021/acs.jctc.5b01187 J. Chem. Theory Comput. 2016, 12, 1824−1832

Article

Journal of Chemical Theory and Computation

independent quadrupole components are optimum fitted individually. The parameter space of atomic point charges and dipoles for NMA is of dimension 24, and selecting the optimum set of 12 parameters requires searching K24,12 = 2,704,156 combinations. These fits provide RMS-errors in the range from 3.89 × 10−3 to 21.27 × 10−3 Eh/e with the distribution shown in Figure 4a. The RMS-error for the best 12-parameter solution can be compared to the value of 3.23 × 10−3 Eh/e when using the full set of 24 point charges/dipoles. Removing half of the parameters thus increases the error by (only) 20%. Including the quadrupoles results in a total of 36 parameters, which is sufficiently large that restrictions must be put on the possible value of k for full combinatorial searches. Selecting the optimum 8-parameter solution requires searching K36,8 = 30,260,340 combinations which provides the distribution shown in Figure 4b. The RMS-error for the best 8-parameter solution is 4.21 × 10−3 Eh/e, which can be compared to the value of 3.14 × 10−3 Eh/e for the full set of 36 parameters. The best 8-parameter solution thus has errors of ∼35% compared to the limiting accuracy, which is consistent with the rank-8 tensor decomposition value in Figure 2. This in turn indicates that atomic multipoles up to quadrupoles represent a useful set of parameters for expanding the representation of the ESP. Figures 4a and 4b show that there are a large number of solutions that provide a low RMS-error at both the dipole and quadrupole level. In Figure 4a there are 106 fits with an RMSerror within 5% of the global minimum (below 4.08 × 10−3 Eh/ e) and 737 fits with an RMS-error within 10% of the global minimum (below 4.28 × 10−3 Eh/e). The top-1% (27,041 fits) of all solutions has an RMS-error below 4.72 × 10−3 Eh/e. For the full combinatorial search of the quadrupole parameter space in Figure 4b, there are 55 and 1129 fits with RMS-error within 5% and 10% of the global minimum, respectively, while the top1% (302,603 fits) of all solutions has an RMS-error below 5.52 × 10−3 Eh/e. When considering the top-10 fits for both NMA searches (i.e., the 10 fits providing the lowest RMS-errors), there is a clear trend in the selection of multipoles. In the charge/dipole space the parameters are approximately equally distributed between charges and dipoles. For the amide group the nitrogen atom is consistently populated with a dipole, while the CO group is populated by one dipole (on either C or O) and two charges. For the up-to-quadrupole search in Figure 4b, no charges are selected in the top-10 fits, while dipoles and quadrupoles are approximately equally populated. Quadrupoles are found consistently on the carbon, oxygen, and nitrogen atoms of the amide group, except for one of the ten minima, where only the nitrogen and oxygen atoms have quadrupoles, and furthermore a dipole populates the amide carbon in nine of the ten minima. The full combinatorial searches thus clearly indicate that atoms involved in π-systems require higher order multipole moments, as was also observed earlier.8 However, when this criterion is met, there is still significant freedom in selecting parameters for achieving a high accuracy in reproducing a reference ESP. The CYS system has 23 atoms, which is sufficiently large that a full combinatorial multipole search is infeasible for any realistic sized subspaces. Instead all 10-charge solutions among the 23 charge parameters have been determined, which is a total of K23,10 = 1,144,066 combinations requiring approximately 20 h CPU-time and provides the error distribution shown in Figure 4c. The RMS-errors are in general shifted

computationally inexpensive and thus allows probing millions of possible parameter combinations. The goodness of fit is quantified by the root-mean-squared (RMS) deviation from the quantum mechanical ESP sampled at all the fitting points. We have in the present work used an unbiased approach of allowing all types of multipole moments on all atomic sites and determining the optimum value for all parameters by fitting to the reference ESP. In an actual design stage of a new force field, it would be necessary to enforce parameters on atoms that are easily interchanged (e.g., hydrogens on a methyl group) to be identical. One could similarly restrict parameter searches to selected subsets, for example not allowing quadrupole moments on hydrogen. Our present approach is to let the data dictate the optimum set of parameters, rather than a priori enforcing restrictions. Parameter Selection by Full Combinatorial Searches. Our starting point is a single molecular conformation with an optimum set of charges, dipoles, and quadrupoles on all atomic centers, obtained by fitting each individual component to the reference ESP under the total charge constraint, and this will be referred to as a full multipole model. A production type force field will require the dipole and quadrupole moments to be converted into a local coordinate frame in order to handle the conformational dependency, but this is irrelevant for fitting to a single conformation. The conformational dependence will in addition, as already mentioned, require inclusion of polarization. The parameter reduction problem can be formulated as selecting the best k-subset solution from a full set of N parameters at a given k level. This is a combinatorial problem, where the number of possibilities is given by the binomial coefficient, KN,k. The factorial growth restricts full searches to situations where N is relatively small or k is either small or close to N. We will in the following discuss results for two systems: N-methylacetamide (NMA, C3H7NO) with 12 atoms and acetyl-N-methyl amine capped cysteine (CYS, C6H12N2O2S) with 23 atoms, as shown in Figure 3. NMA is chosen because it

Figure 3. Structures of a) NMA and b) CYS model systems. Carbon atoms are black, hydrogen atoms are white, oxygen atoms are red, nitrogen atoms are blue, and the sulfur atom is yellow.

is the smallest compound containing the amide bond, which is a key feature in protein structures, and it contains a sufficiently small number of atoms to allow exhaustive parameter searches. CYS has been selected because it in previous work8 emerged as one of the amino acids where multipoles are particularly important. Essentially identical results have been obtained using an alanine peptide model, and there is no reason to expect a different behavior for the remaining 18 natural amino acids. Each charge, dipole, or quadrupole is considered as one parameter in the following, but each of the three dipole and five 1827

DOI: 10.1021/acs.jctc.5b01187 J. Chem. Theory Comput. 2016, 12, 1824−1832

Article

Journal of Chemical Theory and Computation

Figure 4. Distribution of RMS-errors for a) all 12-parameter solutions among the 24 charges/dipoles of NMA, b) all 8-parameter solutions among the 36 charges/dipoles/quadrupoles of NMA, and c) all 10-parameter solutions among the 23 charges of CYS.

toward larger values compared to the NMA distributions in Figures 4a and 4b, which is a result of not including higher order multipole moments for CYS. The best 10-charge combination for CYS provides an RMS-error of 8.84 × 10−3 Eh/e, compared to an error of 6.88 × 10−3 Eh/e when employing all 23 charges. In the next section, a combinatorial algorithm is presented, which aims at identifying a nearoptimum set of multipoles without performing a complete brute force search. With this method, a 10-charge combination is found that has an RMS-error of 9.32 × 10−3 Eh/e, which is within ∼10% of the global minimum. The top-1% best fits provide errors below 11.56 × 10−3 Eh/e. These numbers combined with a visual inspection of the left-hand side of Figure 4c suggests that there are fewer “good” solutions in the CYS charge parameter search compared to the NMA multipole searches. This is a result of the larger multipole parameter space showing a higher degree of near-redundancy compared to the charge parameter space. The fact that there is an abundance of minima in the parameter space with RMS-error values close to the global minimum implies that it is less critical to find the best reduced subset of parameters than to develop a method for finding a consistent set of reduced parameters that generates an ESP with

a low RMS-error. Hence, less elaborate procedures may be sufficient for identifying a suitable low-error set of parameters, rather than focusing on the parameter set corresponding to the global minimum. The RMS-error from a fit using the full multipole model in any case serves as the limit that can be obtained. Parameter Selection by a Combinatorial Bottom-up Approach. We have in previous work shown that for a multipole parametrization using up to quadrupoles, the number of multipoles can be reduced to ∼40% with only a minor reduction (∼10%) in the ESP accuracy.8 The previous work employed a two-stage procedure starting from a model with charges on all atoms, followed by a sequential addition of the most important dipoles and quadrupoles until the RMS-error was within 5% of the limiting value. In a second stage, parameters were sequentially removed until the error exceeded 10% of the limiting value. The reduction stage primarily eliminates charge parameters. This sequential procedure does not account for possible correlated effects between two or more parameters.34 Given that the parameter space covered by an atomic dipole moment can effectively be covered also by two charge parameters on different atomic sites and an atomic quadrupole moment similarly by four charges at different sites, 1828

DOI: 10.1021/acs.jctc.5b01187 J. Chem. Theory Comput. 2016, 12, 1824−1832

Article

Journal of Chemical Theory and Computation

Figure 5. RMS-errors for different reduced parameter fits upon addition of multipoles for the CYS model. The graphs show which combinations that are considered (doubles, triples, or quadruples) as well as whether quadrupoles are included from the beginning or not (Rank 2 vs Rank 1+).

rather than only 2-, 3-, and 4-parameter fits as in the first iteration. As a preliminary test we assess the dependency on how large combinations that are considered and whether quadrupoles are included from the beginning or at an intermediate stage. The reason for not including quadrupoles from the beginning is to both decrease the number of combinations in the first iterations, which is the bottleneck in these calculations, and to increase the fraction of computational efficient low rank multipoles in the final set. The results are presented in Figure 5, where the RMS-errors for six different runs are compared. “Doubles”, “triples”, and “quadruples” refer to the largest combinations of parameters that are covered (i.e., 2-, 3-, or 4parameter combinations). Quadrupoles are included from the beginning for the runs labeled ‘Rank 2′, while ‘Rank 1+’ runs initially search only the charge and dipole space and include quadrupoles when the RMS-error drops below a threshold of three times the limiting error from the full multipole model. This threshold corresponds to quadrupoles being introduced when roughly 2/3 NAtom parameters are populated. For the ‘Doubles, Rank 1+’ a total of ∼25,000 fits is required for CYS to converge the RMS-error to within 10% of the limiting value. The same procedure for the ‘Quadrupoles, Rank 2’ approach results in a total of ∼13 million fits, and thus nearly 3 orders of magnitude more computer time (an increase from 50 min to 460 h). Figure 5 clearly shows that there is no advantage of including triple or quadruple combinations of parameters. For the reduced multipole sets containing below ∼25 parameters, the Rank 2 method with the quadrupoles included from the beginning is preferable, as a consequence of a more free selection of parameters and a larger ratio of quadrupoles. However, for reduced multipole sets containing more than ∼25 parameters (i.e., ∼ NAtom parameters), which is the interesting region with RMS-errors close to the limiting value, the two approaches are practically identical. Thus, the late introduction of quadrupoles provides a computationally attractive solution for reducing the large number of combinations in the first iterations, while at the same time favoring a larger ratio of low rank parameters.

it is possible that this mutual correlation between the atomic multipoles makes the sequential two-stage approach less than optimum. For CYS we have tested this by a combinatorial bottom-up process where the effect of all possible 2-, 3-, or 4combination of parameters (i.e., any double, triple, or quadruple set of parameters) on the RMS-error is evaluated at each level by fully optimizing the parameters for each combination. Note that all possible 2-, 3-, or 4-combination of parameters include both cases where they are distributed on different atomic sites and cases where two or three parameters are located on the same site. In principle, one could go higher in the combinatorial level, but it turns out not to be necessary. The effect of single parameters is not included in this bottomup approach, since a single charge has a zero-contribute to the ESP when added as the first parameter under the total-chargeconstraint, and therefore charges would be artificially deprioritized when including singles. The full charge, dipole, quadrupole parameter space for CYS has 69 parameters, and searching all possible 2-, 3-, and 4combinations thus requires 919,241 (K69,2 + K69,3 + K69,4) parameter fits in the first iteration. Among these combinations, a subset of solutions with small RMS-errors is selected separately for the doubles, triples, and quadruples. The subset includes the top-100 best fits from each group, unless the total number is smaller than 2000, in which case the top-5% fits have been selected. While this selection is somewhat arbitrary, the idea is to select a small set of low-error combinations. The importance of a given parameter is quantified by how often it is included in the total of ∼300 parameter combinations. The population numbers in this step can be divided by a weighting factor to take into account that higher order multipole parameters are computationally more expensive than low rank multipoles, as discussed below. The most abundant multipole from the ∼300 fits is permanently included in the parameter set, and the next iteration proceeds. Since one multipole moment is now populated, there are only 68 remaining parameters to search among, resulting in fewer combinations and a faster computation time for each subsequent iteration. The values of the permanent parameters are optimized for each fit, such that iteration two involves 3-, 4-, and 5-parameter fits, 1829

DOI: 10.1021/acs.jctc.5b01187 J. Chem. Theory Comput. 2016, 12, 1824−1832

Article

Journal of Chemical Theory and Computation

errors up to ∼NAtom parameters, this is of little interest, as the accuracy here is lower than from a full charge-only model. Beyond ∼NAtom parameters the two-stage approach is capable of finding parameter sets with the same accuracy as the more sophisticated combinatorial models. The final parameter subset of the two-stage approach employs 29 multipoles (∼1.3NAtom) of which 17 are charges, 3 are dipoles, and 9 are quadrupoles. For the combinatorial bottom-up approach which includes parameter correlation, on the other hand, the 29-parameter set has a composition of 0 charges, 17 dipoles, and 12 quadrupoles without any weighting factors (1:1:1); 6 charges, 15 dipoles, and 8 quadrupoles for the 1:3:5 weighting factors; and 19 charges, 8 dipoles, and 2 quadrupoles using weighting factors of 1:9:25. The atomic multipoles for these four reduced multipole models are provided in the Supporting Information, as well as the atomic coordinates for CYS. The large penalty on quadrupoles in the 1:9:25 weighting scheme results in larger RMS-errors compared to the other models, but on the other hand provide a parameter set consisting mostly of charges, and is therefore computationally efficient. It is notable, however, that even the strong favoring of charge parameters by the 1:9:25 weighting results in a significant number of dipole and quadrupole parameters being selected. This indicates that there are components of the ESP that cannot be represented by atomic charges only and that quadrupole moments are required on at least some types of atoms.

The overall goal in the selection procedure is to obtain a given accuracy with as low a computational cost as possible. Since a single quadrupole parameter is computationally more demanding than a single charge parameter, this suggests that among the many low-error parameter sets, a preference should be made for those containing a larger number of low-order multipoles. We have incorporated such a preference by introducing weighting factors to bias the composition of different ranks of multipoles. Using the number of independent components in each type of multipole moment as a guideline suggests weighting factors of 1:3:5 for charges, dipoles, quadrupoles, while the number of interaction between each types of multipole moments suggests weighting factors of 1:9:25. The 1:3:5 weighting scheme is the one used in Figure 5. Other weighting factors based on actual computational time could also be considered, but this will of course depend on the actual implementation. The relative weighting is implemented by dividing the population numbers (i.e., how often a given multipole is selected in the ∼300 “good” fits) with the respective weighting factors depending on the rank. For the 1:3:5 weighting scheme as an example, this implies that a dipole has to be selected 3 times as often as a charge, in order to be found equally important. Figure 6 compares the bottom-up approach using doubles of parameters and introducing quadrupoles during the process for



CONCLUSION The present work provides a systematic investigation of the abundance of minima in the electrostatic force field parameter space. Based on the results from tensor decomposition the dimensionality of the parameter space has been estimated, and this analysis suggests that the number of parameters should not exceed twice the number of atoms. Hence, a multipole parametrization using up to quadrupoles on all atoms (i.e., three multipole parameters per atom) provides a redundant set of parameters, and computational effort is consequently spent without a gain in accuracy. N-methylacetamide is used as a model system for a full combinatorial search of the multiple minima in the parameter space at both the dipole and quadrupole levels. It is found that the parameter space has a large number of low-error minima and that essentially the same accuracy can be obtained in multiple ways. Reducing the number of parameters is therefore not an issue about identifying the best solution but rather about developing an algorithm, which will identify a near-optimum subset of parameters in a consistent and computationally efficient way. The top-10 best solutions suggest that dipoles and quadrupoles are important for atoms involved in π-bonding, in agreement with earlier results.35 This is not a surprising result, but it is satisfying that the algorithm automatically identifies multipole parameters as important for atoms involved in πbonding. Single amino acid systems are often used in the parametrization of a protein force field; however, these systems are too large for full combinatorial searches of the multipole parameter space. We have instead searched small subspaces consisting of up to four parameters for a cysteine peptide model. With this method we identify the overall most important multipoles and test whether correlated effects between multipoles are important when reducing the parameter space. It is found that this combinatorial approach is not ̈ method, where individual multipoles superior to the more naive

Figure 6. RMS-errors for reduced parameter sets for the CYS model comparing different weighting schemes of the correlated parameter method presented in this paper together with the two-stage approach presented in ref 8. The inset shows a close-up in the range from 23 to 40 multipoles. The multipole moments corresponding to a reduced set of 29 multipoles (i.e., the end-point of the two-stage approach) for all four selection algorithms are found in Tables SI.2−SI.5 in the Supporting Information.

three different sets of weighting factors1:1:1, 1:3:5, and 1:9:25, respectively. The previously employed two-stage result is also shown for comparison. In the two-stage approach the selection is not directly biased using weighting factors, but since the method is initiated from a set of charges only, it will automatically provide a bias toward a relatively larger ratio of charges in the final multipole set, as discussed earlier.8 One could consider a weighting scheme where the error reduction by a given parameter is modulated by a weighting factor, and this will increase the dipole to quadrupole ratio. While the combinatorial bottom-up procedures lead to slightly lower 1830

DOI: 10.1021/acs.jctc.5b01187 J. Chem. Theory Comput. 2016, 12, 1824−1832

Article

Journal of Chemical Theory and Computation

(12) Zeng, J.; Duan, L.; Zhang, J. Z. H.; Mei, Y. A numerically stable restrained electrostatic potential charge fitting method. J. Comput. Chem. 2013, 34 (10), 847−853. (13) Woods, R. J.; Khalil, M.; Pell, W.; Moffat, S. H.; Smith, V. H. Derivation of Net Atomic Charges from Molecular Electrostatic Potentials. J. Comput. Chem. 1990, 11 (3), 297−310. (14) Kolda, T. G.; Bader, B. W. Tensor Decompositions and Applications. SIAM Rev. 2009, 51 (3), 455−500. (15) Håstad, J. Tensor rank is NP-complete. J. Algs. 1990, 11 (4), 644−654. (16) Grasedyck, L. Hierarchical Singular Value Decomposition of Tensors. SIAM J. Matrix Anal. Appl. 2010, 31 (4), 2029−2054. (17) Chai, J.-D.; Head-Gordon, M. Long-Range Corrected Hybrid Density Functionals with Damped Atom–Atom Dispersion Corrections. Phys. Chem. Chem. Phys. 2008, 10, 6615−6620. (18) Jensen, F. Polarization consistent basis sets. III. The importance of diffuse functions. J. Chem. Phys. 2002, 117 (20), 9234−9240. (19) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, J. A., Jr.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, J. M.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, Ö .; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox, D. J. Gaussian 09 Revision A.02; Wallingford, CT, 2009. (20) Bader, B. W.; Kolda, T. G. Algorithm 862: MATLAB tensor classes for fast algorithm prototyping. ACM Trans.Math. Softw. 2006, 32 (4), 635−653. (21) MATLAB 2012b; The MathWorks, Inc.: Natick, Massachusetts, United States, 2012. (22) Bader, B. W.; Kolda, T. G.; Sun, J.; Acar, E.; Dunlavy, D. M.; Mørup, M.; Mayo, J. R.; Chi, E. C. MATLAB Tensor Toolbox Version 2.6; The MathWorks, Inc.: 2015. (23) Karamertzanis, P. G.; Pantelides, C. C. Optimal Site Charge Models for Molecular Electrostatic Potentials. Mol. Simul. 2004, 30 (7), 413−436. (24) Anandakrishnan, R.; Baker, C.; Izadi, S.; Onufriev, A. V. Point Charges Optimally Placed to Represent the Multipole Expansion of Charge Distributions. PLoS One 2013, 8 (7), e67715. (25) Devereux, M.; Raghunathan, S.; Fedorov, D. G.; Meuwly, M. A Novel, Computationally Efficient Multipolar Model Employing Distributed Charges for Molecular Dynamics Simulations. J. Chem. Theory Comput. 2014, 10 (10), 4229−4241. (26) Stone, A. J.; Alderton, M. Distributed multipole analysis Methods and applications. Mol. Phys. 2002, 100 (1), 221−233. (27) Tsiper, E. V.; Burke, K. Rules for Minimal Atomic Multipole Expansion of Molecular Fields. J. Chem. Phys. 2004, 120, 1153−1156. (28) Singh, U. C.; Kollman, P. A. An Approach to Computing Electrostatic Charges for Molecules. J. Comput. Chem. 1984, 5 (2), 129−145. (29) Besler, B. H.; Merz, K. M.; Kollman, P. A. Atomic Charges Derived from Semiempirical Methods. J. Comput. Chem. 1990, 11 (4), 431−439. (30) Bayly, C. I.; Cieplak, P.; Cornell, W.; Kollman, P. A. A WellBehaved Electrostatic Potential Based Method using Charge Restraints for Deriving Atomic Charges: the RESP Model. J. Phys. Chem. 1993, 97 (40), 10269−10280. (31) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Kollmann, P. A. Application of RESP Charges to Calculate Conformational Energies, Hydrogen Bond Energies, and Free Energies of Solvation. J. Am. Chem. Soc. 1993, 115 (21), 9620−9631.

are evaluated based on their ability to reproduce a reference ESP. Hence, it is not necessary to take into account the correlated effect of various multipoles, when selecting a subset of nonredundant parameters, and the two-stage method presented earlier seems to be sufficient for finding a subset of electrostatic parameters that provide a low RMS-error. If combined with a weighting scheme to account for differences in the computational efficiency, this may provide a consistent strategy for selecting a reduced set of atomic multipoles capable of representing a reference electrostatic potential to a given target accuracy.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.5b01187. Atomic coordinates and multipole moments for different fitting procedures for the CYS peptide model (PDF)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.

■ ■

ACKNOWLEDGMENTS Support from the Danish Natural Science Research Council is gratefully acknowledged. REFERENCES

(1) Cramer, C. J. Essentials of Computational Chemistry; Wiley: Chichester, U.K., 2004. (2) Jensen, F. Introduction to Computational Chemistry; Wiley: Chichester, U.K., 2007. (3) Kramer, C.; Gedeck, P.; Meuwly, M. Atomic Multipoles: Electrostatic Potential Fit, Local Reference Axis Systems and Conformational Dependence. J. Comput. Chem. 2012, 33, 1673−1688. (4) Shi, Y.; Xia, Z.; Zhang, J.; Best, R.; Wu, C.; Ponder, J. W.; Ren, P. Polarizable Atomic Multipole-Based AMOEBA Force Field for Proteins. J. Chem. Theory Comput. 2013, 9 (9), 4046−4063. (5) Jakobsen, S.; Kristensen, K.; Jensen, F. Electrostatic Potential of Insulin: Exploring the Limitations of Density Functional Theory and Force Field Methods. J. Chem. Theory Comput. 2013, 9 (0), 3978− 3985. (6) Mahoney, M. W.; Jorgensen, W. L. A five-site model for liquid water and the reproduction of the density anomaly by rigid, nonpolarizable potential functions. J. Chem. Phys. 2000, 112 (20), 8910−8922. (7) Jorgensen, W. L.; Schyman, P. Treatment of Halogen Bonding in the OPLS-AA Force Field: Application to Potent Anti-HIV Agents. J. Chem. Theory Comput. 2012, 8 (10), 3895−3901. (8) Jakobsen, S.; Jensen, F. Systematic Improvement of PotentialDerived Atomic Multipoles and Redundancy of the Electrostatic Parameter Space. J. Chem. Theory Comput. 2014, 10 (12), 5493−5504. (9) Francl, M. M.; Carey, C.; Chirlian, L. E.; Gange, D. M. Charges Fit to Electrostatic Potentials. II. Can Atomic Charges be Unambiguously Fit to Electrostatic Potentials? J. Comput. Chem. 1996, 17 (3), 367−383. (10) Sigfridsson, E.; Ryde, U. Comparison of Methods for Deriving Atomic Charges from the Electrostatic Potential and Moments. J. Comput. Chem. 1998, 19 (4), 377−395. (11) Francl, M. M.; Chirlian, L. E. The Pluses and Minuses of Mapping Atomic Charges to Electrostatic Potentials. In Rev. Comput. Chem.; John Wiley and Sons, Inc.: New York, 2000; Vol. 14, pp 1−31. 1831

DOI: 10.1021/acs.jctc.5b01187 J. Chem. Theory Comput. 2016, 12, 1824−1832

Article

Journal of Chemical Theory and Computation (32) Chirlian, L. E.; Francl, M. M. Atomic Charges Derived from Electrostatic Potentials: A Detailed Study. J. Comput. Chem. 1987, 8 (6), 894−905. (33) Breneman, C. M.; Wiberg, K. B. Determining Atom-Centered Monopoles from Molecular Electrostatic Potentials. The Need for High Sampling Density in Formamide Conformational Analysis. J. Comput. Chem. 1990, 11 (3), 361−373. (34) Sokalski, W. A.; Shibata, M.; Rein, R.; Ornstein, R. L. Cumulative atomic multipole moments complement any atomic charge model to obtain more accurate electrostatic properties. J. Comput. Chem. 1992, 13 (7), 883−887. (35) Koch, U.; Popelier, P. L. A.; Stone, A. J. Conformational dependence of atomic multipole moments. Chem. Phys. Lett. 1995, 238 (4−6), 253−260.

1832

DOI: 10.1021/acs.jctc.5b01187 J. Chem. Theory Comput. 2016, 12, 1824−1832