Assessing Force Field Potential Energy Function Accuracy via a

2 days ago - Industry initiatives can help women's careers, survey shows. Efforts by companies to encourage women in science, technology, engineering,...
0 downloads 0 Views 1MB Size
Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE

Computational Chemistry

Assessing Force Field Potential Energy Function Accuracy via a Multipolar Description of Atomic Electrostatic Interactions in RNA Yongna Yuan, Zhuangzhuang zhang, Matthew J. L. Mills, Rongjing Hu, and Ruisheng Zhang J. Chem. Inf. Model., Just Accepted Manuscript • Publication Date (Web): 26 Oct 2018 Downloaded from http://pubs.acs.org on October 26, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Assessing Force Field Potential Energy Function Accuracy via a Multipolar Description of Atomic Electrostatic Interactions in RNA Yongna Yuan*, †, Zhuangzhuang Zhang†, Matthew J. L. Mills‡, Rongjing Hu*, †, and Ruisheng Zhang† † School of Information Science & Engineering, Lanzhou University, Lanzhou, Gansu 730000, China * Corresponding Authors Yongna Yuan: [email protected] Rongjing Hu: [email protected] ‡ Present Address: 3M Corporate Research Analytical Laboratory, Saint Paul, Minnesota, United States

Abstract Computational investigations of RNA properties often rely on a molecular mechanical approach to defining molecular potential energy. Force fields for RNA typically employ on a point charge model of electrostatics, which does not provide a realistic quantum-mechanical picture. In reality, electron distributions around nuclei are not spherically symmetric, and are geometrydependent. A multipole expansion method which allows incorporation of polarizability and anisotropy in a force field is described, and its applicability to modelling the behavior of RNA molecules is investigated. Transferability of the model, critical for force field development, is also investigated.

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 48

Introduction Ribonucleic acids (RNAs) play many key roles in organisms beyond simply acting as a carrier of genetic information. These molecules are known to be accountable for a wide variety of functions in biology, and more and more such functions are still being discovered. For example, RNA molecules catalyze reactions1– 3, participate in post-translational gene regulation4, control protein localization5, and guide post-transcriptional 6,7 modification . The typical RNA macromolecule is a single-stranded, polynucleotide chain composed of adenine (A), cytosine (C), guanine (G), and uracil (U) nucleotide units, a nucleotide being composed of phosphate, pentose (sugar), and base moieties. The set of phosphate and pentose moieties of each nucleotide make up the backbone, and the bases are also commonly referred to as side chains. RNA backbone conformation and atom-atom interactions are crucial for RNA catalysis, drug and aptamer binding, and for protein/RNA interactions. All-atom conformational detail is difficult to achieve for the backbone, but it is a result much to be desired. Molecules interacting with RNA see the backbone in full atomic detail, whether or not experimental techniques are able to. Due to the complex variability of the numerous torsion angles per moiety, accurately locating the backbone can be challenging in X-ray or NMR structure determination of RNA molecules8, and computational methods can be turned to for insight. Computational methods - including molecular dynamics (MD) simulations and their associated empirical potential energy functions (typically force fields) - are therefore important in the atomistically detailed investigation of the structure-function relationships of RNA9. Although the first MD simulations of proteins were published in the late 1970s10–14 the first restrained simulations of nucleic acids did not appear until the mid-1980s15,16. An important factor in determining the precision of results computed via MD simulations is how well the force field (and its set of parameters) represents the underlying physical reality. There are many and varied force fields commonly used for simulations of RNA, including the class I Amber17 and CHARMM18 potentials. In these force fields, the potential energy of a system is defined to include bonded terms for interactions between atoms that are closely linked by covalent bonds (i.e. separated by 4 or fewer bonds), and non-bonded terms that describe the electrostatic and van der Waals interactions between more distant pairs of atoms (usually with a 1,4 and higher relationship). Complex nucleic acid

ACS Paragon Plus Environment

2

Page 3 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

systems are dominated by intermolecular interactions - especially the electrostatic interaction - so herein focus is on the latter, which plays a pivotal role in the modeling of RNA structure and dynamics. Although a range of MD simulations of RNAs have been performed using Amber or CHARMM force fields, few simulation results are satisfactory. Currently, the improvements are mostly based on adjusting the dihedral parameters, such as α/β19, ε/ζ20, χ21, etc. At present, the majority of popular force fields calculate the electrostatic energy of a system through fixed, atom-centered point charges. The connection to physical reality of the electrostatic term in typical force fields is therefore lacking because atomic electron densities are, in reality, anisotropic and are polarizable. Isotropic, fixed point charges provide a poor description of the electronic distribution around an atom, particularly for computation of short to medium range interactions22,23, and while parameterization procedures can produce point charge RNA force fields that perform well24, physical accuracy is still desirable. Instead of atomic point charges, atomic multipole moments (arising from a series expansion detailed later) are applied herein to accurately represent atomic electron densities so that the electrostatic interaction energy between pairs of atoms may be computed reliably. Polarization is handled naturally through solving the Schrodinger equation for each nuclear configuration of interest. The use of atomic multipole moments to calculate electrostatic interaction energies is increasingly common, as is the inclusion of polarization. For example, the AMOEBA force field employs a multipolar electrostatic model with polarization25. A recent polarizable AMBER force field has also been described25. Several groups perform research focused on representation and interaction of multipole moments and have published important contributions to the literature in this field26–33. Multipole moments subsequently are used in molecular dynamics simulations and geometry optimizations22,34–38. In Popelier’s group, the electrostatic potential generated by the atomic multipole moments has been extensively studied23,39–43. In previous work, an extensive investigation of the requirements for the successful use of multipolar electrostatics in the description of proteins was carried out. The results of that work have proven useful in guiding development of the FFLUX44 force field and to independent researchers45–48. The work described herein extends this approach to understand similar question for RNA molecules. This communication focuses on the ability of the described force field to reproduce

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 48

atom-atom electrostatic energies defined via quantum mechanics. The expressions needed to apply the potential within simulations have been derived and implemented in a computer program49 which has been used in a geometry optimization application as proof of concept for future simulations44,50. Future work is planned to apply the potential in RNA simulations.

Background and Methods Quantum Chemical Topology (QCT) Prior to proceeding with calculation of atom-atom interactions, it is necessary to provide a precise definition of an atom and its multipole moments. The atoms discussed in this work are defined via the theory of Quantum Chemical Topology (QCT)51,52, which partitions the electron density 𝜌(𝐫) of a molecule into unique atomic contributions through consideration of its gradient vector field, giving rise to well defined, nonoverlapping atoms53. This field consists of an infinite number of gradient paths, the majority of which originate at 𝑟 = ∞ where 𝜌(𝐫) = 0, and terminate at nuclei where 𝜌(𝐫) reaches a maximum. Atomic regions (termed "basins" and labelled Ω) are separated by surfaces of zero flux in the electron density. The atoms are normally capped by the 𝜌(𝐫) = 1 × 10 ―6 a. u. isodensity surface - this value has been validated through continued use and captures > 98% of the total atomic electron density. Atomic properties can be defined by integration over the 3D space of an atomic basin. For example, the electron density of each individual atom can be integrated over the volume of its atomic subspace to yield an atomic population. Further details of QCT are provided in Bader’s monograph. The Multipole Expansion The description of the electrostatic interactions in a chemical system can be made more realistic by increasing the detail in the description of the electrostatic properties of individual atoms. From such a description will follow a more accurate calculation of the electrostatic interactions between atoms. As noted above, QCT partitions the electron density into discrete regions Ω that can be identified with atoms. The exact Coulomb interaction energy between two such atoms 𝐴 and 𝐵 is then given by a 6-dimensional integral. 𝐸𝐴𝐵 =

∑∑∫ 𝐴 𝐵≠𝐴

𝑑𝐫𝐴 Ω𝐴



𝑑𝐫𝐵

𝜌𝑡𝑜𝑡(𝐫𝐴)𝜌𝑡𝑜𝑡(𝐫𝐵)

Ω𝐵

ACS Paragon Plus Environment

(1)

𝑟𝐴𝐵

4

Page 5 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Here, 𝑟𝐴𝐵 is the distance between the nuclei of atoms 𝐴 and 𝐵, and 𝜌tot is the total ground state molecular charge density, which consists of the sum of the electron density and the nuclear contribution. To reduce computational expense, the internuclear distance 𝑟𝐴𝐵 can be expanded in a Taylor series of 1/|𝑅𝐴𝐵 + (𝑟𝐴 ― 𝑟𝐵)|. Substitution of this expansion into eq 1 results in the following equation for the Coulombic interaction energy54 which may converge to the exact interaction energy depending on the internuclear separation. 𝓁max 𝓁max

𝐸𝐴𝐵 =

𝓁𝐴

∑∑ ∑ ∑ 𝓁𝐴

(2)

𝓁𝐵

𝑚𝐵 𝑄𝐴(Ω𝐴)𝑇𝓁𝓁𝐵𝐴𝑚 𝑄𝐵(Ω𝐵) 𝐴

𝓁𝐵 𝑚 𝐴 = ― 𝓁𝐴 𝑚 𝐵 = ― 𝓁𝐵

where 𝑄𝓁𝐴𝑚𝐴(Ω𝐴)and 𝑄𝓁𝐵𝑚𝐵(Ω𝐵) represent the multipole moments of topological atoms Ω𝐴 and Ω𝐵, respectively, while 𝑇 is the interaction tensor between a pair of atomic multipole moments. Letters 𝓁 and 𝑚 denote the rank and component of multipole moments. The rank of the multipole moment determines the particular spherical harmonic function used in the integration. There are 2𝓁 + 1 independent components 𝑚 for each atomic moment, where 𝓁 is the rank of that moment. For example, the dipole moment is of rank 1 and has 2𝓁 + 1 = 3 components, generally denoted by 𝑥, 𝑦 and 𝑧. An atomic multipole moment is obtained by integrating the electron density multiplied by a spherical harmonic over the 3D volume of the atom in the manner described earlier, avoiding the 6dimensional integration procedure at the expense of potentially introducing convergence issues. The (atom-atom) interaction rank, denoted by 𝐿, measures the number of multipole moments that are used to compute the electrostatic energy between two interacting atoms (i.e. the number of terms retained in eq 2. This quantity is defined by eq 3, 𝐿 = 𝓁𝐴 + 𝓁𝐵 + 1

(3)

where 𝓁𝐴 and 𝓁𝐵 are the ranks of the multipole moments for atoms 𝐴 and 𝐵, respectively. For example, computing energies for interaction rank 𝐿 = 3 requires the calculation of monopole, dipole, and quadrupole moments, and therefore includes 9 ( = 1 + 3 + 5) multipole moments for each atom. Interaction rank 3 means that the interactions evaluated to compute eq 2 consist of monopolemonopole (𝓁𝐴 = 𝓁𝐵 = 0), monopole-dipole (𝓁𝐴 = 0, 𝓁𝐵 = 1) dipole- dipole (𝓁𝐴 = 𝓁𝐵 = 0), and monopole-quadrupole (𝓁𝐴 = 0, 𝓁𝐵 = 2) interactions, the latter 2 types having inverse counterparts (e.g. 𝓁𝐵 = 0 and 𝓁𝐴

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 48

= 1 for monopole-dipole). Previous works have shown that an interaction rank of 𝐿 = 5 results in a satisfactory description of the structural and dynamic characteristics of a system22,35. To compute the Coulomb interaction energy corresponding to 𝐿 = 5 the value of 𝓁max must be set to 4 and requires monopole, dipole, quadrupole, octupole and hexadecapole multipole moments, meaning that the electron density of a single topological atom is described by 1 + 3 + 5 + 7 + 9 = 25 multipole moments. In this work an interaction rank of 10 is employed to generate reference values. The QCT approach to the definition of multipole moments for simulation is not the only possible approach. The DMA method of Stone and coworkers represents a second route55, as used in the AMOEBA force field25. Test System: 2MVY For this work, a 10-nucleotide RNA (PDB code: 2MVY) was selected as a representative system for study56. The particular RNA molecule chosen is expected to have no bearing on the ultimate results, and 2MVY was simply the latest released RNA when this work was initiated. The particular biological role of the RNA is irrelevant to the research described herein and the system will simply be referred to as 2MVY. The structure of 2MVY was originally obtained by NMR experiments and in total 30 molecular conformers were located; only one was deposited in the 2MVY PDB entry56. From the molecular structure, one of the two chains (B) was randomly selected, as the results of the work are also not expected to depend heavily on the exact geometry of 2MVY used. The chosen chain of 2MVY is depicted and its main features are highlighted in Figure 1. All bases were protonated such that the total system was neutral. This avoids difficulty with performing the quantum chemical calculations but ultimately will have to be avoided in future work. 2MVY contains 9 phosphate, 10 pentose and 10 base moieties in total (combining to form 9 nucleotides). The choice to retain a single chain of the double helix was made so that the described work would be computationally tractable. This has the effect of removing both the interchain backbone-backbone and basebase interactions from consideration. A set of interactions representative of the elements and distances present in the complete system is retained, however, and it is assumed that results will apply to double-helix and hairpin systems at medium to long range. It will likely be necessary to investigate shortrange RNA base-base interactions as has been carried out for DNA37. Figure 1 is near here

ACS Paragon Plus Environment

6

Page 7 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Molecular Fragmentation The RNA molecule is too large a chemical system for wavefunction evaluation and the computation of atomic multipole moments to be feasible. Because of this, 2MVY must be cut into smaller systems to be studied. Scission into single phosphate, pentose and base moieties is consistent with the procedure followed in the improvement of the Amber force field57 and this approach is adopted (and later tested) herein. In 2MVY, these molecular fragments are separated and then "capped" to account for the charge accumulated by deletion of atoms at the inter-fragment links. A methyl [-CH3] group is used to cap phosphate at each side. The pentose moiety is modeled by replacing the base with a N-dimethyl group [-N(CH3)2] to mimic the RNA environment (see Figures 2 and 3 for examples of each). Note that this is slightly different from the work reported in reference57, in which the base was replaced by a methyl group. A methyl group is also added to the base moiety to replace the pentose. Figure 2 shows examples of the three smallest molecules (phosphate, pentose and base) that were cut from RNA and capped. Figure 2 is near here In order to verify the reliability of the described fragmentation and capping scheme, larger moieties were also cut out from 2MVY; each set of phosphate, pentose and base moieties was cut out from 2MVY as phosphate-pentose, pentose-base (collectively referred to as medium in the foregoing), and nucleotide (referred to as large) fragments. These medium and large molecules were capped in the previously described manner and then used in a comparative analysis with the small fragments. To maintain the original relative geometries of the phosphate, pentose and base fragments from the 2MVY PDB structure, wavefunctions for all capped groups were computed without geometry optimization. Capping was also performed without subsequent restrained geometry optimization - the automatic GaussViewgenerated nuclear coordinates were used for capping groups. All molecules thus capped are saturated and neutral, and it is feasible to evaluate wavefunctions for these small, medium and large systems and to compute all necessary atomic multipole moments. Figure 3 shows the common fragments of the three sizes of molecules in detail. The phosphate moiety - the common fragment of phosphate, phosphate-pentose, and nucleotide, is highlighted with a red rectangle. The pentose moiety - the common fragment of pentose, phosphate-pentose, pentose-base, and nucleotide - is marked with a blue polygon, and finally the fragments marked with a green

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 48

polygon are the base moieties - the common fragment of base, pentose-base, and nucleotide systems. Figure 3 is near here The dependence of results obtained when the RNA molecule is fragmented into differently sized capped moieties will be assessed through consideration of the effect of the fragment size on the atom-atom electrostatic interactions. Computational Details Editing of fragmented chemical structures by capping was performed with GaussView58. Geometry optimization was carried out and energies and wavefunctions of systems were evaluated using Gaussian0959. Three levels of theory were used in this work: HF/631G(d,p), B3LYP/aug-cc-pVTZ, and MP2/aug-cc-pVTZ. In order to ensure the accuracy of the calculated energies, the convergence limit of the energy after successive SCF cycles was set to 10 ―10 a. u. This stringent condition increases the computational cost but yields the more accurate energies and wavefunctions necessary to generate accurate multipole moment values and produce smooth plots of atomic multipole moments across all molecular geometries60. AIMAll61 was used to integrate all atomic multipole moments in the described systems, with the atomic basins capped by a 𝜌(𝐫) = 10 ―6 a. u. isodensity envelope. Integration procedures failed for some of the atoms in these molecular geometries due to their complex topologies62. Only successfully integrated atoms were used to investigate the convergence behavior of atom-atom interactions in this work. In preliminary testing, AIMAll was also used to compute values of the exact 6D interaction energy as defined in eq 1. Atomatom high-rank interaction energies were computed with the computer program NYX, which was written in the Popelier group63. NYX implements a recursive and open-ended formula64 to obtain values of the interaction tensor 𝑇 in eq 2. A sufficient set of atomic multipole moments were computed to achieve an interaction rank of 𝐿 = 10 for each atom, meaning that the highest computed atomic multipole moment was of rank 𝓁 = 9 (eq 3). For any atom-atom interaction, the electrostatic interaction energy was defined to be convergent at a multipolar series expansion of rank 𝐿 if the two following conditions were satisfied: |𝐸(𝐿) ― 𝐸(𝐿 ― 1)| < 0.1 kJ mol-1 and

ACS Paragon Plus Environment

(4) 8

Page 9 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

|𝐸(𝐿 ― 1) ― 𝐸(𝐿 ― 2)| < 0.1 kJ mol-1 where 𝐸(𝐿) is the atom-atom electrostatic energy obtained at rank 𝐿. This is a severe convergence criterion compared to that used by existing force fields and has also been used in other work41,42. It should be noted that the phenomenon of "pseudoconvergence" may emerge39,64 for some atom-atom interactions when the electrostatic energy from a truncated multipole expansion hovers closely around the exact electrostatic energy (6D value). If the criterion in eq 4 is satisfied, then energy calculated at rank 𝐿 is regarded as the "reference" or "exact" multipolar energy. The suitability of this assumption will be tested. All QCT images were generated 65,66 with Rhorix , a plugin for Blender, using data computed with AIMAll. Molecular graphics were created using the program Visual Molecular Dynamics (VMD)67,68 and its internal ray tracer.

Results and Discussion The research ideas and methods in this paper are consistent with those used in the aforementioned previous study of the protein Crambin41. Firstly a central question regarding the particular detail required in the multipole expansion for interacting pairs of atoms will be answered for 2MVY, and an assessment of transferability will be made based on a different RNA structure with PDB code 1ELH. Addressing this central concern - understanding the requirements of an accurate multipolar electrostatic potential for use in force fields for RNA - several questions must be answered. For a given pair of atoms 𝐴 and 𝐵, which rank 𝐿 is needed to meet a given energy error criterion ∆𝐸 (the energy difference between the energy calculated at a rank 𝐿 < 10, and the "exact" energy calculated at 𝐿 = 10)? Also, how does the observed rank 𝐿 change with increasing internuclear distance 𝑅𝐴𝐵? This is a function of the variables 𝑅, 𝐿, ∆𝐸, and the atoms 𝐴 and 𝐵. The transferability of answers to these questions amongst RNA molecules must be assessed. Following this, several ancillary questions that assess the effect of the choices made will be investigated. Several assumptions and approximations have to be made in order to study atomic electrostatic properties in the complete system. In order to test the validity of the results, the effect of each choice on the conclusions drawn must be understood. In particular, it is necessary to evaluate the influence of the level of theory on observations by comparing to results obtained at higher levels of theory. The fragmentation procedure, which permits the study to be completed with available computational resources, has an effect on

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 48

conclusions that must be assessed by comparison to procedures which produce larger and therefore more representative fragments. Finally, the actual 3D environment experienced by a particular atom in an RNA molecule is not simply determined by sequence but also by the helical arrangement of molecular chains. The opposite is inherently assumed by the fragmentation procedures and this assumption’s validity can be assessed by comparing to a neighborhood generation scheme that takes larger structural information into account, i.e. that assesses how atomic neighbors in the helical structure affect the electrostatic properties of a particular atom. Preliminary Results - 6D vs. L = 10 Multipole Expansion Energies In order to justify the use of atom-atom interaction energies calculated at 𝐿 = 10 as "exact", the energy differences between truly exact (within numerical integration error) 6D energies (see eq 1) and the rank 10 multipolar energies (eq 2) are compared in Table 1 for some representative short-range interactions. A single uracil fragment (Figure 3) was taken as an example and a total of eleven atoms (excluding the cap) were chosen for testing: N1, C2, O3, N4, C5, O6, C7, C8, H9, H10, and H11. There are 37 interactions between the chosen atoms: eighteen 1,4 interactions ("1" and "4" refer to the beginning and end of a sequence of atoms, respectively), twelve 1,5 and seven 1,6. The specific comparison of this energy to the 6D result is shown in Table 1. It can be seen that the rank 10 multipolar energy for 31 of the 37 interactions is within 0.1 kJ mol-1 (∆𝐸) of the 6D result. Of the other six interactions, the ∆𝐸 values for OC, OC and NC are all 0.3 kJ mol-1, for NC and CC, the ∆𝐸 values are 0.5 kJ mol-1 and 0.7 kJ mol-1, respectively, and NC displays the largest energy difference among the 37 interactions at ∆𝐸 = 1.2 kJ mol-1. Table 1 is near here Figure 4 shows the results of the investigation of the 6 worst interactions in uracil according to errors in Table 1. In these plots, the multipolar interaction energy between each pair is plotted versus the expansion rank 𝐿 and shows that each expansion has already converged at 𝐿 = 5. Clearly, 𝐿 = 5 is sufficient to obtain an accurate energy value for all tested interactions, including NC. As an NC interaction is, in fact, the most difficult kind of interaction to converge, the remaining nine types of interaction including NA, CA, OA or HA (where A can be any of the four (C, N, O and H) atoms) should be convergent at rank 𝐿 = 10. In fact, it has been proved in Popelier’s studies41,42 that 𝐿 = 10 yields

ACS Paragon Plus Environment

10

Page 11 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

an accurate energy value and the energy error (with respect to the 6D result) at this rank is small enough to be treated as exact. The small differences remaining between the 6D energy and the converged multipolar energy are ascribed to differences in the integration approaches used to evaluate either the energy (a double integration over 2 atomic basins) or those atomic multipole moments (2 single integrations over an atomic basin). Figure 4 is near here In summary, an energy computed at 𝐿 = 10 is taken as the "exact" (reference) energy in the remainder of this study. The great advantage of using the energy calculated based on a multipole expansion at rank 𝐿 as the exact energy (compared to the a 6D integration value) is the significant saving of time through avoidance of the expensive 6D integration procedure, which is replaced with two 3D integrations followed by evaluation of an interaction tensor (eq 2). These 6D integrations represent a major expense in larger fragments. Addressing the Central Question To address the central question, the convergence behavior of the complete set of atom-atom interactions in 2MVY was studied first. Analysis of the system begins by looking at all interactions at their "exact" (i.e. 𝐿 = 10) value to assess convergence of the multipole expansion. In order to be able to discuss this large number of interactions, that number must be reduced by forming subsets of the interactions, for which an element-wise partitioning is natural. All atom-atom interaction calculations were performed at the HF/6-31G(d,p) level of theory on small molecule fragments (phosphates, pentoses and bases). In the RNA structure, there are 5 elements and thus 15 element-wise types of atom-atom interaction: HH, OH, CH, NH, PH, OO, OC, ON, OP, CC, CN, CP, NN, NP, and PP. To obtain a high-resolution set of internuclear distances for each of these 15 atom-atom interaction energy types, all possible examples of each type of interaction within each capped fragment (i.e. phosphate, pentose, and base) were calculated, along with all interactions between atoms of different fragments. A broad set of internuclear distances were studied for all element-wise atom-atom interaction types, with the exception of those involving phosphorus (PA, where A can be any of H, O, C, N or P). Phosphorus only exists in a limited set of chemical environments in RNA due to its appearance being limited to -PO4 groups and thus its interactions may be described directly. In 2MVY, all PB interactions (where B is any of H, N and P, i.e. not O and C) are at least 1-4 in nature and are convergent with shortest

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 48

internuclear distances of PH = 2.8 Å (1-4), PN = 5.0 Å (1-6), and PP = 5.9 Å (1-7). The bonded (1-2) PO pairs in phosphate have divergent expansions, with bond distance PO = 1.5 Å, while at the shortest non-bonded PO distance in 2MVY (3.3 Å, 1-4) the PO interaction is convergent. The shortest internuclear PC distance is 2.6 Å (1-3), at which the multipole expansion is divergent; no PC internuclear distance below 3.2 Å is present where the multipole expansion is convergent. To summarize, all PA interactions in RNA are convergent, with the exception of those between bonded (1-2) PO pairs in phosphate and 18 (1-3) PC interactions. In the following more detailed discussion, the focus is placed on interactions between atoms of elements H, O, C and N. Table 2 is near here Table 2 lists the 10 remaining types of atom-atom interaction studied, along with the total number of each present in 2MVY and the corresponding minimum internuclear distance. The minimum internuclear distance is defined as the internuclear distance that delineates the boundary between the regions in which an interaction between a pair of atoms of given elements is expected to have a corresponding convergent multipole expansion or not. In order to successfully replace a point-charge model with a multipolar model the internuclear distance must be below 𝑅𝐶. To be able to conveniently analyze the convergence of all the interaction types, these 10 types can be further grouped into three sets of interactions by separating hydrogen from non-hydrogen atoms: HH interactions, H𝐴 interactions, and 𝐴𝐵 interactions results, where 𝐴 and 𝐵 denote non-hydrogen atoms. Not surprisingly, the HH interaction, whose inter- atomic multipole expansions are convergent at distances greater than 2.1 Å, has the smallest convergence distance amongst the 10 interaction types 𝐴H (where 𝐴 represents N, O and C) interactions rank next behind the HH interactions, convergent beyond 2.4, 2.5, 2.7 Å respectively. The 𝐴𝐵 interactions have higher 𝑅𝐶 values; the minimum internuclear distances are the largest of the three kinds of interactions, starting with OO at 2.8 Å and ending with the CC interaction as hardest to converge with minimum internuclear distance of 3.8 Å. Connection Between Interaction Rank and Internuclear Distance for a Given Energy Error The investigation can now be focused on those interactions which do have convergent multipole expansions. The relationship

ACS Paragon Plus Environment

12

Page 13 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

between the properties investigated in this work can be expressed in eq 5. (5) 𝐿 = 𝑓(𝐴,𝐵,∆𝐸,𝑅𝐴𝐵) where L stands for the interaction rank between atomic multipole moments. 𝐴 and 𝐵 are element labels. ∆𝐸 is the specified "convergence energy error" the maximum acceptable deviation between exact and multipolar interaction energies, for which a strict threshold value of 0.1 kJ mol-1 is set in this work. 𝑅𝐴𝐵 is the distance between the nucleus of atoms 𝐴 and 𝐵. In eq 5, the four properties on the right side are treated as independent variables. The complexity of eq 5 can be reduced by systematically fixing three of the four independent variables (𝐴, 𝐵 and ∆𝐸), and then studying the relationship between the dependent variable 𝐿 on the left-hand side and the remaining independent variable, 𝑅𝐴𝐵. In the following analysis, a pair of interacting atoms are designated as 𝐴 and 𝐵 at a certain distance 𝑅, and the rank 𝐿 needed for the multipole expansion to meet the energy accuracy threshold ∆𝐸 can be determined. NN pairs were chosen as an example to explore this question, i.e. 𝐿 = 𝑓(𝑅𝐴𝐵)|𝐴 = N,𝐵 = N,∆𝐸 = 0.1. Figure 5 is near here Figure 5 shows how the lowest rank 𝐿 at which each NN interaction’s multipole expansion meets the convergence energy criterion changes with increasing internuclear distance 𝑅NN. There are two labelled regions: a divergent region, 𝐷 ∈ {𝑅NN;𝑅NN < 𝑅𝐶} containing interactions with distances too short for the multipole expansion to converge (at 𝐿 = 10), and a convergent region, 𝐶 ∈ {𝑅NN;𝑅NN ≥ 𝑅𝐶} containing distances for which a converged expansion is obtained. For NN pairs, interactions with short internuclear distances (𝑅NN < 3 Å) appear in the region 𝐷. On the other hand, interactions lie in the region 𝐶 when their internuclear distances are greater than 𝑅𝐶 = 3 Å (see Table 2) and remain stable at long distances (up to 12.0 Å). Actually, Figure 5 shows that, in the 𝑅𝐶 region, all of the atom-atom interactions converge at a rank 𝐿 ≤ 9. When the internuclear distance is beyond 5.5 Å the minimum acceptable rank is always as low as 𝐿 = 5 and for 𝑅 > 7.6 Å always as low as 𝐿 = 4. Therefore, rank 𝐿 = 9 is always high enough to recover the correct energy for an NN interaction provided the multipole expansion converges. As has been proven before, convergent interactions all appear in the region 𝐶 when 𝐿 = 10, if 𝐿 > 10 then in this work the multipole expansion is considered to be divergent (see Methods) and the interaction is in the region 𝐷.

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 48

Connection Between Acceptable Energy Error and Interaction Rank and Distance To further explore the electrostatic picture in 2MVY, the relationship of the five variables in eq 5 can alternatively be expressed as ∆𝐸 = 𝑔(𝐴,𝐵,𝐿,𝑅𝐴𝐵). The relationship ∆𝐸 = 𝑔(𝐿,𝑅OH)|𝐴 = O,𝐵 = H can be analyzed via Figure 6, where OH interactions have been chosen as a representative example. The dependence of the energy difference between the reference energy calculated at rank 𝐿 = 10 and energies computed at 𝐿 = 1, 2, 3, 4 and 5 are shown in Figure 6 for a series of values of 𝑅OH corresponding to individual interactions in 2MVY. Figure 6 is near here From the axis drawn at ∆E = 0.1 kJ mol-1, it is possible to visually determine the value of 𝑅OH at which the desired energy accuracy is reached. For example, for 𝐿 = 3, the plot intersects the horizontal energy line for the final time at 𝑅OH = 4.6 Å, which means when 𝑅OH is larger than 4.6 Å, 𝐿 = 3 is sufficient for a convergent OH interaction energy to reach the exact value with accuracy ∆𝐸 = 0.1 kJ mol-1. On the contrary, if 𝑅OH is less than 4.6 Å, then a higher rank (𝐿 = 4 or 5) is needed to meet the preset ∆𝐸. From Figure 6 it can be seen that energy accuracy at 𝐿 = 2 (monopolemonopole and monopole-dipole interactions) cannot consistently satisfy ∆𝐸 = 0.1 kJ mol-1 until 𝑅OH is greater than 8.9 Å. When the rank 𝐿 is set to 5, almost all the OH interactions are convergent with ∆𝐸 < 0.1 kJ mol-1. Finally, as 𝑅OH increases, the lines spread apart on the 𝑦-axis. At short range a high-rank multipole expansion is necessary, and the use of higher multipole moments improves the error less so at long range. Table 3 is near here Figure 7 is near here The distribution of internuclear distances in 2MVY is of clear interest. As only convergent atom-atom interactions are studied in this work, the multipolar electrostatic energies of interactions with internuclear distances in the region 𝐶 were computed, but not those interactions with internuclear distances in region 𝐷. Fortunately, atom-atom interactions in the divergent region 𝐷 make up a very small proportion of the total. There are in total 51040 atom-atom interactions of all 15 possible element-wise interaction types in 2MVY. Table 3 gives the total number for each of the possible atom-atom interactions, as well as the number of convergent and divergent interactions and their average

ACS Paragon Plus Environment

14

Page 15 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

internuclear distance. Taking the CH interaction as an example, there are in total 10450 CH interactions (which is the largest number amongst all the 15 possible interaction types) in 2MVY and only 2.6% are located in the 𝑅𝐷 region. Figure 7 shows the distribution of CH internuclear distances. Transferability of Results from 2MVY to other RNA Molecules Although results justifying the inclusion of multipolar electrostatics in force fields have been found based on 2MVY, there is no evidence that these conclusions hold for all RNA molecules. To test the transferability of the minimum distances 𝑅𝐶 for atom pair types 𝐴𝐵 which have been calculated based on 2MVY, another RNA chain (PDB code: 1ELH) was taken as a test case and relevant results were computed using the same methodology as applied to 2MVY. 1ELH is a well-studied small RNA (composed of 11 phosphates, 12 pentoses, and 12 bases) with helical structure. As for 2MVY, the convergence of the multipole expansion for atom-atom interactions in 1ELH was considered converged at an interaction rank 𝐿 < 10 by comparison to energies computed at rank 𝐿 = 10. The minimum internuclear distances of all 10 possible atom-atom interactions were computed and compared with the values that were calculated for 2MVY (see Table 2). If the minimum internuclear distances for 1ELH are smaller than or equal to the corresponding distance for 2MVY, then conclusions about RNA drawn from the convergence behavior of 2MVY may be considered to successfully apply to 1ELH. The results are listed in Table 4. When ∆𝐸 = 0.1 kJ mol-1, there are just 9 failing interactions out of a total of 67468. The minimum internuclear distance of the NH interactions in 2MVY is 𝑅 = 2.4 Å. There is one NH interaction that produces absolute energy error slightly larger than 0.1 kJ mol-1, that is, only 0.13 kJ mol-1. The absolute energy error for the failed ON interaction is 0.12 kJ mol-1 at the minimum internuclear distance of 3.0 Å in Table 2. The minimum internuclear distance of an NC interaction in 1ELH is 3.5 Å, which is 0.2 Å longer than the value in Table 2. Table 4 is near here In summary, when rank 𝐿 ≥ 5, the same behavior is observed as for the highest rank, 𝐿 = 10. Conclusions about convergence behavior drawn from 2MVY therefore apply to 1ELH successfully with only very few and minor exceptions and may be expected to apply to RNA molecules in general.

Assessing the Conclusions

Effect

of

the

Employed

ACS Paragon Plus Environment

Methodology

on

15

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 48

The Effect of the Level of Theory Calculations on 2MVY and 1ELH were performed at the HF/631G(d,p) level of theory to take advantage of high calculation speed. To prove the reliability of the HF/6-31G(d,p) level of theory (and therefore the validity of this assumption) for calculating minimum internuclear distances of all 10 atom-atom interaction values (in Table 2), the higher levels of theory B3LYP/aug-cc-pVTZ and MP2/aug-cc-pVTZ were applied to recompute values in the 𝑅𝐶 region. From Table 2, all the minimum internuclear distances of these 10 interaction types were smaller than 4.0 Å at HF, therefore, only interactions where 𝑅 ≤ 4.0 Å were included. The restriction saves computation time at higher levels of theory without having an effect on the observations on convergence made, provided 𝑅𝐶 does not extend to greater than 4.0 Å, as will be observed. Consequently, the numbers of each atom-atom interaction type at B3LYP/aug-cc-pVTZ and MP2/aug-cc-pVTZ levels smaller than at HF/6-31G(d,p) level. The final results are presented in Table 5. Table 5 is near here Table 5 shows the minimum convergent interaction distances of 10 types of interactions calculated at the two higher levels of theory. The 𝑅𝐶 values of four interactions of HH, OH, CH, and OC computed at B3LYP/aug-cc-pVTZ level are 0.1 Å larger than the values obtained at MP2/aug-cc-pVTZ level, whereas the remaining atom-atom interaction types yield the same minimum values. By comparison with Table 2 it can be seen that the 𝑅𝐶 values of all interactions have higher minimum values (from 0.1 Å to 0.5 Å) at B3LYP and MP2 compared to those in Table 2 (HF), with the exception of the CC interaction type which yields the same value at all three levels of theory. In summary, the minimum values calculated at B3LYP/aug-cc-pVTZ are very close to those computed at MP2/aug-cc-pVTZ, and both values are slightly larger than the values obtained at HF/631G(d,p). In this work, the Hartree-Fock method with a small 631G(d,p) basis set was used because of its computational speed and simplicity - all calculations finished in times on the order of minutes for all systems and resulted in the feasibility of the complete study presented. As a basic comparison, for the calculation of phosphate, the computational time with HF is 20 secs, B3LYP/aug-cc-pVTZ required 30 mins, while MP2/aug-cc-pVTZ took up to 5 hrs. In other words, the MP2 method takes 10 times as long as the B3LYP method (with the same basis set) for essentially

ACS Paragon Plus Environment

16

Page 17 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

no change in conclusions. By considering the calculation accuracy and speed together, the B3LYP method is the best choice for performing investigations of electrostatics in RNA, and likely in other biological macromolecules. The Effect of the Fragmentation Procedure As previously discussed, technologically, 2MVY is still too large for computation of atomic multipole moments via the full system wavefunction to be feasible. Because of this, the pilot system 2MVY was fragmented into small moieties of phosphate, pentose, and bases (see Figures 2 and 3). To confirm the reliability of calculations based on these small moieties, "medium molecules" of phosphate-pentose and pentose-base were produced from 2MVY (representative molecular graphs are shown in Figures 8(a) and 8(b)). In total there are 9 phosphate-pentose molecules and 10 pentose-base molecules in 2MVY. In addition, 9 large molecules corresponding to complete nucleotides were also cut out; the molecular graph of an example is shown in Figure 8(c). It was feasible to compute atomic multipole moments in each of the small, medium and large molecular fragments from wavefunctions obtained at the previously used HF/6-31G(d,p), B3LYP/aug-cc-pVTZ and MP2/aug-cc-pVTZ levels of theory. As such, MP2/aug-cc-pVTZ results are reported as this was the highest level used. The minimum internuclear distance of all 10 types of (nonphosphorus) atom-atom interactions in the 𝑅𝐶 region was recalculated based on the medium and large molecules. Table 6 lists these values, with the interaction types in ascending order of minimum values in the 𝑅𝐶 region. By comparing with the minimum values in Table 2, positive results have been achieved in Table 6, as the differences are quite small and the minimum internuclear distances of OH, NH and CC are equal to the corresponding values in Table 2. For HH, the minimum value for large molecules is equal to the values in Table 2, while it is smaller than the values obtained for medium molecules (with a difference of 0.1 Å); for CH, OO, and NC, the values for medium molecules are all identical to the values in Table 2, whereas larger values were obtained based on the large fragments, with differences of 0.1 Å, 0.2 Å, and 0.2 Å, respectively; the minimum values of NN, ON, and OC interactions are the same for both medium molecules and large molecules, which are slightly larger than the values in Table 2. In summary, the observed minimum internuclear distances of all 10 interactions in the 𝑅𝐶 region establish the transferability of results across fragmentations - values calculated based on small molecules can be transferred to medium and large molecules, and it

ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 48

appears safe to assume that the same results will apply to entire RNA molecules. Table 6 is near here Figure 8 is near here To further understand the transferability of atomic properties (i.e. the electron distribution) from a small moiety to a large moiety, changes in interatomic electrostatic energy between small molecules (phosphate, pentose, and base), medium molecules (phosphate- pentose and pentose-base) and large nucleotide molecules (phosphate-pentose-base) were investigated for a single nucleotide of 2MVY. Example molecular geometries of phosphatepentose, pentose-base, and nucleotide are shown in Figure 8 (for comparison, molecular geometries of the smaller phosphate, pentose, and base fragments are shown in Figure 2). Atom-atom interactions which occur within these three kinds of molecules were investigated. The electrostatic energies of the atom-atom interactions in the common fragments were calculated separately, in phosphate, in pentose, in base, in phosphatepentose, in pentose-base, and in nucleotide. Those energy values calculated based on small fragments, however, are somewhat different from the values obtained for the medium fragments and differ from the values calculated from the larger nucleotide as well. For each interaction, the absolute energy deviations between small and medium molecules is calculated ∆𝐸1 = |∆𝐸small ― ∆𝐸medium|, between small and large molecules is denoted ∆𝐸2 = |∆𝐸small ― ∆𝐸large|, and between medium and large molecules is denoted ∆𝐸3 = |∆𝐸mediuml ― ∆𝐸large|, respectively. Figure 9 plots the proportion of interactions between two or three fragments within a given energy deviation ∆𝐸. Table 7 lists the maximum and mean deviations along with the proportion of interactions within an absolute energy deviation of 1.0 and 4.0 kJ mol-1. Table 7 is near here Figure 9 is near here The five line charts of Figure 9 can be classified into two subgroups according to the maximum absolute energy deviation ∆𝐸max. One contains Figure 9(a) and Figure 9(e) for which ∆𝐸max values are in the range 10 to 16 kJ mol-1, the other includes Figure 9(b), Figure 9(c) and Figure 9(d) whose ∆𝐸max values extend from 35 up to 50 kJ mol-1. The key factor that causes this significant difference is whether the connecting bond between pentose and base is cut off in the fragmentation procedure. For example, in Figure

ACS Paragon Plus Environment

18

Page 19 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

9(a) and Figure 9(e), neither the difference between phosphate, pentose and phosphate-pentose nor between pentose-base and nucleotide involves cutting off the bond between pentose and base, however this bond is cut off in the common fragments in Figure 9(b), Figure 9(c) and Figure 9(d). In Figure 9(a), the proportion of interactions within an absolute energy deviation of less than 1.0 kJ mol-1 is 58%. The proportion rises to 85% when the energy threshold is 4.0 kJ mol-1. In other words, compared to phosphatepentose, the atomic multipole moments of phosphate and pentose are 85% accurate if an absolute energy deviation of 4.0 kJ mol-1 is acceptable. The interaction that yields the highest absolute energy deviation of 15.7 kJ mol-1 is OH, which is a 1,4-interaction common to pentose and phosphate-pentose. The mean absolute energy deviation of these 66 atom-atom interactions is 1.9 kJ mol-1. In Figure 9(e), the proportion of interactions is 61% within an absolute energy deviation of 1.0 kJ mol-1. For an absolute energy deviation of 4.0 kJ mol-1 the corresponding proportion is 94%, which is very close to that for Figure 9(a). The highest absolute energy deviation is 10.5 kJ mol-1, which occurs between nitrogen and hydrogen in base of pentose-base. The mean absolute energy deviation of these 196 atom-atom interactions is 1.3 kJ mol-1. The absolute energy deviations in Figure 9(b), Figure 9(c) and Figure 9(d) are a slightly larger than in Figure 9(a) and Figure 9(e). In Figure 9(b), the proportion of interactions is 25% within an absolute energy deviation of 1.0 kJ mol-1. For an absolute energy deviation of 4.0 kJ mol-1 the corresponding proportion is rises to 60%. The highest absolute energy deviation occurs in the base, which is of type NH, with an energy threshold of 54.6 kJ mol-1. The mean absolute energy deviation of these 68 atom-atom interactions is 7.1 kJ mol-1. In Figure 9(c), the proportion of interactions within an absolute energy deviation of less than 1.0 kJ mol-1 is 21%. The proportion is 50% when the energy threshold is 4.0 kJ mol-1. The interaction that yields the highest absolute energy deviation of 47.0 kJ mol-1 is of type NH, which is a 1,5-interaction common to base and nucleotide. The mean absolute energy deviation of these 98 atom-atom interactions is 6.5 kJ mol-1. In Figure 9(d), the proportion of interactions is 12% within an absolute energy deviation of 1.0 kJ mol-1. For an absolute energy deviation of 4.0 kJ mol-1 the corresponding proportion is rises to 62%. The highest absolute energy deviation is 35.7 kJ mol-1, which occurs between oxygen and hydrogen in phosphate-pentose and nucleotide. The mean absolute energy deviation of these 99 atom-atom interactions is 5.0 kJ mol-1.

ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 48

In summary, in the first subgroup of Figure 9(a) and Figure 9(e), electrostatic energies between two atoms in a small/medium molecule differ by less than 2.0 kJ mol-1 on average from the energy between the same atoms in a medium/large molecule containing this fragment. However, in the second subgroup of Figures 9(b)-9(d), the absolute energy deviations are much larger than values in the first subgroup. This means that the torsion angle between the pentose and the base, which is accounting for the conformation of the sugar as well as the back bone of the RNA69–72, is very important for accurately predicting the structure of RNA. Thus, it would be better to not perform computations on the pentose and the base separately; it is better to not parameterize a force field using separate pentose and base fragments. The Influence of the Atomic Neighborhood Although RNA molecules are represented as chains of individual entities, in reality their 3-dimensional structure places atoms which are distant in sequence space close together in real 3D space. When producing fragments of the complete RNA in the manner applied previously in this work (and typically in force field development), there is an implicit assumption that only atoms within the same molecule (or at most within the same nucleotide) can have a strong effect on the electrostatic properties of an atom in question. In opposition to this assumption, the electrostatic properties of a selected atom may exhibit a strong dependence on other atoms much further along the RNA chain. To investigate the extent of this phenomenon it is necessary to explore specifically how neighboring atoms influence the multipole moments of a central atom. In particular it is important to understand the minimum distance at which the effects of a neighboring atom may be safely ignored in terms of the central atom multipole moment values. In addition, the distribution of this distance for an atom of a particular element across differing atomic neighborhoods is of interest. An understanding of this is vital for predicting a selected atom’s multipole moments based on its atomic neighborhood. In order to minimize the computational expense of the investigation and focus on the simplest example of the problem, the application is limited to Hydrogen atoms herein. Three such atoms appearing in 2MVY were chosen. Atom H1 is located on the hydroxyl group in pentose (see Figure 10), atom H2 connects to one carbon atom of pentose, and atom H3 is bound to a carbon atom in base(A).

ACS Paragon Plus Environment

20

Page 21 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

To generate expanding atomic neighborhoods in 3-dimensions, the following approach was assumed. The nuclear coordinates of the atom in question were taken as the center of a sphere of initial radius 𝑟 (chosen to include the closest neighboring atom). The radius of the sphere was then increased in steps of 0.1 Å up to a maximum distance of 12.0 Å (a value chosen for the sake of computational expense). After each radius increase, new atoms were added to the local environment of the atom in question if their nuclear coordinates lay inside the sphere (i.e. if 𝑟𝐴𝐵 < 𝑟). If no new nucleus was found in the expanded sphere then expansion continued, however if a new atom was to be added to the local environment, a structure was produced containing all atoms currently in the environment. A set of structures is ultimately obtained by this procedure, each containing the central atom and a subset of the complete set of its neighbors. A capping procedure is again required as unnatural structures are generated - in this case the structures are saturated to form neutral molecules. Hydrogen causes the smallest perturbation to the system when used to complete an unsaturated system, hence, the influence on the electron distribution of the original atoms in RNA is reduced to the lowest degree when it is used in capping. Where a generated structure would be open-shell, hydrogens were added until the standard valences of all atoms were satisfied (the standard valence used for phosphorus was 5, carbon 4, nitrogen 3, oxygen 2, and that of hydrogen was 1). For example, if a terminal oxygen atom connects to another atom with a single bond [-O], then one hydrogen atom will be added to this oxygen atom generating [OH]. If a terminal oxygen atom is doubly bonded to a carbon atom [-C(=O)], then no hydrogen must be added to this terminal oxygen. In fact, some molecules may end up with groups that do not occur in RNA. For example, the phosphate fragment occurring in the original RNA is [-C-O-P(=O)(-OH)2], which becomes [-C-O-PH2(=O)] in the sphere. Although this situation is not entirely desirable, this appears the optimal way to model the environment of the central atom. Rather than trace the change in the multipole moments of the central atom as the local environment becomes larger, it is illustrative to instead monitor the value of an interaction rank 10 multipolar electrostatic interaction. As more atoms are added to the local environment of the central atoms, it is reasonable to expect that atom’s multipole moments to settle on final values. By choosing a "probe atom" - an atom of the system whose multipole moments remain fixed at their value in the largest sphere - it is possible to compute the electrostatic interaction energy between the probe atom and the central atom and monitor this value as a function of the radius of the sphere (and hence the extent of the local atomic neighborhood). This allows interpretation of the

ACS Paragon Plus Environment

21

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 48

overall effect of the environment in terms of energy accuracy the quantity of highest interest in force field development. Firstly, atom H1 (see Figure 10) was taken as the center of a sphere in order to observe how its electrostatic properties change with increasing sphere radius 𝑟. The initial 𝑟 value was set to 2.0 Å, such that another atom covalently bonded to H1 (1,2 interaction, here just oxygen) was included in the initial sphere. The value of 𝑟 was then increased as described, resulting in the generation of 74 structures. Figure 10 is near here The wavefunctions of this series of hydrogen-capped structures were computed at the HF/6-31G(d,p) level of theory, and the multipole moments of the H1 atom in each structure were integrated. The nearest neighbors of the H1 atom can be expected to have the strongest influence on the multipole moments of H1. The largest structure with 𝑟 = 12.0 Å is taken as the reference structure and is shown on the right of Figure 10. Atom C1 was selected as the probe atom (see above); the internuclear distance between H1 and C1 is 𝑅H1C1 = 8.3 Å allowing the assumption that the multipole expansion energy of this interaction will safely converge. The multipole moments of H1 were calculated in each of 74 structures as discussed, while the multipole moments of C1, were only computed in the reference ( 𝑟 = 12.0 Å) structure and remain invariant to 𝑟. These combine with the varying multipole moments of H1 atom to yield the electrostatic interaction energy between H1 and C1 by eq 2. A sequence of electrostatic interaction energies of H1C1 is computed using the individual multipole moments of the H1 atom occurring in each of the set of generated structures. Figure 11(a) shows the interaction energy of H1C1 as a function of the increasing radius 𝑟 of the sequence of structures. Plotted on the 𝑦-axis is the interaction energy of H1C1 (denoted as 𝐸), which changes according to the varying multipole moments of the H1 atom. The expansion process is considered converged when the interaction energy between the central and probe atom is within 0.4 kJ mol-1 (0.1 kcal mol-1) of its value in the environment generated by the reference sphere. By this standard, the multipole moments of the H1 atom are judged to remain stable over the last 20% of the total number of structures (see Figure 11(a)). In other words, the electrostatic properties appear to converge for structures generated by a sphere with a radius between 10.4 Å and 12.0 Å.

ACS Paragon Plus Environment

22

Page 23 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 8 is near here Figure 11 is near here Figure 11(a) shows that the H1C1 interaction energy changes dramatically when 𝑟 increases from about 2.6 Å to 2.9 Å. This means that in this range of 𝑟, new atoms entering the sphere have a great influence on the electron distribution (and thus the multipole moments) of H1; newly introduced interactions in this range are 1,3 in nature. A short, flat region between 2.9 Å and 3.2 Å follows, as no new atom enters the sphere and thus the multipole moments of the H1 atom remain constant. From 𝑟 = 3.2Å onward, the absolute value of the H1C1 interaction energy slowly rises and the energy difference from the energy value of the reference sphere is smaller than 0.4 kJ mol-1. This indicates that the influence of atoms entering the environment when 𝑟 ≥ 3.2 Å, on the multipole moments of H1 is negligible, or the multipole moments of H1 will not be influenced by other atoms where the internuclear distance is greater than 3.2 Å. There are in total 161 atoms in the final sphere. The same procedure was performed for H2 and H3. Figures 11(b) and 11(c) show the energy of the H2C2 and H3O3 interactions as a function of the increasing radius 𝑟 of the sequence of structures. For H2, a plateau is reached very soon after 𝑟 = 2.5 Å - the effect of the local environment on H diminishes rapidly with distance. For H3, after an initial level region, a sharp drop is observed at 4.0 Å. A slow increase to a plateau value is observed for 𝑟 > 4.0 Å interrupted by a significant spike at 𝑟 = 7.0 Å. This spike (and its sudden correction) can be ascribed to the entrance of two atoms at similar distances with near opposite influence on the electrostatics of H3. The minimum radius values for H1, H2 and H3 are listed in Table 8 and despite a small sample clearly vary significantly for atoms of this element. In summary, the results above show that hydrogen atoms located at different positions in RNA lead to observation of different minimum radius values. Therefore, atoms (even of the same element) in different locations possess significantly different biochemical properties. Hence, in the development of force fields for RNA molecular simulations, it is necessary to improve two points: (1) atoms in RNA should be divided into different types to study their properties accurately, and (2) a significant environment not based only on sequence should be considered in order to obtain precise atomic multipole moments, as well as their dependent atom-atom electrostatic interaction energies.

ACS Paragon Plus Environment

23

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 48

Conclusions In this work, high-rank topological atomic multipole moments were employed to calculate atom-atom electrostatic interaction energies in an RNA molecule where atom-centered point charges are more commonly employed. The convergence behavior of 10 possible types of atom-atom interactions in 2MVY was studied. The multipolar electrostatic energy calculated at 𝐿 = 10 was taken as a reference energy, which is computationally faster than calculating exact 6D interaction energies. Although the convergence properties were initially computed only based on 2MVY, which is a somewhat arbitrary choice, the values were tested on another RNA molecule (1ELH) and the predicted results were satisfactory. The observed minimum internuclear distances 𝑅𝐶 for all 10 types of interactions can be safely transferred from small to medium and large fragments. The electrostatic energies of atom-atom interactions in a small fragment differ by less than 2.0 kJ mol-1 on average from the energies of the same interactions in a medium fragment containing the small fragment, provided the fragmentation does not cut off the connecting bond between pentose and base, or the energy difference is even up to 7.1 kJ mol-1. The minimum internuclear distances calculated at the B3LYP/aug-cc-pVTZ and MP2/aug-cc-pVTZ levels of theory are mutually slightly different when compared to the corresponding values computed at HF/6-31G(d,p) level, indicating that the conclusions drawn at the lower level can be extrapolated to higher levels of theory. The influence of neighboring atoms on the electron distribution of three hydrogen atoms was investigated by computing the minimum radius of a sphere of atomic environment at which the influence of additional atoms on the multipole moments of the center atom can be ignored. Three hydrogen atoms located at different positions were studied using a consistent methodology to see if the same element located at different positions in RNA resulted in the same minimum radius. The minimum radius values are large for the three hydrogen atoms. Thus, both nearby neighbors and distant atoms affect the multipole moments of the center atom, something not considered in atom typing schemes for force fields, which tend to use very local information about the atomic environment. The environment of the center atom must therefore be large enough to obtain accurate atomic multipole moments as well as their dependent atom-atom electrostatic interaction energies. It is therefore necessary to divide atoms in RNA into more finely detailed types than are usually assumed to study their properties

ACS Paragon Plus Environment

24

Page 25 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

accurately, and thus improve the computational accuracy of force fields for RNA. Funding Sources This work was supported by the National Natural Science Foundation of China (Grant No. 21503101), the National Science Foundation of Gansu province, China (Grant No. 1506RJZA223), the Fundamental Research Funds for the Central Universities (Grant No. lzujbky-2017-189), and the Project-sponsored by SRF for ROCS, SEM (Grant No. SEM[2015]311). Acknowledgment The authors are grateful to Prof. Paul L. A. Popelier for kindly providing the program NYX.

References (1) Doudna, J. A.; Cech, T. R. The Chemical Repertoire of Natural Ribozymes. Nature 2002, 418, 222. (2) Nissen, P.; Hansen, J.; Ban, N.; Moore, P. B.; Steitz, T. A. The Structural Basis of Ribosome Activity in Peptide Bond Synthesis. Science (80-. ). 2000, 289 (5481), 920 LP-930. (3) Scott, W. G. Ribozymes. Curr. Opin. Struct. Biol. 2007, 17 (3), 280–286. (4) Tucker, B. J.; Breaker, R. R. Riboswitches as Versatile Gene Control Elements. Curr. Opin. Struct. Biol. 2005, 15 (3), 342–348. (5) Walter, P.; Blobel, G. Signal Recognition Particle Contains a 7S RNA Essential for Protein Translocation across the Endoplasmic Reticulum. Nature 1982, 299, 691. (6) Bachellerie, J.-P.; Cavaillé, J.; Hüttenhofer, A. The Expanding SnoRNA World. Biochimie 2002, 84 (8), 775–790. (7) Wahl, M. C.; Will, C. L.; Lührmann, R. The Spliceosome: Design Principles of a Dynamic RNP Machine. Cell 2009, 136 (4), 701–718. (8) Moore, P. B. 1 - A Spectroscopist’s View of RNA Conformation: RNA Structural Motifs; Söll, D., Nishimura, S., Moore, P. B. T.-R. N. A., Eds.; Pergamon: Oxford, 2001; pp 1–19. (9) Bailor, M. H.; Sun, X.; Al-Hashimi, H. M. Topology Links RNA Secondary Structure with Global Conformation, Dynamics, and Adaptation. Science (80-. ). 2010, 327 (5962), 202 LP-206. (10) Levitt, M.; Warshel, A. Computer Simulation of Protein Folding. Nature 1975, 253, 694.

ACS Paragon Plus Environment

25

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 48

(11)

Warshel, A.; Levitt, M. Theoretical Studies of Enzymic Reactions: Dielectric, Electrostatic and Steric Stabilization of the Carbonium Ion in the Reaction of Lysozyme. J. Mol. Biol. 1976, 103 (2), 227–249. (12) McCammon, J. A.; Gelin, B. R.; Karplus, M. Dynamics of Folded Proteins. Nature 1977, 267, 585. (13) Lifson, S.; Warshel, A. Consistent Force Field for Calculations of Conformations, Vibrational Spectra, and Enthalpies of Cycloalkane and n-Alkane Molecules. J. Chem. Phys. 1968, 49 (11), 5116. (14) McCammon, J. A. Models for Protein Dynamics. CECAMUniversite Paris IX, Orsay, Fr. 1976. (15) Levitt, M. Computer Simulation of DNA Double-Helix Dynamics. Cold Spring Harb. Symp. Quant. Biol. 1983, 47, 251–262. (16) Tidor, B.; Irikura, K. K.; Brooks, B. R.; Karplus, M. Dynamics of DNA Oligomers. J. Biomol. Struct. Dyn. 1983, 1 (1), 231–252. (17) Case, D. A. AMBER 2018, . University of California, San Francisco. 2018. (18) Brooks, B. R.; Brooks, C. L.; Mackerell, A. D.; Nilsson, L.; Petrella, R. J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; Caflisch, A.; Caves, L.; Cui, Q.; Dinner, A. R.; Feig, M.; Fischer, S.; Gao, J.; Hodoscek, M.; Im, W.; Kuczera, K.; Lazaridis, T.; Ma, J.; Ovchinnikov, V.; Paci, E.; Pastor, R. W.; Post, C. B.; Pu, J. Z.; Schaefer, M.; Tidor, B.; Venable, R. M.; Woodcock, L. H.; Wu, X.; Yang, W.; York, D. M.; Karplus, M. CHARMM: The Biomolecular Simulation Program. J. Comput. Chem. 2009, 30 (10), 1545–1614. (19) Pérez, A.; Marchán, I.; Svozil, D.; Sponer, J.; Cheatham III, T. E.; Laughton, C. A.; Orozco, M. Refinement of the AMBER Force Field for Nucleic Acids: Improving the Description of α/γ Conformers. Biophys. J. 2007, 92 (11), 3817–3829. (20) Mlýnský, V.; Kührová, P.; Zgarbová, M.; Jurečka, P.; Walter, N. G.; Otyepka, M.; Šponer, J.; Banáš, P. Reactive Conformation of the Active Site in the Hairpin Ribozyme Achieved by Molecular Dynamics Simulations with ε/ζ Force Field Reparametrizations. J. Phys. Chem. B 2015, 119 (11), 4220–4229. (21) Zgarbová, M.; Otyepka, M.; Šponer, J.; Mládek, A.; Banáš, P.; Cheatham, T. E.; Jurečka, P. Refinement of the Cornell et al. Nucleic Acids Force Field Based on Reference Quantum Chemical Calculations of Glycosidic Torsion Profiles. J. Chem. Theory Comput. 2011, 7 (9), 2886–2902. (22) Shaik, M. S.; Devereux1, M.; Popelier, P. L. A. The

ACS Paragon Plus Environment

26

Page 27 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Importance of Multipole Moments When Describing Water and Hydrated Amino Acid Cluster Geometry. Mol. Phys. 2008, 106 (12–13), 1495–1510. (23) Cardamone, S.; Hughes, T. J.; Popelier, P. L. A. Multipolar Electrostatics. Phys. Chem. Chem. Phys. 2014, 16 (22), 10367–10387. (24) Tan, D.; Piana, S.; Dirks, R. M.; Shaw, D. E. RNA Force Field with Accuracy Comparable to State-of-the-Art Protein Force Fields. Proc. Natl. Acad. Sci. 2018. (25) Zhang, C.; Lu, C.; Jing, Z.; Wu, C.; Piquemal, J.-P.; Ponder, J. W.; Ren, P. AMOEBA Polarizable Atomic Multipole Force Field for Nucleic Acids. J. Chem. Theory Comput. 2018, 14 (4), 2084–2108. (26) Gresh, N.; Cisneros, G. A.; Darden, T. A.; Piquemal, J.-P. Anisotropic, Polarizable Molecular Mechanics Studies of Inter- and Intramolecular Interactions and Ligand−Macromolecule Complexes. A Bottom-Up Strategy. J. Chem. Theory Comput. 2007, 3 (6), 1960–1986. (27) Price, S. L.; Hamad, S.; Torrisi, A.; Karamertzanis, P. G.; Leslie, M.; Catlow, C. R. A. Applications Of Dl_poly And Dl_multi To Organic Molecular Crystals. Mol. Simul. 2006, 32 (12–13), 985–997. (28) Leslie, M. DL_MULTI—A Molecular Dynamics Program to Use Distributed Multipole Electrostatic Models to Simulate the Dynamics of Organic Crystals. Mol. Phys. 2008, 106 (12– 13), 1567–1578. (29) Sagui, C.; Pedersen, L. G.; Darden, T. A. Towards an Accurate Representation of Electrostatics in Classical Force Fields: Efficient Implementation of Multipolar Interactions in Biomolecular Simulations. J. Chem. Phys. 2003, 120 (1), 73–87. (30) Pengyu, R.; W., P. J. Consistent Treatment of Interand Intramolecular Polarization in Molecular Mechanics Calculations. J. Comput. Chem. 2002, 23 (16), 1497–1506. (31) Day, G. M.; Motherwell, W. D. S.; Jones, W. Beyond the Isotropic Atom Model in Crystal Structure Prediction of Rigid Molecules:  Atomic Multipoles versus Point Charges. Cryst. Growth Des. 2005, 5 (3), 1023–1033. (32) Cole, D. J.; Vilseck, J. Z.; Tirado-Rives, J.; Payne, M. C.; Jorgensen, W. L. Biomolecular Force Field Parameterization via Atoms-in-Molecule Electron Density Partitioning. J. Chem. Theory Comput. 2016, 12 (5), 2312– 2323. (33) Lee, L. P.; Cole, D. J.; Skylaris, C.-K.; Jorgensen, W. L.; Payne, M. C. Polarized Protein-Specific Charges from Atoms-in-Molecule Electron Density Partitioning. J. Chem. Theory Comput. 2013, 9 (7), 2981–2991.

ACS Paragon Plus Environment

27

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 48

(34)

Liem, S. Y.; Popelier, P. L. A. Properties and 3D Structure of Liquid Water:  A Perspective from a High-Rank Multipolar Electrostatic Potential. J. Chem. Theory Comput. 2008, 4 (2), 353–365. (35) Y., L. S.; A., P. P. L.; M., L. Simulation of Liquid Water Using a High-Rank Quantum Topological Electrostatic Potential. Int. J. Quantum Chem. 2004, 99 (5), 685–694. (36) Liem, S. Y.; Popelier, P. L. A. The Hydration of Serine: Multipole Moments versus Point Charges. Phys. Chem. Chem. Phys. 2014, 16 (9), 4122–4134. (37) Joubert, L.; Popelier, P. L. A. Improved Convergence of the ‘Atoms in Molecules’ Multipole Expansion of Electrostatic Interaction. Mol. Phys. 2002, 100 (21), 3357– 3365. (38) Liem, S. Y.; Popelier, P. L. A. High-Rank Quantum Topological Electrostatic Potential: Molecular Dynamics Simulation of Liquid Hydrogen Fluoride. J. Chem. Phys. 2003, 119 (8), 4560–4566. (39) Popelier, P. L. A.; Rafat, M. The Electrostatic Potential Generated by Topological Atoms: A Continuous Multipole Method Leading to Larger Convergence Regions. Chem. Phys. Lett. 2003, 376 (1), 148–153. (40) Rafat, M.; Popelier, P. L. A. The Electrostatic Potential Generated by Topological Atoms. II. Inverse Multipole Moments. J. Chem. Phys. 2005, 123 (20), 204103. (41) Yongna, Y.; L., M. M. J.; A., P. P. L. Multipolar Electrostatics for Proteins: Atom–Atom Electrostatic Energies in Crambin. J. Comput. Chem. 2013, 35 (5), 343–359. (42) Michel, R.; A., P. P. L. Long Range Behavior of HighRank Topological Multipole Moments. J. Comput. Chem. 2007, 28 (4), 832–838. (43) Fletcher, T. L.; Popelier, P. L. A. Polarizable Multipolar Electrostatics for Cholesterol. Chem. Phys. Lett. 2016, 659, 10–15. (44) Thacker, J. C. R.; Wilson, A. L.; Hughes, Z. E.; Burn, M. J.; Maxwell, P. I.; Popelier, P. L. A. Towards the Simulation of Biomolecules: Optimisation of Peptide-Capped Glycine Using FFLUX. Mol. Simul. 2018, 44 (11), 881–890. (45) Jakobsen, S.; Jensen, F. Systematic Improvement of Potential-Derived Atomic Multipoles and Redundancy of the Electrostatic Parameter Space. J. Chem. Theory Comput. 2014, 10 (12), 5493–5504. (46) Silvia, C.; Alessandro, E.; Jacopo, B.; Roberto, O. Electron Density Analysis of Large (Molecular and Periodic) Systems: A Parallel Implementation. J. Comput. Chem. 2015, 36 (26), 1940–1946. (47) Kuster, D. J.; Liu, C.; Fang, Z.; Ponder, J. W.;

ACS Paragon Plus Environment

28

Page 29 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Marshall, G. R. High-Resolution Crystal Structures of Protein Helices Reconciled with Three-Centered Hydrogen Bonds and Multipole Electrostatics. PLoS One 2015, 10 (4), e0123146. (48) Lošdorfer Božič, A.; Podgornik, R. PH Dependence of Charge Multipole Moments in Proteins. Biophys. J. 2017, 113 (7), 1454–1465. (49) Mills, M. J. L.; Popelier, P. L. A. Electrostatic Forces: Formulas for the First Derivatives of a Polarizable, Anisotropic Electrostatic Potential Energy Function Based on Machine Learning. J. Chem. Theory Comput. 2014, 10 (9), 3840–3856. (50) Zielinski, F.; Maxwell, P. I.; Fletcher, T. L.; Davie, S. J.; Di Pasquale, N.; Cardamone, S.; Mills, M. J. L.; Popelier, P. L. A. Geometry Optimization with Machine Trained Topological Atoms. Sci. Rep. 2017, 7 (1), 12817. (51) Popelier, P. L. A.; Brémond, É. A. G. Geometrically Faithful Homeomorphisms between the Electron Density and the Bare Nuclear Potential. Int. J. Quantum Chem. 2009, 109 (11), 2542–2553. (52) Popelier, P. L. A.; Aicken, F. M. Atomic Properties of Amino Acids: Computed Atom Types as a Guide for Future Force-Field Design. ChemPhysChem 2003, 4 (8), 824–829. (53) Popelier, P. L. A. Atoms in Molecules. An Introduction; Pearson Education: London, 2000. (54) Popelier, P. L. A.; Kosov, D. S. Atom–Atom Partitioning of Intramolecular and Intermolecular Coulomb Energy. J. Chem. Phys. 2001, 114 (15), 6539–6547. (55) Misquitta, A. J.; Stone, A. J.; Fazeli, F. Distributed Multipoles from a Robust Basis-Space Implementation of the Iterated Stockholder Atoms Procedure. J. Chem. Theory Comput. 2014, 10 (12), 5405–5418. (56) Roost, C.; Lynch, S. R.; Batista, P. J.; Qu, K.; Chang, H. Y.; Kool, E. T. Structure and Thermodynamics of N6-Methyladenosine in RNA: A Spring-Loaded Base Modification. J. Am. Chem. Soc. 2015, 137 (5), 2107–2115. (57) Aduri, R.; Psciuk, B. T.; Saro, P.; Taniga, H.; Schlegel, H. B.; SantaLucia, J. AMBER Force Field Parameters for the Naturally Occurring Modified Nucleosides in RNA. J. Chem. Theory Comput. 2007, 3 (4), 1464–1475. (58) Dennington, R.; Keith, T. A.; Millam, J. M. GaussView Version 5.0.8. 2016. (59) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara,

ACS Paragon Plus Environment

29

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 48

M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, J. A.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, J. M.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas; Foresman, J. B.; Ortiz, J. V; Cioslowski, J.; Fox, D. J. Gaussian 09, Revision E.01. Gaussian 09, Revision E.01, Gaussian, Inc., Wallingford CT. Wallingford CT 2015. (60) Mills, M. J. L.; Popelier, P. L. A. Intramolecular Polarisable Multipolar Electrostatics from the Machine Learning Method Kriging. Comput. Theor. Chem. 2011, 975 (1), 42–51. (61) Keith, T. AIMAll 15.09.27. TK Gristmill Software 2011. (62) Rafat, M. Towards a Force Field Based on Quantum Chemical Topology, University of Manchester, 2005. (63) Rafat, M.; Popelier, P. L. A. A Convergent Multipole Expansion for 1,3 and 1,4 Coulomb Interactions. J. Chem. Phys. 2006, 124 (14), 144102. (64) Hättig, C. Recurrence Relations for the Direct Calculation of Spherical Multipole Interaction Tensors and Coulomb-Type Interaction Energies. Chem. Phys. Lett. 1996, 260 (3), 341–351. (65) Mills, M. J. L.; Sale, K. L.; Simmons, B. A.; Popelier, P. L. A. Rhorix: An Interface between Quantum Chemical Topology and the 3D Graphics Program Blender. J. Comput. Chem. 2017, 38 (29), 2538–2552. (66) Mills, M. J. L. Rhorix. 2017. (67) Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual Molecular Dynamics. J. Mol. Graph. 1996, 14 (1), 33–38. (68) Stone, J. E. An Efficient Library for Parallel Ray Tracing and Animation, 1998. (69) Riccardi, L.; Nguyen, P. H.; Stock, G. Free-Energy Landscape of RNA Hairpins Constructed via Dihedral Angle Principal Component Analysis. J. Phys. Chem. B 2009, 113 (52), 16660–16668. (70) Yildirim, I.; Stern, H. A.; Tubbs, J. D.; Kennedy, S. D.; Turner, D. H. Benchmarking AMBER Force Fields for RNA: Comparisons to NMR Spectra for Single-Stranded r(GACC) Are Improved by Revised χ Torsions. J. Phys. Chem. B 2011, 115 (29), 9261–9270.

ACS Paragon Plus Environment

30

Page 31 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(71)

Yildirim, I.; Stern, H. A.; Kennedy, S. D.; Tubbs, J. D.; Turner, D. H. Reparameterization of RNA χ Torsion Parameters for the AMBER Force Field and Comparison to NMR Spectra for Cytidine and Uridine. J. Chem. Theory Comput. 2010, 6 (5), 1520–1531. (72) Aytenfisu, A. H.; Spasic, A.; Grossfield, A.; Stern, H. A.; Mathews, D. H. Revised RNA Dihedral Parameters for the Amber Force Field Improve RNA Molecular Dynamics. J. Chem. Theory Comput. 2017, 13 (2), 900–915.

ACS Paragon Plus Environment

31

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 48

Figures

Figure 1. Chain B of the molecular structure of 2MVY (as determined by NMR spectroscopy). There are 9 phosphate, 10 pentose and 10 base moieties in total (9 nucleotides, 320 atoms). The longest internuclear distance corresponds to a CO interaction at a distance of 33.2 Å, which is marked with a dashed line. Examples of each different moiety and side- chain are labelled; the phosphatepentose-guanine group (far left) comprises one of the 9 nucleotides of 2MVY.

ACS Paragon Plus Environment

32

Page 33 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 2. The interatomic surfaces (IAS) of selected atoms in the smallest molecular fragments cut from 2MVY: (a) phosphate, atom P; (b) pentose, atoms O and C; (c) base, atoms N and H. The bond critical points and (non-bonded) AILs are shown in green and the ring critical points are in pink. Cage critical points (in b) are shown in orange. Grey rectangles mark the capping groups added to each fragment.

ACS Paragon Plus Environment

33

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 48

Figure 3. Structures and names of different-sized molecular fragments and their common moieties. The red rectangle encloses the phosphate moiety - the common fragment of (a), (d), and (f). The green polygon encloses a sidechain (U) moiety - the common fragment among (c), (e), and (f). The blue polygon encloses the pentose moiety - the common fragment of (b), (d), (e), and (f).

ACS Paragon Plus Environment

34

Page 35 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 4. The electrostatic energy between six pairs of atoms in uracil (structure inset) calculated based on QCT-partitioned atomic multipole moments (at rank 𝐿). The dashed line represents the exact 6D energy of each interaction.

ACS Paragon Plus Environment

35

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 48

Figure 5. The relationship between the lowest rank (𝐿) at which a particular NN interaction meets the convergence energy error criterion and internuclear distance (𝑅NN, Å) for NN interactions in 2MVY. All interactions in the region labelled 𝐷 are divergent, while the region labelled 𝐶 contains convergent interactions with a rank 𝐿 always less than or equal to 9. 𝑅𝐶 is the shortest distance beyond which all interactions converge at some rank 𝐿 < 10. Shorterrange interactions require a more detailed electrostatic description.

ACS Paragon Plus Environment

36

Page 37 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 6. The energy difference (∆𝐸, kJ mol-1) between the reference energy calculated at rank 𝐿 = 10 and the values calculated at ranks 𝐿 < 1, 2, 3, 4 and 5 for OH interactions in 2MVY. The energy difference decreases with increasing rank 𝐿 as well as with increasing internuclear distance 𝑅OH (Å). Only converged OH interactions are shown, on the right of the vertical line at 2.6 Å (see Table 2).

ACS Paragon Plus Environment

37

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 48

Figure 7. The distribution of CH internuclear distances 𝑅 (Å) in 2MVY. The total number of interactions is 10450.

ACS Paragon Plus Environment

38

Page 39 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 8. Example molecular graphs of (a) phosphate-pentose, (b) pentose-base, and (c) nucleotide fragments as cut out from 2MVY. Green spheres represent bond critical points, ring critical points and ring lines are shown in pink and tan spheres are cage critical points. AILs between atoms normally considered "bonded" are depicted with sufficient thickness to eclipse their BCPs to enhance clarity.

ACS Paragon Plus Environment

39

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 48

Figure 9. Proportion of interactions (%) within a given energy deviation (kJ mol-1) occurring in the common fragments in Figure 3. (a) The common fragments between phosphate, pentose and phosphatepentose. (b) the common fragments between pentose, base and pentose-base. (c) the common fragments between phosphate, pentose, base and nucleotide. (d) the common fragments between phosphatepentose and nucleotide. (e) the common fragments between pentosebase and nucleotide.

ACS Paragon Plus Environment

40

Page 41 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 10. The largest sphere (𝑟 = 12.0 Å) centered at H1 (H2 and H3, which were also separately selected as center atoms, are labelled in the same figure). Atoms C1, C2 and O3 are selected as "probe atoms" for H1, H2 and H3, respectively. Their corresponding internuclear distances are 𝑅H1…C1 = 8.3 Å, 𝑅H2…C2 = 8.7 Å and 𝑅H3…O3 = 6.8 Å.

ACS Paragon Plus Environment

41

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 48

Figure 11. The electrostatic energy (kJ mol-1) between H at the center of the sphere and one possible probe atom versus the radius 𝑟 (Å) of the sphere: (a) between H of a hydroxyl in pentose and one C atom, HC, internuclear distance 𝑅 = 8.3 Å; (b) between H connected to a carbon in pentose and one C atom, HC, 𝑅 = 8.7 Å; (c) between H connected to one nitrogen in a base and one O atom, HO, 𝑅 = 6.8 Å.

ACS Paragon Plus Environment

42

Page 43 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Tables Table 1. Comparison of exact (6D) and multipolar energy (kJ mol-1) for thirty-seven representative short-range interactions in uracil (for diagram see Figure 4). Atom-atom interactions H11H12 H10H1 H10H1 H9H12 H9H11 C8H10 C8H9 C7H10 C7H9 N6H12 N6H11 N6C8 C5H12 N4H12 N4H11 N4H10 N4H9 N4C8 O3H12 O3H11 O3H10 O3H9 O3C8 O3C7 O3N6 O3C5 C2H12 C2H11 C2H10 C2H9 C2C7 C2N6 N1H11 N1H10

1,𝑛[a]

𝐸(6D)[b]

𝐸 (L = 10)[c]

∆𝐸[d]

1,4 1,6 1,4 1,6 1,5 1,5 1,5 1,4 1,4 1,5 1,4 1,4 1,4 1,5 1,4 1,4 1,4 1,4 1,5 1,6 1,6 1,6 1,4 1,5 1,5 1,4 1,4 1,5 1,5 1,5 1,4 1,4 1,4 1,6

0.3 5.2 -6.2 5.1 -3.5 93.4 97.5 28.4 27.4 -18.6 11.9 -342.3 23.6 -24.3 3.6 -301.1 -409.6 -521.9 -18.3 4.5 -172.1 -215.5 -442.3 -70.2 664.4 -837.0 43.6 -6.0 350.0 429.3 182.5 -1384.4 3.6 -234.0

0.3 5.2 -6.2 5.1 -3.5 93.3 97.4 28.5 27.5 -18.6 11.8 -342.4 23.6 -24.3 3.6 -301.1 -409.6 -520.7 -18.3 4.5 -172.1 -215.5 -442.3 -69.9 664.4 -836.7 43.6 -6.0 350.0 429.2 183.2 -1384.1 3.7 -234.0

0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.0 1.2 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.3 0.0 0.0 0.0 0.1 0.7 0.3 0.1 0.0

ACS Paragon Plus Environment

43

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 48

N1H9 1,6 -256.7 -256.8 0.1 N1N6 1,5 871.8 871.9 0.1 N1C5 1,4 -1135.5 -1135.0 0.5 [a] “1” and “𝑛” refer to the beginning and end of a sequence of atoms, respectively. [b] The exact atom-atom interaction energy calculated by AIMAll (6D, eq 1). [c] Energy calculated by NYX using multipole moments of QCT atoms at interaction rank 𝐿 = 10 (eq 2). [d] Energy difference between exact (6D) and multipolar energy (kJ mol-1). Table 2. The number (𝑁) of atom-atom interactions of each the 10 possible non-phosphorus-containing element-wise interaction types occurring in 2MVY and the corresponding minimum internuclear distances (𝑅𝐶, Å) delineating the internuclear distance regions in which their interactions converge or diverge. Types are sorted by 𝑅𝐶 in ascending order. Type

𝑁 [a]

𝑅𝐶

HH

39174

2.2

NH

13347

2.4

OH

24010

2.5

CH

39791

2.7

OO

3599

2.8

NN

1087

3.0

ON

4103

3.0

OC

12135

3.2

NC

6706

3.3

CC

9915

3.8

[a] 𝑁 represents the number of atom-atom interactions.

Table 3. Statistical data on all 15 possible atom-atom interaction types. Type

𝑁 [a]

Divergent[b]

Convergent[c]

Average[d]

Percentage[e]

HH NH OH CH

5995 4180 7480 10450

39 50 129 273

5956 4130 7351 10177

15.1 13.9 15.1 14.4

0.7% 1.2% 1.7% 2.6%

ACS Paragon Plus Environment

44

Page 45 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

HP 990 0 990 15.2 0.0% OC 6460 246 6214 14.6 3.8% OO 2278 67 2211 15.3 2.9% ON 2584 26 2558 14.3 1.0% OP 612 36 576 15.0 5.9% CC 4465 312 4153 14.0 7.0% CN 3610 173 3437 13.2 4.8% CP 855 18 837 14.8 2.1% NN 703 31 672 12.2 4.4% NP 342 0 342 14.7 0.0% PP 36 0 36 16.2 0.0% [a] 𝑁 represents the total number of each atom-atom interaction type in 2MVY. [b] denotes the number of interactions in the divergent region 𝑅𝐷. [c] is the number of interactions in the convergent region 𝑅𝐶. [d] means the average internuclear distance (Å) for each interaction type. [e] refers to the number of interactions in the 𝑅𝐷 region over the total number of interactions 𝑁. Table 4. Predictions made for each of the 10 possible interaction types in 1ELH based on the data obtained from 2MVY. Type HH NH OH CH OO NN ON OC NC CC

𝑁[a] 8481 5577 10985 14470 3495 867 3620 9332 4661 5980

𝑁fail[b] 0 3 2 2 0 0 1 0 1 0

[a] 𝑁 is the number of total atom-atom convergent interactions occur in the 𝑅𝐶 region in 1ELH. [b] 𝑁fail is the number of interactions that failed in the prediction (the interactions for which the absolute energy error is greater than 0.1 kJ mol-1 according to eq 4).

Table 5. The minimum internuclear distance (𝑅𝐶, Å) calculated at B3LYP/aug-cc-pVTZ and MP2/aug-cc-pVTZ levels of theory. 𝑁 is the number of atom-atom interactions of a given type.

ACS Paragon Plus Environment

45

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2395

B3LYP/aug-ccpVTZ 2.3

MP2/aug-ccpVTZ 2.2

NH

709

2.8

2.8

OH

1446

2.7

2.6

CH

2626

2.8

2.7

OO

171

3.1

3.1

NN

99

3.1

3.1

ON

88

3.1

3.1

OC

629

3.3

3.2

NC

489

3.8

3.8

CC

649

3.8

3.8

type

𝑁

HH

Page 46 of 48

Table 6. The minimum internuclear distance (𝑅𝐶, Å) calculated for different sized (middle and large) molecular fragments. All calculations are performed at MP2/aug-cc-pVTZ level. Type

Medium[b]

Large[c]

HH

2.3

2.2

OH

2.6

2.6

CH

2.7

2.6

NH

2.8

2.8

OO

3.1

2.9

NN

3.3

3.3

ON

3.3

3.3

OC

3.3

3.3

NC

3.8

3.6

CC

3.8

3.8

[a] Phosphate, pentose and base fragments cut out from 2MVY. [b] Phosphate-pentose and pentose-base fragments cut out from 2MVY. [c] Nucleotide fragments cut out from 2MVY.

ACS Paragon Plus Environment

46

Page 47 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 7. Maximum and mean deviations along with the proportion of interactions within an absolute energy deviation of 1.0 and 4.0 kJ mol-1. Figures[a] Proportion(%)[b] Proportion(%)[c]

Max (kJ mol-1)

Mean(kJ mol-1)

[d]

[e]

a 58 85 15.7 1.9 c 21 50 47.0 6.5 d 12 62 35.7 5.0 b 25 60 54.6 7.1 e 61 94 10.5 1.3 [a] The corresponding label of Figure 9 [b] proportion of interactions within an absolute energy deviation of 1.0 kJ mol-1 [c] proportion of interactions within an absolute energy deviation of 4.0 kJ mol-1 [d] the highest absolute energy deviation [e] the mean absolute energy deviation. Table 8. The minimum radius values (Å) for H at which the influence on its atomic multipole moments can be ignored when new atoms enter the sphere. Atom

𝑁[a]

H1

161

H2

141

H3

224

Location[b] on the hydroxyl in pentose connected to a carbon atom in pentose connected to a carbon atom in Base (A)

𝑟

∆𝐸[c] (kJ mol-1)

2.9

0.4

7.4

0.4

8.9

0.4

[a] 𝑁 is the total number of atoms in the largest sphere. [b] The location of the H atom in 2MVY (see Figure 10 and the details in the main text). [c] ∆𝐸 is the

acceptable energy discrepancy from the reference sphere energy.

ACS Paragon Plus Environment

47

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 48

For Table of Contents Use Only Assessing force field accuracy via a multipolar description of atomic electrostatic interactions in RNA Yongna Yuan,* Zhuangzhuang Zhang, Matthew J. L. Mills, Rongjing Hu*, and Ruisheng Zhang 1.375 inches high and 3.5 inches wide

ACS Paragon Plus Environment

48