Investigation of Nascent Base Pair and Polymerase Behavior in the

Dec 27, 2017 - Optimizing DNA polymerases for a broad range of tasks requires an understanding of the factors influencing polymerase fidelity, but man...
1 downloads 12 Views 1MB Size
Subscriber access provided by UNIV OF TASMANIA

Article

Investigation of Nascent Base Pair and Polymerase Behavior in the Presence of Mismatches in DNA Polymerase I Using Molecular Dynamics Andrew Vincent Yeager, Kathryn Humphries, Ellen Farmer, Gene Cline, and Billy Miller J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00516 • Publication Date (Web): 27 Dec 2017 Downloaded from http://pubs.acs.org on December 28, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Investigation of Nascent Base Pair and Polymerase Behavior in the Presence of Mismatches in DNA Polymerase I Using Molecular Dynamics Andrew Yeager, Kathryn Humphries, Ellen Farmer, Gene Cline, Bill R. Miller III* Department of Chemistry, Truman State University, 100 E. Normal Ave. Kirksville, MO 63501 Corresponding author e-mail: [email protected] Abstract Optimizing DNA polymerases for a broad range of tasks requires an understanding of the factors influencing polymerase fidelity, but many details of polymerase behavior remain unknown, especially in the presence of mismatched nascent base pairs. Using molecular dynamics, the large fragment of Bacillus stearothermophilus DNA polymerase I is simulated in the presence of all sixteen possible standard nucleoside triphosphate-template (dNTP-dN) pairs, including four Watson-Crick pairs and twelve mismatches. The pre-catalytic steps of nucleotide addition from nucleotide insertion to immediately preceding catalysis are explored using three starting structures representing different stages of nucleotide addition. From these simulations, interactions between dNTPs and the DNA-protein complex formed by the polymerase are elucidated. Patterns of large-scale conformational shifts, classification of nucleotide pairs based on composition, and investigation of the roles of residues interacting with dNTPs are completed on 50+ µs of simulation. The role of molecular dynamics in studies of polymerase behavior is discussed.

1 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction DNA polymerases play a central role in the continued survival of organisms by replicating, maintaining, and repairing the genome. Many organisms produce a number of polymerases to perform these tasks.1-8 Natural systems often retain a balance between highfidelity behaviors that promote sequence preservation in individuals and low-fidelity behaviors that maintain variation in populations. Such a balance may be unnecessary in laboratories, however, and researchers may have a preference for high- or low-fidelity behaviors. In applications such as polymerase chain reaction (PCR), high-fidelity polymerases capable of preserving the original DNA sequence are often preferred, while low-fidelity polymerases that frequently introduce mutations may be preferred when attempting to maximize variation in populations. Specially designed polymerases with tailored fidelity represent an opportunity to improve experimental results through the precise control of polymerase behavior, but a deeper understanding of the mechanisms behind pre-catalytic nucleotide discrimination is necessary to design polymerases in a targeted manner. DNA polymerase I is often used in studies of DNA polymerase behavior and activity,1-17 Capable of 3’→5’ exonuclease, 5’→3’ exonuclease, and polymerase activity, DNA polymerase I is commonly crystallized with an N-terminal deletion of the 5’→3’ exonuclease domain, as the remaining structure still allows for polymerase activity.12,14,16-17 In particular, the DNA polymerase I large fragment of Bacillus stearothermophilus (BF) has been frequently crystallized.12,14-16 Analogous to the Klenow fragment in Escherichia coli, BF resembles a human hand with thumb, palm, and fingers subdomains, as shown in Figure 1A. The polymerase active site is contained within the palm, while the DNA strand is held in place by the thumb. The highly mobile fingers are associated with the binding and chemical addition of dNTPs, and may

2 ACS Paragon Plus Environment

Page 2 of 36

Page 3 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

be used to distinguish between conformations of the enzyme. The ternary, pre-catalytic form of BF is associated with three conformations suggested by multiple studies: the “open” conformation allows new dNTPs to bind,16 catalytic addition of nucleotides occurs in the “closed” conformation,12-13,17 and the intermediate “ajar” conformation represents a proposed fidelity checkpoint between initial binding of dNTPs and catalysis.13-14 A single α-helix, referred to as the O-helix, located within the fingers subdomain and the active site, may be used to differentiate between these conformations (Fig. 1B).16-17

Figure 1. 3D structures of the large fragment of B. stearothermophilus DNA polymerase I (BF). A) Polymerase subdomains include the palm (purple), fingers (green), and thumb (blue). The O-helix (yellow) is an active site region of the fingers subdomain. B) Comparison of three conformations of BF using the position of the O-helix, including closed (blue, 1LV5), ajar (yellow, 3HPO) and open (red, 4YFU).

Solution fluorescence studies used to characterize these conformations4,13,18-19 demonstrate that the active site closes partially after a dNTP and associated magnesium(II) ion bind.4,20 Further conformational change is restricted by the identity of the nascent base pair: Watson-Crick (WC) pairs promote continued closing of the active site, while mismatches hamper further closing.13-14,21 A second Mg2+ is required for catalysis,4 though the binding order of metals and dNTP is unknown. Mg2+ ions help stabilize the negative charge concentrated in the 3 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

active site during nucleotide addition and are located near the dNTP. One Mg2+ occupies “metal site A” and coordinates with the alpha-phosphate (Pα) of the dNTP and the hanging 3’-OH, while the Mg2+ in “metal site B” coordinates with the Pβ and Pγ of the dNTP and two nearby acidic amino acids on the enzyme (Asp 653 and Asp 830 in BF) (Fig. 2).22-23

Figure 2. Active site of B. stearothermophilus DNA polymerase I showing two magnesium(II) ions occupying distinct sites coordinated with the nucleoside triphosphate, growing primer strand, and enzyme.

Although the transition from open to closed is accepted as an essential part of dNTP integration,13,16 the mechanism behind mismatch recognition has yet to be fully elucidated. A number of amino acids identified previously in conservation and mutation studies likely play a role in nucleotide discrimination,1-8,10-11 but mutation studies often explore single position variants1-3,5,8,10 and may be unable to predict the effects of combining mutations. Studies focusing on mutants with modifications at multiple positions4,6 better detect combined effects, but still rely on conventional kinetic data and mutation rates to classify variants. 4 ACS Paragon Plus Environment

Page 4 of 36

Page 5 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Integrating structural information, often from static crystal structures, can provide contextual information to help clarify changes in dNTP-polymerase interactions, but such studies often underestimate the dynamic behavior of the polymerase.3-6 Interpretation is made more difficult by the variable effects of mutations, which can modify fidelity for specific mismatches yet leave others unaffected.1,3,7-8,10-11 Such differential impacts imply that mismatch recognition is a highly complex process, and an approach which captures the dynamic behavior of the polymerase at high resolution may be necessary to further determine key aspects of nucleotide discrimination. Experiment has provided an abundance of kinetic data detailing nucleotide addition and conformational transitions, but such data provide little information on the atomistic details facilitating polymerase activity. Molecular dynamics (MD) simulations offer atomic-level resolution of dynamic enzymatic behavior using known three-dimensional structures and the assumptions of physical and chemical laws. BF has been simulated previously using MD.16-17 One study employed unrestrained MD to simulate the active site closing around a dCTP-dG base pair and revealed a number of details of pre-catalytic nucleotide addition, but focused on a limited set of possible WC base pairs.16 In this study, MD is used to simulate all sixteen possible standard dNTP-dN base pairs in the active site of wild-type (WT) BF. As WT BF preferentially adds WC pairs over mismatches, a full transition to the closed conformation is unlikely in the presence of a mismatch. Three µsscale simulations for each pair corresponding to the open,16 closed,12 and ajar14 initial conformations were completed, allowing for a thorough investigation of pre-catalytic polymerase behavior. Additional simulations of an E658A BF mutant were used to clarify the role of E658 in preferential addition of WC pairs over other purine-pyrimidine pairs. By comparing polymerase behavior in the presence of all possible nascent base pairs, the mechanistic roles of a number of

5 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

amino acids are explored using a variety of analytical approaches including measures of conformational transitions, principal component analysis (PCA), and binding free energy calculations. The potential role of MD in elucidating the mechanistic details of mismatch recognition needed to dependably engineer polymerases is discussed. Methods System Preparation and Simulation Three X-ray crystal structures were obtained from the Protein Data Bank corresponding to the open (4YFU),16 ajar (3HPO),14 and closed (1LV5)12 conformations of BF. Structures were prepared by reverting mutations to wild-type, adding a 3’-OH overhang to dideoxynucleotides, removing excess ligands, and adding magnesium(II) ions where necessary, all in silico, as previously described.16-17 Parameters for magnesium ions were obtained from Allner et al.24 Using standard DNA nucleotides, sixteen possible pairs may be formed, including four Watson-Crick (WC) base pairs and twelve mismatches. When considering the active site during the pre-catalytic steps of nucleotide addition, all sixteen permutations should be considered distinct, as the nascent base pair is formed from a nucleoside triphosphate (dNTP) and a template base (dN) with differing chemical properties. Using these sixteen permutations and the three prepared structures from above, 48 separate simulations were prepared through in silico mutation of the nascent base pair. Na+ ions were introduced to neutralize extraneous charge and systems were solvated with explicit TIP3P water molecules25 and periodic boundary conditions in a truncated octahedron using tleap.26 An additional simulation of the dGTP-dG pair was completed for which the dGTP base was flipped from the anti to the syn orientation in the closed conformation, bringing the total number of wild-type simulations to 49.

6 ACS Paragon Plus Environment

Page 6 of 36

Page 7 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

All trajectories were minimized, heated, equilibrated, and simulated using the Amber ff12SB force field27 and Amber 14 software package26 as previously described by Miller et al.17 Using the GPU-accelerated version of Amber 14, a minimum of 1.0 µs unrestrained MD was completed for each of the 49 systems, with exact durations indicated in Table 1, for a total of 50.18 µs. Table 1. Total simulation times for individual WT BF simulations in the presence of all 16 possible dNTP-dN pairs started from three conformations. The time in parentheses represents a closed syn-dGTP-dG simulation completed in addition to the closed anti-dGTP-dG simulation.

Nucleotide Template Adenine Adenine Adenine Adenine Cytosine Cytosine Cytosine Cytosine Guanine Guanine Guanine Guanine Thymine Thymine Thymine Thymine

dNTP Adenine Cytosine Guanine Thymine Adenine Cytosine Guanine Thymine Adenine Cytosine Guanine Thymine Adenine Cytosine Guanine Thymine Total

Closed 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.16 (1.06) 1.05 1.09 1.07 1.02 1.07 17.52

Simulation Length (µs) Ajar 1.00 1.00 1.00 1.00 1.04 1.01 1.01 1.00 1.00 1.02 1.08 1.08 1.14 1.00 1.00 1.00 16.38

Open 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.01 1.00 1.00 1.09 1.08 1.08 1.02 16.28

A set of mutant systems was also prepared with the E658A mutation in the closed conformation. Only 8 pairs were used for these simulations, including 4 WC base pairs and 4 purine-pyrimidine mismatches. System preparation and simulation were identical to the wildtype systems, but simulation length was restricted to 100 ns. Each mutant system was simulated three times, resulting in a total of 2.4 µs of simulation for the mutant systems.

7 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analysis General analyses and visualization were completed using cpptraj28 and Visual Molecular Dynamics (VMD),29 respectively. Protein backbone RMSD values were used to monitor the magnitude of deviations over the course of simulations and compare among simulations. Principal component analysis (PCA) was performed on individual and selected groups of simulations. All analyses used protein backbone Cα,, C, and N atoms, while analyses of individual systems also incorporated phosphorous atoms and the nitrogen atoms most distant from the DNA backbone in nucleotides. Analyses completed on four or fewer combined simulations involved 10 frames for each nanosecond of simulation, while larger combinations employed 2 frames for each nanosecond. Per-residue root mean square fluctuations (RMSf) were calculated from the first and last frames of pseudo-trajectories corresponding to the first five principal components for each system or set of systems analyzed. The conformation of each system over time was determined using distance between the Cα atoms of R629 and P699, as previously described,16-17 by comparing R629-P699 distances from simulations to the values obtained from the starting crystal structures (closed: 9.04 Ǻ; ajar: 14.84 Ǻ; open: 17.49 Ǻ).12,14,16 The catalytic distance between the Pα of the dNTP and hanging 3’-OH was used as a secondary measure of conformational state, utilizing previously established cutoffs in nucleophilic attack distance.16 Distances less than 4.8 Å were treated as closed, while distances greater than 7.0 Å were treated as open. Intermediate values were treated as ajar. A transition between conformations was considered successful if both R629-P699 and nucleophilic attack distances indicated simulations spent more than 5% of their total time in a conformation other than their starting conformation.

8 ACS Paragon Plus Environment

Page 8 of 36

Page 9 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Free energy analysis was performed using the Molecular Mechanics-Generalized Born Surface Area (MM-GBSA) theory for all systems using 10 frames for each nanosecond of simulation under the assumptions of the Generalized Born model described by Onufriev, Bashford, and Case30 and using the MPI installation of MMPBSA.py.31 Solvent and nonmagnesium ions were stripped before calculations were completed. The ligand was defined as the dNTP, while the receptor was defined as both magnesium(II) ions and the protein-DNA complex. MM-GBSA provided single residue decomposition energies, per frame free energies of binding, and average free energy of binding for each simulation. Nucleotide interactions were examined quantitatively through hydrogen bond analysis and per-residue decomposition energies. Hydrogen bond analysis used default cutoffs and did not consider interactions with the solvent. Nucleotides remaining perfectly paired for the duration of a simulation were expected to have an average number of hydrogen bonds approaching the total number of possible hydrogen bonds, while limited pairing was determined by highly reduced or zero average hydrogen bonds. Nascent base pair interactions were additionally compared to previously crystallized post-catalytic mismatches.15 The identity of the template base (N) necessarily varied with nascent base pair, but bases in other positions, such as the last-catalyzed pair (primer: M-1; template: N-1) varied instead by starting conformation, based on the crystalized nucleotide sequence. Average interactions across starting conformations between dNTPs and non-nascent nucleotides are considered to remove the impact of nucleotide identity, focusing instead on position.

Results

9 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Wild-type and mutant BF simulations totaling 52.58 µs modeling sixteen dNTP-dN pairs during the pre-catalytic steps of nucleotide addition were completed. Protein backbone RMSD values across simulations remained below 5.5 Å in all cases (Supporting information (SI): Figures S1-S2). WT simulations generally had greater RMSD values as they progressed, consistent with the µs timescale used, while mutant simulations stayed below 3.5 Å during their 100 ns duration. Few relevant trends in RMSD values were identified, although systems started in the closed conformation often had reduced RMSD values relative to systems started in the other two conformations. Conformational Change Conformational change was highly dependent on both starting conformation and nascent base pair identity. PCA identified a range of residues useful in distinguishing between simulations with different starting conformations. Combined analysis of all WT systems revealed a core range of 19 residues (M687-A705) with RMSf values greater than 2.5 Ǻ in the projected pseudo-trajectory corresponding to the first principal component (SI: Fig. S3). A similar peak was identified in analysis of systems sharing a template base (e.g. all dNTP-dA simulations), and when comparing different starting conformations of the same nascent base pair (e.g. all dATPdA simulations), suggesting that residues in that range may be used to distinguish conformations. Previously, the distance between R629 and P699 has been used to characterize conformations,16-17 and use of this distance was supported by PCA. P699, a member of the above-identified range and the first member of the O-helix, had higher RMSf values on average (4.23 ± 0.94 Ǻ) compared to R629 (0.71 ± 0.34 Ǻ) in pseudo-trajectories of the first principal component. While the motions of R629 had reduced correlation to conformation relative to

10 ACS Paragon Plus Environment

Page 10 of 36

Page 11 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

P699, the consistently elevated RMSf values of P699 indicate that the R629-P699 distance represents a reasonable estimate of conformation. Nucleophilic attack distance between the hanging 3’-OH and Pα of the dNTP has also been used previously the characterize conformations.16 Transitions between conformations were detected by combining R629-P699 and nucleophilic attack distance. 13 simulations underwent conformational transitions, including 11 transitions from the ajar to open conformation. The remaining 5 ajar simulations (dTTP-dA, dATP-dC, dATP-dG, dCTP-dG, dGTP-dT) retained their conformation for the duration of the simulation. Only two closed simulations transitioned to a more open conformation: the dCTP-dT system transitioned to an ajar conformation, and the dATP-dC system fully opened. The presence or absence of conformational change was reflected by average binding affinities between dNTPs and the remaining protein-DNA-Mg2+ complex (Table 2). The five ajar conformation systems that remained in the ajar conformation had increased binding affinities relative to other ajar simulations. Adjusting binding affinities using the ajar conformation to normalize across base pairs revealed that the two closed simulations that transitioned to more open conformations had the poorest binding affinities of closed simulations.

11 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 2. ∆Gbind and ∆∆Gbind values derived from MM-GBSA for each dNTP-dN simulation. Values are relative and do not represent absolute ∆G values as they were calculated using the assumptions of a Generalized Born model. ∆∆Gbind values were obtained by normalizing closed and open simulations to the value reported for the ‘intermediate’ ajar simulation. Base pairs for which ajar simulations transitioned to open are bolded, while closed simulations that transitioned to more open conformations are italicized.

Base Pair dNTP dN A A C A G A T A A C C C G C T C A G C G G G T G A T C T G T T T Mean Std. Dev.

∆Gbind (kcal/mol) Closed Ajar Open -143.31 -47.89 -72.28 -143.46 -88.48 -79.51 -165.99 -51.51 -66.50 -157.34 -105.59 -55.35 -21.55 -109.39 -17.25 -146.38 -57.66 -65.78 -145.28 -83.14 -72.82 -139.59 -89.26 -70.66 -149.42 -101.86 -63.74 -156.60 -115.12 -76.28 -149.48 -75.15 -74.08 -136.35 -51.68 -102.96 -157.66 -86.15 -89.24 -129.83 -99.71 -72.31 -144.12 -99.92 -74.44 -122.39 -80.57 -63.20 -138.05 -83.94 -69.77 32.91 21.82 17.80

∆∆Gbind (kcal/mol) Closed Ajar Open -54.83 0.00 -24.39 -91.956 0.00 8.97 -60.40 0.00 -14.99 -47.95 0.00 50.24 36.11 0.00 92.14 -63.24 0.00 -8.11 -56.03 0.00 10.32 -37.73 0.00 18.60 -34.30 0.00 38.12 -81.45 0.00 38.84 -97.80 0.00 1.07 -50.21 0.00 -51.28 -57.95 0.00 -3.09 -29.91 0.00 27.40 -63.55 0.00 25.49 -122.39 0.00 17.37 -51.85 0.00 14.17 30.47 0.00 33.19

The average ∆Gbind values for the dATP-dC pair suggested that conformations sampled in simulation were more important than initial conformation in determining dNTP binding affinity. Of all dATP-dC WT simulations, the ajar conformation promoted the highest binding affinity, while the closed simulation – which fully opened – and the open simulation had very similar binding affinities. Mean average binding affinities across nascent base pairs indicated that binding affinity and conformation were closely correlated: closed conformations were associated with higher binding affinities than more open conformations. The standard deviations of these values suggested that closed simulations have the widest variation in reported average binding affinity, but without the influence of the “closed” dATP-dC simulation, the mean average binding affinity and its standard deviation in the closed conformation drop, signaling that the 12 ACS Paragon Plus Environment

Page 12 of 36

Page 13 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

closed conformation had reduced variation in average binding affinity for nascent base pairs. The ajar conformation had the greatest variability. Energetic trends linked to dNTP binding affinity during the transition from open to closed conformations were reconstructed in the presence of each nascent pair. Per frame binding energies were binned based on Arg629-Pro699 distances, then adjusted based on the value of the bin containing the ajar distance (or nearest bin), setting the ajar bin to a value of 0 kcal/mol and modifying the other bins accordingly. Comparison across simulations allowed an average energy path to be constructed across conformations for each base pair (SI: Fig. S4). The quality of the reconstructed path depended largely on whether a simulation sampled the ajar conformation, with improved agreement between simulations when the ajar conformation was sampled suggesting that better sampling along the conformational space between open and closed could provide improved reconstructed patterns in binding affinity. Conformations with reduced space in the active site compared to the closed conformation experienced reduced binding affinity relative to the closed conformation, suggesting the presence of an optimal range of structures surrounding the closed conformation that maximize binding affinity for nascent base pairs. Nucleotide Interactions Interactions between dNTPs, template bases, and surrounding nucleotides varied with nascent base pair identity. Enzyme-influenced pair formation, characterized by the spatial orientations and interactions of paired nucleotides, can be represented in the closed conformation, when nucleotide interactions are constrained by the surrounding amino acids, metal ions, and DNA strands in the active site. Representative spatial orientations of base pairs (Fig. 3) revealed patterns in behavior linked to base pair composition.

13 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Prominent nucleotide pair conformations for all 16 dNTP-dN pairs as predicted by simulations beginning in the closed conformation. Magnesium(II) ions (pink) are shown as spheres, while phosphorous (gold), oxygen (red), carbon (teal), nitrogen (blue) and polar hydrogen (white) atoms are shown as sticks. van der Waals radii of dNTPs (green) and template nucleotides (purple) are shown as surfaces. The subtitle indicates the general category of each pair based on behavior in simulation. Details for each category are provided in the text. WC pairs are included for comparison.

Base pair formation in simulations suggested four sub-categories exist among mismatches. Treating WC pairs as a fifth distinct sub-category, the remaining 12 pairs may be classified as crowded (CP), stretched (SP), disordered (DP), or wobble pairs (WP). CPs, including all purine-purine (Pu-Pu) pairs, were more sterically restricted in the active site than other mismatches. Finer distinctions may be made based on the orientation of the nucleotide relative to the sugar, such that dATP-dA, dATP-dG, and dGTP-dA may be considered anti-anti CPs and dGTP-dG a syn-anti CP. Hydrogen bonding between guanines was greatly improved in the syn-anti orientation compared to the anti-anti orientation. SPs, conversely, consisted of all pyrimidine-pyrimidine (Py-Py) pairs and required nucleotides to move further into the active site for pair formation. The remaining four mismatches were distinguished using base pair stability.

14 ACS Paragon Plus Environment

Page 14 of 36

Page 15 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

DPs (dATP-dC and dCTP-dA ) failed to form stable associations for longer than a few nanoseconds, while WPs (dGTP-dT and dTTP-dG) were capable of pairing stably for longer periods. Quantitative characterization of nascent base pair formation stemmed from hydrogen bonding analysis and per-residue decomposition energies. The average number of hydrogen bonds between nucleotides provided an estimate of pair stability using Figure 4 above to determine the expected possible number of hydrogen bonds. Pair strength was inferred from perresidue decomposition energies (SI: Table S1), which estimate the energetic impact of specific residues on the binding affinity of the dNTP. Stability and strength correlated well with categorization. WC pairs experienced the strongest and most stable interactions, followed by WPs and CPs. DPs and SPs had the weakest interactions, and often failed to maintain hydrogen bonds even in the constrained closed conformation. Interactions between dNTPs and other surrounding nucleotides were similarly characterized using decomposition energies (SI: Table S2) and hydrogen bond analysis. Of all non-nascent nucleotides, only the adjacent base pair noticeably contributed to dNTP stability, and the primer strand base (M-1) contributed to dNTP stability more than the corresponding template strand base (N-1) across conformations, regardless of dNTP-dN pair. Comparing average numbers of hydrogen bonds and average decomposition energies revealed that hydrogen bonding was less important in these interactions than in interactions with the template base (N), as the ratio of decomposition energies to average number of hydrogen bonds was elevated for the M-1 and N-1 positions relative to the N position. Interactions between dNTPs and Mg2+ ions were consistent with expectations derived from the roles of these magnesium ions. The stabilizing Mg2+ occupying metal site B (MSB) had

15 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the strongest impact on dNTP binding, though the magnitude varied widely between dNTP-dN pairs, and was ultimately most beneficial for the dCTP-dG pair and most problematic for the dATP-dC pair. The catalytic Mg2+ in MSA had a destabilizing effect in many simulations, though the effect of MSA was generally smaller than that of MSB. Analysis of Select Amino Acids Amino acids putatively linked to fidelity were identified by analyzing MM-GBSA decomposition results. Five amino acids with average decomposition energy values of magnitude 0.6 kcal/mol or greater were identified using heightened variation (e.g. standard deviations across simulations) in decomposition energies as a criterion (Fig. 4).

Figure 4. Amino acids identified through analysis of decomposition energies as putatively linked to nucleotide discrimination. Details concerning the behavior and roles of these amino acids in simulation are discussed in the text.

Two amino acids, Y654 (SI: Fig. S5) and K876 (SI: Fig. S6) displayed conformation specific behavior. Y654 had a destabilizing influence on dNTPs in the closed conformation, but often had a stabilizing influence in the ajar conformation. Only dNTPs from two simulations started in the 16 ACS Paragon Plus Environment

Page 16 of 36

Page 17 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ajar conformation, dTTP-dA and dCTP-dG, were destabilized byY654, and both spent more than 85% of their simulations sampling the conformational space between the closed and ajar conformations, rather than opening, which was expected for WC pairs. MM-GBSA calculations treating the dNTP and magnesium ions as a combined ligand revealed that Y654, although a strong contributor to magnesium stability in the closed conformation, contributed considerably less to magnesium binding in the ajar and open conformations. This pattern was reflected in MM-GBSA calculations treating only the dNTP as the ligand, where Y654 acted to destabilize the dNTP in the closed conformation, but had reduced impact in the ajar and open conformations. K876 had a strong stabilizing effect in the ajar conformation but little effect in the closed or open conformations, consistent with the finding that dNTPs spent about ¼ of their time, on average, hydrogen bonding with K876 in simulations started in the ajar conformation, but generally failed to hydrogen bond in the closed or open conformations. Two aromatic residues located on the O-helix, Y714 (SI: Fig. S7) and F710 (SI: Fig. S8), had very different patterns in decomposition energies despite the similarity of their side chains and positions. Y714 often had the strongest stabilizing effect in the open conformation. On average, dNTPs spent less than 7% of their time hydrogen bonding with Y714 in the closed conformation, but more than 25% in the open conformation. F710, conversely, ubiquitously stabilized the dNTP, but the magnitude of this stabilizing effect did not correlate with hydrogen bonding, as all but a single simulation (open dTTP-dC) spent less than 7% of production runs hydrogen bonding with F710. Variation in decomposition energy for both residues additionally depended on dNTP-dN pair identity. The final amino acid, E658 (SI: Fig. S9), varied in behavior across conformations and nascent base pairs, often destabilizing the dNTP. Hydrogen bonding analysis revealed that

17 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

increased hydrogen bonding frequency was associated with reduced destabilization of the dNTP, ultimately resulting in a switch to a stabilizing effect in nine simulations. For those nine simulations, the dNTPs maintained an average of 1.1 hydrogen bonds with E658, more than double the overall average (0.5 hydrogen bonds). The link was strongest in the ajar conformation, where hydrogen bonding was enriched across simulations that retained the ajar conformation, resulting in a switch for E658 to a stabilizing factor in four of five cases. Based on the highly variable impact of E658 on dNTP stability and its possible role in maintaining the ajar conformation, E658 was selected for further investigation. E658A: In-depth investigation All Pu-Py pairs, including WC pairs, DPs, and WPs, were simulated in the presence of an E658A mutation, beginning in the closed conformation. All pairs were simulated three times, with each simulation restricted to 100 ns. Of the resulting 24 simulations, no conformational transitions were observed, resulting in 3 independent replicates of behavior in the mutant closed conformation for each Pu-Py pair. MM-GBSA was completed for all 24 mutant simulations, and average binding affinities for each base pair were calculated (SI: Table S3). By comparing to average values for closed WT simulations of the correct dNTP-dN pairs (SI: Fig. S9), and using the standard deviation for the three replicates of each base pair as a guideline of expected variability, all but one base pair (dATP-dC) had an average binding affinity for the dNTP within or around one standard deviation from the WT closed conformation value. Considering the free energy reported for the dATP-dC system reflected significant conformational change, however, it was expected that the dATP-dC system would experience the greatest change in binding affinity. The E658A mutation appeared to have little to no effect on overall dNTP binding affinity.

18 ACS Paragon Plus Environment

Page 18 of 36

Page 19 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Unlike binding affinity, there were some differences in nucleotide and magnesium interactions between WT and E658A systems (SI: Table S4). While there was very little to no change in interactions between the dNTP and N-1 base, interactions with both the M-1 and N bases were altered. On average dNTPs in mutant systems spent 55% of their time hydrogen bonding with the M-1 base, compared to only 16% in WT systems. Despite the increase in hydrogen bonding, decomposition energies were relatively unaffected, indicating that hydrogen bonding played a more significant role in dNTP-M-1 interactions in the E658A systems. Conversely, dNTP-N decomposition energies were elevated in mutant systems relative to the associated increase in hydrogen bonding, suggesting that hydrogen bonding contributed less to interactions between dNTPs and template bases in E658A systems compared to WT. Interactions with the magnesium in MSB were much stronger in mutant systems, and the magnesium in MSA destabilized the dNTP to a greater degree in mutant systems. The effect of the E658A mutation on interactions between the dNTP and residue at position 658 were systematic rather than base pair specific, in that variation among base pairs was minimized in the mutant system. Comparing the average number of hydrogen bonds and the decomposition energy (Fig. 5) revealed the effect of the E658A mutation on the dNTP. In the WT systems, dNTP-E658 interactions involved a broad range of average inter-residue hydrogen bonds and most commonly had a destabilizing or mildly stabilizing effect on the dNTP. Mutant systems instead clustered tightly together, within a narrow range of average inter-residue hydrogen bonds and a much more consistent decomposition energy. dNTP-A658 interactions in mutant systems were always stabilizing. The E658A mutation had the general effect of eliminating base pair specific behavior at residue 658, instead providing a stabilizing effect across base pairs.

19 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. Effect of E658A mutation on interactions between dNTPs and E658/A658. Icons in black represent values from closed wild-type simulations. Icons in gray represent values averaged from three 100 ns simulations of the E658A mutant in the closed conformation. Wild-type and mutant systems form distinct clusters.

Discussion Although DNA replication has been deeply explored,1-23 many of the mechanistic details of nucleotide recognition and rejection in DNA polymerase remain unknown. Fidelity studies provide quantitative information detailing the effect of mutations on pair specific mutation rates,3-6 but often fail to determine how interactions differ between the nascent base pair and mutated residues relative to the wild-type protein. Molecular dynamics allows for an atomiclevel view of the behavior of biological molecules by using classical physical and chemical laws to predict the motion of atoms through time. Polymerase and nucleotide behavior in simulations of WT and mutant BF allow the mechanistic roles of a number of residues, and of conformational changes in polymerase, to be explored and compared to findings of more traditional experimental methods.

20 ACS Paragon Plus Environment

Page 20 of 36

Page 21 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Conformational Fidelity Checkpoint Conformational transitions from fully open to closed – or the reverse – are difficult to simulate without applying a biasing potential, even when µs timescales are employed.17 Previous attempts to simulate a full transition from the closed to open conformation revealed the presence of an intermediate conformation linked to fidelity,17 crystallized as the ajar conformation.14 A single-molecule fluorescence resonance energy transfer (FRET) study describing the ajar fidelity checkpoint suggested that this intermediate should represent a peak in the energy landscape of the open-to-closed transition, favoring transition to the closed conformation in the presence of WC pairs (Fig. 6).13 Simulating an open to closed transition, even with a WC nascent base pair, has previously required multiple µs of simulation using unrestrained MD.16 Among 49 WT and 24 mutant simulations, only 13 conformational transitions occurred, and of these, only one involved a complete transition from closed to open. The rarity of conformational transition events in general is unsurprising, considering previous timescales needed when using unrestrained MD.

Figure 6. Proposed roles of conformations used as starting structures in simulations. Following insertion of a dNTP and two Mg2+ ions, DNA polymerase transitions to the ajar conformation, and may reopen to eject mismatched dNTPs or proceed to the closed conformation when Watson-Crick pairs are present. An energetic barrier to reopening from the closed conformation is expected, as discussed in the text.

The frequency of ajar to open transitions relative to other transition events is consistent with the notion that the ajar conformation represents a fidelity checkpoint. As a peak in the

21 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

energy landscape in the transition from open to closed, the ajar conformation should represent a barrier to further closing in the presence of a mismatch. Within the µs simulations employed in this study, 85% of conformational transitions involved opening of the active site from the ajar conformation. Both remaining conformational transitions involved movement from the closed conformation to a more open conformation, while no successful transitions from a more open to more closed state occurred. 12 of 16 WT simulations started in the closed conformation contained a mismatch, yet only two of those simulations transitioned to a more open conformation, suggesting that the ajar conformation not only prevents transitions to the fully closed conformation, but additionally serves as a barrier to reopening after the enzyme has reached the closed conformation. The ajar conformation may therefore represent one of the most important fidelity checkpoints in replication, as a failure to reopen after closing around a mismatch would result in incorporation of a mismatched nucleotide. Mismatch Categorization Interaction strength between possible base pairs has previously been calculated in the absence of enzyme using quantum calculations and a water solvent model.32 These calculations provide an estimate of base pairing potential when the identity of the nitrogenous base is the primary source of interaction. Under these conditions, WC pairs are most strongly favored, though A/G and G/T pairs have nearly identical affinities for each other as the A/T pairs.32 The remaining mismatches vary in affinity, though A/C pairs constitute the lowest affinity pair.32 In the active site, however, the influence of the protein can greatly impact base pair affinity. In the closed conformation, where many pairs experienced the most stable pairing interactions, base pair affinities are modified such that although the strongest eight pairs in the active site match the eight most favored pairs in the absence of enzyme, the order is altered. Unlike the quantum

22 ACS Paragon Plus Environment

Page 22 of 36

Page 23 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

calculations, were reciprocal pairing did not impact binding affinity, the behaviors of reciprocal dNTP-dN pairs in the active site can vary, suggesting that the active site imposes constraints on the nucleotides absent in solution. Nascent base pair interactions in the active site are governed primarily by Pu/Py composition. Five categories of base pairs are suggested by patterns of Pu/Py composition and dNTP-dN behavior, consisting of Watson-Crick pairs (WC), disordered pairs (DPs), wobble pairs (WPs), stretched pairs (SPs), and cramped pairs (CPs). Previous crystallization of mismatched pairs provided a baseline of expected behavior. Johnson and Beese15 crystallized dNTP-dN mismatches in BF and reported Pu/Py composition plays an important role in determining mismatch stability. The Pu/Py composition of the dNTP-dN pair determines how much space in the active site is needed and how much the DNA backbone must shift to accommodate pair formation. Further effects are determined by variable interactions of nitrogenous bases, such as pair formation. Although the categories outlined above differ from those reported by Johnson and Beese, the two schemes are expected to differ when considering their scheme refers to post-catalytic interactions, where both bases are constrained by their connection to their respective strands. WPs behave much like traditional WC pairs in the closed conformation, maintaining an average of 1.1 hydrogen bonds. As the active site opens, however, interactions decrease dramatically, such that the ajar dTTP-dG simulation was one of the few examples of a successful nucleotide ejection. The decomposition energies of WPs, on average, are second only to WC pairs, and have a strong stabilizing effect in the closed conformation. WPs fit well into the active site and are capable of consistently pairing.

23 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

CPs fit into the active site reasonably well, despite their name, but become more constrained as the polymerase closes. These constraints appear reduced relative to the constraints described by Johnson and Beese,15 consistent with the additional mobility of the unbound precatalytic dNTP. Anti-anti CPs interact surprisingly favorably in the closed conformation, maintaining an average 1.2 hydrogen bonds, though interactions are reduced in the ajar and open conformations. Average decomposition energies further support the idea that CPs are capable of pairing stably. The single syn-anti CP, dGTP-dG, is supported as a syn-anti pair by Johnson and Beese,15 and was initially discovered after the dGTP in the open dGTP-dG simulation spontaneously flipped and began pairing with the dG, driving the R629-P699 distance closer to the ajar distance. The behavior of the dGTP-dG pair is highly variable depending on whether the syn-anti or anti-anti orientation is used, but the formation of the syn-anti pair both in the open dGTP-dG simulation and in crystal structures supports classification as a syn-anti pair. When in the closed conformation, the syn-anti orientation allows the dG to stabilize the dGTP to a similar extent as A/T and G/T pairs, while the anti-anti orientation prevents stable pairing. DPs, despite their similarity to WPs in Pu/Py composition, pair poorly. On average, these pairs hydrogen bond rarely across conformations and have net decomposition values of near zero, neither stabilizing nor destabilizing dNTPs. Johnson and Beese similarly described these pairs.15 Pairing between DPs was rare, and short-lived when it occurred. Both bases were mobile in each pair, often moving independently of the other. Similar behavior was described by Johnson and Beese, who had difficulty resolving structures of post-catalytic DPs.15 SPs experience greater strain than that described by Johnson and Beese, maintaining a single hydrogen bond only 10% of time simulated on average. Johnson and Beese separated these pairs into different categories,15 but similarity in behavior in pre-catalytic steps suggests

24 ACS Paragon Plus Environment

Page 24 of 36

Page 25 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

these pairs behave somewhat uniformly. With the highest decomposition energies on average, the presence of an SP serves to destabilize the dNTP more frequently than not, suggesting that SPs form unstable pairs. Residue Specific Interactions with dNTPs Adjacent nucleotides play a predominantly stabilizing role for the dNTP. The previously catalyzed nucleotide at the M-1 position has a strong stabilizing effect on the dNTP, often contributing more to the binding of the dNTP than the template base at the N position. The M-1 base appears to interact primarily through π-stacking. The nucleotide at the N-1 position has a reduced effect in comparison, and may serve to improve or depreciate dNTP binding. Variation in the role of the N-1 nucleotide was best linked to the strength of the dNTP-N-1 pair. In some cases, a more stable dNTP-N-1 pair was possible relative to the dNTP-dN pair, and the dNTP interacted more strongly with the N-1 base, forming a base pair across layers. Magnesium ion behavior was consistent with expectations stemming from the current understanding of the roles of these metal ions in nucleotide addition.4,22 A number of options for force field parameters of Mg2+ ions are available, each of which can contribute to different sampled coordination patterns in simulation,33 making the selection of appropriate magnesium ion parameters essential in MD. The parameters developed by Allner et al24 have been used previously in MD simulation of BF.16-17 Generally, use of the parameters developed by Allner et al24 has been suggested for simulations where Mg2+ ions must bind to small nucleic acids,33 which, in light of consistency between simulation and expectation, implies that the selection of the Allner parameters here was appropriate. The Mg2+ in MSA is associated with the catalytic addition of new nucleotides. The consistent destabilizing effect of this Mg2+ on the dNTP across base pairs could derive from its catalytic, rather than structural, function. The magnesium in

25 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

MSB plays a different role, serving to help offset the concentrated negative charge of the phosphates in the dNTP. This Mg2+ demonstrates a conformation specific dependence, often serving as the strongest stabilizing influence for the dNTP in the closed (and sometimes ajar) conformation, but contributing to destabilization in the open (and often ajar) conformation. As magnesium behaviors meet expectations in other regards, such a discrepancy may indicate that the active site modulates Mg2+-dNTP interactions, helping to improve dNTP binding in the closed conformation when approaching catalysis, and reducing dNTP binding in the ajar and open conformations. A number of amino acids near the dNTP appear to stabilize the dNTP in the active site by interacting with the magnesium ions and the phosphates of the dNTP. Two such residues, Y654 and K876, were identified from their enriched variability in decomposition energies relative to other residues in and around the active site. Although both residues interact with the region of the active site containing concentrated charge, these interactions differ from expectation and demonstrate the importance of a dynamic approach when discerning the roles of amino acids. Y654, located near MSB, consistently interacts with the magnesium at MSB using the carbonyl oxygen on the protein backbone rather than the hydroxyl group affixed to the aromatic side chain. Y654 often has the opposite effect on dNTP stability compared to the Mg2+ at MSB, serving to destabilize the dNTP when it is tightly bound to MSB. Structural studies employing static crystal structure or mutation studies may fail to identify such an interaction due to the use of a backbone carbonyl group rather than a side chain. While the impact of Y654 on the dNTP and magnesium may be a general characteristic of residues at that location rather than specific to tyrosine, the importance of the residue in stabilizing the magnesium at MSB was easily identified through molecular dynamics, where more traditional methods may have failed.

26 ACS Paragon Plus Environment

Page 26 of 36

Page 27 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

K876 is the terminal amino acid of BF, and of unfragmented B. stearothermophilus DNA polymerase I, and experiences a great deal of mobility in simulation. This mobility is often heightened in the ajar and open conformations relative to the closed conformation, allowing K876 to hydrogen bond with the oxygen atoms in Pβ and Pγ of the dNTP in a number of simulations started in the ajar conformation. In these simulations, K876 contributes to dNTP stability by increasing interactions between the protein and the dNTP, bringing K876 close enough to the dNTP to form hydrogen bonds more than half the time in some cases. In each of the starting structures, however, K876 is located fairly distant from the dNTP, such that the distance between Cα of K876 and Pγ in the closed conformation begins at 12.96 Å,16 ajar at 13.68 Å,14 and at 13.83 Å in the open conformation.12 Given the distance between these residues in the crystal structures, it is unlikely that such an interaction would be predicted by studies working from immobile crystal structures. Amino acids that interact strongly with the nitrogenous base of the dNTP, rather than the sugar or phosphates, were also identified using variability in decomposition energies. These two residues, Y714 and F710, have been previously identified by a number of studies focusing on nucleotide fidelity, as they often have large impacts on polymerase behavior when mutated.7 Y714 is highly conserved and has been studied a number of times.1-2,6-7,11,13 The residue is linked to diverse effects on fidelity when mutated, including increased misincorporation of ribonucleotides38 and severely reduced fidelity when mutated to a non-phenylalanine residue.1-2 F710, also highly conserved, has been repeatedly studied as well.7,10-11,13 Mutation to tyrosine at that location is associated with increased incorporation of dideoxynucleotides,10 but loss of the aromatic ring at that position, although linked to a reduction in nucleotide binding and catalysis,7,21 has no effect on polymerase fidelity.3

27 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The functions of these two residues, as suggested by simulation, are consistent with the central importance of these amino acids in literature. In simulations, Y714 generally impacted the nascent base pair in one of two ways. When template bases were present fully in the active site, rather than in the pre-insertion site, Y714 occasionally maneuvered to partake in π-stacking interactions. This interaction is consistent with the observation that fidelity is greatly impacted by the loss of an aromatic residue at that location, as an aromatic residue is needed to π-stack with the template nitrogenous base. When the template base is located in the pre-insertion site, however, Y714 replaces the template base in the active site. Such replacements are most common in the open conformation, as this conformation provides the template base with access to the pre-insertion site.17 Y714 is then free to hydrogen bond with the dNTP, as shown in Figure 7A. F710 primarily serves to stabilize the dNTP through π-stacking, but can also serve to prevent reinsertion of nucleotides by positioning itself in the binding site following dNTP ejection (Fig. 7B).

Figure 7. Behavior of aromatic O-helix residues. A) Tyr714 occasionally enters the active site and forms hydrogen bonds with dNTPs, even mismatches, once the template base (here adenine) has moved into the pre-insertion site, which occurs before ejection of a dNTP. B) Phe710 can “stand’ in the active site and prevent ejected dNTPs from re-entering the active site, as well as keeping template bases in the pre-insertion site. Note that neither interaction disrupts the WC pair at the n-1 position.

28 ACS Paragon Plus Environment

Page 28 of 36

Page 29 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Ultimately, although some novel behaviors can be identified from simulations, many interactions between these two residues and dNTPs can be explained primarily as stabilizing interactions which improve dNTP binding, which is consistent with the highly conserved status of these residues, as it implies that these stabilizing behaviors are inherent properties of these amino acids, regardless of nascent base pair identity. The ability for MD to repeatedly replicate such stabilizing interactions supports the idea that molecular dynamics offers a reasonably accurate representation of biological systems. Mutant Study: Mechanistic role of E658A E658 has an unexpected interaction pattern with dNTPs across WT mismatches, serving a stabilizing role in only a handful of cases. Previous experimental work by Minnick et al. demonstrated that the E658A mutation was associated with changes in WP and DP fidelity: there was a 10-fold increase in dTTP-dG integration, a 75-fold increase in dCTP-dA mutations, but only a 1.1-fold increase and 0.77-fold decrease in the reciprocal mutation rates.3 To better understand the role of E658 in WP and DP fidelity, all WP, DP, and WC pairs were simulated in the closed conformation of BF with an E658A mutation. The closed conformation was chosen as the active site is most constrained, and spatial restrictions have been previously linked to the putative role of E658.3 Minnick et al. used kinetics data and static structures to suggest that preferential integration of dPyTP-dPu mismatches in the presence of the E658A mutation is driven by differences in spatial orientation expected for dPyTP-dPu mismatches relative to dPuTP-dPy mismatches.3 Minnick et al. suggest that the absence of the glutamate side chain in the mutated form allows dPuTP-dPy pairs, whose dPuTPs sterically clash with E658 in the WT, to adopt spatial orientations that promote pairing but reduce catalytic efficiency, resulting in very little overall

29 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

change in mutation rates for those mismatches.3 In simulations, both dPuTP-dPy mismatches have decreased hydrogen bonding frequency with residue 658, but improved decomposition energies in mutant systems relative to WT. Decreased hydrogen bonding suggests fewer electrostatic interactions, as expected by the removal of the glutamate side chain, but the increased stability provided by residue 658 in the absence of the bulky side chain, despite the loss of hydrogen bonding, suggests that destabilizing interactions, such as steric clashing, are reduced in mutant systems. The assertion by Minnick et al. that steric clashing dominates dPuTPE658 interactions in mismatches is therefore supported by simulation. dPyTP-dPu pairs, alternatively, are expected to have reduced interactions between dPyTPs and E658 in mismatches relative to similar interactions in WC pairs, as the dPyTPs shift away from E658.3 Mutation to alanine would then augment these expected patterns of interaction, ultimately resulting in an inability to distinguish between dPyTP-dPu pairs and increased incorporation of mismatches. Consistent with this conclusion, dTTP-dG experiences very little change in average hydrogen bond frequency between WT and mutant systems, suggesting that interactions between residue 658 and the dTTP are unaffected by side chain, and are limited even in the WT system. dCTP-dA, however, has reduced hydrogen bonding frequency in mutant systems, indicating that the dCTP interacts with the glutamate side chain specifically, and that the loss of this side chain reduces interactions between dCTP and residue 658. The behavior of dCTP is inconsistent with the expectations proposed by Minnick et al., as dCTP should have limited interactions with E658 when paired with adenine. Minnick et al. used the assumption that DPs and WPs form stable pairs in the active site to propose their model,3 but that assumption does not match nucleotide behavior in simulation, where adenine/cytosine pairs fail to form over long periods of time, even in the closed

30 ACS Paragon Plus Environment

Page 30 of 36

Page 31 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

conformation. Despite these discrepancies in behavior, the role of E658 in mismatch recognition suggested by simulation supports the mechanism proposed by Minnick et al., with some modification. The largest modification involves the dCTP-dA pair, where the dCTP interacted more frequently with E658 than the dA opposite. The disordered quality of the dCTP-dA pair allows for more frequent interaction between E658 and the dCTP than in the corresponding dTTP-dG pair, but this same chaotic behavior also prevents these interactions from enduring. The suggestion that reduced dPyTP-E658 interactions in mismatches allow for the specific changes in fidelity linked to the E658A mutation is supported by the above described changes in hydrogen bonding. However, the assumption that adenine/cytosine pairs remain together stably should be reassessed, as pair interactions in simulation suggest that for dCTP-dA, interactions with E658 are reduced relative to dCTP-dG not because the dCTP is consistently displaced from its position relative to the WC pair, but rather because the mobility of the dCTP is increased, even in the closed conformation.

Conclusions By simulating dNTP-dN pairs in the active site of BF, aspects of conformational change, pre-catalytic nucleotide interaction, and specific dNTP-residue interactions can be assessed at the atomic level and compared to the results of more traditional experimental methods. Simulated conformational transitions are consistent with expectations derived from previous studies of DNA polymerase, supporting a central role for the ajar conformation in maintaining fidelity. Nucleotide interactions are modified when bound to DNA polymerase, which imposes steric and electrostatic constraints that can modulate pairing in the nascent base pair. Residues identified based on interactions with the dNTP behave as indicated in previous literature, and simulations

31 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of a mutant BF reveal an alternative explanation for changes in fidelity linked to the E658A mutation. The results of the molecular dynamics simulations discussed above are supported by a number of experimental studies with a broad range of focuses, consistent with the diverse information that can be extracted from such simulations. The general agreement between experiment and simulation also implies that molecular dynamics has become a robust method for studying large biological molecules. The ability to design DNA polymerases with very specific patterns of fidelity could provide researchers with an opportunity for more precise results by eliminating variable behavior or perhaps further improve fidelity in applications such as PCR. The current understanding of the mechanistic roles of many amino acids in and around the active site is insufficient to fully predict the effect of mutating those residues on fidelity, but molecular dynamics simulations can help improve mechanistic understanding of those residues. As the speed and accuracy of molecular dynamics simulations continues to improve, researchers should consider the possible benefits of incorporating such information into more traditional studies. Acknowledgments This work used Extreme Science Engineering Discovery Environment (XSEDE) allocation TGMCB140073, which is supported by National Science Foundation grant number OCI-1053575. Additional equipment and funding was provided by Truman State University. Supporting Information Supplemental figures and tables are available in a separate document. This material is available free of charge via the Internet at http://pubs.acs.org.

References

32 ACS Paragon Plus Environment

Page 32 of 36

Page 33 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1. Carroll, S.S.; Cowart, M.; Benkovic, S.J. A Mutant of DNA Polymerase I (Klenow Fragment) with Reduced Fidelity. Biochem. 1991, 30, 804-813. 2. Bell, J.B.; Eckert, K.A.; Joyce, C.M.; Kunkel, T.A. Base Miscoding and Strand Misalignment Errors by Mutator Klenow Polymerases with Amino Acid Substitutions at Tyrosine 766 in the O Helix of the Fingers Subdomain. J Biol Chem. 1997, 11, 7345-7351. 3. Minnick, D.T.; Liu, L.; Grindley, N.D.F.; Kunkel, T.A.; Joyce, C.M. Discrimination Against Purine-Pyrimidine Mispairs in the Polymerase Active Site of DNA Polymerase I: A structural Explanation. Proc Natl Acad Sci USA. 2001, 99, 1194-1199. 4.Bermek, O.; Grindley, N.D.F.; Joyce, C.M. Distinct Roles of the Active-Site Mg2+ Ligands, Asp882 and Aps705, of DNA Polymerase I (Klenow Fragment) During the Prechemistry Conformational Transitions. J Biol Chem. 2010, 286, 3755-66. 5. Patel, P.H.; Kawate, H.; Adman, E.; Ashbach, M.; Loeb, L.A. A Single Highly Mutable Catalytic Site Amino Acid is Critical for DNA Polymerase Fidelity. J Biol Chem. 2001, 276, 5044-5051. 6. Polesky, A.H.’ Steitz, T.A.; Grindley, N.D.; Joyce, C.M. Identification of Residues Critical for the Polymerase Activity of the Klenow Fragment of DNA Polymerase I from Escherichia coli. J Biol Chem. 1990, 265, 14579-14591. 7. Minnick, D. T.; Bebenek, K.; Osheroff, W.P.; Turner Jr, R.M.; Astatke, M.; Liu, L.; Kunkel, T.A.; Joyce, C.M. Side Chains that Influence Fidelity at the Polymerase Active Site of Escherichia coli DNA Polymerase I (Klenow Fragment). J Biol Chem. 1999, 274, 3067-3075. 8. McCain, M.D.; Meyer, A.S.; Schultz, S.S.; Glekas, A.; Spratt, T.E. Fidelity of Mispair Formation and Mispair Extension is Dependent on the Interaction Between the Minor Groove of the Primer Terminus and Arg668 of DNA Polymerase I of Escherichia coli. Biochem. 2005, 44, 5647-5659. 9. Patel, P.H.; Suzuki, M.; Adman, E.; Shinnkai, A.; Loeb, L.A. Prokaryotic DNA Polymerase I: Evolution, Structure, and “Base Flipping” Mechanism for Nucleotide Selection. J Mol Biol. 2001, 308, 823-837. 10. Tabor, S.; Richardson, C.C. A Single Residue in DNA Polymerase in the Escherichia coli DNA Polymerase I Family is Critical for Distinguishing Between Deoxy- and Dideoxyribonucleotides. Proc Natl Acad Sci USA. 1995, 92, 6339-6343.

33 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

11. Loh, E.; Loeb, L.A. Mutability of DNA Polymerase I: Implications for the Creation of Mutant DNA Polymerases. DNA Repair. 2005, 4, 1390-1398. 12. Johnson, S.J.; Taylor, J.S.; Beese, L.S. Processive DNA Synthesis Observed in a Polymerase Crystal Suggests a Mechanism for the Prevention of Frameshift Mutations.Proc Natl Acad Sci USA. 2003, 100, 3895-3900. 13.

Hohlbein, J.; Aigrain, L.; Craggs, T.D.; Bermek, O.; Potapova, O.; Shoolizadeh, P.; Grindley, N.D.F.; Joyce, C.M.; Kapanidis, A.N. Conformational Landscapes of DNA Polymerase I and Mutator Derivatives Establish Fidelity Checkpoints for Nucleotide Insertion. Nature Comm. 2013, 4, 2131-2140.

14. Wu, E.Y.; Beese, L.S. The Structure of High Fidelity DNA Polymerase Bound to a Mismatched Nucleotide Reveals an “Ajar” Intermediate Conformation in the Nucleotide Selection Mechanism. J Biol Chem. 2011, 286, 19758-19767. 15. Johnson, S.J.; Beese, L.J. Structures of Mismatch Replication Errors Observed in a DNA Polymerase. Cell. 2004, 116, 803-816. 16. Miller III, B.R.; Beese, L.S.; Parish, C.A.; Wu, E.Y. The Closing Mechanism of DNA Polymerase I at Atomic Resolution. Structure. 2015, 23, 1609-1620. 17. Miller III, B.R.; Parish, C.A.; Wu, E.Y. Molecular Dynamics Study of the Opening Mechanism for DNA Polymerase I. PLoS Comp Biol. 2014, 10, e1003961. 18. Kiefer, J.R.; Mao, C.; Braman, J.C.; Beese, L.S. Visualizing DNA Replication in a Catalytically Active Bacillus DNA Polymerase Crystal. Nature. 1998, 391, 304-307. 19. Kiefer, J.R.; Mao, C.; Hansen, C.J.; Basehore, S.L.; Hogrefe, H.H.; Braman, J.C.; Beese, L.S. Crystal Structure of a Thermostable Bacillus DNA Polymerase I Large Fragment at 2.1 Å Resolution. Structure. 1997, 5, 95108. 20. Johnson, K..A. The Kinetic and Chemical Mechanism of High-Fidelity DNA Polymerases. Biochim Biophys Acta. 2010, 1804, 1041-1048. 21. Joyce, C.M. Techniques Used to Study the DNA Polymerase Reaction Pathway. Biochim Biophys Acta. 2010, 1804, 1032-1040. 22. Mullen, G.P.; Serpersu, E.H.; Ferrin, L.J.; Loeb, L.A.; Mildvan, A.S. Metal Binding to DNA Polymerase I, its Large Fragment, and two 3’,5’-Exonuclease Mutants of the Large Fragment. J Biol Chem. 1990, 265, 14327-14334.

34 ACS Paragon Plus Environment

Page 34 of 36

Page 35 of 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

23. Beese, L.S.; Steitz, T.A. Structural Basis for the 3’-5’ Exonuclease Activity of Escherichia coli DNA Polymerase I: a Two Metal Ion Mechanism. The EMBO J. 1991, 10,25-33. 24. Allner, O.; Nilsson, L.; Villa, A. Magnesium Ion-Water Coordination and Exchange in Biomolecular Simulations. J Chem Theory Comput. 2012, 8, 1493-1502. 25. Price, D.J.; Brooks, C. A Modified TIP3P Water Potential for Simulation with Ewald Summation. J Chem Phys. 2004, 121, 10096-10103. 26. Case, D.A.; Babin, V.; Berryman, J.T.; Betz, R.M.; Cai, Q.; Cerutti, D.S.; Cheatham III, T.E.; Darden, T.A.; Duke, R.E.; Gohlke, H.; Goetz, A.W.; Gusarov, S.; Homeyer, N.; Janowski, P.; Kaus, J.; Kolossvary, I.; Kovalenko, A.; Lee, T.S.; LeGrandd, S.; Luchko, T.; Luo, R.; Madej, B.; Merz, K.M.; Paesani, F.; Roe, D.R.; Roitberg, A.; Sagui, C.; Salomen-Ferrer, R.; Seabra, G.; Simmerling, C.L.; Smith, W.; Swails, J.; Walker, R.C.; Wang, J.; Wolf, R.M.; Wu, X., Kollman, P.A. AMBER 14. University of California, San Francisco. 2014. 27. Hornak, V.; Abel, R,; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of Multiple Amber Force Fields and Development of Improved Protein Backbone Parameters. Proteins. 2006, 65, 712-725. 28. Roe, D.R.; Cheatham III, T.E. PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. J Chem Theor Comput. 2013, 9, 3084-3095. 29. Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual Molecular Dynamics. J. Mol. Graphics. 1996, 14, 33-38. 30. Onufriev, A.; Bashford, D.; Case, D.A. Exploring Protein Native States and Large-Scale Conformational Changes with a Modified Generalized Born Model. Proteins. 2004, 55, 383-394. 31. Miller III, B.R.; McGee Jr, T.D.; Swails, J.M.; Homeyer, N.; Gohlke, H,; Roitberg, A.E. MMPBSA.py: An Efficient Program for End-State Free Energy Calculations. J Chem Theor Comput. 2012, 8, 3314-3321. 32. Poater, J.; Swart, M.; Guerra, C.F.; Bickelhaupt, F.M. Selectivity in DNA Replication. Interplay of Steric Shape, Hydrogen Bonds, π-stacking and Solvent Effects. Chem Commun 2011, 47, 7326-7328. 33. Casalino, L.; Palermo, G.; Abdurakhmonova, N.; Rothlisberger, U.; Magistrato, A. Development of SiteSpecific Mg2+-RNA Force Field Parameters: A Dream of Reality? Guidelines from Combined Molecular Dynamics and Quantum Mechanics Simulations. JCTC 2017, 13, 340-352.0

35 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table of Contents Graphic:

36 ACS Paragon Plus Environment

Page 36 of 36