Prediction of Active Site and Distal Residues in E. coli DNA

Publication Date (Web): January 17, 2018 ...... Support of this work by the National Science Foundation under grant MCB-1517290 to M.J.O. and P.J.B., ...
1 downloads 0 Views 1MB Size
Subscriber access provided by READING UNIV

Article

Prediction of active site and distal residues in E. coli DNA polymerase III alpha polymerase activity Ramya Parasuram, Timothy A Coulther, Judith M Hollander, Elise Keston-Smith, Mary Jo Ondrechen, and Penny J. Beuning Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.7b01004 • Publication Date (Web): 17 Jan 2018 Downloaded from http://pubs.acs.org on January 18, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Biochemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Prediction of active site and distal residues in E. coli DNA polymerase III alpha polymerase activity

Ramya Parasuram#, Timothy A. Coulther#, Judith M. Hollander, Elise Keston-Smith, Mary Jo Ondrechen*, Penny J. Beuning*

Department of Chemistry & Chemical Biology, Northeastern University, Boston, MA 02115 USA #Equal contributions *Address correspondence to: Penny J. Beuning Phone: 617-373-2865 Email: [email protected]

Mary Jo Ondrechen Phone: 617-373-2856 Email: [email protected]

Department of Chemistry and Chemical Biology Northeastern University 360 Huntington Ave 102 Hurtig Hall Boston, MA 02115 USA Fax: 617-373-8795

1 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract The process of DNA replication is carried out with high efficiency and accuracy by DNA polymerases. The replicative polymerase in E. coli is DNA Pol III, which is a complex of 10 different subunits that coordinates simultaneous replication on the leading and lagging strands. The 1160-residue Pol III alpha subunit is responsible for the polymerase activity and copies DNA accurately, making one error per 105 nucleotide incorporations. The goal of this research is to determine the residues that contribute to the activity of the polymerase subunit. Homology modeling and the computational methods of THEMATICS and POOL were used to predict functionally important amino acid residues through their computed chemical properties. Sitedirected mutagenesis and biochemical assays were used to validate these predictions. Primer extension, steady-state single-nucleotide incorporation kinetics, and thermal denaturation assays were performed to understand the contribution of these residues to the function of the polymerase. This work shows that the top 15 residues predicted by POOL, a set that includes the three previously known catalytic aspartate residues, seven remote residues, plus five previously unexplored first-layer residues, are important for function. Six previously unidentified residues, R362, D405, K553, Y686, E688, and H760, are each essential to Pol III activity; three additional residues, Y340, R390, and K758, play important roles in activity.

2 ACS Paragon Plus Environment

Page 2 of 33

Page 3 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Table of Contents graphic

3 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

INTRODUCTION Enzymes are natural catalysts that help accelerate chemical reactions that would occur very slowly under uncatalyzed conditions. The rate of the catalyzed reaction can be 1020 times faster than the uncatalyzed reaction in water.1 DNA polymerases are enzymes that catalyze the addition of nucleotides to growing DNA strands, allowing genetic information to be passed on to the next generation. Replicative polymerases must carry out this process efficiently and faithfully for survival. The speed and specificity of enzymatic reactions are derived from specific interactions that take place between the amino acid residues and the substrate molecule(s) at the active site. The chemical and geometric properties of the active site residues with respect to their position in the protein three-dimensional (3D) structure determine the activity and function of the enzyme. Hence, knowledge of the active site is one of the first steps toward understanding the functions of enzymes. A number of different computational methods are commonly used to predict the active site of an enzyme. For protein sequences of high homology to those of well-characterized proteins, transfer-based methods like sequence alignment or structural alignment, if a 3D structure is available, are often used to transfer information about functional residues from a well-characterized protein to an uncharacterized protein. Additionally, bioinformatics-based and ab initio methods utilize sequence information or the chemical and physical properties of the amino acid residues, such as sequence conservation scores, phylogenetic information, electrostatics properties, geometric features, and computed buffer range to identify the active site residues.2-5 THEMATICS (Theoretical Microscopic Anomalous Titration Curve Shapes) is a protein functional site prediction method that requires only the protein 3D structure as input.6-8 4 ACS Paragon Plus Environment

Page 4 of 33

Page 5 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

THEMATICS takes advantage of the observation that active sites in enzymes generally contain residues that are capable of transferring protons and those residues that are essential for catalysis tend to show perturbed theoretical titration curves when compared to non-active-site residues. POOL (Partial Order Optimum Likelihood) is a machine learning method that rank-orders all residues according to a computed probability of functional importance.9-11 POOL can use multiple input features, but requires that the outcome (in this case, the probability that a residue is functionally important) from each input feature depends monotonically on that feature. POOL currently uses electrostatic information from THEMATICS, ligand binding pocket information from ConCavity, and phylogenetic information from INTREPID.12, 13 POOL has been validated to predict accurately residues important for function3, 10 using the CSA-100, which is a benchmark set of 100 well-characterized enzymes with experimentally verified important residues from the Catalytic Site Atlas.14, 15 It is known that in the active site, mutations of residues that contact the substrate can have significant effects depending on the function of the residue, but recent work on enzyme active sites has shown that long range interactions can also play a role in the biochemical function of the enzyme.4, 16 For instance, in dihydrofolate reductase, which catalyzes the reduction of 7, 8-dihydrofolate to 5, 6, 7, 8-tetrahydrofolate, M42 is located 10 Å from the active site and showed a 42-fold reduction in rate when mutated to tryptophan.17 In the present work, remote residues are classified into 'shells' based on their distance from the active site. Residues within 5 Å of the substrate are denoted as ‘First shell,’ other residues within 5 Å of the first shell are denoted as ‘Second shell’ and so on. It has been shown that enzymes can have either extended or compact active sites. For ketosteroid isomerase (KSI) from Ps. putida, POOL predicted a compact active site with little remote residue participation. However, for a different

5 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

isomerase, human phosphoglucose isomerase, an extended active site was predicted and mutations at POOL-predicted positions in the second and third shells result in significant decreases in catalytic efficiency of the enzyme. There was also no evidence that the remote residue variants have any significant alteration of the structure of the enzyme.18 Similarly, it has been reported that second- and third-shell residues in cobalt-dependent nitrile hydratase from Pseudomonas putida participate significantly in catalysis, as predicted.19 DinB, a DNA damage bypass polymerase from E. coli, was also shown to have an extended active site, as predicted by POOL.20 In the case of DinB, mutations of residues in contact with active site residues, the socalled second shell residues, have marked defects in the extension step of damage bypass.20 Notably, the validated predictions of important residues by POOL in DinB were based on a homology model of DinB, since at the time of the study the crystal structures of DinB had not yet been reported.21 DNA Pol III is the major replicative polymerase in E. coli, carrying out the crucial step of DNA replication. The Pol III holoenzyme is a complex of 10 subunits, wherein the alpha subunit is the catalytic subunit that carries out the polymerization reaction.22, 23 The alpha subunit (Pol III alpha) operates in the cell as part of the polymerase core, which includes the epsilon proofreading subunit and the theta subunit that helps to stabilize the core.24 The C family polymerases, of which DNA Pol III alpha is a member, do not share homology with other polymerase families except for a similar overall structural architecture, which is in the form of a right hand (Figure 1a). It was only in 1999 that Pritchard et al. identified the positions of the three catalytic aspartate residues through alignments of 20 Pol III sequences from different organisms.25 These residues were subsequently identified to play a catalytic role in the mechanism of phosphoryl transfer as Pol III shows severe loss of activity upon single mutation

6 ACS Paragon Plus Environment

Page 6 of 33

Page 7 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

of each of these three residues.25 The three Asp residues identified, D401, D403 and D555, are located in the palm domain of Pol III, along with the two divalent Mg2+ ions that are essential for catalysis. The DNA is held in place by the finger and thumb domains. The polymerase also has a PHP (polymerase and histidinol phosphate phosphatase) domain at the N terminus which has been shown to have pyrophosphatase activity.26-28 The OB fold domain is responsible for binding to single-stranded DNA.29, 30 Though several studies have identified mutator and antimutator phenotypes of alpha to understand its fidelity,31, 32 less is known about the role of additional residues surrounding the catalytic aspartate residues. Recent studies on enzyme active sites have shown that residues remote from the canonical active site may also contribute significantly to catalysis.18-20, 33, 34 The aim of this study is to use the validated computational tools THEMATICS and POOL to predict important residues within, and distal to, the active site and to investigate experimentally their role in the functioning of DNA Pol III alpha. Indeed, among the top-ranked predicted residues are second-shell residues D405, Y686, and E688 that are ≥7 Å from the incoming nucleotide and result in loss of activity upon mutation.

7 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

a

b

a

c

Figure 1. Eco Pol III structure and sequence overview. a: Surface representation of the different domains, colored according to 1c. b: POOL-predicted residues evaluated experimentally for functional importance. The known catalytic aspartates are in red, other first shell residues in purple, and remote residues are green. c: Sequence map of the POOL predicted residues relative to the different domains. Remote residues are labeled above the bar, while first shell residues are below.

MATERIALS AND METHODS Homology Modeling of Ternary Complex. The structure of the truncated alpha subunit (residues 1-917) has been determined (PDB ID 2HNH).28 In order to study the interactions between the protein and its substrate, a homology model for the truncated alpha with DNA

8 ACS Paragon Plus Environment

Page 8 of 33

Page 9 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

bound, available as Supporting Information, was made using the YASARA suite of programs using two templates: Thermus aquaticus Taq Pol III alpha (PDB ID 3E0D) and Geobacillus kaustophilus Gka Pol C (PDB ID 3F2B) 35-37. The two templates share 40% and 25% sequence identity with Eco Pol III alpha, respectively. The quality of the model was evaluated using Zscores calculated by YASARA and also through analysis of a Ramachandran plot generated using the MolProbity server; 96.5% of all residues are in favored regions and 99.4% of all residues are in allowed regions.38,39 Active Site Prediction and Remote Residue Analysis. THEMATICS and POOL calculations were carried out for alpha (PDB 2HNH) excluding residues 1-270 (PHP domain).7,9 The input features used for POOL were the electrostatic features of THEMATICS and phylogenetic information from INTREPID. The top 15 residues in the rank-ordered POOL predictions have been used in this study. Although this cutoff was chosen somewhat arbitrarily to keep this study a manageable size, most residues in the top 15 are important for activity; it is likely that some lower-ranked predicted residues will also be important for activity. The distance between the incoming nucleotide and the top POOL-ranked residues in the model were measured using YASARA. Since DNA polymerases have two substrates - DNA and the incoming nucleotide the incoming nucleotide was chosen as the substrate due to its proximity to the reaction center. Any residue that lies within 5 Å of the incoming nucleotide was classified as "First shell" and any other residue that lies within 5 Å of one or more first shell residues was termed "Second shell." Site-directed Mutagenesis. Plasmid pET28a (a generous gift of Meindert Lamers and John Kuriyan) encoding kanamycin resistance and His-tagged alpha (UniProt P10443) was used as a template to design the variants, which were then constructed by QuikChange Lightning 9 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Mutagenesis (Agilent).28 The variants were confirmed by sequencing (MGH Core Facility or Eton Biosciences). In general, 700-900 nucleotides surrounding the region of the mutation were sequenced; a selection of active and inactive variants (R390A, D405N, K553A, and K758L) were chosen for complete gene sequencing, which confirmed that the gene is intact. Protein Purification. The variants were expressed in E. coli BL21 Tuner pLysS cells in 1 L Luria Broth with kanamycin (30 µg/mL).28 After the cells reached OD600 ~ 0.8, expression of the alpha subunit was induced by adding IPTG to a final concentration of 1 mM with shaking at 200 rpm for 4 h at 30 °C, after which cells were collected by centrifugation. The cell pellets were resuspended in buffer containing 50 mM HEPES, 250 mM NaCl, 10% glycerol and 2 mM βmercaptoethanol and lysed by sonication and addition of lysozyme. The clarified lysate was purified using a Nickel column (HisTrap HP, GE Healthcare) and further with a Heparin column (HiTrap Heparin HP, GE Healthcare). The fractions containing Pol III alpha were confirmed by SDS-PAGE. The fractions were concentrated using Vivaspin concentrators with a 10,000molecular weight cutoff membrane and the concentration of alpha was determined by Bradford assay. SDS-PAGE analysis of purified proteins is shown (Figure S1) in the Supporting Information. Thermal stability assay. Samples were prepared with 5 µM alpha variants in buffer (30 mM HEPES, pH 7.5, 20 mM NaCl, 10 mM MgSO4, 1 mM DTT) and 10X concentration SYPRO® Orange (Invitrogen). Each sample (15 µL) was added to a 96-well PCR plate and sealed and briefly centrifuged. The fluorescence values were measured from 4 °C to 75 °C for each well using a Bio-Rad CFX96 Real-Time PCR Detection System and melting temperatures (Tms) were calculated from the melt curve. Values reported are the average of at least three trials. Errors are reported as the standard deviations. 10 ACS Paragon Plus Environment

Page 10 of 33

Page 11 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Primer Extension Activity Assay. Template and 32P 5'-end labeled primer DNA were annealed in annealing buffer (20 mM HEPES, pH 7.5, 5 mM Mg(OAc)2) by heating at 95 °C for 2 min to denature the two strands, 50 °C to anneal the strands and finally cooled to 37 °C to give final concentration of 500 nM DNA. Each reaction mixture contained the reaction buffer (30 mM HEPES, pH 7.5, 20 mM NaCl, 10 mM MgSO4, 1 mM DTT, 100 µg/ml BSA, and 4% glycerol), 100 nM 32P-labeled annealed DNA and 25 nM alpha variant in 10 µL. The reaction was initiated with 1000 µM dNTP mixture. The reaction was quenched in 20 µL loading buffer (90% formamide, 50 mM EDTA, 0.025% xylene cyanol, and 0.025% bromophenol blue) after 2, 5, 15 and 30 min time points. A zero-min time point aliquot was removed prior to adding the mixture of all four standard dNTPs. The reaction products were analyzed by separation on a 14% polyacrylamide gel and imaged on a Molecular Dynamics storage phosphor imaging screen with a Storm 860 imager. Quantification was performed using ImageQuant. The percent of primers extended to any length beyond the primer was calculated from the ratio of primers extended to the total amount of DNA. The DNA template and primer sequences are shown. Variants were assayed at least three times; representative data are presented here. Annealed Primer Template DNA: 5'GGT TAC TCA GAT CAG GCC TGC GAA GAC CTG GGC GTC CGG CTG CAG CTG TAC TAT CAT ATG C3' 3' CCG CAG GCC GAC GTC GAC ATG ATA GTA TAC G5'

Steady-State Kinetics. 32P labeled DNA was annealed in the same manner as for the primer extension assays. The same reaction buffer and concentration of labeled DNA was used as above, but the reaction mixture contained the alpha variant ranging from 1 nM to 25 nM, depending on the variant, in 20 µL reactions. The varied substrate was dCTP as the incoming nucleotide, as the template base is G. Four time points were taken at intervals of 15, 30, or 120 11 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

seconds and each was quenched in 10 µL loading buffer. Analysis was performed as above, except that in this case only one nucleotide is added, so the amount of +1 primer was determined relative to the total amount of DNA to obtain the percent extension at each time point. Initial velocities at each substrate concentration were used to obtain Michaelis-Menten kinetic parameters. Values reported are the average of at least three trials. Errors are reported as the standard deviations. Although both of the assays described immediately above involve primer extension, we will refer to the former as primer extension and the latter single-nucleotide incorporation experiments as kinetics assays. Complementation Assay. DNA plasmids encoding E. coli (Eco) Pol III alpha were mixed with DV17 competent cells, a temperature-sensitive strain due to a mutation in the dnaE gene that encodes alpha.31, 40 The mixture was placed on ice for 10 min, then held at 30 oC for 5 min followed by another 10 min on ice. LB broth was added to 10X the original volume and incubated at 30 oC for 2 hours while shaking at 200 RPM. Equal volumes of cultures were plated on kanamycin (30 µg/mL) LB-agar plates and incubated at three separate temperatures: 30 oC, 37 o

C, and 42 oC. For those grown at higher temperatures, colonies were counted after 1 day, while

those grown at 30 oC were counted after 2 days. Assays were completed in triplicate for variants. Active site comparison with other polymerases. The three structures available for C family polymerases, 2HNH, 3E0D, and 3F2B, were aligned using the Matchmaker tool in UCSF Chimera software.28, 35, 36, 41 The local active site of 2HNH was manually aligned with the active site of human DNA polymerase β (PDB ID 1BPX) using Pymol software.42-44 RESULTS

12 ACS Paragon Plus Environment

Page 12 of 33

Page 13 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

POOL predicts an extended active site for DNA Pol III alpha. The top 15 POOL predictions using THEMATICS and INTREPID as input features were chosen for further analysis (Table 1). The 15 residues were designated into shells based on their distance from the incoming nucleotide. As can be seen for DNA Pol III alpha, POOL predicts an extended active site, with seven of the top 15 residues located beyond the first shell (Figure 1). Seventeen single-site variants were created to probe the contributions of the predicted residues to the biochemical function of alpha. Conservative mutations were chosen to test the role of charge or size. Alternatively, the residue was mutated to alanine to change both size and charge. In order to determine possible effects of the point mutations on protein stability, the variants were subjected to thermal shift assays to obtain their melting temperatures (Table 2). All seventeen variants show melting temperatures that are similar to that of wild-type (WT) Pol III alpha, with the largest difference being less than 3 °C. Thus, none of the variants constructed here cause a major change in overall stability of alpha.

13 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 33

Table 1. Top 15 POOL-ranked residues in order of rank Residue no.

Residue

Shell

403 401 340 555 405 390 686 709

ASP ASP TYR ASP ASP ARG TYR ARG

First First Second First Second First Second First

553 758 630 362 547 760 688

LYS LYS ASP ARG GLU HIS GLU

First Second Remote First Second First Second

Distance from nucleotide (Å) 2.7 2.4 8.1 4.6 7.1 1.8 10.5 9.5

a a

6.9 8.1 16.9 1.8 7.8 2.7 8.0

Interaction st w 1 shell residues

Other interacting residues

Domain

D401, F402 R362, D403 F756 E688, K758 -

Palm Palm Palm Palm Palm Palm Finger Finger

N757 Y686, E688 K553, D401 Q687 Y686, K758

Palm Finger Finger Palm Palm Finger Finger

a

Residues indicated in red are distal residues. Although these residues are >5 Å from the dNTP substrate, they are considered first-shell residues since there are no contacting residues between them and the incoming dNTP. Table 2. Melting temperature of Pol III alpha variants Variant

Melting Temperature (Tm, °C)

Variant

Melting Temperature (Tm, °C)

WT Y340S Y340F D405N R390A Y686F Y686A R709A K553A

42.7 ± 0.6 40.0 ± 0.0 40.7 ± 0.3 41.5 ± 0.0 41.5 ± 0.0 40.5 ± 0.0 42.5 ± 0.0 42.8 ± 0.3 42.7 ± 0.3

K758L D630N D630A R362K R362A E547Q E547A H760L E688Q

43.0 ± 0.0 41.8 ± 0.3 42.0 ± 0.0 41.5 ± 0.0 41.5 ± 0.0 42.7 ± 0.3 41.8 ± 0.3 40.5 ± 0.0 42.0 ± 0.0

The average melting temperature along with the associated standard deviation is displayed for each variant. 14 ACS Paragon Plus Environment

Page 15 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Primer extension assays verify POOL predictions. WT Pol III alpha and variant enzymes were assayed for DNA primer extension activity with a mixture of all four standard nucleotides. Variants harboring mutations at residues 401, 403 and 555 were previously characterized and are not included in our analysis.25 Pol III alpha variants D405N, Y686A, K553A, R362A, H760L and E688Q show complete loss of activity (Figure S2 and Table S1 of the Supporting Information). The mutations Y340S, Y340F, R362K, R390A, R709A, K758L and D630N resulted in less primer extension activity compared to WT alpha (Figure 2). Interestingly, Pol III alpha variants harboring either of the two non-conservative mutations, D630A and E547A, were as active as WT alpha (Figure 2). For the variants with detectable activity in primer extension assays, single-nucleotide incorporation assays were performed with dCTP as the incoming nucleotide to obtain their Michaelis-Menten kinetic parameters. Five of the seven variants with lowered primer extension activity showed a corresponding decrease in catalytic efficiency, ranging from 2.1 – 33-fold decrease relative to WT alpha (Table 3). Whereas the variant D630N showed a decrease in primer extension, its kinetic parameters were similar to those of WT alpha. Only one POOLpredicted residue, E547, showed essentially no deviation from WT alpha upon mutation in both the primer extension and overall catalytic efficiency in single-nucleotide incorporation assays. The variants R709A and Y686F showed a slight increase in the catalytic efficiency of single nucleotide incorporation relative to WT alpha. Y686 is a remote residue (Figure 1, 3a) that shows complete loss of activity upon mutation to alanine. However, placing a phenylalanine at the position results in slightly increased activity, in both the primer extension and single-nucleotide incorporation kinetics assays. This shows that the phenyl ring is necessary for activity and also implicates the hydroxyl group in 15 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

reducing basal activity. Genetic studies have shown that Y686H confers a temperature sensitive mutator phenotype to the cell, but the mutator effect of this mutation was attributed to a decreased ability of alpha to compete for the 3'-primer terminus with low-fidelity DNA Pol V, a Y-family polymerase.31 K758 and E688 are also second-shell residues, located 8 Å from the incoming nucleotide. Y686, E688, and K758 form an interacting network (Figure 3b). The K758L variant resulted in reduced activity, while the E688Q variant showed complete loss of activity. Disruption of the Y686-E688-K758 interaction may influence the electrostatic environment of the active site and thus plays a role in catalysis. H760 likely packs against G363, which is part of a loop harboring K362. In addition, the homology model suggests that H760 is adjacent to the ribose of the incoming nucleotide.

16 ACS Paragon Plus Environment

Page 16 of 33

Page 17 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 2. Percent primer extension activity of Pol III variants for which any activity was detected (Table S1 of the Supporting Information); PAGE analysis (Figure S2) is included in the Supporting Information. Plot of the percentage of the primer that has been extended after 2, 5, 15, and 30 min. The percent of primers extended beyond the primer was calculated from the ratio of primers extended to the total amount of primer.

17 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 33

Table 3. Kinetic Parameters of Alpha and Variants

Variant WT D405N Y686A K553A R362A H760L E688Q Y340F Y340S R390A Y686F R709A K758L D630A D630N R362K E547A E547Q

Residue Shell

Second Second First First First Second Second Second First Second First Second Remote Remote First Second Second

(min )

KM (µM)

Catalytic Efficiency -1 -1 (µM min )

Fold Decrease

50 ± 10

120 ± 30

0.42 ± 0.1

-

kcat -1

No Activity Detected

25 ± 5 12 ± 9 14 ± 9 120 ± 50 60 ± 20 1.4 ± 0.7 55 ± 8 40 ± 10 5±2 90 ± 70 120 ± 20

130 ± 20 150 ± 6 900 ± 200 79 ± 10. 100 ± 10 120 ±30 170 ± 43 140 ± 50 400 ± 200 180 ± 76 230 ± 30

0.20 ± 0.03 0.08 ± 0.05 0.015 ± 0.008 1.6 ± 0.7 0.58 ± 0.2 0.012 ± 0.01 0.34 ± 0.1 0.31 ± 0.03 0.014 ± 0.008 0.43 ± 0.21 0.51 ± 0.03

2.1 5.4 27 0.26 0.73 33 1.24 1.4 29 0.93 0.79

The measured kinetic parameters (Figures S3 and S4 of the Supporting Information) for all variants are listed. The six variants at the top had no detectable primer extension activity. All catalytic efficiencies are compared to that of WT. Variants are listed in order of POOL rank of the respective residues, except that the inactive variants are grouped separately from the active variants.

R390 and R709 are part of the conserved positively-charged residues in the fingers domain that interact with the phosphate groups of the incoming nucleotide (Figure 3a). R390A shows substantial loss of activity in both general primer extension and in single-nucleotide incorporation assays, but R709A shows only a slight decrease in primer extension activity and a

18 ACS Paragon Plus Environment

Page 19 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

slight decrease in single-nucleotide incorporation kinetics activity. Most of the 30-fold reduction in catalytic efficiency in the R390A variant is due to an increase in Km. The modest activity that is observed could be due to the presence of R710 in the vicinity, which could compensate for the absence of a single arginine. We were unable to detect binding interactions of the WT or any of the variant enzymes with a gel-shift assay; previous attempts to detect DNA binding with filter binding assay by Yanagihara were also futile.45

19 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Active site views of pol III alpha subunit. Residues (a) R709, R390, and Y686 (b), E688 and K758, (c) K553 and Y340, and (d) R362 and D405. In all images, the active site Asp residues are in red and the incoming dNTP is colored by atom identity. Distal residues are colored purple while first shell residues are blue. Residues essential for activity are circled.

20 ACS Paragon Plus Environment

Page 20 of 33

Page 21 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Complementation assays probe the ability of variants to function in vivo Specific mutations in Eco Pol III can cause temperature sensitivity, eliminating growth at higher temperatures.31, 32 Strains harboring temperature-sensitive alleles of alpha were subjected to complementation by WT and the different variants to determine which variants were able to rescue growth at the nonpermissive temperature. None of the inactive variants were able to restore growth at the nonpermissive temperatures (Figure 4). Having in vitro activity alone did not correlate with growth at higher temperatures; for example, both variants Y340S and K758L had some primer extension activity but failed to complement the temperature-sensitive strain. While activity alone is not sufficient to alleviate the sensitivity, there may be a minimum level of activity needed, as variants showing 5-fold or greater decreases in catalytic efficiency fail to complement the temperature sensitive strain; Y340S, R390A, K758L, and R3262K, variants are in this category. Y340 is a second-shell residue and is located behind D401 with respect to the substrate. Upon mutation Y340F, the variant retains its activity and complements the temperature-sensitive strain for growth. The variant Y340S, which shows a somewhat larger 5.4-fold loss of activity, fails to complement the temperature-sensitive strain, suggesting that the aromatic ring on the side chain contributes to catalysis. After 5 min, alpha Y340S only extends 4% of the primers compared to 34% for WT alpha. In the structure, Y340 shows a stacking interaction with F554. This stacking interaction could be important for correct orientation of residues D555 and K553, which would be disrupted on mutation of Y340 to S (Figure 3c).

21 ACS Paragon Plus Environment

Biochemistry

1.8 1.6

37℃/30℃

1.4

Relative CFU

1.2 1 0.8 0.6 0.4 0.2

nd

nd

0.79

0.93

29

nd

1.4

1.2

33

nd

0.73

0.26

nd

27

nd

Fold decrease

2.1

0

5.4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 33

Figure 4. Complementation of temperature-sensitive DV17 cells by alpha variants. The ratio of colony-forming units (cfu) for each variant at the higher, non-permissive temperature of 37 °C relative to the permissive temperature of 30 °C. The ratio of cfu at 42 °C relative to 30 °C is given in Supporting Information Figure S5. The table at the bottom indicates the fold decrease in steady-state kinetics activity, relative to WT alpha, given in Table 3. The notation “nd” indicates that no activity was detected.

DISCUSSION DNA pol III is the replicative DNA polymerase in E. coli and therefore is of central importance to survival. We applied the computational active site predictor POOL to identify residues that contribute to pol III activity, focusing on the alpha polymerase subunit of pol III. POOL correctly predicted the active site aspartic acid residues.25 We focused on the top 15 22 ACS Paragon Plus Environment

Page 23 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

POOL-predicted residues, which included residues in close contact with the incoming nucleotide substrate as well as residues more remote from the active site. Residues in contact with the incoming nucleotide substrate are defined as the first shell. Residues that are >5 Å from the substrate are considered second shell residues. By constructing and assaying variants based on the POOL predictions, we identified six variants resulting in a complete loss of activity, and three additional mutations that resulted in substantial losses in activity. Notably, several secondshell residues including D405, Y686, E688, and K758 were among those showing the largest decreases in activity. Determine of overall protein stability as measured by melting temperature revealed that all variants had similar stability. The crystal structure (PDB 2HNH) of alpha was obtained without DNA bound and our model was built based on a DNA-bound template; upon ligand binding some changes in conformation and in rotameric states are expected. While there is some uncertainty in the side chain positions of a model structure, the active site residues of the model align reasonably well with those of the Eco Pol III crystal structure. Figure 5 shows an alignment of the active site region of Pol III in the apo Eco crystal structure (PDB 2HNH) and our homology model constructed from a DNA-bound template structure. The three previously-known catalytic aspartate residues 25 are shown in red for the crystal structure and green for the model. Newlyidentified residues are colored yellow for the crystal structure and by element for the model. Upon binding of DNA, some changes in conformation and in rotameric states are likely. Note that the relative positions of the residues are generally similar, although second-shell residues D405 and R362 are shifted a little closer to each other in the model. The exception is secondshell residue K553, which forms a salt bridge to the catalytic D555 in the apo crystal structure but is shifted and forms a salt-bridge with the catalytic D401 in the model. In the Taq structure, 23 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the distance between the NZ atom of K616 (equivalent to K553 of Eco) and the OD2 atom of D463 (equivalent to D401 of Eco) is 3.5 Å; the NZ atom of K616 is also 4.8 Å from the OD2 atom of D618 (equivalent to D555 of Eco). The proximity of the second-shell K553 (Taq K616) to both catalytic aspartate residues suggests possible dynamic shifts in these contacts, as in the case of pol β R254,46 which aligns structurally with Eco K553. In addition to a structure of truncated Eco Pol III alpha (PDB ID 2HNH), only two other structures of C family polymerases are currently available. Thermus aquaticus (Taq) alpha (PDB ID 3E0D) is a homolog of Eco Pol III alpha; PolC (PDB ID 3F2B) is a replicative C family polymerase from gram positive Geobacillus kaustophilus (Gka).28, 35, 36 The active site residues in the Eco and Taq structures are identical as they share an overall 40% sequence identity. Gka PolC belongs to a different class of C family polymerases that are found in low G+C gram positive bacteria and shares low sequence identity of 25% to Eco alpha.47 Even with the low sequence identity, 10 of the 15 POOL-predicted positions are conserved in the two proteins (Figure 5). Pol β is a human DNA repair polymerase that belongs to the X family of polymerases. It has been observed that the palm domain of Pol III alpha and Pol β share a similar nucleotidyltransferase fold.28 In a previous study, the catalytic aspartate residues of Taq alpha were shown to align structurally with the respective aspartate residues in rat Pol β.48 A manual structure alignment of the top 15 POOL-predicted residues of Pol III alpha with human Pol β identified a number of other conserved positions that may help to infer functional roles for these residues (Figure 5b). The aligned spatial positions across the active sites of these polymerases can be found in Figure 5b.

24 ACS Paragon Plus Environment

Page 24 of 33

Page 25 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

D405N and R362A variants show complete loss of activity. D405 is a second shell residue, behind first-shell R362 with respect to the substrate. In the homology model, they lie at a distance of 1.8 Å from each other, which could indicate formation of a salt bridge (Figure 3d). The R362K mutation allows for detectable activity, but displays almost a 30-fold reduction in catalytic efficiency and fails to complement the temperature-sensitive strain for growth. While the charged character of the lysine side chain affords activity, it cannot replace the arginine perfectly. In the Eco alpha structure, R362 is flipped away from D405, but in human Pol β, the residue R258 (corresponding to Eco R362) forms a salt bridge with D192 (Eco D403) in the active form. After a conformational change in the presence of the correct nucleotide, R258 forms a salt bridge with E295 (Eco D405) to release D192 to carry out catalysis.46, 49 In the manual alignment (Figure 5) D405 and E295 do not appear to be aligned, but D405 could still play a similar role to modulate the active conformation.

25 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 33

PDB ID

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Eco alpha

2HNH

D403

D401

Y340

D555

D405

R390

Y686

R709

K553

K758

D630

R362

E547

H760

E688

Taq alpha

3E0D

D465

D463

Y402

D618

D467

R452

Y743

R766

K616

K815

D689

R424

E610

H817

E745

Gka alpha

3F2B

D975

D973

F869

D1098

N977

T962

C1212

M1234

K1096

K1273

E1177

R893

H1091

H1275

D1214

pol β

1BPX

D192

D190

M155

D256

E295*

R183

F278

R40

R254

R328

H285

R258

-

Y271

E335

Figure 5. a: Alignment of the active site region of Pol III in the crystal structure (PDB 2HNH) and our homology model. The three previously-known catalytic aspartate residues are shown in red for the crystal structure and green for the model. Newly-identified residues are colored yellow for the crystal structure and by element for the model. A calcium ion is rendered as a 26 ACS Paragon Plus Environment

Page 27 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

yellow ball. Image rendered in YASARA. b: (Top) Active site comparison of Eco Pol III and human Pol β. Most active site residues of Pol III (blue) have direct counterparts in Pol β (green). (Bottom) Active site comparison for the top 15 POOL predictions across related polymerases. *E295 does not align perfectly at that position but may play a similar role.

As mentioned above, alpha K553 is comparable to a similarly positioned residue in human Pol β, R254, which upon mutation to alanine reduces the kcat of Pol β 50-fold.46 Pol β R254 (Eco K553) is also believed to form a salt bridge with D256 (Eco D555) in the binary complex and switches to form a salt bridge with D190 (Eco D401) to help position D190 to coordinate the Mg2+ ion in the ternary complex.46 The Eco alpha mutation K553A resulted in a complete loss of activity and the variant failed to complement the temperature-sensitive strain. Human Pol β R183 (Eco R390) interacts with the β-phosphate group of the incoming nucleotide in the structure and Pol β R183A showed ~43-fold decrease in kpol values compared to WT.49 The POOL-predicted Eco alpha residues also align with some known tumor-associated Pol beta mutations.50 E295K in human Pol β is associated with gastric cancer51 and colorectal cancer, which is analogous to Eco D405, a second-shell residue with non-detectable activity upon the conservative mutation D405N.52 Another Pol β mutation, R183G, is associated with esophageal cancer.53 The analogous residue in Eco Pol III is R390, which displays significantly reduced activity upon mutation to alanine. In this work we used POOL, with THEMATICS and INTREPID as input features, to predict the catalytically active residues in the alpha subunit of DNA Pol III. Though this enzyme was first identified in 1971,54 only the three catalytic aspartate residues were previously identified as critical for activity.25 POOL predicted an extended active site for alpha, with seven of the top 15 residues being distal to or remote from the active site. Primer extension assays with 27 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

variants showed that of the 15 predicted residues, six of these resulted in a complete loss of activity upon mutation and mutations of the other nine predicted residues conferred varying degrees of activity compared to WT alpha. The six previously unexplored residues shown herein to be essential for catalysis are R362, K553, and H760 in the first shell and D405, Y686, and E688 in the second shell. A threshold activity was needed in order for different variants to restore growth to a temperature-sensitive strain at the non-permissive temperatures in vivo. All the constructs have comparable melting temperatures, which indicate that they are approximately as stable as WT alpha. Remote residues can play a number of different roles in enzyme function. They can help modulate the pKa of the catalytic residues, influence the electrical potential in the binding pocket, control dynamic processes required for catalysis, or help maintain the structural integrity of the enzyme. By comparing the active site of Pol III alpha with Pol β, a well-characterized enzyme, it can be inferred that the remote residues are located such that they can help position the catalytic residues in the correct orientation and provide the electrostatic environment for carrying out the reaction. Hence, mutations to these remote residues can sometimes cause the enzyme to be in an inactive conformation that cannot carry out its reaction. Active site and remote residues in DNA Pol III alpha can be a part of aromatic stacking interactions (Y340, Y686), can modulate active site geometry via salt bridge formation (D405, R362, K553, K758, E688), can influence the electrical potential in the active site (D405, E688, K758), or can interact with the incoming nucleotide (R390, R709, H760).

Acknowledgments

28 ACS Paragon Plus Environment

Page 28 of 33

Page 29 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

We thank Meindert Lamers and John Kuriyan (UC-Berkeley) for the expression clone for Eco alpha. We acknowledge technical assistance from John T. Lambert, Christopher Joshi, and especially Nicole M. Antczak and Hannah R. Stern. Funding Sources Support of this work by the National Science Foundation under grant MCB-1517290 to M.J.O. and P.J.B., American Cancer Society grant RSG-12-161-01-DMC to P.J.B., and by National Institute of Justice predoctoral fellowship 2015-R2-CX-0011 awarded to T.A.C., is gratefully acknowledged. Supporting Information The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.biochem.xxxxxx. Figures S1-S5 and Table S1 (PDF) and a file containing the coordinates for the homology model described in this paper.

29 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References 1. 2.

3.

4.

5.

6. 7.

8. 9.

10.

11. 12. 13.

14. 15.

16. 17.

Ringe, D., and Petsko, G. A. (2008) How Enzymes Work, Science 320, 1428-1429. Gao, Y.-F., Li, B.-Q., Cai, Y.-D., Feng, K.-Y., Li, Z.-D., and Jiang, Y. (2013) Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection, Mol. BioSyst. 9, 61-69. Somarowthu, S., Yang, H., Hildebrand, D. G., and Ondrechen, M. J. (2011) Highperformance prediction of functional residues in proteins with machine learning and computed input features, Biopolymers 95, 390-400. Brodkin, H. R., DeLateur, N. A., Somarowthu, S., Mills, C. L., Novak, W. R., Beuning, P. J., Ringe, D., and Ondrechen, M. J. (2015) Prediction of distal residue participation in enzyme catalysis, Protein Sci. 24, 762-778. Mills, C. L., Beuning, P. J., and Ondrechen, M. J. (2015) Biochemical functional predictions for protein structures of unknown or uncertain function, Comput. Struct. Biotechnol. J. 13, 182-191. Wei, Y., Ko, J., Murga, L. F., and Ondrechen, M. J. (2007) Selective prediction of interaction sites in protein structures with THEMATICS, BMC Bioinf. 8, 119. Ondrechen, M. J., Clifton, J. G., and Ringe, D. (2001) THEMATICS: a simple computational predictor of enzyme function from structure, Proc Natl Acad Sci U S A 98, 12473-12478. Ringe, D., Wei, Y., Boino, K. R., and Ondrechen, M. J. (2004) Protein structure to function: insights from computation, Cell. Mol. Life Sci. 61, 387-392. Somarowthu, S., Yang, H., Hildebrand, D. G. C., and Ondrechen, M. J. (2011) Highperformance prediction of functional residues in proteins with machine learning and computed input features, Biopolymers 95 390-400. Tong, W., Wei, Y., Murga, L. F., Ondrechen, M. J., and Williams, R. J. (2009) Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D Structure and sequence properties, PLoS Comput. Biol. 5, e1000266. Somarowthu, S., and Ondrechen, M. J. (2012) POOL server: machine learning application for functional site prediction in proteins, Bioinformatics 28, 2078-2079. Sankararaman, S., and Sjölander, K. (2008) INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentification, Bioinformatics 24, 2445-2452. Capra, J. A., Laskowski, R. A., Thornton, J. M., Singh, M., and Funkhouser, T. A. (2009) Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure, PLoS Comput. Biol. 5, e1000585. Bartlett, G. J., Porter, C. T., Borkakoti, N., and Thornton, J. M. (2002) Analysis of catalytic residues in enzyme active sites, J Mol Biol 324, 105-121. Porter, C. T., Bartlett, G. J., and Thornton, J. M. (2004) The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data, Nucleic Acids Res 32, D129-133. Lee, J., and Goodey, N. M. (2011) Catalytic Contributions from Remote Regions of Enzyme Structure, Chem. Rev. 111, 7595-7624. Rajagopalan, P. T. R., Lutz, S., and Benkovic, S. J. (2002) Coupling Interactions of Distal Residues Enhance Dihydrofolate Reductase Catalysis:  Mutational Effects on Hydride Transfer Rates, Biochemistry 41, 12618-12628. 30 ACS Paragon Plus Environment

Page 30 of 33

Page 31 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

18.

19.

20.

21.

22. 23.

24. 25. 26.

27.

28.

29.

30.

31.

32.

Somarowthu, S., Brodkin, H. R., D'Aquino, J. A., Ringe, D., Ondrechen, M. J., and Beuning, P. J. (2011) A tale of two isomerases: compact versus extended active sites in ketosteroid isomerase and phosphoglucose isomerase, Biochemistry 50, 9283-9295. Brodkin, H. R., Novak, W. R. P., Milne, A. C., D'Aquino, J. A., Karabacak, N. M., Goldberg, I. G., Agar, J. N., Payne, M. S., Petsko, G. A., Ondrechen, M. J., and Ringe, D. (2011) Evidence of the participation of remote residues in the catalytic activity of Cotype nitrile hydratase from Pseudomonas putida, Biochemistry 50, 4923-4935. Walsh, J. M., Parasuram, R., Rajput, P. R., Rozners, E., Ondrechen, M. J., and Beuning, P. J. (2012) Effects of non-catalytic, distal amino acid residues on activity of E. coli DinB (DNA polymerase IV), Environ. Mol. Mutagen. 53, 766-776. Sharma, A., Kottur, J., Narayanan, N., and Nair, D. T. (2013) A strategically located serine residue is critical for the mutator activity of DNA polymerase IV from Escherichia coli, Nucleic Acids Res 41, 5104-5114. Kornberg, A., and Baker, T. A. (1992) DNA replication, 2nd ed., W.H. Freeman, New York. Kim, D. R., Pritchard, A. E., and McHenry, C. S. (1997) Localization of the active site of the alpha subunit of the Escherichia coli DNA polymerase III holoenzyme, J. Bacteriol. 179, 6721-6728. McHenry, C. S., and Crow, W. (1979) DNA polymerase III of Escherichia coli. Purification and identification of subunits, J. Biol. Chem. 254, 1748-1753. Pritchard, A. E., and McHenry, C. S. (1999) Identification of the Acidic Residues in the Active Site of DNA Polymerase III, J. Mol. Biol. 285, 1067-1080. Barros, T., Guenther, J., Kelch, B., Anaya, J., Prabhakar, A., O'Donnell, M., Kuriyan, J., and Lamers, M. H. (2013) A structural role for the PHP domain in E. coli DNA polymerase III, BMC Struct. Biol. 13, 8. Lapenta, F., Monton Silva, A., Brandimarti, R., Lanzi, M., Gratani, F. L., Vellosillo Gonzalez, P., Perticarari, S., and Hochkoeppler, A. (2016) Escherichia coli DnaE Polymerase Couples Pyrophosphatase Activity to DNA Replication, PLoS One 11, e0152915. Lamers, M. H., Georgescu, R. E., Lee, S.-G., O'Donnell, M., and Kuriyan, J. (2006) Crystal Structure of the Catalytic alpha Subunit of E. coli Replicative DNA Polymerase III, Cell 126, 881-892. McCauley, M. J., Shokri, L., Sefcikova, J., Venclovas, C. e., Beuning, P. J., and Williams, M. C. (2008) Distinct Double- and Single-Stranded DNA Binding of E. coli Replicative DNA Polymerase III α Subunit, ACS Chem. Biol. 3, 577-587. Georgescu, R. E., Kurth, I., Yao, N. Y., Stewart, J., Yurieva, O., and O'Donnell, M. (2009) Mechanism of polymerase collision release from sliding clamps on the lagging strand, EMBO J. 28, 2981-2991. Vandewiele, D., Fernández de Henestrosa, A. R., Timms, A. R., Bridges, B. A., and Woodgate, R. (2002) Sequence analysis and phenotypes of five temperature sensitive mutator alleles of dnaE, encoding modified α-catalytic subunits of Escherichia coli DNA polymerase III holoenzyme, Mutat. Res., Fundam. Mol. Mech. Mutagen. 499, 85-95. Fijalkowska, I. J., and Schaaper, R. M. (1993) Antimutator mutations in the alpha subunit of Escherichia coli DNA polymerase III: identification of the responsible mutations and alignment with other DNA polymerases, Genetics 134, 1039-1044.

31 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

33.

34.

35.

36.

37.

38. 39.

40. 41. 42. 43.

44. 45.

46.

47.

Jacewicz, A., Trzemecka, A., Guja, K. E., Plochocka, D., Yakubovskaya, E., Bebenek, A., and Garcia-Diaz, M. (2013) A Remote Palm Domain Residue of RB69 DNA Polymerase Is Critical for Enzyme Activity and Influences the Conformation of the Active Site, PLoS One 8, e76700. Eckenroth, B. E., Towle-Weicksel, J. B., Nemec, A. A., Murphy, D. L., Sweasy, J. B., and Doublie, S. (2017) Remote Mutations Induce Functional Changes in Active Site Residues of Human DNA Polymerase beta, Biochemistry 56, 2363-2371. Wing, R. A., Bailey, S., and Steitz, T. A. (2008) Insights into the replisome from the structure of a ternary complex of the DNA polymerase III alpha-subunit, J. Mol. Biol. 382, 859-869. Evans, R. J., Davies, D. R., Bullard, J. M., Christensen, J., Green, L. S., Guiles, J. W., Pata, J. D., Ribble, W. K., Janjic, N., and Jarvis, T. C. (2008) Structure of PolC reveals unique DNA binding and fidelity determinants, Proc. Natl. Acad. Sci. U.S.A. 105, 2069520700. Krieger, E., Joo, K., Lee, J., Lee, J., Raman, S., Thompson, J., Tyka, M., Baker, D., and Karplus, K. (2009) Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8, Proteins: Struct., Funct., Bioinf. 77, 114-122. Ramachandran, G. N., Ramakrishnan, C., and Sasisekharan, V. (1963) Stereochemistry of polypeptide chain configurations, J. Mol. Biol. 7, 95-99. Lovell, S. C., Davis, I. W., Arendall, W. B., de Bakker, P. I. W., Word, J. M., Prisant, M. G., Richardson, J. S., and Richardson, D. C. (2003) Structure validation by Cα geometry: ϕ,ψ and Cβ deviation, Proteins: Struct., Funct., Bioinf. 50, 437-450. Wechsler, J. A., and Gross, J. D. (1971) Escherichia coli mutants temperature-sensitive for DNA synthesis, Mol. Gen. Genet. 113, 273-284. Meng, E. C., Pettersen, E. F., Couch, G. S., Huang, C. C., and Ferrin, T. E. (2006) Tools for integrated sequence-structure analysis with UCSF Chimera, BMC Bioinf. 7, 339. System, P. M. G. (Version 1.2r3pre) Schrödinger, LLC. Sawaya, M. R., Prasad, R., Wilson, S. H., Kraut, J., and Pelletier, H. (1997) Crystal structures of human DNA polymerase beta complexed with gapped and nicked DNA: evidence for an induced fit mechanism, Biochemistry 36, 11205-11215. DeLano, W. L., Ultsch, M. H., de Vos, A. M., and Wells, J. A. (2000) Convergent solutions to binding at a protein-protein interface, Science 287, 1279-1283. Yanagihara, F., Yoshida, S., Sugaya, Y., and Maki, H. (2007) The dnaE173 mutator mutation confers on the alpha subunit of Escherichia coli DNA polymerase III a capacity for highly processive DNA synthesis and stable binding to primer/template DNA, Genes Genet. Syst. 82, 273-280. Menge, K. L., Hostomsky, Z., Nodes, B. R., Hudson, G. O., Rahmati, S., Moomaw, E. W., Almassy, R. J., and Hostomska, Z. (1995) Structure-function analysis of the mammalian DNA polymerase β active site: role of aspartic acid 256, arginine 254, and arginine 258 in nucleotidyl transfer, Biochemistry 34, 15934-15942. Lahiri, I., Mukherjee, P., and Pata, J. D. (2013) Kinetic characterization of exonucleasedeficient Staphylococcus aureus PolC, a C-family replicative DNA polymerase, PLoS One 8, e63489.

32 ACS Paragon Plus Environment

Page 32 of 33

Page 33 of 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

48.

49.

50. 51.

52.

53.

54.

Bailey, S., Wing, R. A., and Steitz, T. A. (2006) The Structure of T. aquaticus DNA Polymerase III Is Distinct from Eukaryotic Replicative DNA Polymerases, Cell 126, 893904. Kraynov, V. S., Showalter, A. K., Liu, J., Zhong, X., and Tsai, M.-D. (2000) DNA Polymerase β:  Contributions of Template-Positioning and dNTP Triphosphate-Binding Residues to Catalysis and Fidelity, Biochemistry 39, 16008-16015. Starcevic, D., Dalal, S., and Sweasy, J. B. (2004) Is there a link between DNA polymerase beta and cancer?, Cell Cycle 3, 998-1001. Iwanaga, A., Ouchida, M., Miyazaki, K., Hori, K., and Mukai, T. (1999) Functional mutation of DNA polymerase beta found in human gastric cancer--inability of the base excision repair in vitro, Mutat. Res. 435, 121-128. Donigan, K. A., Sun, K. W., Nemec, A. A., Murphy, D. L., Cong, X., Northrup, V., Zelterman, D., and Sweasy, J. B. (2012) Human POLB gene is mutated in high percentage of colorectal tumors, J. Biol. Chem. 287, 23830-23839. Li, M., Zang, W., Wang, Y., Ma, Y., Xuan, X., Zhao, J., Liu, L., Dong, Z., and Zhao, G. (2014) DNA polymerase beta mutations and survival of patients with esophageal squamous cell carcinoma in Linzhou City, China, Tumor Biol. 35, 553-559. Kornberg, T., and Gefter, M. L. (1971) Purification and DNA synthesis in cell-free extracts: properties of DNA polymerase II, Proc Natl Acad Sci U S A 68, 761-764.

33 ACS Paragon Plus Environment