Using Graphs of Dynamic Hydrogen-Bond Networks To Dissect

Apr 30, 2019 - DExD/H-box proteins are soluble enzymes that couple binding and hydrolysis of adenosine triphosphate (ATP) with reactions involving RNA...
0 downloads 0 Views 9MB Size
Article Cite This: J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

pubs.acs.org/jcim

Using Graphs of Dynamic Hydrogen-Bond Networks To Dissect Conformational Coupling in a Protein Motor Konstantina Karathanou and Ana-Nicoleta Bondar* Freie Universität Berlin, Department of Physics, Theoretical Molecular Biophysics Group, Arnimallee 14, D-14195 Berlin, Germany

J. Chem. Inf. Model. Downloaded from pubs.acs.org by ALBRIGHT COLG on 05/01/19. For personal use only.

S Supporting Information *

ABSTRACT: DExD/H-box proteins are soluble enzymes that couple binding and hydrolysis of adenosine triphosphate (ATP) with reactions involving RNA metabolism or bind and push newly synthesized proteins across bacterial cell membranes. Knowledge of the reaction mechanism of these enzymes could help the development of new therapeutics. In order to explore the mechanism of longdistance conformational coupling in SecA, the DEAD-box motor of the Sec protein secretion in bacteria, we implemented algorithms that provide simplified graph representations of the protein’s dynamic hydrogen-bond networks. We find that mutations near the nucleotide-binding site or changes of the nucleotidebinding state of SecA associate with altered dynamics at the preprotein binding domain and identify extended networks of hydrogen bonds that connect the active site of SecA to the region where SecA binds newly synthesized secretory proteins. Water molecules participate in hydrogen-bonded water chains that bridge functional domains of SecA and could contribute to long-distance conformational coupling.



INTRODUCTION Proteins with the conserved DExD/H motif (Asp-Glu-x-Asp/ His, with x indicating any amino acid residue) are helicases involved in RNA metabolism.1−3 Proteins from this family have been implicated in cancer.4 SecA, a well-studied model system for the DEAD-box proteins, functions as the protein motor of the Sec protein secretion pathway in bacteria, where it uses the energy from binding and hydrolysis of ATP to push secretory proteins across the membrane-embedded SecYEG protein translocon.5 Hydrogen (H) bonds between functional domains of the protein are thought to participate in longdistance allosteric coupling.6−9 Here, we present an algorithm to derive simple graphical representations of a protein’s Hbond networks and used these graphs to explore how SecA responds to mutations and changes in the nucleotide-binding state. SecA is a relatively large protein that consists of several distinct functional domains (Figure 1A). The long helix of the Helical Scaffold Domain (HSD) connects to all other domains of SecA, with which it participates in dynamic H-bond clusters.8−10 The two short helices of the HSD, denoted as the two-helix finger (2HF, Figure 1A), could participate in pushing preproteins through the translocon,11,12 but the precise role of the 2HF in SecA function remains a matter of investigation.13−15 The nucleotide binds at the interface between the two Nucleotide-Binding Domains (NBD1 and NBD2, Figure 1A). The secretory protein binds at the Protein Binding Domain (PBD), which sprouts from NBD1 (Figure 1). The PBD is a mobile region of SecA,16 and during the conformational dynamics of SecA it is thought to sample © XXXX American Chemical Society

three main orientations relative to NBD2 and the Helical Wing Domain (HWD).5 The PBD couples allosterically to the nucleotide-binding site: binding of preprotein at the PBD is thought to loosen the conformation of the DEAD motor and thus impact release of ADP.17 Crystal structures of SecA provided invaluable information about the three-dimensional structure of SecA from different organisms − Bacillus subtilis,18−20 Escherichia coli,21 Mycobacterium tuberculosis,22 Thermotoga maritima,23,24 and Thermus thermophilus25 (for a review see, e.g., ref 5). Most of these crystal structures captured SecA in its apo or ADP-bound states. A glimpse into the three-dimensional structure of ATPbound SecA is provided by the crystal structure of E. coli SecA from ref 21, which indicates coordinates for the ATP molecule, but it lacks coordinates for the magnesium ion and for most of the PBD. The structure of E. coli SecA solved by Nuclear Magnetic Resonance (NMR)7 contains the PBD and a signal peptide model, but it lacks coordinates for the nucleotide. A model of SecA bound to ADP and a peptide model was derived recently for B. subtilis SecA using computer simulation approaches;9 in this model, the signal peptide makes hydrophobic contacts and dynamic H bonds with SecA, and its presence at the interface between the PBD, HWD, and HSD associates with altered motions of NBD2.9 To dissect the mechanism of long-distance conformational coupling in SecA, here we study the response of SecA to Special Issue: Women in Computational Chemistry Received: December 31, 2018

A

DOI: 10.1021/acs.jcim.8b00979 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 1. Architecture and conformational dynamics of SecA. (A) Overlap of ADP-bound B. subtilis SecA in crystal structures PDB ID:1TF219 (cyan) and PDB ID:1M7418 (red), which were used as starting coordinates for computations discussed here. NBD1 and NBD2 are also referred to as the Nucleotide-Binding Folds NBF1 and NBF2, respectively;26 NBD2 is also referred to as the Intramolecular Regulator of ATPase 2, IRA2,27 and NBD1 as the NBD. The PBD is also denoted as the preprotein cross-linking domain, PPXD. Note the different orientation of the PBD in the two structures. (B) H bonds between functional domains of SecA (red lines) and H bonds between groups of the same protein domain (gray lines). On average, at any moment of time during Sim1 there are ∼200 intradomain H bonds and ∼60 interdomain H bonds. (C) Close view of the Hbond network at the interface between NBD1 and NBD2. Unless specified otherwise, all molecular graphics were prepared with VMD.28

on starting conditions used for the simulations.8 H-bond maps can also be computed from sets of protein crystal structures.10 The H-bond maps of SecA revealed that the NBDs connect to other regions of the protein via networks of H bonds8,9 and highlighted the importance of molecular dynamics for probing dynamic H bonds that might not be observed in a limited set of crystal structures.10 The algorithms present here aim to facilitate mapping results of H-bond analyses onto protein structures and calculating Hbond paths that can connect sites of interest for the functioning of the protein. To this aim, we rely on concepts from graph theory and social networks. We use these algorithms to calculate graphs of the H-bond networks of SecA and identify regions of SecA where dynamics of H bonds changes with mutation or with the nucleotide-binding state, and protein groups that might be particularly important for long-distance conformational coupling. Analyses of graphs computed for SecA indicate that perturbed dynamics in the vicinity of the nucleotide-binding site associates with altered H-bond dynamics of the PBD. Extensive H-bond networks that span SecA could facilitate coupling between different regions of the protein. Water molecules bound at the interfaces between protein domains may contribute to inter-domain and long-distance conformational coupling.

mutations that alter H bonding in the vicinity of the nucleotide-binding site and model B. subtilis SecA bound to ATP. The four mutations studied here − T107N, E208Q, R489K, and R517A, are known to impact the functioning of SecA. In E. coli SecA, the T109N mutant (corresponding to B. subtilis T107) has higher ATPase activity than the wild-type protein;29 by contrast, the E210Q and R509K mutations (B. subtilis E208 and R489, respectively) inhibit the ATP activity;29 E210Q also abolishes conformational coupling between SecA and the SecY translocon.16 The salt bridge between D217 and R566 (E. coli SecA numbering, corresponding to D215 and R517 in B. subtilis SecA), denoted as Gate1, is essential for the coupling between the PBD and the DEAD motor,30 and the R566A mutation inhibits translocation of preprotein.30 The complexity of SecA, with its numerous H bonds9 and long-distance conformational coupling between functional domains, calls for specialized algorithms to characterize the response of SecA to perturbations such as mutations or changes in the nucleotide-binding state. Network analyses are increasingly being used to study the dynamics of complex biosystems (for reviews see, e.g., refs 31 and 32). Networks (or graphs) represent interactions denoted as edges between elements of the network usually referred to as nodes in the graph.31 Networks can contain important elements, denoted as hubs, which might have many neighbors or impact protein conformational dynamics;31 nodes of a network can also represent local energy minima of the energy landscape33−35 or protein H bonds.36 Algorithms specialized for analyses of dynamics and H bonding in biomolecules include, e.g., PySFD37 (Significant Feature Differences analyzer for Python), HBAT,38 HBonanza39 (Hydrogen-Bond analyzer), the MDAnalysis toolkit,40,41 algorithms for identifying protein/water Hbond networks relevant to proton binding and proton transfer,42,43 and lipid/water H-bond clusters at membrane surfaces.44 In previous analyses of SecA H bonds we used twodimensional H-bond maps, in which H bonds between selections of H-bond donors and H-bond acceptors are given as dots on the map, color coded according to the occupancy computed from the simulation trajectory.8,9 Difference H-bond maps computed between two simulation trajectories can facilitate the identification of sites where H-bonding depends



METHODS

ADP-Bound Wild-Type and Mutant SecA. The ADPbound SecA structure 1M7418 was prepared as follows. H atom coordinates for the protein, ADP, and the 46 crystal-structure waters were constructed using CHARMM.45 All titratable protein groups were considered with their standard protonation states, i.e., Asp and Glu side chains are negatively charged, Arg and Lys are positively charged, and all His side chains are singly protonated on Nε. This SecA structure included coordinates for amino acid residues from M1 to G802. SecA was placed in the center of a water box, and waters of the water box overlapping with groups solved in the crystal structure were deleted; we have further deleted waters of the water box within 6 Å of the ADP molecule. Sodium ions were added for charge neutrality. The resulting simulation system, denoted as Sim1 (Table 1), includes a total of 277,124 atoms. An independent simulation system for ADP-bound SecA (Sim8) was prepared from PDB ID:1TF219 using the same B

DOI: 10.1021/acs.jcim.8b00979 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

to geometry optimization, heating, equilibration, and production runs. A limitation of the protocol used for docking ATP to B. subtilis SecA is that details of how ATP interacts with the protein, and of how the protein conformation couples to the nucleotide-binding state, could depend on the sequence of SecA and on the crystal structures used for docking. Moreover, the crystal structure of E. coli ATP-bound SecA,21 which has coordinates for two SecA monomers, indicates variability in how specific groups of SecA interact with ATP: The dihedral angle PA-O3A-PB-O3E of ATP is −52.1° in monomer (chain) A, as compared to −170.6° in monomer (chain) B (Figure S1); and, as it had been noted, there are differences in how specific protein groups interact with the ATP molecule in the two protein monomers from the crystal structure of ATPbound E. coli SecA.21 We thus suggest that the approach we used to model ATP-bound B. subtilis SecA is reasonable. Force-Field and Molecular Dynamics Simulations. We used the CHARMM force field parameters for the protein,47,48 ions and nucleic acids,49−51 and the TIP3P water model.52 Molecular dynamics simulations were performed using NAMD.53,54 Following geometry optimization, heating was performed in the NVT ensemble (constant number of particles N, constant volume V, and constant temperature T). The simulations continued with production runs in the NPT ensemble (constant N, constant pressure P = 1 bar, and constant T = 300 K), performed using a Langevin dynamics scheme and a Nose-Hoover piston,55,56 with isotropic pressure coupling. During heating and the first 1 ns of equilibration we used an integration step of 1 fs; for all remaining production runs, we used a multiple-time-step integration scheme57 with integration steps of 2 fs for short-range nonbonded forces and 4 fs for long-range electrostatics. Short-range real space interactions were treated with a smooth switch function between 8 and 12 Å. Coulomb interactions were described with smooth-particle mesh Ewald.58,59 Covalent bonds to H atoms were kept fixed.60 Coordinates were saved each 10 ps. Unless otherwise specified, all average values were computed from the last 100 ns of each simulation. To investigate water dynamics on the surface of SecA we performed additional simulations in which we used the last snapshot of Sim1 to initiate simulations in the NVE ensemble (constant volume V and constant energy E). The NVE simulation, labeled as Sim1′, was prolonged to 1 ns using an integration step of 1 fs; coordinates were saved every 10 fs. Analysis of H-Bond Networks. We consider that two groups are H bonded when the distance between the H and the acceptor heavy atom, dHA, is ≤2.5 Å. To identify H-bond networks we implemented in VMD28 an algorithm (Scheme 1) that allows graphical visualization of the network and quantification of the H-bond connections. In step 1 of the algorithm (Scheme 1) we derive lists of the H-bonded pairs of atoms and their interaction distances. We then track the H-bond occupancy, which is defined as the percentage of the length of the analyzed trajectory in which two groups are H bonded. In step 2, the information derived at step 1 is used to draw the H-bond network. For clarity, only one line connects a given pair of H-bonding groups, and the lines interconnect Cα atoms of the amino acid residues. Cα atoms are thus used as the representative node of each protein group, and lines are color-coded to indicate the occupancy values between amino acid residue pairs. We then group H bonds according to their average length as calculated from the

Table 1. Summary of Simulations Performed protein sim

structure

type

Sim1, 1′ Sim2 Sim3 Sim4 Sim5 Sim6 Sim7 Sim8 Sim9

1M74

wild-type T107N E208Q1 E208Q2 R489K R517A wild-type wild-type wild-type

1TF2 1TF2

nucleotide-binding pocket

lengtha (ns)

rmsdb (Å)

ADP, Mg2+

203.8 203.6 204.5 200.6 199.6 203.9 203.8 176.0 222.0

1.5 1.7 1.7 1.5 1.5 1.6 1.4 1.0 1.1

ATP, Mg2+ ADP, Mg2+ ATP, Mg2+

“Length” refers to the NPT production runs performed without any constraints on the coordinates of the atoms. The 1 ns NVE simulation Sim1′ was initiated from the last snapshot of Sim1. b“rmsd” gives the average rmsd of all structured regions of SecA computed from the last 100 ns of Sim1−Sim9; as structured regions we considered NBD1, NBD2, the SD and 2HF regions of the HSD, and the structured regions of the PBD and HWD. Decompositions of the rmsd profiles for the various domains of SecA are given in Figures S2 and S3. a

approach as for Sim1 above. In Sim8 the SecA structure includes coordinates for amino acid residues from M1 to I780, the ADP molecule, magnesium ion, 26 crystallographic water molecules, bulk water, and sodium ions added for charge neutrality, for a total of 196,703 atoms. The mutant SecA systems Sim2−Sim6 were prepared starting from the wild-type simulation system Sim1 by using CHARMM45 to replace the mutated protein side chain as needed. The following mutations were modeled for independent SecA simulations: T107N (Sim2), E208Q (Sim3, Sim4), R489K (Sim5), and R517A (Sim6, see Table 1). For E208Q we performed two independent simulations, distinguished by which of the carboxylate oxygen atoms of E208 (Oε1 or Oε2) were replaced with -NH2. For clarity, we refer to these two E208Q mutants as E208Q1 and E208Q2, respectively. ATP-Bound, Wild-Type B. subtilis SecA. The crystal structure PDB ID:2FSG21 gives coordinates for E. coli SecA bound to ATP. This E. coli SecA structure is, however, difficult to use in atomistic simulations, because it lacks coordinates for most of the PBD.21 To model SecA bound to ATP, we used Coot46 to overlap ATP-bound E. coli SecA (PDB ID:2FSG, chain A) onto the B. subtilis ADP-bound SecA and then replaced ADP from the B. subtilis structure with the ATP molecule taken from the overlapped E. coli structure. Overlaps were prepared separately for the ADP-bound structures of B. subtilis SecA 1M7418 (Sim7) and 1TF219 (Sim9). Since the crystal structure PDB ID:2FSG of ATP-bound E. coli SecA lacks coordinates for a magnesium ion, we kept the magnesium ion from the corresponding ADP-bound B. subtilis crystal structure. There were no severe clashes between atoms of the ATP molecule and B. subtilis SecA in either of the two overlaps. In the overlap between 2FSG and 1M74 (for Sim7), the shortest distances between protein heavy atoms and ATP are ATP-O2γ:R136-NH2 (1.74 Å), ATP-O2γ:G490-O (1.77A), and ATP-Cδ:M79-Sδ (1.82 Å, Figure S1A). In the overlap between 2FSG and 1TF2 (for Sim9), a somewhat short distance of 1.4 Å is observed only for ATP-O2γ:G490-O; the distance between ATP-O2γ and R136-NH2 is 3.8 Å, and M79 is within 2 Å of the ATP molecule (Figure S1B). These two models of ATP-bound B. subtilis SecA were then subjected C

DOI: 10.1021/acs.jcim.8b00979 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Step 3 of the algorithm computes occupancies for all Hbond pairs. The occupancy is defined as the percentage of time, during the trajectory segment used for analyses, for which the H-bond criterion is met. A pair of amino acid residues can H bond via different atoms−for example, a Thr hydroxyl group could H bond to either of the carboxylate oxygen atoms of Asp/Glu, to the both of oxygen atoms, or to the oxygen atom of the backbone carbonyl group. We calculate the occupancy of each of these H bonds. Then, for simplicity of the graphical representations, the strength of H bonding between the two amino acid residues is calculated based on the H bond with largest occupancy. The information about the overall strength of the H bonds is then mapped onto the network of the protein H bonds. Step 4 of the algorithm involves specialized analyses of the H-bond networks (Scheme 1), as summarized below. Finding Shortest-Distance Pathways between Domains of SecA. To find out paths of connections in the protein, we calculate an adjacency matrix (Scheme 1); this is a square matrix that indicates whether the protein pairs of vertices are connected or not by H bonds in the graph. The usage of the adjacency matrix allows us to monitor all possible connections of each node and create paths of connections along the simulation time used for analysis. We used Dijkstra’s algorithm61 to derive the shortestdistance H-bond paths between domains of SecA. Dijkstra’s algorithm finds the shortest distance (least cost) path between two nodes of a graph G = (V,E) with vertices (nodes) V and edges E >0.61 Our implementation below of Dijkstra’s algorithm is according to ref. 61. In the case of the SecA analyses here, we used as edge weights the distances between pairs of amino acid residues, taken according to the strength definition found when analyzing the H bonds (see section above on Analysis of H-Bond Networks). The algorithm starts with an initial (source) and end node and finds the shortest path between these two nodes.61 Each node is first assigned a tentative distance value that is zero for the source node and infinite for all other nodes, and all nodes are set as unvisited−i.e., no edges connect the nodes.61 The tentative distance from the current node to the nearest neighbor is the shortest distance from the source to each node. A tentative distance being infinite means absence of a direct connection.61 The algorithm calculates the direct distance between each unvisited node and the source node. This becomes the new tentative distance, and the source node is marked as visited. Then, the node with the smallest tentative distance−i.e., the smallest distance from the source, is set as the current node. For each of the unvisited neighbors of the current node, the algorithm calculates the sum of two distances: the distance from the source node to the current node and the direct distance between each unvisited node and the current node. If this sum is smaller than the previous distance from neighbor to source, the sum becomes the new tentative distance to each neighbor.61 When all neighbors of the node are considered, the current node is marked as visited, and it is removed from the set of unvisited nodes. The node with the smallest new tentative distance is now set as the current node, and the procedure is continued until the final destination node has been marked as visited. Graph Connectivity: Betweenness Centrality and Degree Centrality. In graph theory, betweenness centrality,62

Scheme 1. Algorithm Used for Simplified Representations of Protein H-Bond Networksa

a Briefly, in step 1 of the algorithm the simulation trajectory is read, and all H bonds are identified for each coordinate set of the simulation. In step 2, we use a distance criterion to group H bonds as strong, medium, and weak; the sums of these H bonds are computed separately and used to choose the representation of the H bond in molecular graphics. Step 3 consists of computing occupancies of the H bonds and the adjacency matrix. In step 4, the adjacency matrix is used to compute H-bond paths of shortest distance, calculate graphs of the protein H bonds, and characterize the H-bond network.

simulation trajectory. Strong, medium, and weak H bonds are denoted when dHA ≤ 1.7 Å, 1.7 Å < dHA ≤ 1.9 Å, and 1.9 Å < dHA ≤ 2.5 Å, respectively. For simplicity of the H-bond representation according to the average length of the H bond, we first calculate, separately, the sum of the number of strong, medium, vs weak H bonds for each pair of heavy atoms that H bond, as sampled during the trajectory segment used for data analysis. The largest of these three sums is then taken to characterize that particular H bond as being a strong, medium-strength, or weak H bond. In the case where the two or all three sums above have values close to each other, we proceeded as follows. First, we compare the sum of the number of H bonds considered to be strong with the 25% of either of the other two sums. If this comparison indicates that the sum of the number of strong H bonds is larger than 25% of the sum obtained for the weak and medium H bonds, we consider that the H bond is strong; if this condition for the sums is not met, but the sum of the number of medium H bonds is larger than 25% of the sum of strong and weak H bonds, the H bond is considered medium strength. Otherwise, the H bond is considered as weak. In Scheme 1, this analysis within Step 2 of the algorithm is denoted as “strength control”. D

DOI: 10.1021/acs.jcim.8b00979 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling abbreviated here by BC, gives a measure of the communication flow between network nodes based on shortest paths. In other words, BC measures how often each graph node appears on a shortest path between two nodes in the graph. The BC is also related to the overall connectivity of the network, as removing nodes with high BC values could disconnect the network. The betweenness of a node V linking nodes (v1, v2) is given by63 c(V ) =

∑ v1, v2 # V

ADP-bound SecA (Table 1). The implementation of water residence time estimates was done by first performing test simulations on native-state α-lactalbumin (PDB ID:1A4V66) in a box of water molecules, using the simulation setup as described in ref 65. We could reproduce the mean residence time of waters solvating α-lactalbumin within the first solvation layer - we found, 25.1 ps, as compared to 25.9 ps in ref 65. Water-Mediated H-Bonds between Functional Domains of SecA. To identify water-mediated H bonds between domains of SecA, we first inspected the results of water residence times performed for each amino acid residue and selected for further analyses only those amino acid residues for which water molecules remain close for a long simulation time. We then searched for H-bonded water bridges that can interconnect any two groups from this list that are part of different domains of SecA. The search for protein/water bridges was performed with an algorithm similar to that used by us to analyze water/lipid Hbond networks44 and water/carboxylate bridges on a protein’s surface.42 In this algorithm, we search for the shortest-distance connection between two protein groups, mediated by Hbonded water chains with maximum five water molecules. We start by selecting water molecules within the H-bond distance from oxygen or H atoms of each of the protein groups of interest. If two protein groups H bond to the same water molecule, a water bridge with length L = 1 has been found. For water bridges with L > 1, the algorithm continues by treating waters found in the first step as nodes for a new search for Hbonded waters. The algorithm checks, at each step, whether water molecules of H-bonded chains with L ≤ 5 interconnect two protein groups from two different domains of SecA. For each pair of protein groups bridged by water chain, we keep only one, shortest distance bridge. To derive an overall picture of the dynamics of water bridges between protein domains, we calculate the occupancy of these bridges. The occupancy is defined as the percentage of the length of the analyzed trajectory during which a water bridge of certain length is present. Unless specified otherwise, analyses of protein dynamics and protein H-bond networks were performed using the last 100 ns of each Sim (Table 1). Analyses of water residence times and protein/water bridges were performed using the entire length of Sim1′ (Table 1).

nv1v2(V ) NV1V2

(1)

where nv1v2(V) is the number of shortest paths between nodes v1 and v2 passing through node V, and NV1V2 is the total number of shortest paths from v1 to v2.63 BC values can depend on the number of nodes in the graph, e.g., large graphs with many nodes might have nodes with high BC values. To account for this, the BC can be normalized by the number of pairs of nodes in the graph (excluding node V). 1 The BC is divided by 2 (|N | − 1)(|N | − 2) in the case of undirected graphs and by (|N|−1)(|N|−2) in the case of directed graphs, where N is the number of nodes in the graph. The normalization factor for directed graphs is twice that of undirected graphs, as the path from v1 to v2 could differ from that from v2 to v1. The normalized BC value c(V) ranges from 0 to 1, indicating zero probability and, respectively, 100% probability that the shortest path between two random nodes is present through a given node. The degree centrality (or valency, abbreviated here as DC) of a graph G counts how many edges E are connected to a node, or how many connected neighbors a node has.64 The value of the DC ranges from 0 to N−1; DC = N − 1 indicates that the graph contains one central node to which all other nodes are connected−this graph is denoted as the star graph. We construct here graphs G = (V, E), where nodes V are the amino acid residues of SecA, and E are the edges which represent the H-bond connections between the nodes V. As the edges have no orientation, our network of H-bond connections is an undirected graph. We normalize the DC centrality scores by 1 ( |N | − 1)(|N | − 2), such that the normalization represents 2 the probability that the shortest path between two random nodes will be present through a given node. Water Dynamics on the Surface of SecA. To estimate the water lifetime at the first hydration shell of the protein, we used the NVE simulations to compute the residence time correlation function, CR(t), as described in ref. 65: CR(t ) =

1 N (t )



RESULTS AND DISCUSSION We report on results from nine independent Sims with a total sampling time of 1.8 μs (Table 1). To probe the response of SecA to mutations that alter H bonding and to changes in the nucleotide-binding state, the Sims were performed starting from two crystal structures of ADP-bound SecA (Figure 1A). The time scales of our atomistic simulations, which range from ∼176 ns to ∼222 ns, might be insufficient for sampling the full extent of conformational changes caused by the perturbations we study. Nevertheless, the molecular dynamics simulations provide clues about molecular interactions of SecA that contribute to long-distance conformational couplings. Inspection of the Cα root-mean-square distances (rmsd, Figures S2, S3, S4) indicates good structural stability of SecA in all simulations. The NBDs have small values of the Cα rmsd, which indicates that the structure of these domains remains close to the starting crystal structures (Figures S2, S3). For the other domains of SecA, we find relatively small values of the rmsd for the structured regions ( 7 H-bond connections. Note that these H-bond connections account for all unique H bonds that are sampled during the last 50 ns of Sim1; each of these unique H bonds has a different occupancy. They are shown as red spheres, and their direct connections are shown as red edges. The subgraph that contains the paths starting from those protein groups is shown in gray.

(Figures 6C,D). PBD groups with high DC values include Y250 (DC = 10), which locates close to K255 (Figure 6C). By contrast, there are only a few groups with high DC at NBD1 and NBD2 (Figures 6C,D); this is consistent with NBD1 and NBD2 having numerous high-occupancy H bonds, i.e., groups that tend to keep their H-bond partners (Figures 2A, S5A). The analysis of the DC values thus supports analyses discussed above for the H-bond networks, whereby the PBD and its interfaces are characterized by dynamic H bonds−and thus it can reorient, as known to occur during protein function,27,68 whereas NBD1 and NBD2 tend to have more stable H bonds and have less conformational dynamics (Figures 2, S2, S5).10 Water-Mediated Bridges between Functional Domains of SecA. In membrane proteins, protein-bound water molecules that H bond to functionally important groups can have essential roles in controlling local protein structure and dynamics and can directly participate in chemical reactions.69−77 At protein interfaces, waters could form networks that promote protein binding.78 We wondered whether water molecules could bind at the surface of SecA and participate in H-bond chains that help couple domains of SecA.

An initial computation of the residence times of water molecules in the first hydration shell of each amino acid residue of SecA indicates that that most (91%) protein amino acids have close water molecules with short residence times of