Protein Residue Networks from Energetic and Geometric Data: Are

Nov 30, 2018 - Accounts Evolves to Meet the Ever-Changing Landscape of Research. Accounts of Chemical Research is a known for publishing concise ...
2 downloads 0 Views 3MB Size
Article pubs.acs.org/JCTC

Cite This: J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Protein Residue Networks from Energetic and Geometric Data: Are They Identical? Vladimir Sladek,*,†,‡ Hiroaki Tokiwa,‡,§ Hitoshi Shimano,‡,∥ and Yasuteru Shigeta⊥ †

Institute of Chemistry − Centre for Glycomics, Dubravska cesta 9, 84538 Bratislava, Slovakia Agency for Medical Research and Development (AMED), Chiyoda-ku, Japan § Department of Chemistry, Rikkyo University, Nishi-Ikebukuro, Toshima, Tokyo 171-8501, Japan ∥ Department of Internal Medicine, Faculty of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8575, Japan ⊥ Center for Computational Sciences, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki 305-8577, Japan

Downloaded via UNIV OF WINNIPEG on December 2, 2018 at 17:12:36 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



S Supporting Information *

ABSTRACT: Protein residue networks (PRN) from energetic and geometric data are probably not identical. PRNs constructed from ab initio pair interaction energies are analyzed for the first time and compared to PRN based on center of mass separation. We use modern, previously unused algorithms such as global and local efficiencies to quantitatively confirm that both types of PRNs do exhibit small-world character. The main novelty finding is that interaction energy-based PRNs preserve smallworld character even when clustered. A node hierarchy independent of the cutoff energy used for the edge creation is characteristic for them. Efficiency centrality identifies hubs responsible for such behavior. The interaction energy-based PRNs seem to comply with the scale-free network model with respect to efficiency centrality distribution as opposed to distance based PRNs. Community detection is introduced into protein network research as an extension beyond cluster analysis to study tertiary and quaternary structures. discuss protein binding.17 Similar aspects were presented by Yeates and Beeby in 2006.1 In 2001, Latora and Marchiori proposed an improved graph measure to characterize smallworld properties of a networkthe global efficiency.18 The susceptibility of the network to the loss of small-world character upon node/edge removal can be measured by the local efficiency.18 The efficiency centrality Ceff was proposed by Wang19 and later extended by one of us.20 It is intended to rank nodes by their importance in the preservation of the small-world properties of the network. These modern graph metrics were so far not used for qualitative and quantitative evaluation of small-world character of protein networks. This work is a step in that direction. Our main goal is to introduce a protein residue network (PRN) based on full interaction maps of pair interaction energies (PIE) obtained from ab initio calculations. Successful attempts to study PRN based on interaction energies are known from literature, e.g., the work of Vijayabaskar et al.21 They used PRN based on attractive interactions energies based on force field data averaged over short molecular dynamic simulations of the proteins thermal motion. In this regard, we differ, as we use the experimentally determined structure from crystallography. However, in a sense, this also presents an

1. INTRODUCTION Network analysis utilizing graph theory has been used in proteomics and molecular biology for some time now. Protein−protein signaling networks are often studied,1 single proteins less so, yet it is not uncommon.2,3 Spectral analysis of the graph adjacency or Laplacian matrix3 with subsequent evaluation of degree, closeness, and/or betweenness centrality are the most common techniques.4 The protein network construction is most commonly based on distance information either between centers of mass or beta carbons of interacting residues. In principle, two main branches of research may be identifiedfirst application research aimed at particular systems and techniques5−11 and second the more theoretical research focused on identification of general protein properties. Several works of Estrada et al.12,13 concentrate on the latter. A small-world network is such that even though most nodes are not directly connected, i.e., it can be highly clustered, any two nodes can be reached by few steps, i.e., the characteristic path length is relatively short.14 Often such networks contain several hubshighly connected nodes.12 Proteins have been described and studied as small-world systems in the past perhaps most notably by del Sol and O’Meara who have adopted characteristic path length L and clustering coefficient C to do so.15,16 They focused on protein clusters. Tailor discussed small-world properties of proteins also with the emphasis on protein clusters.4 In 2011, Pons used closeness to © XXXX American Chemical Society

Received: July 17, 2018

A

DOI: 10.1021/acs.jctc.8b00733 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation Table 1. PRN Properties and Partition in UL141···TR2 Dimer for Different Edge Criteriaa Nnodd

Nedg

D

Ncome

Q



UL141 UL141b UL141b UL141b UL141b UL141b UL141b UL141b UL141b UL141b

Eint < − 0.01 Eint < − 1.0 Eint < − 3.0c Eint < − 5.0c Eint < − 8.0c Eint < − 12.0c Eint < − 15.0c RCM < 10 RCM < 7c RCM < 5c

293; 293; 293; 293; 293; 293; 293; 293; 293; 293;

35% 35% 35% 35% 35% 35% 35% 35% 35% 35%

2431 1311 718 497 356 246 175 2334 808 195

0.0568 0.0306 0.0168 0.0116 0.0083 0.0058 0.0041 0.0546 0.0189 0.0046

60 64 17/1 35/5 62/19 104/29 164/25 13 41/4 113/37

0.463 0.469 0.609 0.627 0.655 0.214 0.206 0.571 0.626 0.903

3.06 3.72 − − − − − 3.71 − −

UL141

Eint < − 0.01

188; 36%

1460

0.0830

37

0.524

2.70

TR2

Eint < − 0.01

105; 32%

638

0.1168

19

0.856

3.06

system b

edge crit.

a

Number of nodes Nnod, number of edges Nedg, graph density D, mumber of communities Ncom and clusters Nclu, modularity of the graph partition to communities Q, and average shortest path length d̅. bComplex UL141···TR2. cGraph is disconnected (d̅ is infinity). dNnod and percent of nodes in secondary structure. eNcom/Nclust (number of clusters with at least two nodes).

sheets) and dominantly consists of loops and only few beta sheets. Kre2p is a balanced mix of beta sheets and alpha helices. PPAR has the majority of its sequence organized in alpha helices with no beta sheets and only short connecting loops. 2.2. Graph Construction. A protein with Nnod residues has potentially Nnod(Nnod−1)/2 pair interactions. It is not sensible to consider all of them physically meaningful; e.g., interaction over a too long distance will unlikely be strong, thus irrelevant for the protein structure/topology. We apply two types of criteria for the creation of an edge between two nodes in the proteins’ graph representation: the distance criterion based on the separation between the centers of mass (CM) of the side chains for the two interacting residues i, j and the (interaction) energy criterion based on how strongly the residues interact. In the first case, the edge weight wij is proportional (equal) to RCM. In the latter case, wij ∼ 1/|Eint|. For the energy criterion, we apply an additional condition RvdW ≤ 3, meaning that the distance between the closest atoms in the two interacting residues must be less than, or equal to, three times their combined vdW diameter. This assignment may be deemed physically meaningful, as only negative interaction energies are considered in the edge creation, and lower weight is related to stronger interaction. Thus, stronger attraction relates to the shorter distance between the nodes in graph-theoretic terms as for weighted graphs wij = dij. The long-range attractive interaction energy is often described as an inverse power function of separation, e.g., ∼r−6 in LennardJones potential.29 This establishes a physical relationship between the two edge criteria; however, they are not equivalent. Only the side-chain atoms are considered in the evaluation of the center of mass separation RCM, i.e., excluding the backbone atoms. The reasons for using only negative interaction energies, i.e., the part of the potential surface that can support a bound state, are two-fold. First, the secondary and tertiary and arguably also quaternary structures in proteins and protein complexes are primarily governed by attractions. Second, there is no simple way of discriminating between attractive and repulsive interactions with positive interaction energy if these should be represented as edge weights (they are qualitatively different). Moreover, the calculation of shortest paths may be problematic due to negative circles, in the case if

average over the structure ensemble present in the crystal used for diffraction. In general, the goal is to relate observable biological protein properties (e.g., communication channels,22 important residues responsible for ternary and higher structures and/or for ligand binding) to the network topology and being able to make conclusive prediction from such models of proteins. Such is also our goal with the PRN based on first-principles interaction energies. On the practical side, we show that community detection algorithms such as the Louvain method are useful extensions to the spectral analysis and that in such a case the network does not need to be clustered. The theoretical contribution is the comparison of small-world properties of pair interaction energy and distance-based protein residuum networks, PIE-PRN and D-PRN, respectively. Identification of central hubs responsible for such behavior is provided.

2. METHODS 2.1. Protein Models. Six model proteins were selected for this work. Three smaller ones were chosen because they were analyzed as networks in other works: Trx (Thioredoxin, PDB entry 1QUW23) by Viloria,2 bovine angiogenin 1AGI,24 and human lysozyme 1LZ125 by Kannan.3 Three additional systems were selected: human cytomegalovirus UL141 binding to TRAIL-R226 (PDB code 4I9X), herein labeled as UL141, Peroxisome proliferator-activated receptor-α (PPARα) complex with coactivator PGC1α and its novel selective modulator Pemafibrate (Pema),27 herein labeled as PPAR, and α-1,2mannosyltransferase Kre2p in the 2B state along the reaction coordinate with Ca2+ as the metal,28 labeled as Kre2p. The latter three were selected for specific reasons. First, they are larger than common-sized proteins used as model systems for network analysis. Second, their secondary and tertiary structures have some specificity worth examiningall are complexes of protein with smaller molecules: ligands in Kre2p and PPAR, peptides coactivator PGC1α in PPAR, and/or protein complexes (UL141). Kre2p is the most complex in this sense as our model contains three larger ligands: donor, acceptor, uridine diphosphate; one metal ion, Ca2+; and nine water molecules in the vicinity of these ligands. Finally, their secondary and tertiary structures exhibit also considerable differences; UL141 has little secondary structure (helices or B

DOI: 10.1021/acs.jctc.8b00733 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation both positive and negative edge weights would be used (see Supporting Information). One node in the network corresponds to one residue in the protein. Unless specifically stated, pair interactions (i.e., edges) between two residues connected by covalent bonds were excluded from analysis; for discussion, see the Supporting Information. Graph analysis was carried out in ProGA, Protein Graph Analyzer,30 utilizing the NetworX 2.1 routines.31 Ab initio FMO-PIEDA (Fragment Molecular Orbital−Pair Interaction Energy Decomposition Analysis)32−37 calculations were performed in GAMESS38,39 to obtain pair interaction energies.27,40,41 Solvent effects were included implicitly by PCM (Polarisable Continuum Model) model.42,43

Figure 1. Left: UL141 (red)···TR2 (green) protein complex is bound through relatively discrete contact domains indicated by blue spots. Right: Highlighted three respective communities.

3. RESULTS We discuss whether, and if, then how PRN constructed from distance data, nowadays rather common,44 and ab initio interaction energy data differ. First we study the network topology and its global properties. Later, we analyze the role and importance of individual nodes and relate these to physical properties−residue contact and interaction energy. 3.1. Global Network Properties. 3.1.1. Network Topology. Some properties characterizing the network topologies are collected in Table 1 and Table S2. The smallest network is Trx with 105 nodes, and the largest is Kre2p with 352 nodes. The number of edges varies with the threshold for edge acceptance. The weakest condition when all attractive interactions with Eint < −0.01 kcal mol−1 are accepted as edges in the PRN results in a graph density D from 0.0568 in UL141 to 0.1546 in Trx. All densities D for the small proteins without ligands, i.e., Trx, 1AGI and 1LZ1, are comparable for given edge criteria and thresholds and are generally higher than for the structures which contain ligands. The higher densities are also reflected in the shorter average weighted paths lengths d̅. The lowest D and largest d̅ is for UL141, which also contains least secondary structure, some 35%. A hydrogen bond between residues k and k + 3(4) is characteristic for alpha helices; these are not in UL141. Similarly, beta sheets are stabilized by hydrogen bonds. Naturally, one may conclude that the less secondary structure a protein contains, the less dense its PRN will be. However, upon closer inspection, we find otherwise. If we analyze D of UL141 and TR2 of the UL141···TR2 complex separately, we see that D(Tr2) > D(UL141) > D(UL141···TR2). It is actually Tr2 which has the least secondary structure. This falsifies the assumption that lower densities are due to a lack of secondary structure and apparently also proteins consisting mostly from loops can have dense PRN. The true reason for the lower D in the complex compared to its separated constituents is that the two proteins are in contact through relatively few strong interactions concentrated into discrete contact domains as illustrated in Figure 1. On the other hand, the removal of the well-connected ligand Pema and its 54 edges in the PPAR complex leads to a decrease in the density because a cavity opens in the protein, through which interactions are not transmitted; see Table S2. 3.1.2. Small-World Properties, Clustering, and Communities. The work of Latora and Marchiori18 proposed the global efficiency Eglob as a useful quantitative measure for the smallworld properties of a network/graph. It was designed and demonstrated to replace the clustering coefficient and characteristic path length with a single descriptor.

E (G ) =

1 N (N − 1)

∑ i≠j∈G

dij−1 (1)

It accounts for the shortest path lengths dij and number of nodes N. It is convenient to use the scaled values such that Eglob = E(G)/E(Gid), where Gid is the ideal, fully connected graph. Then, Eglob attains values between 0 and 1, which facilitates mutual comparison of different graphs. The closer Eglob is to 1, the more the network behaves as a small-world network. Additionally, the local efficiency centrality was introduced to quantify the graphs’ local fault tolerance toward node removal. It is evaluated for each node’s local subgraph (ego graph) Gi, i.e., a subgraph of G consisting of only the neighbors of node i. E loc =

1 N

∑ E(Gi) i∈G

(2)

The graph can be weighted, and the ideal graph is constructed from the actual graph by adding all missing edges with the appropriate weights, if existent. This last remark is very important and is the main difference between PRN constructed from distance data and interaction energy data. We always can measure the physical distance between any two residues centers of mass and assign it as an edge weight when creating the ideal graph Gid. Therefore, Gid for a distance base network can/will be complete, fully connected. However, when dealing with interaction energies, the situation is different. Distant residues will have interaction energy zero as this is distance dependent. Therefore, we can formally treat these long-range interactions with some arbitrary high weight or omit the edge to reflect the physically correct behavior, i.e., that there is no interaction between the distant residues. In our case when the interaction energy is zero or repulsive, we set wij = 10,000, and for attractive edges, we apply the wij = 1/|Eint| formula. This way also Gid for the interaction energy-based graph will be fully connected. Looking at Table 2 and Table S1, we see that Eglob attains the highest values, well above 0.9, for the criterion RCM < 10 Å. High efficiency is also attained for other edge acceptance criteria. The criterion RCM < 5 Å is the recommended cutoff distance for identification of clusters.2 This relates to the fact that at this point the network is already considerably disconnected creating several larger connected components; see Table S2. This is why the network loses its small-world C

DOI: 10.1021/acs.jctc.8b00733 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation Table 2. Small-World Properties of PPARα···PGC1α···Pema PRNa system

edge crit.

Eglobd

b

PPAR PPARb PPARb PPARb PPARb PPARb PPARb PPARb PPARb PPARb

Eint < −0.01 Eint < −1.0 Eint < −3.0f Eint < −5.0f Eint < −8.0f Eint < −12.0f Eint < −15.0f RCM < 10 RCM < 7f RCM < 5f

0.86 0.86 0.86 0.84 0.69 0.39 0.27 0.95 0.81 0.04

PPARc

Eint < −0.01

0.82 (0.41)

Elocd

(0.41) (0.34) (0.27) (0.22) (0.16) (0.07) (0.04) (0.35) (0.22) (0.01)

0.92 0.86 0.57 0.32 0.14 0.05 0.03 0.96 0.69 0.06

(0.64) (0.48) (0.24) (0.14) (0.06) (0.02) (0.01) (0.75) (0.53) (0.06)

0.92 (0.63)

C̅ cce 0.342 0.266 0.177 0.122 0.060 0.024 0.017 0.513 0.373 0.054

Ctre

(0.070) (0.041) (0.029) (0.009) (0.017) (0.008) (0.000) (0.062) (0.022) (0.005)

0.339 (0.070)

0.319 0.223 0.137 0.085 0.048 0.024 0.029 0.461 0.359 0.186

(0.069) (0.040) (0.025) (0.007) (0.018) (0.008) (0.000) (0.061) (0.025) (0.016)

0.322 (0.071)

a

Global efficiency Eglob, local efficiency Eloc, average clustering coefficient C̅ cc, and global clustering coefficient (transitivity) Ctr. bComplex PPARα··· PGC1α···Pema. cComplex PPARα···PGC1α. dValues for unweighted graph are in parentheses. eValues for random graph G(N,p) are in parentheses (graph density of corresponding PRN was taken as the probability p). fGraph is disconnected.

character. The edge criterion Eint < −8 kcal mol−1 was chosen with the intention to filter out weak hydrogen bonds as it is somewhat lower interaction energy than the typical hydrogen bond in proteins.45,46 Even for the criterion Eint < −8 kcal mol−1, the graph is less clustered than for RCM < 5 Å. This indicates that the PIE-PRN graphs behave more like small worlds even if the “nonessential” interactions are filtered out. Also, for Eint < −8 kcal mol−1, the density and edge count are roughly 1.5 to 2 times higher than for the recommended RCM < 5 Å; see Table S2. In order to achieve a comparable number of connected components in the PIE-PRN graph to the D-PRN graph, one has to use the very strict criterion Eint < −15 kcal mol−1. Even at this point, most of the PIE-PRN graphs retain Eglob between 0.2 and 0.4. The D-PRN graphs have Eglob some 0.1 or less for RCM < 5 Å. However, at this level of disconnectedness, both types of PRN have Eloc < 0.1, which indicates high susceptibility to loss of small-world character upon further edge removal. Interestingly, for the criteria Eint < −5 kcal mol−1 and RCM < 7 Å, the clustering of the PRN is comparable, as indicated by the number of connected components (Table S2). In both cases, only a few separate nodes are disconnected, and the bulk of the nodes are still in one large cluster. The edge count Nedge is roughly 1.5 to 2 times lover for Eint < −5 kcal mol−1 than for RCM < 7 Å. Despite this, the global efficiencies are comparable. Even though the local efficiencies for the D-PRN are considerably higher, they are also much closer to those of random Erdő s−Rényi graphs with the same edge probability.47 What is the reason behind such difference? The answer may lie in the different network topology and node hierarchy of the two respective models. Central nodes with respect to the existence of the networks small-world character can be identified in the PIE-PRN graph even for the most benevolent edge acceptance criterion of Eint < −0.01 kcal mol−1; see Figure 2. The centrality of these nodes is only consistently highlighted for stricter edge criteria. Such hierarchy is not eminent for the distance-based network, and central nodes emerge clearly only after clustering. This conclusion is supported by the analysis of the distribution of center of mass, interaction energy, and edge weights in Figure 3. Obviously, RCM and the corresponding distance-based weight histograms are bell shaped, while the energy distribution resembles more an inverse power distribution. Such RCM distribution can be anticipated intuitively, given that the maximum dimension of the PPAR

Figure 2. 1LZ1 graph for different edge acceptance criteria. Clusters are differentiated by color, and node size is proportional to efficiency centrality Ceff.

Figure 3. Histograms of edge properties in the PPARα···PGC1α··· Pema complex: (a) RCM distribution (also weight distribution in the distance-based graph), (b) Eint distribution (all PIE, not just accepted as edge), and (c) weight distribution in the energy-based graph (w = 10,000 represent Eint = 0, i.e., nonexistent interaction).

D

DOI: 10.1021/acs.jctc.8b00733 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation protein is ∼60 Å. More clearly, the Ceff distribution in Figure 4 reflects the fact that the nodes in the PIE-PRNs are more

definitions are possible.51 As stated before, the secondary, tertiary, and quaternary structures of proteins are dictated by weak noncovalent attractions. Alpha helices and beta sheets present well definable units of secondary structure. In order to facilitate a particular tertiary structure, there must be interactions between the secondary units. It seems a reasonable challenge to put community detection in PRN to work for identification of such interactions. Interestingly, it appears that it is indeed useful for such a purpose. Here, it is illustrated on the PPARα···PGC1α···Pema complex, and the remaining systems are in the Supporting Information. Figure 5 depicts

Figure 4. Efficiency centrality Ceff histograms in the PPARα··· PGC1α···Pema complex for different edge criteria. Values for the Eint < −0.01, Eint < −1.0 and Eint < −3.0 kcal mol−1 criteria are almost identical (overlapping).

consistently ranked over a wired range of Ceff values, and similar inverse power law probability distribution is achieved only for the highly clustered D-PRNs. We can, thus, conclude that in the energy-based networks contain a “skeleton” of nodes central to the small-world character as opposed to the distance-based networks. Nonetheless, it also indicates that the PIE-PRNs behave much more as scale-free networks with respect to the Ceff distribution even when not disconnected, i.e., possessing a natural hierarchy of hubs and subhubs of different importance. This, in turn, indicates that the energy-based PRN are much more susceptible to loss of small-world character upon removal of key hubs as indicated by the lower Eloc values. The transitivity (global clustering coefficient) is also higher in the PIE-PRN. This opens up a question: does the PRN created with the relatively strict criterion RCM < 5 Å still realistically represent the protein topology? We try to answer this by evaluation of local node centralities, in particular, the efficiency centrality. We performed spectral analysis of the Laplacian matrices of the model systems for all edge criteria and edge acceptance thresholds. This was pioneered in 1999 for PRN.3 We can confirm the 5 Å distance cutoff proposed by Viloria2 as a useful value to achieve good splitting of the PRN into multinode clusters also known as connected components. The existence of clusters can be checked by the algebraic connectivity introduced by Fiedler.48 We show here that this requirement of a strict cutoff criterion is mostly due to the way clusters are identified in the spectral graph analysis of PRN. This method works reliably only on disconnected graphs as is exemplified in the supplementary file on the original example by Kannan.3 The performance of community detection methods based on the optimization of the graph partition modularity49 are tested in this work. The results for the Louvain algorithm50 are summarized in Table 1 and Table S2. The modularity Q, a value from −1 to1, indicating the quality of the partition tends to increase with the decreasing graph density D. The main advantage of the community detection based on the maximization of the partition modularity is that it can be applied on connected graphs, i.e., graphs which do maintain the small-world character natural for PRN. A community may be viewed as a group of nodes in which there is higher interconnectedness as in the whole graph, although other

Figure 5. Color-coded community structure in the PPARα··· PGC1α···Pema complex. One color hue corresponds to one community. Communities were detected in the energy-based PRN for edge acceptance criterion Eint < −1 kcal mol−1. The magnified part depicts the community structure of the coactivator PGC1α and the interaction partners (thicker lines) and corresponding community (lighter lines) in the protein. The sequence numbering is 1−270 for PPARα, 271−282 for PGC1α, and 283 for Pema.

the color-coded community structure. It is clearly visible that the communities reach across secondary structure units such as helices (also visible in Figure 1). The community structure in the coactivator PGC1α is particularly interesting. It has been found experimentally that the LXXLL motif in PGC1α is crucial for binding to PPARγ.52,53 We show that it is similarly important in the binding to PPARα. The community structure reveals three main interaction domains between PPARα and PGC1α, colored red, cyan, and green in Figure 5. Pro271 and Leu273 in PGC1α interact with Glu264 in PPARα with −97 and −28 kcal mol−1, respectively. These form the core of the red community. The cyan community centers around the Lys275 interaction with Asn105 and Asp103 with −39 and −27 kcal mol−1, respectively. Finally, the core of the green community is the interaction of Leu277 and Leu278 of PGC1α with Lys94 and Gln107 and Ala282 with Glu91. Even though Leu89 and Phe92 are in the same helix as Glu91 and Lys94, they are not in the green community spanning from PGC1α. They form one community with the bottom of helix1 (dark blue color in the forefront in Figure 5). While the community structure appears similar for different edge acceptance thresholds, the fine differentiation is lost for stricter thresholds; see the Supporting Information for PPAR. Therefore, it appears that for community detection it is desirable to analyze PIE-PRN with the weaker interactions included as opposed to cluster analysis. We argue that community structure based on interaction energy PRN should be viewed as superior since it is based on E

DOI: 10.1021/acs.jctc.8b00733 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

comparable as well, albeit lower than for the more benevolent edge criteria. This leads to the conclusion that the radius of ∼10 Å around the side-chain center of mass covers most of the interactions stronger than −0.01 kcal mol−1, and the radius of ∼7 Å contains most of the residues which have Eint < −1 kcal mol−1 with the given amino acid. In both cases, the CCLS values for the edge criterion are marginally higher. Some conclusion can be drawn from the closeness centrality distribution. This seems to be bell shaped for connected graphs as seen from the plots in the Supporting Information. We recognize that the maxima of the CCLS distributions for the edge criteria Eint < −0.01 kcal mol−1 and RCM < 10 Å tend to be very close, as are the maxima for the criteria Eint < −1.0 kcal mol−1 and RCM < 7 Å. The distribution for the RCM < 5 Å criterion is significantly skewed to a declining shape with most nodes acquiring low values and only a few nodes with higher values. This obviously ties into the fact that for the latter criterion the graph is always disconnected. This is also reflected in the edge degree distribution, which becomes discontinuous. The number of nodes in the last bin of the CCLS histograms for all edge criteria is roughly similar, and the nodes which are identified as central are roughly the same; as seen from the CCLS vs k plot. PDB gradient (heat) maps of the structures were created in order to be able to identify the central nodes visually within the structure (Supporting Information). As expected, the most central nodes correlate with their degree, and thus, residues oriented toward the center of the protein attain higher CCLS values. This may be best visible on helices where we observe an almost periodical alternation of color hue between inward- and outward-facing residues. Large ligands bound in cavities tend to have high CCLS as they are in contact with many residues, best visible in the PPARα···PGC1α··· Pema complex. 3.2.2. Efficiency Centrality. The efficiency centrality Ceff was proposed by Wang19 and later extended.20 It is intended to rank nodes by their importance in the preservation of the small-world properties of the network. Conceptually, it is similar to the local efficiency Eloc,18 albeit Ceff is a metric comparing the global efficiency of the whole network with the global efficiency of the whole network after the deletion of the ith node, G̅ i. So, unlike Eloc, it is not evaluated just in the neighborhood (ego) graph of the ith node.

actual attractive interactions responsible for the stability of the folded structure. In distance-based PRN, one inevitably accounts also for possible repulsive interactions. In general, the number of communities Ncom for PIE-PRN is higher than for D-PRN, indicating finer differentiation; see Table 1 and Table S2. 3.2. Local Node Properties and Centralities in PRN Analysis. By local properties, we understand parameters that are evaluated for each node. The edge weights wij in the graph analysis of the PRN are assigned to real physical properties such as distance and interaction energy associated with the given edge representing the interaction of two residues: nodes i and j. Different algorithms for node ranking by means of centralities that were proposed so far rely predominantly on the analysis of shortest paths or the topology of the graph in the neighborhood of a given node. Thus, they are dependent on the edge weights associated with the node. Without some clear physical background, it is not sensible to sum the edge weights of a particular node and draw conclusions on its importance. For example, distance is not an additive quantity in the sense as used here; so, it makes little sense to calculate the sum of distances of a given residuum from all other residues. On the other hand, the sum of all pair interaction energies (PIE) of the ith node, i.e., Ei = Σj∈G, j≠i Eij may give some image on how strongly a residuum connects to the network. The δk vs k plots for all analyzed PRNs for all edge criteria reveal the presence of nodes with δk considerably higher than average. The degree and average neighbor degree magnitudes are however relatively inconsistent for various thresholds of the edge acceptance criteria (shown in the Supporting Information). Hence, the effort of Viloria2 to standardize the distance cutoff was a sensible approach. Taylor named degree, betweenness, and closeness centrality as the most commonly used graph descriptors for PRN.4 We introduce efficiency centrality into the toolkit for PRN analysis and relate it to physical properties of the network. 3.2.1. Closeness Centrality. Closeness centrality CCLS is inversely proportional to the sum of the length of the shortest paths between the given node and all remaining nodes in the graph.54,55 Thus, the more central a node is, the closer it is to all other nodes; in other words, it participates in more shortest paths in the whole graph. CCLS is in this analysis calculated in such way that the node distance is equal to the center of mass separation RCM. The data in the plots labeled as Eint force the evaluation of CCLS only for edges fulfilling the energetic edge criteria, but the node distance is still taken equal to RCM. For a disconnected graph, CCLS is evaluated in each connected component with n reachable nodes within the same connected component to which node i belongs and normalized to (n−1)/ (N−1).31 CCLS(i) =

n−1 n−1 N − 1 ∑nj =−11 dij

Ceff (i) = 1 −

Eglob(Gi̅ ) Eglob(G)

(4)

Ceff seems to correlate with the total interaction energy Ei of a given node (sum of its PIE) rather well: compare the gradient maps of Ei and Ceff in the Supporting Information. If some residuum has larger Ei, its largest contribution is usually from just one particularly strong interaction with some other residuum. These are quite often interactions responsible for the tertiary protein structure, and the residues are in a very favorable relative orientation and distance, ideally close to the (global) minimum on their potential energy surface (PES). On the other hand, one cannot be sure whether short RCM unambiguously reflects strong attraction, because subequilibrium distance PESs tend to have very steep repulsive walls. The problem with the evaluation of Ceff in the D-PRN is that if the network is connected, i.e., we do not impose the strict edge acceptance criterion RCM < 5 Å, then the distribution of the efficiency centrality is roughly symmetrical. Only if the PRN becomes clustered, then the Ceff distribution is similar to

(3)

The closeness centrality seems to correlate very well with the node degree δk as well as with the average weighted neighbor degree δ|k, which is as expected. After applying stricter rules to the edge acceptance, we see a drop in the magnitudes of CCLS for all nodes. Interestingly enough, CCLS for the edge criteria Eint < −0.01 kcal mol−1 and RCM < 10 Å seem to attain very similar values in all studied proteins. Analogically, values for the criteria Eint < −1 kcal mol−1 and RCM < 7 Å are very F

DOI: 10.1021/acs.jctc.8b00733 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

the concept of efficiency centrality as a measure for node’s contribution to the small-world character. The Ceff ranking in the distance-based PRN resembles more the closeness centrality ranking CCLS. Leu12 is on rank 15 in CCLS. Some others are Leu31, Gly55, Leu84, and Ser36 rank on positions 4, 5, 6, and 7 in Ceff and 6, 9, 12, and 11 in CCLS, respectively. Let us take a look at the interactions of Pema in the PPAR network. Pema itself is the highest ranking residue with respect to the efficiency centrality due to mediating contact between many of the PPAR helices. Its interacting partners with its COO− group are Tyr464, His440, Tyr314, and Ser280.27 These are at positions 11, 39, 40, and 43 in the Ceff ranking. Looking at their interactions with other than Pema residues, we find that Tyr464 has one strong interaction around −38 kcal mol−1, one of some −12 and two with −4 kcal mol−1. His440 has interactions of one −34, two −10, and one −5 kcal mol−1. Tyr314 has one −15, two −5, and three −3 kcal mol−1. Lastly, Ser280 has one −18 and two −3 kcal mol−1. Lys358 is the second highest ranking after Pema with which it shares a −45 kcal mol−1 interaction. It also interacts with His440 by −34 kcal mol−1. Its good interconnectedness with the highest ranking residuum and other good ranking residue results in such a good score in efficiency centrality. Other high scoring residues are Asp372, Glu251, and Arg271 at ranks 4, 5, and 3, all mediating contact between helices. Also Glu289, Lys292, and Glu462 score at 18, 20, and 16, respectively. The latter mediate contact to the coactivator peptide PGC-1α.27 Similarly, strong and numerous interactions are the reason why the diphosphate nucleotide in the Kre2p network is the top scoring residue.

that in the interaction energy-based PRN; see Figure 4. This behavior probably relates to the different weight distribution in the PIE-PRN and D-PRN, as discussed earlier in Figure 3. Also, if the D-PRN is clustered, the identification of the most central nodes with respect to Ceff is inconsistent with the nonclustered D-PRN. This is visible in Figure 6 for PPAR and Figure 2 for 1LZ1. The energy-based PIE-PRNs seem to be more consistent in this respect.

Figure 6. Ceff for all nodes in the PPARα···PGC1α···Pema complex for different edge criteria. Two top panels are for the interaction energy PRN, and the bottom panel for the distance PRN; here, the values for RCM < 5 Å are on the right axis.

4. CONCLUSIONS Both the pair interaction energy and distance-based PRN can be considered small world, even though their topologies differ. To tie this to some physical interpretations, one may say that even in such simple physical models as PRN, the basis of allostery, a well-known property of proteins, is inherently contained. The energy-based PRNs seem to have a reasonable hierarchy of hubs even for benevolent edge acceptance criteria, i.e., when the PRN is dense and connected. For stricter energy cutoffs this hierarchy remains consistent. From this also follows the lower local efficiency of energy-based PRN compared to distance-based PRN for less dense networksthe removal of fewer more central nodes would cause significant loss of smallworld properties in the energy-based PRN. This relates to the fact that energy-based PRNs seem to be scale free with respect to the Ceff distribution as opposed to distance-based PRN. Hence, efficiency centrality successfully identifies such hubs. These are usually residues with strong interactions acting as funnels that mediate interactions across geometrically more distant parts of the protein. Naturally, Coulomb interactions tend to be stronger and further reaching than dispersiondominated van der Waals interactions. In light of our findings, it seems that even though scale-free networks are, in general, resilient to random mutations, if the high ranking nodes are removed, the topology and/or stability of the network can suffer considerably. This may obviously strongly relate to mutational stability of proteins. Protein domains stabilized by weaker but numerous van der Waals interactions, such as, e.g., hydrophobic domains, i.e., domains stabilized by desolvatation effects, should be less susceptible to single residuum mutation. Similarly, in ligand binding pockets, the mutation of a single residuum breaking some highly specific Coulomb electrostatic

It is interesting to compare the highest scoring residues for both types of PRN. 1LZ1 is discussed in more detail. Lys1 and Glu7 score at the first and second rank in the PIE-PRN. Leu12 ranks highest in the D-PRN with RCM < 7 Å edge acceptance criterion; however, is on place 109 in the energy-based PRN. Interestingly, Lys1 and Glu7 share one interaction of some −121 kcal mol−1. Figure 7 displays also the other interaction partners of Lys1 and Glu7 satisfying the Eint < −8 kcal mol−1 condition. It appears that both residues mediate contact between two distant domains in the 1LZ1 protein through a very short edge (in graph terms), which is in agreement with

Figure 7. 1LZ1 and the residues with highest Ceff. Left: PIE-PRN; Lys1(dark blue) interacts with Glu7 (dark red). Their remaining interaction partners with Eint < −8 kcal mol−1 in cyan and light red, respectively Right: distance-based PRN; Leu12 (red) with all connected residues for RCM < 7 Å highlighted. From these, only Ala9 and Leu15 (green) satisfy the condition Eint < −8 kcal mol−1. G

DOI: 10.1021/acs.jctc.8b00733 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

computer Center, Institute for Solid State Physics, The University of Tokyo, and the Computing Centre of the Slovak Academy of Sciences using the supercomputing infrastructure acquired in projects ITMS 26230120002 and 26210120002, supported by the Research and Development Operational Programme funded by the ERDF.

interaction may jeopardize the binding specificity to a ligand. Efficiency centrality can pick up such sites. We have shown that the edge density is not severely affected by less content of the secondary structure, and such proteins also behave as small worlds. The ab initio energy threshold value at which transition of the PIE-PRN from one connected component to several smaller components occurs seems to be at around −3 to −8 kcal mol−1. This is roughly in agreement with the finding of Vijayabaskar et al.21 who found a transition region between −2 to −5 kcal mol−1. The difference probably accounts for the method of interaction energy calculation: FMO/MP2/PCM vs force field. Finally, we show that community detection is useful in the identification of tertiary and quarterly protein structures as communities do stretch across secondary structures such as helices, sheets, and loops if there is strong interaction governing the relative positions of such secondary elements. Denser, nonclustered networks are more suitable for community analysis. Thus, this is a natural complementary technique to the spectral analysis.





(1) Yeates, T. O.; Beeby, M. Proteins in a Small World. Science 2006, 314 (5807), 1882−1883. (2) Viloria, J. S.; Allega, M. F.; Lambrughi, M.; Papaleo, E. An optimal distance cutoff for contact-based Protein Structure Networks using side-chain centers of mass. Sci. Rep. 2017, 7, 2838−2848. (3) Kannan, N.; Vishveshwara, S. Identification of side-chain clusters in protein structures by a graph spectral method. J. Mol. Biol. 1999, 292 (2), 441−464. (4) Taylor, N. R. Small world network strategies for studying protein structures and binding. Comput. Struct. Biotechnol. J. 2013, 5 (6), e201302006. (5) Yin, Y.; Maisuradze, G. G.; Liwo, A.; Scheraga, H. A. Hidden Protein Folding Pathways in Free-Energy Landscapes Uncovered by Network Analysis. J. Chem. Theory Comput. 2012, 8 (4), 1176−1189. (6) Ricard, T. C.; Haycraft, C.; Iyengar, S. S. Adaptive, Geometric Networks for Efficient Coarse-Grained Ab Initio Molecular Dynamics with Post-Hartree−Fock Accuracy. J. Chem. Theory Comput. 2018, 14 (6), 2852−2866. (7) Setny, P.; Zacharias, M. Elastic Network Models of Nucleic Acids Flexibility. J. Chem. Theory Comput. 2013, 9 (12), 5460−5470. (8) VanWart, A. T.; Eargle, J.; Luthey-Schulten, Z.; Amaro, R. E. Exploring Residue Component Contributions to Dynamical Network Models of Allostery. J. Chem. Theory Comput. 2012, 8 (8), 2949− 2961. (9) Orellana, L.; Rueda, M.; Ferrer-Costa, C.; Lopez-Blanco, J. R.; Chacon, P.; Orozco, M. Approaching Elastic Network Models to Molecular Dynamics Flexibility. J. Chem. Theory Comput. 2010, 6 (9), 2910−2923. (10) Raimondi, F.; Felline, A.; Seeber, M.; Mariani, S.; Fanelli, F. A Mixed Protein Structure Network and Elastic Network Model Approach to Predict the Structural Communication in Biomolecular Systems: The PDZ2 Domain from Tyrosine Phosphatase 1E As a Case Study. J. Chem. Theory Comput. 2013, 9 (5), 2504−2518. (11) Ribeiro, A. A. S. T.; Ortiz, V. Determination of Signaling Pathways in Proteins through Network Theory: Importance of the Topology. J. Chem. Theory Comput. 2014, 10 (4), 1762−1769. (12) Estrada, E. Universality in Protein Residue Networks. Biophys. J. 2010, 98 (5), 890−900. (13) Estrada, E. Characterization of the folding degree of proteins. Bioinformatics 2002, 18 (5), 697−704. (14) Watts, D. J.; Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature 1998, 393, 440−442. (15) del Sol, A.; O’Meara, P. Small-world network approach to identify key residues in protein−protein interaction. Proteins: Struct., Funct., Genet. 2005, 58 (3), 672−682. (16) del Sol, A.; Fujihashi, H.; O’Meara, P. Topology of small-world networks of protein−protein complex structures. Bioinformatics 2005, 21 (8), 1311−1315. (17) Pons, C.; Glaser, F.; Fernandez-Recio, J. Prediction of proteinbinding areas by smallworld residue networks and application to docking. BMC Bioinf. 2011, 12, 378−387. (18) Latora, V.; Marchiori, M. Efficient Behavior of Small-World Networks. Phys. Rev. Lett. 2001, 87 (19), 198701. (19) Wang, S.; Du, Y.; Deng, Y. A new measure of identifying influential nodes: Efficiency centrality. Commun. Nonlinear Sci. Numer. Simul. 2017, 47 (Supplement C), 151−163. (20) Sladek, V. A note on the interpretation of the efficiency centrality. Commun. Nonlinear Sci. Numer. Simul. 2018, 61, 225−229. (21) Vijayabaskar, M. S.; Vishveshwara, S. Interaction Energy Based Protein Structure Networks. Biophys. J. 2010, 99 (11), 3704−3715.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.8b00733. Extensive graphical documentation of various graph metrics for each of the six studied protein complexes (plots similar to those in the text and additional ones) and computational details. (PDF)



REFERENCES

AUTHOR INFORMATION

Corresponding Author

*E-mails: [email protected]; vladimir.sladek86@gmail. com. ORCID

Vladimir Sladek: 0000-0001-5332-7558 Hiroaki Tokiwa: 0000-0002-3790-1799 Yasuteru Shigeta: 0000-0002-3219-6007 Author Contributions

The manuscript was written through contributions of all authors. V.S. proposed the methodology, wrote the ProGA code, and performed the graph theoretical analysis. H.T., H.S., and Y.S. built the PPAR model and did FMO calculations on it. Funding

This work has been supported by the Agency of the Ministry of Education of Slovak Republic and the Slovak Academy of Sciences (project VEGA 2/0035/16), AMED-CREST #JP18gm0910003, and MEXT Supported Program for the Strategic Research Foundation at Private Universities 2013− 2018. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS V.S. thanks Vladimir Sladek (Sr.) and Igor Tvaroska from SAS and Slovakia and Petr Kulhanek from CEITEC, Czech Republic, for consultations and Juraj Kóňa for the geometry of Kre2p. The computations were performed using the Center for Computational Sciences, University of Tsukuba, Research Center for Computational Science, Okazaki, and the SuperH

DOI: 10.1021/acs.jctc.8b00733 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

Inhibitors of Golgi α-Mannosidase II. ChemMedChem 2018, 13 (4), 373−383. (42) Fedorov, D. G.; Kitaura, K.; Li, H.; Jensen, J. H.; Gordon, M. S. The polarizable continuum model (PCM) interfaced with the fragment molecular orbital method (FMO). J. Comput. Chem. 2006, 27 (8), 976−985. (43) Fedorov, D. G.; Kitaura, K. Subsystem Analysis for the Fragment Molecular Orbital Method and Its Application to Protein− Ligand Binding in Solution. J. Phys. Chem. A 2016, 120 (14), 2218− 2231. (44) Emerson, I. A.; Amala, A. Protein contact maps: A binary depiction of protein 3D structures. Phys. A 2017, 465, 782−791. (45) Langkilde, A.; Kristensen, S. M.; Leggio, L. L.; Mølgaard, A.; Jensen, J. H.; Houk, A. R.; Navarro Poulsen, J.-C.; Kauppinen, S.; Larsen, S. Short strong hydrogen bonds in proteins: a case study of rhamnogalacturonan acetylesterase. Acta Crystallogr., Sect. D: Biol. Crystallogr. 2008, 64 (8), 851−863. (46) Hubbard, R. E.; Haider, M. K. Hydrogen Bonds in Proteins: Role and Strength. In eLS; American Cancer Society, 2010. (47) Erdő s, P.; Rényi, A. On Random Graphs I. Publicationes Mathematicae 1959, 6, 290−297. (48) Fiedler, M. Algebraic connectivity of graphs. Czechoslovak Math. J. 1973, 23 (98), 298−305. (49) Newman, M. E. J.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 2004, 69 (2), 026113. (50) Blondel, V. D.; Guillaume, J.-L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008, 2008, P10008. (51) Fortunato, S.; Hric, D. Community detection in networks: A user guide. Phys. Rep. 2016, 659, 1−44. (52) Li, Y.; Kovach, A.; Suino-Powell, K.; Martynowski, D.; Xu, H. E. Structural and Biochemical Basis for the Binding Selectivity of Peroxisome Proliferator-activated Receptor γ to PGC-1α. J. Biol. Chem. 2008, 283 (27), 19132−19139. (53) Viswakarma, N.; Jia, Y.; Bai, L.; Vluggens, A.; Borensztajn, J.; Xu, J.; Reddy, J. K. Coactivators in PPAR-Regulated Gene Expression. 2010, 2010, 250126. (54) Freeman, L. C. Centrality in social networks conceptual clarification. Social Networks 1978, 1 (3), 215−239. (55) Wasserman, S.; Faust, K. Social Network Analysis: Methods and Applications; Structural Analysis in the Social Sciences; Cambridge University Press, 1994.

(22) Ghosh, A.; Vishveshwara, S. A study of communication pathways in methionyl- tRNA synthetase by molecular dynamics simulations and structure network analysis. Proc. Natl. Acad. Sci. U. S. A. 2007, 104 (40), 15711−15716. (23) Nicastro, G.; De Chiara, C.; Pedone, E.; Tato, M.; Rossi, M.; Bartolucci, S. NMR solution structure of a novel thioredoxin from Bacillus acidocaldarius possible determinants of protein stability. Eur. J. Biochem. 2000, 267, 403−413. (24) Acharya, K. R.; Shapiro, R.; Riordan, J. F.; Vallee, B. L. Crystal structure of bovine angiogenin at 1.5-A resolution. Proc. Natl. Acad. Sci. U. S. A. 1995, 92 (7), 2949−2953. (25) Artymiuk, P. J.; Blake, C. C. F. Refinement of human lysozyme at 1.5 Å resolution analysis of non-bonded and hydrogen-bond interactions. J. Mol. Biol. 1981, 152 (4), 737−762. (26) Nemčovičová, I.; Benedict, C. A.; Zajonc, D. M. Structure of Human Cytomegalovirus UL141 Binding to TRAIL-R2 Reveals Novel, Non-canonical Death Receptor Interactions. PLoS Pathog. 2013, 9 (3), 1−14. (27) Yamamoto, Y.; Takei, K.; Arulmozhiraja, S.; Sladek, V.; Matsuo, N.; Han, S.; Matsuzaka, T.; Sekiya, M.; Tokiwa, T.; Shoji, M.; et al. Molecular association model of PPARα and its new specific and efficient ligand, pemafibrate: Structural basis for SPPARMα. Biochem. Biophys. Res. Commun. 2018, 499 (2), 239−245. (28) Bobovska, A.; Tvaroska, I.; Kona, J. A theoretical study on the catalytic mechanism of the retaining alpha-1,2-mannosyltransferase Kre2p/Mnt1p: the impact of different metal ions on catalysis. Org. Biomol. Chem. 2014, 12 (24), 4201−4210. (29) Kaplan, I. K. Intermolecular Interactions: Physical Picture, Computational Methods and Model Potentials; John Wiley and Sons, 2006. (30) Sladek, V. ProGA - Protein Graph Analyser (C) software; Institute of Chemistry, Centre for Glycomics, Slovakia & Rikkyo University, Tokyo, 2016. Freely available upon request at vladimir. [email protected]. (31) Hagberg, A. A.; Schult, D. A.; Swart, P. J. Exploring Network Structure, Dynamics, and Function using NetworkX. Proc. SciPy 2008, 11−16. (32) Fedorov, D. G.; Nagata, T.; Kitaura, K. Exploring chemistry with the fragment molecular orbital method. Phys. Chem. Chem. Phys. 2012, 14 (21), 7562−7577. (33) Fedorov, G. Dimitri; Kitaura, K. The Fragnemt Molecular Orbital Method - Practical Applications to Large Molecular Systems; CRC Press, Taylor and Francis Group, 2009. (34) Green, M. C.; Fedorov, D. G.; Kitaura, K.; Francisco, J. S.; Slipchenko, L. V. Open-shell pair interaction energy decomposition analysis (PIEDA): Formulation and application to the hydrogen abstraction in tripeptides. J. Chem. Phys. 2013, 138 (7), 074111. (35) Fedorov, D. G.; Kitaura, K. Pair interaction energy decomposition analysis. J. Comput. Chem. 2007, 28 (1), 222−237. (36) Gordon, M. S.; Fedorov, D. G.; Pruitt, S. R.; Slipchenko, L. V. Fragmentation Methods: A Route to Accurate Calculations on Large Systems. Chem. Rev. 2012, 112 (1), 632−672. (37) Fedorov, D. G. The fragment molecular orbital method: theoretical development, implementation in GAMESS, and applications. WIRES 2017, 7 (6), e1322. (38) Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. H.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S.; et al. General atomic and molecular electronic structure system. J. Comput. Chem. 1993, 14 (11), 1347−1363. (39) Gordon, M. S.; Schmidt, M. W. Theory and Applications of Computational Chemistry; Dykstra, C. E., Frenking, G., Kim, K. S., Scuseria, G. E., Eds.; Elsevier, 2005; pp 1167−1189. (40) Sladek, V.; Kona, J.; Tokiwa, H. In silico analysis of interaction pattern switching in ligand···receptor binding in Golgi alphamannosidase II induced by the protonated states of inhibitors. Phys. Chem. Chem. Phys. 2017, 19 (19), 12527−12537. (41) Š esták, S.; Bella, M.; Klunda, T.; Gurská, S.; Džubák, P.; Wöls, F.; Wilson, I. B. H.; Sladek, V.; Hajdúch, M.; Poláková, M.; et al. NBenzyl Substitution of Polyhydroxypyrrolidines: The Way to Selective I

DOI: 10.1021/acs.jctc.8b00733 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX