Underestimated Noncovalent Interactions in Protein Data Bank

Jul 25, 2019 - Underestimated Noncovalent Interactions in Protein Data Bank .... Portico digital preservation service. ACS Publications. 1155 Sixteent...
0 downloads 0 Views 3MB Size
Subscriber access provided by UNIV OF SOUTHERN INDIANA

Chemical Information

Underestimated noncovalent interactions in Protein Data Bank Zhijian Xu, Qian Zhang, Jiye Shi, and Weiliang Zhu J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.9b00258 • Publication Date (Web): 11 Jul 2019 Downloaded from pubs.acs.org on July 17, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Underestimated noncovalent interactions in Protein Data Bank

Zhijian Xu, *, † Qian Zhang, ‡ Jiye Shi, † Weiliang Zhu *, †,

†CAS

Key Laboratory of Receptor Research, Drug Discovery and Design Center,

Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China ‡Department

of Computer Science and Technology, East China Normal University,

Shanghai 200241, China. Open

Studio for Druggability Research of Marine Natural Products, Pilot National

Laboratory for Marine Science and Technology (Qingdao), Qingdao, 266237, China

*To

whom correspondence should be addressed. Phone: +86-21-50806600-1304

(Z.X.), +86-21-50805020 (W.Z.), Fax: +86-21-50807088 (W.Z.), E-mail: [email protected] (Z.X.), [email protected] (W.Z.).

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT Noncovalent interactions (NCIs) play essential roles in the structure and function of biomacromolecules. There are various NCIs, e.g., hydrogen bonds (HBs), cation-π and π-π interactions, and ionic bonds, among which HBs are the most widespread and wellstudied. By utilizing the ratio of the observed HBs over pseudo HBs (1.0 Å longer than the HB distance criteria without angle constraints), we demonstrated that HBs in both protein-ligand and protein-protein interfaces are overlooked in structures deposited in PDB. After the QM/MM optimization of 12 protein-ligand complexes, we showed that the overlooked HBs could be recovered. With a systematic search in the PDB, we found that the HB number per residue (NHB/R) in proteins decreases as structural resolution becomes lower, implying that HBs are overlooked even today, regardless of the type of refinement approach used. Similarly, cation-π, π-π and ionic interactions were found to be significantly lost, manifesting the universal underestimation of various NCIs. Considering the vital role of NCIs, it is important to recover the NCIs to facilitate drug design, to explore protein-protein interaction and to study protein structure and function.

ACS Paragon Plus Environment

Page 2 of 47

Page 3 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Introduction Noncovalent interactions (NCIs) play essential roles in the structure and function of biomacromolecules. There are different NCIs, e.g., hydrogen bonds (HBs), cation-π interactions, π-π interactions, CH–π interactions1, and ionic bonds, among which HBs are the most widespread and best studied. The key role of HBs in protein-ligand interfaces is revealed by crystal structure and database surveys. For example, crystal structures revealed that buried HBs contribute to the high potency of complement factor D inhibitors.2 Proteins structures obtained by poor electron density maps might have more overlooked NCIs due to the imperfect refinement process. The electron density map showed that an extensive HB network contributes to the great potency of indoleamine 2,3-dioxygenase 1 (IDO1) inhibitor, while the loss of this HB network completely eliminates the inhibitory capacity against IDO1.3 Ultrahigh resolution Xray crystal structures of CTX-M β-lactamase directly showed that the binding of a ligand induces a proton transfer and the formation of a low-barrier hydrogen bond (LBHB).4 A single HB donation from the cofactor flavin N5 to a nearby asparagine ensures the reduction of FAD in DNA photolyase.5 A comprehensive investigation of HBs from 65,266 protein-ligand PDB structures yielded interaction geometries to improve current computational models, which is expected to be useful in designing new chemical structures for biological applications.6 HBs are also essential to protein-protein interactions.7 For instance, HB networks could be utilized to design a wide range of protein homo-oligomers with structural specificity.8 HB preference plays a vital role in the evolution of protein-protein

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

interfaces, which might be utilized in antibody modeling and protein-protein docking.9 Protein function may be tuned by a single HB.10-11 For example, non-adiabatic quantum mechanics/molecular dynamics simulations showed that a single HB is responsible for off-switching in fluorescent proteins.12 Both protein and water hydrogen bond networks facilitate the conformational fluctuations of the amyloid protein Aβ30-35.13 HBs between the protein amide proton and the carbonyl oxygen of the same residue, affecting approximately 94% of proteins, contribute significantly to protein folding.14 Backbone HB formation was found to be uniform in the β-barrel membrane proteins and synchronized with the formation of tertiary protein structure.15 Akin to Ramachandran plots, overall HB patterns could be used to depict the relative positions of peptide units and annotate protein secondary and tertiary structure.16 The strengths of backbone HBs could vary widely to facilitate transmembrane helix breaking and bending to satisfy functional imperatives.17 Similarly, other noncovalent interactions, viz., cation-π interactions18-21, π-π interactions22-24, and ionic bonds25-26, are also common and important to protein function and structure. The HB distance (length) distributions are different between high- and lowresolution structures.27-28 However, whether the HB numbers are overlooked is still a mystery. Recently, we found that halogen bonds (XBs) numbers between organohalogens and proteins are overlooked in the PDB29-30. For example, in the binding site of cytochrome P-450CAM (PDB ID: 1PHA), the XB formed between Thr101 and the Cl atom of the inhibitor was lost, which could be recovered by flipping the sidechain of Thr101.29 To the best of our knowledge, it is unknown whether other

ACS Paragon Plus Environment

Page 4 of 47

Page 5 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

NCIs numbers are overlooked. Therefore, we defined a ratio of observed NCIs over pseudo NCIs (1.0 Å longer than the NCIs distance criteria without angle constraints) as a parameter in this study to describe the saturation of NCI formation. We demonstrated that HBs are overlooked both in protein-ligand and protein-protein interfaces in the PDB. After the QM/MM optimization of 12 protein-ligand complexes, we showed that the overlooked HBs could be recovered. In addition, more than 2% of HBs, or an average of five HBs per protein, are lost when the resolution is lower than 2.0 Å even today, regardless of the type of refinement approach used. Similarly, cation-π, π-π and ionic bonds were found to be significantly lost, manifesting the universal underestimation of various NCIs, even when state-of-the-art refinement methods were used. Therefore, it is important to recover NCIs in drug design, molecular modeling, and crystallography.

Results HBs in protein-ligand interfaces are overlooked as resolution decreases. Although fuzzy electron density and current refinement software lead to lost HBs at low resolution, HB donors and acceptors should occur near each other due to geometrical constraints. Accordingly, we define a term, namely, the pseudo HB number (𝑁𝑝𝐻𝐵), which is counted based on loose criteria, viz, no constraints on hydrogen bond angle and maximum hydrogen bond distance with an extended tolerance of 1.0 Å. That is to say, HB donors and acceptors might be pushed 1.0 Å (tolerance values) away from each other in the refinement process.30 Then, the ratio of observed NHB (𝑁𝑜𝐻𝐵) over 𝑁𝑝𝐻𝐵

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

is calculated and called the apparent saturation of HB formation (ASHB). In addition, different tolerance values and another popular software PyMOL were also evaluated, which yielded consensus results (Figure S1 & S2). A total of 71,006 protein-ligand structures that could form at least one HB were analyzed. The ASHB values in the protein-ligand interface were calculated to be 0.34, 0.33, 0.32, 0.30, 0.27 and 0.25 for resolution ranges of ≤1.5, 1.5 to ≤2.0 (1.5-2.0), 2.0 to ≤2.5 (2.0-2.5), 2.5 to ≤3.0 (2.5-3.0), 3.0 to ≤3.5 (3.0-3.5) and >3.5 Å, respectively (Figure 1). The data revealed that as the resolution becomes worse, the ASHB values in the protein-ligand interface become dramatically lower. The clear decreasing trend in the ASHB values suggests that the HBs in protein-ligand interfaces are underestimated in the PDB when the resolution is worse than 1.5 Å. Furthermore, the 95% confidence intervals (CIs) of the ASHB are (0.33, 0.34), (0.33, 0.33), (0.32, 0.32), (0.29, 0.30), (0.26, 0.27), and (0.24, 0.26) for the resolution ranges of ≤1.5, 1.5-2.0, 2.0-2.5, 2.5-3.0, 3.03.5 and >3.5 Å, respectively, demonstrating the significant underestimation of HBs in protein-ligand interfaces in the >1.5 Å resolution range.

Figure 1. The ASHB in protein-ligand and protein-protein interfaces.

ACS Paragon Plus Environment

Page 6 of 47

Page 7 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

In our previous work29, a simple parameter, the ratio of the number of HBs between ligands and protein sidechain over the number of HBs between ligands and the whole protein (sidechain and backbone), was found to be stable as the resolution decreases, implying that the sidechain HBs are not overlooked compared to the backbone HB. However, this ratio could not reveal the underestimation of the backbone HB. Therefore, the ASHB defined in this study is a comprehensive indicator for investigating the potential effect of resolution on HB numbers.

QM/MM calculations to recover HBs in protein-ligand interfaces. If the lost HBs could be recovered, it would be helpful for designing new drugs or explaining the protein-ligand interaction. To investigate whether the overlooked HBs in low resolution could be recovered by QM/MM optimization, 6 pairs of protein-ligand complexes (i.e., the same complexes with one resolution is higher than the other) were retrieved from PDB for QM/MM calculation. The average HB number from the 6 protein-ligand complexes in high resolutions is 3.2 and 1.5 in the corresponding low resolutions (Table 1). After QM/MM optimization, all the average HB numbers are 3.3 for both high and low resolutions (Table 1), which is comparable to the original HBs in the high resolution. The 0.1 more HB after QM/MM optimization in high resolution may attribute to the artifact of removal of water molecules in QM/MM calculation. Table 1. The HB number from original PDB and after QM/MM optimization in protein-ligand interfaces. Protein

Ligand Structure

PDB ID (Resoluti

Origi nal

HBs after

ACS Paragon Plus Environment

PDB ID (Resoluti

Origin al HBs

HBs after

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

on) a

HBs

QM/MM

a

a

Page 8 of 47

on) b

b

QM/ MM b

O

Androgen receptor Androgen receptor

N+

O HO

N H

HO

I

I

O

OF F

F

O OH NH2 I OH

Androgen receptor

4oh5 (2.00 Å)

3

3

4oh6 (3.56 Å)

1

4

2piv (1.95Å)

3

3

2piw (2.58 Å)

2

4

2piu (2.12Å)

4

4

1xj7 (2.70 Å)

3

3

3b2i (1.86 Å)

3

4

3vg4 (2.50 Å)

1

3

1cxq (1.02 Å)

3

3

1vsh (1.95 Å)

1

3

5pwt (1.58Å)

3

3

5pyi (2.29 Å)

1

3

3.2

3.3

1.5

3.3

O

Fatty acidbinding protein, liver Gag-Pol polyprotein Nuclear autoantigen Sp-100 Average HB Number a

O OH

HO

N

O

N

S O

O

O

N

S O

OH

OH

for high resolution, b for low resolution. Androgen receptor is a validated drug target for prostate cancer31, and

hydroxyflutamid is a nonsteroidal antiandrogen (NSAA), which is the major active metabolite of the first approved NSAA flutamide. Taking the androgen receptorhydroxyflutamid complex as an example, there are 3 HBs in the high resolution (Figure 2a) and 1 HB in the low resolution (Figure 2c), which implies that 2 HBs might be lost in the low resolution. After QM/MM optimization, the lost HBs in the low resolution are recovered (Figure 2d). In addition, it should note that the HBs in the high resolution are stable. After QM/MM optimization, there are still 3 same HBs in the high resolution (Figure 2b & 2a). In addition, a fake HB involving Gln711 (Figure 1d) formed mainly due to the QM/MM calculation error. Taken together, the lost HB numbers in protein-ligand interfaces could be recovered by QM/MM optimization.

ACS Paragon Plus Environment

Page 9 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 2. The HBs between hydroxyflutamid and androgen receptor in the original PDB (a, c) and after QM/MM optimization (b, d). (a) The HBs in a high resolution (PDB ID: 4oh5, resolution: 2.00 Å). (b) The HBs for (a) after QM/MM optimization. (c) The HBs in a low resolution (PDB ID: 4oh6, resolution: 3.56 Å). (d) The HBs for (c) after QM/MM optimization. The androgen receptor is shown in gray cartoon, the key residues in gray sticks, and hydroxyflutamid in yellow sticks. HBs are shown in yellow dash and other distances are shown in black dash. All the distances are labeled in Å.

HBs in protein-protein interfaces are overlooked as resolution decreases. A total of 162,939 protein-protein complex structures that could form at least one HB were analyzed. The ASHB values in the protein-protein interfaces are 0.51, 0.50, 0.46, 0.43, 0.38 and 0.33 for the resolution ranges of ≤1.5, 1.5-2.0, 2.0-2.5, 2.5-3.0, 3.03.5 and >3.5 Å, respectively (Figure 1), revealing that as the resolution becomes worse,

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the ASHB values in protein-protein interfaces become dramatically lower. The clear decreasing trend in the ASHB values suggests that HBs in protein-protein interfaces are underestimated in the PDB when the resolution is worse than 1.5 Å. Furthermore, the 95% CIs of the ASHB are (0.51, 0.52), (0.49, 0.50), (0.46, 0.46), (0.42, 0.43), (0.37, 0.38), and (0.33, 0.34) for the resolution ranges of ≤1.5, 1.5-2.0, 2.0-2.5, 2.5-3.0, 3.0-3.5 and >3.5 Å, respectively, revealing the significant underestimation of HBs in proteinprotein interfaces with resolution >1.5 Å.

Insignificant change in HB donor and acceptor numbers at different resolutions. Some flexible residues may be omitted from models, especially in structures with low resolution. To examine whether the missing atoms would lead to significant deviation in the statistical results, the PDB (January 2018 release, refer to Method for details) was explored. Taking the resolution range of ≤1.5 Å as a reference (Table 2), if zero is within the 95% confidence interval (CI) of effect sizes32-33 (d) between the high resolution range (≤1.5 Å) and the low resolution range (>1.5 Å), the difference is not statistically significant at the 5% level (p>0.05), while the difference is statistically significant (p2.5 Å is not statistically significant. Although the difference is statistically significant between the high resolution range and 1.5-2.5 Å, the effect sizes are no

ACS Paragon Plus Environment

Page 10 of 47

Page 11 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

greater than 0.06 (Table 2), which is far less than 0.2, and the magnitude of the difference is therefore ignorable. For the acceptor, although the difference between the high resolution range and >1.5 Å is statistically significant, the effect sizes are no more than 0.06 in the 1.5-3.0 Å resolution range, showing that the magnitude of the difference is ignorable. The effect size is 0.21 for 3.0-3.5 Å, which is comparable to 0.20, implying that the difference is small. Although the effect size is 0.58 for >3.5 Å, implying that the difference is large, the absolute difference is no greater than 3.7%, showing that the difference is small. Therefore, the missing atoms themselves in the 0-3.5 Å resolution range should have an ignorable effect on the HB numbers. Table 2. The HB number per residue (NHB/R), HB donor number per residue (ND/R) and acceptor number per residue (NA/R) from PDB structures (n=329,841) throughout the resolution range. The effect sizes (d) between the high and low resolution ranges and their 95% confidence interval (CI). ≤1.5 Å 1.5-2.0 Å 2.0-2.5 Å 2.5-3.0 Å 3.0-3.5 Å >3.5 Å 1.51±0.09 1.52±0.10 1.52±0.12 1.51±0.16 ND/R 1.51±0.10 1.51±0.09 a 0.05 0.06 -0.03 -0.08 0 d a (0.02,0.08) (0.03,0.09) (-0.06,0) (-0.11, 0.04) (-0.05,0.05) CI 1.53±0.08 1.53±0.09 1.51±0.10 1.48±0.13 NA/R 1.53±0.08 1.53±0.08 b 0.04 0.06 0.06 0.21 0.58 d b (0.01,0.07) (0.03,0.09) (0.03,0.10) (0.18,0.25) (0.53,0.63) CI 0.86±0.14 0.81±0.15 0.75±0.17 0.68±0.18 NHB/R 0.88±0.14 0.88±0.13 c 0 0.17 0.47 0.88 1.37 d c (-0.03,0.03) (0.14,0.20) (0.43,0.50) (0.84,0.92) (1.32,1.42) CI a

for ND/R, b for NA/R, c for NHB/R

Significant decrease in NHB/R as the resolution decreases. The value of NHB/R within protein chains at different resolutions was calculated in two steps to avoid undue weighting towards certain types of proteins due to redundancy (refer to Methods for details). NHB/R is approximately 0.88 for the ≤2.0 Å resolution

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

range, 0.86 for 2.0-2.5 Å, 0.81 for 2.5-3.0 Å, 0.75 for 3.0-3.5 Å, and 0.68 for >3.5 Å, respectively (Table 2), with a difference as large as 22.7%. Zero is outside the 95% CI of effect sizes32-33 between the high resolution range and the >2.0 Å resolution range, showing that the difference is statistically significant, and the effect size is not small for the >2.5 Å resolution range (Table 2) (see Effect Size (d) in Methods for details). The apparent decreasing trend in NHB/R suggests that HBs in protein are markedly underestimated in the PDB as resolution decreases. It is natural to ask whether or not it is true that the decreasing trend is a historical issue. Hence, we split the data into different time periods. Overall, NHB/R rises steadily before 1996 (Figure 3), implying that the underestimation of the HBs was improving before 1996. Since then, NHB/R has not improved significantly. However, the overall decreasing trend in NHB/R with resolution exists during all time periods. The two outliers in >3.5 Å during 1991-2000 should be attributed to the sparse data (Figure 3). All the results demonstrated that HBs are still significantly underestimated even now when the resolution is worse than 2.0 Å.

ACS Paragon Plus Environment

Page 12 of 47

Page 13 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 3. Observed HB number per residue (NHB/R) in proteins published during different time periods. *The difference between the high resolution range (≤1.5 Å) and the low resolution range

is statistically significant at the 5% level (p 0.20). * and # stand for the same meaning in the following figures.

In addition, NHB/R from different refinement software was calculated (Figure 4). All the software was found to show an overall decreasing trend (Figure 4) as the resolution decreases, demonstrating that all the refinement software underestimates the HBs. Among the software programs, TNT showed a higher recovery rate of NHB/R.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. Observed HB number per residue (NHB/R) in proteins solved by using different refinement software.

Recoverable HBs in proteins at different resolutions. The intrinsic saturation of HB formation (ISHB, hereinafter) is calculated as the ratio of observed NHB (𝑁𝑜𝐻𝐵) and the number of lost HBs (𝑁𝑙𝐻𝐵) over the pseudo HB number (𝑁𝑝𝐻𝐵) (equation 1). ISHB should be constant at different resolutions which is independent of the refinement software, but owing to the imperfect refinement software ASHB should decrease if increasing numbers of HBs are lost as resolution decreases. We proposed that ASHB at high resolution (≤1.5 Å) could be used to reflect the ISHB inside proteins (equation 2). Accordingly, 𝑁𝑜𝐻𝐵 and 𝑁𝑙𝐻𝐵 at low resolution (>1.5 A) could be estimated with equation 3, which we call the total recovered 𝑁𝑟𝐻𝐵. And the corresponding total recovered 𝑁𝑟𝐻𝐵/𝑅 (total recovered HBs per residue) could be estimated with equation 4.

ACS Paragon Plus Environment

Page 14 of 47

Page 15 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

𝐼𝑆HB =

𝑁𝑜𝐻𝐵 + 𝑁𝑙𝐻𝐵

(1)

𝑁𝑝𝐻𝐵 𝑁𝑜𝐻𝐵

𝐼𝑆HB ≈ 𝐴𝑆HB ≤ 1.5 Å = (𝑁𝑝 ) 𝐻𝐵

𝑁𝑜𝐻𝐵

𝑁𝑟𝐻𝐵 = 𝑁𝑜𝐻𝐵 + 𝑁𝑙𝐻𝐵 = (𝑁𝑝 ) 𝐻𝐵

(2)

≤ 1.5 Å

≤ 1.5 Å 𝑁𝑜𝐻𝐵/𝑅

𝑁𝑟𝐻𝐵/𝑅 = 𝑁𝑜𝐻𝐵/𝑅 + 𝑁𝑙𝐻𝐵/𝑅 = (𝑁𝑝

𝐻𝐵/𝑅

∗ 𝑁𝑝𝐻𝐵 )

≤ 1.5 Å

(3) ∗ 𝑁𝑝𝐻𝐵/𝑅

(4)

The results calculated with the equations show that ASHB in 329,841 PDB chains is 0.34 for the ≤2.0 Å resolution range, 0.33 for 2.0-2.5 Å, 0.32 for 2.5-3.0 Å, 0.30 for 3.0-3.5 Å, and 0.28 for >3.5 Å, implying that ASHB is a suitable indicator for the lost HBs. The total recovered NHB/R (𝑁𝑟𝐻𝐵/𝑅), as shown in Figure 5a (blue colored), is 0.88 for 1.5-2.5 Å, 0.87 for 2.5-3.0 Å, 0.85 for 3.0-3.5 Å, and 0.81 for >3.5 Å. The loss of HBs calculated from bootstrapping (see 95% CI of percentage/AS calculated by bootstrapping in Methods for details) by the refinement software is 0.2% for 1.5-2.0 Å, 2.2% for 2.0-2.5 Å, 6.6% for 2.5-3.0 Å, 12.1% for 3.0-3.5 Å, and 16.2% for >3.5 Å (Figure 5a). The 95% CIs for the loss of HBs are 0-0.3% for 1.5-2.0 Å, 2.1-2.4% for 2.0-2.5 Å, 6.4-6.8% for 2.5-3.0 Å, 11.8-12.5% for 3.0-3.5 Å, and 15.8-16.7% for >3.5 Å. The average length of the 329,841 PDB chains is 256 amino acids. Therefore, an average of 0.5 hydrogen bonds (256*0.88*0.1%=0.5) is lost per protein at resolutions between 1.5 and 2.0 Å, while the lost HB numbers are 5.0, 14.7, 26.3 and 33.6 for proteins with resolutions of 2.0-2.5 Å, 2.5-3.0 Å, 3.0-3.5 Å and >3.5 Å, respectively. This result illustrates the significant loss of HBs as the resolution decreases. To explore the effect of tolerance distance on 𝑁𝑝𝐻𝐵/𝑅 and ASHB, different extension distances were tested, which demonstrated that 1.0 Å should be an ideal

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

tolerance for this study (Table S1).

Figure 5. Observed HB number per residue (𝑁𝑜𝐻𝐵/𝑅) in proteins and recovery (𝑁𝑟HB/𝑅). (a) 𝑁𝑜𝐻𝐵/𝑅 in the whole PDB (n=329,841). (b) 𝑁𝑜𝐻𝐵/𝑅 in a refined dataset. (c) 𝑁𝑜𝐻𝐵/𝑅 in PDB-REDO and the corresponding PDB (n=280,678). Blue columns represent the restoration of HBs from 329,841 PDB chains (a), the refined dataset (b), and PDBREDO (c).

ACS Paragon Plus Environment

Page 16 of 47

Page 17 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

NHB/R in a refined dataset. NHB/R could be affected by amino acid composition, protein shape, etc. To exclude these factors, we compiled a small dataset of 31 proteins. Each protein has at least one structure in each of the 6 resolution ranges. The value of 𝑁𝑜𝐻𝐵/𝑅 for the 31 proteins is 0.88 ± 0.09 for 0-1.5 Å, 0.87 ± 0.09 for 1.5-2.0 Å, 0.84 ± 0.10 for 2.0-2.5 Å, 0.80 ± 0.10 for 2.5-3.0 Å, 0.74 ± 0.14 for 3.0-3.5 Å, and 0.72 ± 0.12 for >3.5 Å (Figure 5b). The apparent decreasing trend for the 31 proteins is in good agreement with the data from all proteins (Figure 5a), which further confirms that HBs in proteins are underestimated in the PDB. On the basis of the refined dataset, 𝑁𝑟𝐻𝐵/𝑅 is 0.88 for 1.5-2.0 Å, 0.87 for 2.0-3.0 Å, 0.86 for 3.0-3.5 Å, and 0.85 for >3.5 Å. The loss of HBs by the refinement software is 1.2% for 1.5-2.0 Å, 4.1% for 2.0-2.5 Å, 8.6% for 2.5-3.0 Å, 13.8% for 3.03.5 Å, and 15.4% for >3.5 Å. The 95% CIs for the loss of HBs from bootstrapping are -0.5-3.0% for 1.5-2.0 Å, 2.1-6.1% for 2.0-2.5 Å, 6.9-10.2% for 2.5-3.0 Å, 12.0-15.6% for 3.0-3.5 Å, and 12.2-18.0% for >3.5 Å. Therefore, there are 2.7 hydrogen bonds (256*0.88*1.2%=2.7) lost per protein on average at resolutions between 1.5 and 2.0 Å, while the lost HB numbers are 9.1, 19.2, 30.4 and 33.5 for proteins with resolutions of 2.0-2.5 Å, 2.5-3.0 Å, 3.0-3.5 Å and >3.5 Å, respectively. The percentages of lost HBs based on the refined dataset are larger than those from the whole PDB, but the fluctuations are larger due to the smaller sample size.

HB after re-refinement. PDB-REDO is a popular re-refinement software for improving PDB structures.34-

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

35

A total of 280,678 PDB chains have been re-refined with PDB-REDO version 7.15

which have applied homology-based hydrogen bond restraints.27 As shown in Figure 5c, the value of 𝑁𝑜𝐻𝐵/𝑅 from PDB-REDO is always higher than that from the corresponding PDB structures, suggesting that PDB-REDO does excellent work and recovers some HBs. Impressively, in the high resolution range (≤1.5 Å), the HB number per residue from PDB-REDO (0.90) is higher than that from the PDB (0.88), which is partly attributed to the recovery of missing sidechain atoms in PDB-REDO (Table S2), suggesting that the HBs in the highest resolution range in the PDB are still underestimated. After recovering HBs in PDB-REDO, we found that HBs are still lost at resolutions worse than 2.0 Å, although the loss is less than that from the PDB (Figure 5c and Figure 5a). Remarkably, there are no missing sidechain N/O/S atoms after rerefinement in the resolution range of 1.5-3.0 Å (Table S2), indicating that all the lost HBs result from the inaccuracy of the refinement software.

Other overlooked interaction. Cation-π interactions in proteins are overlooked in PDB. Of the 332,954 protein structures, 194,232 (58.3%) possess at least one cation-π interaction. The observed number of cation-π interactions per residue (𝑁𝑜Cπ/𝑅) in proteins is 0.0087 ± 0.0055 for the ≤1.5 Å resolution range, 0.0082 ± 0.0052 for the 1.5-2.0 Å resolution range, 0.0079 ± 0.0050 for 2.0-2.5 Å, 0.0077 ± 0.0050 for 2.5-3.0 Å, 0.0077 ± 0.0049 for 3.0-3.5 Å, and 0.0077 ± 0.0059 for >3.5 Å, respectively (Figure 6a). Similar to the calculation of the recovered HBs (equation 3), the recovered NCπ/R (𝑁𝑟Cπ/𝑅) is 0.0086 for 1.5-2.0 Å,

ACS Paragon Plus Environment

Page 18 of 47

Page 19 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

0.0085 for 2.0-3.0 Å, 0.0084 for 3.0-3.5 Å, and 0.0082 for >3.5 Å (Figure 6a). ASC-π is 0.16 for the ≤1.5 Å resolution range, 0.15 for 1.5-3.0 Å, and 0.14 for >3.0 Å, and the corresponding 95% CIs are (0.16, 0.16), (0.15, 0.15), (0.14, 0.15), (0.14, 0.14), and (0.13, 0.14) for the resolution ranges of ≤1.5, 1.5-2.5, 2.5-3.0, 3.0-3.5 and >3.5 Å, respectively. The loss of cation-π interactions by the refinement software is 4.8% for 1.5-2.0 Å, 8.0% for 2.0-2.5 Å, 8.7% for 2.5-3.0 Å, 8.2% for 3.0-3.5 Å, and 6.2% for >3.5 Å. The 95% CIs of the loss of cation-π interactions are 4.0-5.1% for 1.5-2.0 Å, 7.5-8.6% for 2.0-2.5 Å, 7.8-9.9% for 2.5-3.0 Å, 6.4-9.7% for 3.0-3.5 Å, and 3.8-8.4% for >3.5 Å, respectively.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Numbers of three other types of NCIs per residue (NNCI/R) in the PDB and their recovery (𝑁𝑟NCI/𝑅). (a) 𝑁𝑜Cπ/𝑅 for cation-π interactions. (b) 𝑁𝑜ππ/𝑅 for π-π interactions. (c) 𝑁𝑜IB/𝑅 for ionic bonds. Blue columns represent the restoration of these interactions (𝑁𝑟NCI/𝑅).

ACS Paragon Plus Environment

Page 20 of 47

Page 21 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

π-π interactions in proteins are overlooked in the PDB. Of the 332,954 protein structures, 69,459 (20.9%) possess at least one π-π interaction. The observed π-π number per residue (𝑁𝑜ππ/𝑅) in proteins is 0.037 ± 0.024 for the 0-1.5 Å resolution range, 0.036 ± 0.023 for the 1.5-2.0 Å resolution range, 0.035 ± 0.022 for 2.0-2.5 Å, 0.034 ± 0.022 for 2.5-3.0 Å, 0.033 ± 0.026 for 3.0-3.5 Å, and 0.026 ± 0.018 for >3.5 Å, respectively (Figure 6b). The recovered π-π interactions per residue (𝑁𝑟ππ/𝑅) are 0.036 for 1.5-2.0 Å, 0.035 for 2.0-3.0 Å, 0.033 for 3.0-3.5 Å, and 0.026 for >3.5 Å (Figure 6b), respectively. ASπ-π is 0.71 for the ≤1.5 Å resolution range, 0.70 for 1.5-2.5 Å, 0.69 for 2.5-3.0 Å, 0.70 for 3.0-3.5 Å, and 0.68 for >3.5 Å, and the corresponding 95% CIs are (0.70, 0.71), (0.70, 0.70), (0.69, 0.70), (0.69, 0.71), and (0.67, 0.69) for the resolution values of ≤1.5, 1.5-2.5, 2.5-3.0, 3.0-3.5 and >3.5 Å, respectively. The loss of π-π interactions by the refinement software is 1.0% for 1.5-2.0 Å, 0.6% for 2.0-2.5 Å, 1.8% for 2.5-3.0 Å, 0.4% for 3.0-3.5 Å, and 2.3% for 3.5-35 Å. The 95% CIs of the loss of π-π are -0.7-2.6% for 1.5-2.0 Å, -1.3-2.4% for 2.0-2.5 Å, -0.6-3.9% for 2.5-3.0 Å, -3.6-4.3% for 3.0-3.5 Å, and -2.4-7.5% for >3.5 Å, respectively.

Ionic bonds in proteins are overlooked in the PDB. Of the 332,954 protein structures, 272,132 (81.7%) possess at least one ionic bond. The number of ionic bonds per residue (𝑁𝑜IB/𝑅) in proteins is 0.0230 ± 0.011 for the 0-1.5 Å resolution range, 0.0225 ± 0.011 for the 1.5-2.0 Å resolution range, 0.0209 ± 0.010 for 2.0-2.5 Å, 0.0191 ± 0.009 for 2.53.0 Å, 0.0170 ± 0.008 for 3.0-3.5 Å, and 0.0149 ± 0.008 for >3.5 Å, respectively (Figure 6c). The recovered ionic bond numbers per residue (𝑁𝑟IB/𝑅) are 0.0235 for 1.5-2.0 Å,

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

0.0231 for 2.0-2.5 Å, 0.0223 for 2.5-3.0 Å, 0.0207 for 3.0-3.5 Å, and 0.0190 for >3.5 Å (Figure 6c). ASIB is 0.52 for the ≤1.5 Å resolution range, 0.50 for 1.5-2.0 Å, 0.47 for 2.0-2.5 Å, 0.44 for 2.5-3.0 Å, 0.42 for 3.0-3.5 Å, 0.38 for >3.5 Å, and the corresponding 95% CIs are (0.51, 0.52), (0.49, 0.50), (0.46, 0.47), (0.44, 0.44), (0.41, 0.42), and (0.38, 0.39), for the resolution values of ≤1.5, 1.5-2.0, 2.0-2.5, 2.5-3.0, 3.0-3.5 and >3.5 Å, respectively. The loss of ionic bonds by the refinement software is 4.2% for 1.5-2.0 Å, 9.7% for 2.0-2.5 Å, 14.3% for 2.5-3.0 Å, 17.8% for 3.0-3.5 Å, and 21.5% for >3.5 Å. The 95% CIs of the loss of ionic bonds are 3.8-5.0% for 1.5-2.0 Å, 9.2-10.3% for 2.02.5 Å, 13.8-15.0% for 2.5-3.0 Å, 16.7-18.7% for 3.0-3.5 Å, and 19.7-23.4% for >3.5 Å.

Overlooked halogen bonds (XBs) calculated by newly defined parameters. The structures of a total of 2,453 protein-organohalogen complexes that could form at least one XB were analyzed. The ASXB values in the protein-ligand interface are 0.13, 0.12, 0.11, and 0.13 for the resolution ranges of ≤1.5, 1.5-3.0, 3.0-3.5, and >3.5 Å, and the corresponding 95% CIs are (0.12, 0.14), (0.11, 0.12), (0.11, 0.13), (0.08, 0.12), and (0.08, 0.21), respectively. The ASXB value in the >3.5 Å resolution range is an outlier due to the sparse data (only 17 structures). The overall decreasing trend in the ASXB values suggests that XBs are overlooked in PDB when the resolution is worse than 1.5 Å, further confirming the conclusion in the previous study.29-30

Discussion

ACS Paragon Plus Environment

Page 22 of 47

Page 23 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Noncovalent interactions (NCIs) play essential roles in the structure and function of biomacromolecules. Among different NCIs, hydrogen bonds (HBs) are the most common and interesting type of interaction in biological systems. Despite attracting massive research in the past 100 years, HBs still hold many unsolved mysteries. We previously showed that the halogen bond is underestimated in the PDB29-30. In this study, for HBs at protein-ligand and protein-protein interfaces, the apparent saturation (ASHB) was calculated for all 6 resolution ranges. Clear decreasing trends were observed, showing the underestimation of HB numbers. The lost HBs in the low resolution could be recovered by QM/MM optimization. The HBs in protein-ligand and protein-protein interfaces are essential to their structure and function, as molecular recognition occurs at the interface. Based on the calculated ASHB, structures with resolutions worse than 1.5 Å should be examined carefully for any study related to protein-ligand or proteinprotein interactions. For HBs in proteins, HB number per residue (NHB/R) decreases as resolution becomes worse, and all the refinement software has an overall decreasing trend. Most importantly, NHB/R is still largely underestimated for resolutions worse than 2.0 Å from 1996 until now, showing the lagging of HB refinement in the past three decades. Because NHB/R is a very simple parameter that could be affected by amino acid composition and protein shape, we compiled a refined dataset with each protein having at least one structure in each of the 6 resolution ranges. The clear decreasing trend observed from the refined dataset is in good agreement with the data from all proteins, indicating that HB formation in proteins is underestimated in the PDB.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For proteins in the PDB, at least 2.2% of HBs are lost at resolutions worse than 2.0 Å, which corresponds to 5.0 HBs per protein on average. Although the re-refinement software could recover some HBs, more than 1.1% of HBs are still lost with a resolution worse than 2.0 Å after re-refinement. In addition, 8.0% of cation-π interactions, 0.6% of π-π interactions, and 9.7% of ionic bonds were lost in the 2.0-2.5 Å resolution range. Small organic molecules usually adopts crystalline conformer which is energetically close to the global minimum (lowest energy) in their crystal structures.36 On the basis of these observations, we hypothesize that the biological systems might be close to a global minimum in the crystal structures. All refinement methods would approach the global minimum, but this goal is difficult to achieve. Therefore, with a clear electron density map (≤1.5 Å), the refinement software will have to adjust the X-ray model to the electron density map and will be more likely to achieve the global minimum, showing the maximum number of NCIs. With a fuzzy electron density map (>2.0 Å), the refinement software will place the heavy atoms more arbitrarily, and the model will be trapped in a local minimum, yielding fewer NCIs in the model.

Conclusion In summary, based on analysis in different resolution ranges, the HBs in proteinligand and protein-protein interfaces are underestimated at resolution >1.5 Å. The lost HBs could be recovered by QM/MM optimization. In addition, HBs in proteins are underestimated at resolutions >2.0 Å, and all refinement software used in the past three decades underestimates HBs. Moreover, Cation-π interactions, π-π interactions, and

ACS Paragon Plus Environment

Page 24 of 47

Page 25 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ionic bonds are also lost at resolutions >2.0 Å. Furthermore, we hypothesized that the biological system might be close to a global minimum (lowest energy) with maximum number of NCIs, which is hard to achieve by the current refinement software. Considering the essential roles of NCIs in protein function and structure, molecular recognition, and drug design, it is urgent to recover the NCIs during X-ray crystallography, molecular modeling and drug design.

Methods Protein Data Bank (PDB). The PDB (January 2018 release, 136,594 biological macromolecular structures in total) was explored. For HBs in the protein-ligand interface, the criteria are similar, though not identical, to those used by Matthias Rarey et al.6, with experimental method = Xray, molecule type = protein (exclusion of RNA and DNA), and ligand = true, resulting in 93,714 structures that were further refined in PyMOL with the criteria “ligand is organic and contains only the elements C/H/N/O/S”, yielding 77,167 protein−ligand structures, and the biggest ligand in each structure was used for further ASHB calculation. For HBs in proteins, 332,954 PDB chains with UniProt IDs were analyzed, which corresponded to 37,160 unique proteins and 117,795 PDB structures. 99.1% (329,841 of 332,954 PDB chains) PDB chains could form at least one HB. The structures that could not form HBs are either only remaining Cα atoms (e.g., PDB ID: 1A1Q) or with a very small chain length (e.g. chain C in 1A5H has only 7 residues).

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 47

Protein-protein interface (PPI). The most probable solutions to the biological assembly (the functional form of the macromolecules)

are

downloaded

from

PDBePISA

(http://www.ebi.ac.uk/pdbe/prot_int/pistart.html),37 which yielded 77,319 coordinate (PDB-formatted) files. The buried area in the complex is defined as the solvent accessible surface area of one subunit plus that of the other subunit minus that of the complex.38 On average, the buried area of the PPI is 1320 ± 520 Å2 (mean ± SD) of protein surface.39 Therefore, the criterion to define a PPI is above 1320-520=800 Å2. The biological assembly are composed of all kinds of oligomeric states, e.g. homodimers, heterodimers, trimers, hexamers, etc. For a biological assembly, each two subunits with the buried area above 800 Å2 is defined as a PPI. In total, 167,956 PPIs are generated for the HB calculation. Among which, 162,939 PPIs could form at least one HB.

Detection of hydrogen bonds (HBs). The hydrogen atoms for all the structures are constructed by HBPLUS v. 3.2.

40

The

HBs in the structures of protein-ligand interfaces, protein-protein interfaces, and proteins, resolved using X-ray crystallography were determined by LIGPLOT v.4.041, DIMPLOT41, and HBPLUS v. 3.240 with the following parameters, respectively, i.e., d(D-A)90, and (H-AAA)>90 where AA stands for acceptor antecedents (Figure S3).

ACS Paragon Plus Environment

Page 27 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Apparent saturation of HB (ASHB) For pseudo HBs in protein-ligand interfaces, protein-protein interfaces, and proteins, the criteria are d(D-A)3.5 Å). 𝑑=

𝑠𝑝𝑜𝑜𝑙𝑒𝑑 =

𝑚ℎ𝑖𝑔ℎ ― 𝑚𝑙𝑜𝑤 𝑠𝑝𝑜𝑜𝑙𝑒𝑑

(𝑛ℎ𝑖𝑔ℎ ― 1)𝑠ℎ𝑖𝑔ℎ2 + (𝑛𝑙𝑜𝑤 ― 1)𝑠𝑙𝑜𝑤2 𝑛ℎ𝑖𝑔ℎ + 𝑛𝑙𝑜𝑤 ― 2

where mhigh is the mean of the number of interactions per residue in the high resolution range; mlow is the mean of interactions per residue in the low resolution range; n is the sample size, and s2 is the sample variances. Cohen considered effect sizes as small, medium and large (d = 0.2, 0.5, and 0.8, respectively).33, 57 Therefore, a d value greater than 0.2 is considered not small in this study. 𝑑2 σ(𝑑) = + 𝑛ℎ𝑖𝑔ℎ × 𝑛𝑙𝑜𝑤 2(𝑛ℎ𝑖𝑔ℎ + 𝑛𝑙𝑜𝑤) 𝑛ℎ𝑖𝑔ℎ + 𝑛𝑙𝑜𝑤

The 95% confidence interval (CI) of the effect size is:

ACS Paragon Plus Environment

Page 30 of 47

Page 31 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(𝑑 ―1.96 × σ(𝑑), 𝑑 +1.96 × σ(𝑑)) A 95% CI for effect size equals a 5% alpha error level for effect size.58 If the confidence interval of an effect size includes zero, then the result is not statistically significant at the 5% level (p>0.05).58 On the other hand, if zero is outside the range, then it is statistically significant at the 5% level (p