Chiral Ramachandran Plots II: General Trends and Protein Chirality

Ocean plastics pact under fire. More than 290 corporations responsible for 20% of the world's plastic packaging have signed the New Plastics... POLICY...
1 downloads 0 Views 2MB Size
Subscriber access provided by University of Sunderland

Article

Chiral Ramachandran Plots II: General trends and proteins chirality spectra Huan Wang, David Avnir, and Inbal Tuvi-Arad Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.8b00974 • Publication Date (Web): 22 Oct 2018 Downloaded from http://pubs.acs.org on October 23, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Chiral Ramachandran Plots II: General trends and proteins chirality spectra Huan Wang, †, ‡ David Avnir †* and Inbal Tuvi-Arad‡* †

Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel ‡

Department of Natural Science, The Open University of Israel, Raanana, 4353701, Israel

ABSTRACT: The degree of chirality of protein backbone residues is used to enrich the Ramachandran Plot (RP), and create 3D chiral RPs with much more structural information. Detailed comparative analysis of the four classical RPs - general, glycine, proline and pre-proline - is provided, including statistical analysis of quantitative chirality distributions in the maps and in the secondary structures. Our results show that points with outlier chirality levels represent special transitional points in the folded protein such as -helix kinks, twists of β-strands, and transition points between secondary structures. A protein chirality spectrum in which the degree of chirality of each residue is plotted against the sequence number explores these special points. Over 65,000 residues extracted from 200 high quality proteins are used for this study which shows that quantitative chirality is a general and useful structural parameter for proteins conformational studies. 1 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 31

INTRODUCTION The three-dimensional (3D) structure of proteins plays a key role in determining their biological functions. Understanding the complexity of the structure is particularly challenging because of the extremely rich library of conformers that characterize the folded backbone. Furthermore, even in regular repeated secondary structures such as αhelices and β-strands, distorted points exist, e.g. kinks in α-helices and bends/twists in βstrands. One of the more widely used tools for proteins conformational analysis and backbone geometry validation has been the Ramachandran plot (RP), with five decades of intensive use.

1, 2

RPs are two-dimensional (2D) maps that represent the relation

between two dihedral angles, ϕ (defined by the atoms Ci−1−Ni−Cα,i−Ci) and ψ (defined by the atoms Ni−Cα,i−Ci−Ni+1) of the polypeptide backbone, involving three adjacent amino acid residues (Figure 1). RPs may present the data of a single selected amino acid (such as glycine (Gly)3 or proline (Pro) 4), or a group of amino acids (such as the pre-Pro4, 5 amino acids, or all amino acids except Gly, Pro and pre-Pro, referred to as the general (Gen) plot 4), or libraries of proteins, structures.

9

6-8

and sub-sets such as specific secondary

The minimalism of these plots is their strengths – they are easily obtained

and provide, at a glance, structural and qualitative information. However, this is also their weakness, because the conformational information provided by the RP is partial and relies only on two dihedral angles. Enriching the information content of the RP is therefore of need. Selecting an additional structural parameter poses a challenge, because a full description of a specific conformer requires taking into account all angles and bond lengths and the question is of course, which? We have recently suggested that a suitable 2 ACS Paragon Plus Environment

Page 3 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

parameter for that purpose should be based on a structural property which is common to all of the protein building blocks, namely that all are chiral including the Gly residue.10 The central characteristic of this conformational chirality is the fact that its degree varies according to the specific local set of angles and lengths. Calculations of the degree of chirality are based on the Continuous Chirality Measure (CCM)11-14 methodology (described in Methods) that expands the concept of chirality from a binary property to a continuous and robust geometrical descriptor, and is by now a well-developed methodology which has been intensively used in a wide range of chirality associated phenomena.15-18 It is perfectly suitable for our purpose as a third RP parameter, as it solves the problem of multiple specific geometrical parameters by wrapping into a single parameter all bond angles and lengths. Our implementation of the CCM methodology involves the evaluation of the degree of chirality (the CCM value) of the Ramachandran unit (RU), namely the atoms C(i−1)−Ni−Cα,i−Ci−N(i+1) of each amino acid (see Figure 1), which is common to all of the protein building blocks, thus extending the RP from a two parameters 2D plot to a richer three-parameters 3D presentation. A preliminary exploration of this quantitative approach was carried out on the classical Gly RP.10 It was found that Gly, which is routinely tagged as the only achiral amino acid, is in fact always conformationally chiral with the degrees of chirality which are comparable to the CCM values of RUs of the (chiral) amino acids.

Figure 1. The Ramachandran unit (RU) in red and a pair of dihedral angles (ϕ, ψ). 3 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 31

Here we first provide, in a comparative way, the chiral maps of the four classical RPs for the General (Gen), Gly, Pro, and pre-Pro case. More than 65,000 amino acid residues extracted from 200 high quality proteins data are used for that purpose. In the Method section we specify the criteria used for selecting these proteins and procedures for cleaning them prior to the CCM calculation. In what follows, we describe the theory of the continuous chirality measure, and provide detailed analyses of the chiral RPs, of the distributions of the chirality values and their relation to the secondary structures. By extracting general trends of the RU, we emphasize unique features of the four plots. Importantly, these statistical analyses serve also for clearly pointing out the interesting chirality outlier points in the RPs, that is, points representing residues that have unexpectedly high or low chirality values, which deviate largely from the expected statistical observations. These outliers, (that should not be confused with the common RP outliers representing unusual values of the dihedral angles7) are generally buried in the chiral RPs but are clearly seen in the statistical analyses, and contain conformational information on critical points in the folded structure. To highlight their significance, we set up to focus on the potential predictive structural information they might carry. For that purpose, we have introduced the protein chirality spectrum, a graph which follows the variations in the chirality values along the entire backbone chain with its secondary structures. Finally, we show the predictive power of that spectrum in identifying joint points of conformational changes, such as the edges between the secondary structures or kinks in α-helices,19, 20 which play important functional roles in ion channels.21, 22

4 ACS Paragon Plus Environment

Page 5 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

METHODS Continuous Chirality The continuous chirality measure (CCM) determines the distance of a molecular structure from its nearest achiral counterpart.11, 12 The calculation is based on searching for all structures with the same connectivity that belong to one of the achiral point groups (G = Cs, Ci, or higher order Sn with n = 4, 6, 8, ...), choosing the one with a minimal distance to the original structure, according to: S G  

N 100  min Qk  Pk  d2 k 1

2

(1)

Here Qk  is the set of coordinates of the studied molecule, Pk  is the set of coordinates of the corresponding achiral structure, N is the number of atoms in the molecule, and d2 is a size normalization factor – the sum of the squared distances of each atom of the original molecule from the center of mass, Q0  : N

d 2   Q k  Q0

2

(2)

k 1

The CCM is a special case of the more general Continuous Symmetry Measure (CSM),11, 13, 14

which determines the deviation from symmetry of molecules for a given G in Eq. (1)

on a continuous scale (note that for the CSM, G is any point group). The values of the CCM range between 0, for a structure with G achiral symmetry, and 100 when the nearest achiral structure reduces to the center of mass. In practice, it is more probable that the nearest achiral structure of a highly chiral molecule will reduce to a planar structure and not to a point in space, therefore CCM values are generally smaller than 100. For the 5 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 31

RU of amino acids in proteins the range of CCM is between 0 and c.a. 7 and the nearest achiral structure is planar. In other words, it is enough to set G = Cs in Eq. (1) since searching for higher symmetries (i.e., Ci, S4 and so on) yields higher CCM values for the RU. Several properties of the CCM are important to note: (1) First, as described in the Introduction, it is a collective variable of the geometry that takes into account all of the bond lengths, bond angles and dihedral angles of a molecule. As such it is a non-injective surjective function, that is, different molecular geometries can be described by the same CCM value. From that point of view, the CCM resembles an intensive thermodynamic property. (2) It is size-normalized and therefore its values are comparable between different molecules. (3) It is always a positive parameter which does not indicate a specific handedness. (4) Being a function of the coordinates, the CCM can be applied to the whole molecule as well as to its fragments, as applied here (see the section on Cutting Proteins into Units below).

The Proteins Database Selection Criteria. Coordinates of 200 proteins used in this work were extracted from the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB).23, 24 In order to assure that only high-quality proteins with minimal statistical bias will be selected, several criteria for filtering the proteins were set: (i) The experimental method was X-ray crystallography with a resolution of 1.6 Å or better, and an “Excellent” grade as defined by FirstGlance in Jmol25; (ii) Only monomeric chains were used, and if the PDB file included oligomeric multiple peptide chains, the longest chain 6 ACS Paragon Plus Environment

Page 7 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

was chosen, or the first chain was used in cases of equal chain lengths. (iii) DNA, RNA or hybrid chains were filtered out. (iv) PDB files with averaged B-factors, representing the mean square isotropic displacement of each atom, higher than 30 Å2 were excluded; (v) Rfree grade, which measures the quality of fitting a simulated diffraction pattern to the analyzed experimental diffraction pattern, was at least “Much better than the average at their resolution” according to FirstGlance in Jmol25; (vi) The database was further filtered by checking the organism source; if the proteins obtained from the same organism source shared identical leading-codes of the PDB IDs, we double checked their structures and took one of them. For instance, both 5ARC and 5ARD, which fulfilled the above five criteria, were obtained from Bacillus lentus, and shared quite similar structures. We selected 5ARC as the candidate. A full list of the proteins used in this work is provided in the Supporting Information (SI).

Cleaning the PDB Files – The “PDB_cleaner”. Before calculating and analyzing data, each protein in our database was cleaned by means of our home-built Python script, “PDB_cleaner”

(available

on

https://github.com/LePingKYXK/PDB_cleaner),

for

deleting ligands, solvents, non-coordinates lines (e.g. ANISOU data representing anisotropic temperature factors) from the ATOM section in the PDB file. Specifically, if some amino acids were labeled with alternate locations, the PDB_cleaner chooses the first one. Additionally, amino acids with missing atoms were removed.

Cutting Proteins into Units – The PDBslicer. Our home-built Perl program, PDBslicer, was used to extract the Ramachandran Units (RU) of Gly (5,440), Pro (3,214), pre-Pro 7 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 31

(2,871), and Gen (54,052) from the 200 proteins in our database and calculate their CCM. Terminus residues were discarded since either their ϕ or ψ angles are not defined. Similarly, residues adjacent to a sequence gap in the PDB file were discarded.

The Chiral Ramachandran Plots of the full set of 200 proteins were constructed by plotting the RP and coloring each point according to the CCM value of the corresponding RU. A 3D version of these plots was constructed by using the CCM as the Z-axis. The dihedral angles ϕ and ψ, and the secondary structures assignments of each residue were retrieved using the Dictionary of Secondary Structures of Proteins (DSSP)26, 27.

The Chirality Spectrum was obtained by plotting, for a single protein, the CCM of each RU as a function of the corresponding sequence number. All data figures were plotted by means of OriginPro 2018.28

RESULTS AND DISCUSSION The Chiral Ramachandran Plots We begin with a typical 3D chiral RP of an RU carrying a specific amino acid from the Gen group, and selected serine (Ser) for that purpose, see Figure 2. A collection of several other chiral RPs is provided in Figure S1 of the SI. The newly added conformational information – the degree of chirality of the RU (the CCM value), ranging between close to 0 (near achirality) and ~7 (highly chiral) – is represented by the color code on the right, from blue to red, respectively. The projection of the 3D presentation is the familiar 2D RP, to which the CCM values were added with the same color code. Let 8 ACS Paragon Plus Environment

Page 9 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

us now overview the features seen in Figure 2; it will also serve for the analysis of the four classical RPs (Gly, Pro, pre-Pro, and Gen), below. Figure 2 clearly shows that the degree of chirality of the Ser-RU is far from being uniform throughout the map, and that there are regions where the conformational chirality is high (yellow and red, top of quadrant III and bottom of quadrant I), large regions which have medium level chirality (green), while near-achirality (blue) is concentrated mainly on the top left of quadrant II. Typical actual conformers which are represented by the three parameters, ϕ, ψ and the CCM are shown as well. It is seen that, in general, near achirality is characterized by a RU that is near-planar, the high chirality values originate mainly from helical conformers, and that intermediate values (green) indeed display transitional conformers, i.e. from planar to helical. Secondary structure analysis shows that most of the highly chiral conformers reside in the α-helix region ((ϕ, ψ) ∈ (−65°,−45°)) (which is therefore at the focus of the final section of this report), and that most of the low chirality conformers reside in the near planar β-ladders region where ϕ ∈ (−180°, −55°), ψ ∈ (60°, 180°) that is, these secondary structures impose their characteristic structural features – helicity and planarity - on the RU, resulting in high and low chirality values, respectively. We note that perfect achirality is obtained when (ϕ, ψ) equals either 0 or 180 . However, the former region is not populated in Figure 2 due to steric constraints10. When the RUs of the other amino acid residues of the Gen group are similarly analyzed, one finds out generally similar chirality distribution maps, and several examples are presented in Figure S1 of the SI. It is interesting that despite the diversity of side chains of the 18 amino acid residues, the similarity in both the (ϕ, ψ) distributions and the corresponding chirality values distributions at the (ϕ, ψ) coordinates indicate the 9 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 31

insensitivity of the conformational chirality of the RUs to the side chain. That apparent similarity therefore enables one to construct and use the chiral Gen RP as a unified general representation of chiral nature of the protein building blocks, much as the original black-and-white 2D RP has been used as a general unifying reference for protein studies and structures evaluation. This general chirality map is shown in Figure 3C, based on 54,052 RUs extracted from 200 proteins, as described in the Methods. The 3D version of the chiral RP of RU of Gen is shown in Figure S2 of the SI.

Figure 2. The 3D presentation of the chiral Ramachandran plot of Ser (3763 RUs from 200 proteins). Typical chiral RU conformers are shown, along with the (color) CCM value, and the secondary structure where they reside. H = α-helices; E = extended βstrands; T = Hydrogen bonded turns; C = loops and irregular elements. The projection (the “floor”) shows the familiar 2D RP presentation to which the chirality color code was added.

10 ACS Paragon Plus Environment

Page 11 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Excluded in the Gen RP are the highly flexible Gly, the rigid Pro and the pre-Pro residues, features which lead to their unique data point distributions in RP, and markedly differ from the other 18 amino acids. The chiral RP of Gly has been studied and analyzed in detail previously10 and is reproduced here (with an improved larger set of data) for comparative purposes (Figure 3A). In brief, it shows that Gly is almost always conformationally chiral at least to some degree with a nearly 2D C2-symmetry, not only for the distribution of the data points but also for the chirality values. The reason is that for a given specific (ϕ, ψ) point, a near-enantiomer is possible (although relatively rare) with the opposite near (−ϕ, −ψ) values. This map, unlike the others, has data points in all four quadrants, a result of the lack of a side chain in Gly which leads to the almost unrestricted flexibility of Gly-RU. For the other three chiral RPs (Figure 3B to 3D) such symmetry does not exist, i.e., their geometry is confined to certain regions of the RP. The RU of Pro chiral RP is shown in Figure 3B and it is clear that its significantly lower conformational flexibility results in a much more limited distribution. Again, we see that even for these extreme limits of flexibility-rigidity in the amino acids family, the RU chiral conformations seem to be affected by the nature of the side chain in only a limited way. Completing the set of the classical RPs is the chiral map of pre-Pro (Figure 3D). In this map we are back to the set of the 18 amino acids as in the Gen RP (Figure 3C), but now they are located in a preceding position of Pro (that is, linked to the N atom of Pro). Due to the steric hindrance of the Pro ring, the distribution of pre-Pro map is relatively restricted compared to the Gen case.

11 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 31

Figure 3. The Chiral RP of the Ramachandran units (RUs) of (A) Gly (5440 RUs), (B) Pro (3214 RUs), (C), Gen (54052 RUs) and (D) pre-Pro (2871 RUs). Insets are the corresponding RUs (in red). Their 3D presentations are shown in Figures S2 to S5 of the SI.

Let us now summarize some of the common features of the four chiral RPs: (i) The range of CCM values is similar in all four maps, spanning from close to zero to up to 7. 12 ACS Paragon Plus Environment

Page 13 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(ii) Similar to the Ser map (Figure 2), the most chiral conformers reside in a restrict region (in red) around the -helix area with (ϕ, ψ) = (−65°, −45°), and from there the chirality of the conformers gradually decreases (from orange to green), roughly parallel to the diagonal. (iii) Many low chirality conformers (in green) spread over the relatively wide area, of ϕ ∈ (−180°, −55°), ψ ∈ (60°, 180°) characterized by near planar β-ladders geometries. (iv) The most striking similarity of the four chiral RPs is apparent when one notes that they can roughly be overlapped, in their limited span and in their color codes (chirality values). In other words, the chiral RPs are advantageous as they provide an intuitive presentation of the correlation between conformational chirality of the backbone building blocks and their secondary structures, in agreement with various secondary structures assignments on the classical RPs.9

Statistical Analyses of the Chirality Values The chiral RPs provide an overall presentation of the general features of chirality distributions. Higher resolution statistical analyses of the CCM values are now in order, as these may reveal finer details on the conformational interpretation of the CCM values of the RUs. In particular, we are interested in identifying outliers of the CCM distributions − if such exist. Our hypothesis is that the regular repeated structural patterns along the protein backbone, i.e., -helices and β-strands, should have constrains on the CCM values, and therefore residues with an outlier CCM value may carry interesting structural information on the protein backbone in their specific location, detectable by this chirality analysis. The statistical analyses are carried out in two ways: in relation to the chiral RPs and in relation to the secondary structures. Figure 4 presents the CCM 13 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 31

distributions calculated for the RUs of the four types of the chiral RPs both as relative frequencies curves (Figure 4A) and as box-and-whisker plots (Figure 4B).

Figure 4. (A) The CCM distributions of the RU for the four types chiral RPs: Gly (red), Pro (blue), Gen (black), and pre-Pro (green). (B) Box and whisker plots of the CCM distributions. The boxes show the distribution density range between 25% and 75% percentiles (the interquartile range, IQR); median values are represented by the horizontal line inside the boxes. The lower and upper whiskers are the horizontal lines located at the bottom and top the boxes, respectively. The minimum and maximum values are marked with ▲ and ▼ below and above each box, respectively. Bin size of all data was set to 0.2. Lower and upper whiskers were defined by Eq. (3). The values of 25% percentile, median, 75% percentile, and lower and upper whiskers are labeled aside. The number of residues in each distribution appears on the top of corresponding box. All values are collected in Table S1 of the SI. 14 ACS Paragon Plus Environment

Page 15 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

The lower and upper quartiles of the boxes in Figure 4B are the first (25%) and third (75%) quartiles denoted as Q1 and Q3, respectively. The whiskers are defined by:  Upper Whisker  Q3  1.5   Q3  Q1    Lower Whisker  Q1  1.5   Q3  Q1 

(3)

It is seen that each curve in Figure 4A has two peaks around the CCM values of 3 and 6. These represent the two most abundant CCM values in the chiral RPs, as can be seen also by a general inspection of Figure 3. Except for the double-peak feature which is common to all curves, several differences are noticeable: (i) The Gly (red) curve is spread wide and is relatively evenly distributed, compared to the other three. This is a result of the flexibility of the Gly RU, as discussed above. We draw again the attention to the clear fact that unlike its standard label (Gly is an achiral amino acid), conformationally it is at least as chiral as the other amino acids. (ii) Compared to Gly, the double-peak of Gen (black curve) is shifted to higher CCM values, while the double-peak of Pro (blue curve) shifts to lower CCM side. These shifts apparently reflect the effects of the specific sidechains: A whole library of many side chains for Gen, but a restrictive planar side chain for Pro. (iii) Although pre-Pro is similar to Gen in its collection of 18 amino acid, the effect of the neighboring Pro is visible: the double-peak of pre-Pro is downshifted to lower CCM values, and the peak centered around 3 is much more dominant. Next, we present and analyze the CCM distributions within the secondary structures. As seen in Figure 5, the median CCM in -helix structures (H, red bar in the box) is generally higher than any other secondary structure, regardless of the type of amino acid residue, with values of 5.6 for the RU of Pro, 5.7 for the RU of pre-Pro, 5.8 for the RU of 15 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 31

Gly and 5.9 for the RU of Gen (see Table S2 in the SI for details). Referring to Figure 4A, the peaks of high CCM values are therefore clearly assigned to the -helices. On the other hand, the median CCM in β-strands (E, blue bar in the box) is of much lower CCM values and less confined, ranging from 2.6 for the RU of Gly and up to 3.5 for the RU of Gen (see Table S2 in the SI for details). Values for other secondary structures (e.g., coils, turns and bends) are similar to those of the β-strands. We also note that the median values are relatively similar to the mode in each secondary structure, except for the cases of a bimodal distribution such as for the bend and turn structures of Pro. These observations provide an interpretation for the bimodal distribution seen in Figure 4A: the lower peak is a mixture of β-strands, coils, turns, etc., while the higher peak is mainly attributed to the helices.

Figure 5. CCM distribution analysis of RU within the secondary structures presented by box and whisker plots. (A) Gly, (B) Pro (C) Gen, and (D) Pre-Pro. Secondary structure 16 ACS Paragon Plus Environment

Page 17 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

letter/color codes are: C: irregular coil (black), E: extended β-strands (blue), S: bend (green), T: turns (light blue), H: α-helix (red), and O: other secondary structures (orange) with small amount of data including B: β-bridge, G: 310 helix, and I: π-helix. The total number of RUs for each group is presented above each box. See Figure 4 for explanation of box and whiskers symbols. Detailed statistical data is listed in Table S2 of the SI. It is interesting to note the general resemblance of Figures 5A (RU of Gly) and 5C (RU of Gen), which shows that from the point of view of chirality distribution profiles, these two classical RPs share common features except the lower CCM region of C, E, and S. Having in mind that Gly has no side chain, this similarity, on the one hand, indicates that the chirality of the backbone after the folding has been achieved is robust to variations of the side chain. On the other hand, the differences in the CCM distribution in the C, E, and S reveal that Gly has a higher population in the flexible secondary structures, resulting in a much wider interquartile range (Q3 − Q1). This should be stated with caution though as it is clear that the Gen type is a mixture of 18 different amino acids. On the other hand, the CCM distribution of the RUs of Gly (Figure 5A) and Pro (Figure 5B) are very different, again, reflecting the flexibility of the Gly and the rigidity of Pro. This rigidity confines the RU of Pro to a limited range of the dihedral angle  on the RP (Figure 3B), and likewise limits the range of CCM values in all secondary structures. Regarding the RUs of pre-Pro (see Figure 5D and Table S2 in the SI), its median of the CCM values in each secondary structure is close to that of the Gen. However, the limiting rigidity of Pro shows up in the interquartile range (Q3 − Q1) of the CCM values of pre-Pro, which are of a much narrower range.

17 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 31

The Predictive Power of Proteins' Chirality Spectra The identification of general chiral features in the secondary structures (previous sections) leads now to a closer look at conformations with atypical chirality that appears as outlier points in Figure 5: These points clearly deviate from the general trends (outside of lower and upper whiskers), which may indicate special locations in the backbone folding. We performed statistical analysis on CCM values of the RUs for distinguishing outliers of specific secondary structures, focusing on α-helices and extended β-strands. In order to understand the conformational information provided by the outliers, we introduce now the protein chirality spectrum (see Figure 6A), a plot presenting the CCM value of the analyzed unit (the RU in our case) as a function of the sequence number of the residue in the backbone of a single protein, with a simplified three-colors coding for the secondary structures, and with color stripes denoting CCM-boundaries (defined by Eq. 3) for -helices and β-strands; outside of these boundaries, clearly observed outliers of specific secondary structures reside. The specific protein taken for this example is formylglycinamide synthetase (PDB ID: 4LGY). Figures 6B to 6D show enlarged segments of the chirality spectrum, where α-helices or extended β-strands feature outliers. It is seen in Figure 6B (Ser-398, Gly-401, and Gly-406) and 6C (Ile-723) that the outliers indicate border points at the edge of a secondary structure segment in which the secondary structure is changing. Interestingly, kink points in α-helices usually found in transmembrane and soluble proteins and identified by kink angles,19,

20, 29

are clearly

detected by the CCM outliers of the α-helix (see Trp-985 and Ala-986 in Figure 6D). Similarly, a bend conformation in β-strand is detected by the Val-973 CCM outlier (Figure 6D). 18 ACS Paragon Plus Environment

Page 19 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

The generality of this structural analysis tool is shown in Figure 7, for which we picked four different polypeptide segments (two α-helices and two β-strands) from four different proteins. Note again that the CCM outliers are capable of detecting transitions at the edges of the secondary structures, e.g., residues Gly-189 and Ala-198 of the α-helix in Figures 7A1 and A2, and Asp-908 and Ser-911 of the β-strands in Figures 7D1 and 7D2. In addition, the kink point in α-helix, e.g., residue Asp-379 in Figures 7B1 and 7B2, and the bends/twists in β-strands, e.g., Thr-912 and Tyr-913 in Figures 7D1 and 7D2 are also clearly identified. Why do α-helix outliers represent the edge or kink points? It turns out that the helicity is the maximal chiral geometry in the secondary structures, inducing its pronounced chirality on the RU (see Katzenelson et al.16 for a discussion of the CCM of helices). Any conformational adjustment needed at kink requires a flattening disruption of this maximal helicity, thus reducing its chirality. Also, residues at the edges of a helix (some of them still defined as a helix by the DSSP algorithm) already start to stretch out in order to adjust for the structural change towards lower chirality, since all other secondary structures are generally less chiral than helices. The case of β-strands is richer due to higher diversity in their conformations.30 The ideal planar β-strand (i.e. the achiral conformer) is relatively rare. Most of the β-strand conformers are bent or twisted in 3D space, such as the well-known β-barrel structure in the green fluorescence protein. This is why the median CCM of β-strands is around 3. Bend/twist in β-strand requires conformational change on a RU that pushes it away from planarity and increases its chirality. Oppositely, the residues at the edges of a β-strand or edges of twisted ribbon are generally flatter, leading to lower CCM values. 19 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 31

In general, different datasets of proteins as well as various methods of predicting kink points along a protein backbone results with very different percentages of kinks in helices structures. The phenomenon is thus described as either rare or very common, and the variety of numbers ranges from 6% to 64%28. Kinks abundance is commonly estimated as the percentage of helix segments with a kink point relatively to the total number of helix segments. These are generally determined by the angle of a helix (which at a kink point is larger than 20°), yet the accuracy of its calculation is strongly dependent on the length of the helix segment, resolution of the protein and other factors. Here we provide a simpler estimate in terms of the number of kinked residues in the total number of residues with -helix structure. Our dataset of 200 proteins contains 20410 RUs that can be classified as H residues (which amounts to 31% of the total number RUs), out of which 1597 RUs have an outlier chirality degree (4.8  CCM or CCM  6.9). Under the very crude approximation that all represent kinks, we obtain an upper limit of 7.8% residues located at kink points. A more reasonable estimation will be to assume that each helix segment includes three outliers – two at the two edges of the segment and one as a kink (e.g., panel D in Figure 6). Dividing the number of outliers by three we obtain an estimation of 2.6% kinks. This number should be considered as an upper limit, as not all helices are kinked, (e.g., panel C in Figure 6, panel A1 in Figure 7). Finally, we note that in Figure 5B, Pro has very few outlier points in -helix, which is consistent with its being found outside of the helix kink, that is, near the kink point, but not the point itself.29 In this sense the CCM is a sensitive collective variable of the geometry that can be used to predict and describe conformational changes of specific residues as well as structural changes of a complete 20 ACS Paragon Plus Environment

Page 21 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

protein sequence, hidden in the DSSP data. Quantifying the predictive power of the protein chirality spectrum is a topic of ongoing research in our group.

21 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 31

Figure 6. (A) The chirality spectrum of a formylglycinamide synthetase (PDB ID: 4LGY) in the middle panel (the break refers to a sequence gap in the original PDB file, between residues 448 and 466). Enlarged sections of it appear at the top and bottom panels. (B) A 22 ACS Paragon Plus Environment

Page 23 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

β-strand segment ranging from residue 382 to 422; (C) An α-helix segment ranging from residue 705 to 730; (D) A segment featuring both an extended β-strand and an α-helix between residues 969 to 998. Red dots: α-helix (H) residues, blue dots: extended β-strand (E) residues, gray dots: all other secondary structures. Horizontal red-shaded and blueshaded areas represent the boundaries of H and E, respectively. Ball and stick models denote the RUs at the joint points: bend (e.g., Glu-394 and Lys-395 in panel B), transition to a new secondary structure (e.g., Ser-398 in panel B, Ile-723 in panel C, and Glu-976 and Arg-996 in panel D) and kink points (e.g., Trp-985 and Ala-986 in panel D).

23 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 31

Figure 7. The chirality spectra of selected segments from four different proteins (A1) Phosphate-binding protein PstS (PDB ID: 2Q9T), sequence segment: 187 to 201, (B1) Fructose-1,6-bisphosphate aldolase/phosphatase (PDB ID: 3T2C), sequence segment: 372 to 389, (C1) Non-haem bromoperoxidase BPO-A2 (PDB ID: 1BRT), sequence segment: 115 to 124, and (D1) Endo-alpha-N-acetylgalactosaminidase (PDB ID: 5A57),

ACS Paragon Plus Environment

24

Page 25 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

sequence segment: 902 to 917. The corresponding secondary structure segments are shown on the right-hand side in panels A2 to D2. See Figure 6 for the explanations of the colored dots and cartoons.

CONCLUSIONS In this report we have used quantitative chirality – the Continuous Chirality Measure (CCM) – as a predictor of joint points in the backbone of folded proteins where a conformational variation indicates a change in the direction, torsion or type of the secondary structure. The use of the CCM in proteins research addresses an inherent problem in the structural analysis presentation of proteins conformations, namely that too many specific structural parameters are needed for that description – dihedral angles, bond angles and bond lengths. The CCM approach is advantageous here as it is able to encompass many parameters into a single one – the degree of chirality. The usefulness of this parameter has been shown in this report, first by expanding the classical RPs with an additional conformational descriptor, showing that chiral RPs can be used as unifying protein plots, much as the regular RPs do. It is quite a non-trivial observation that the CCM allows unification of structural protein information, as the RP dihedral angles ϕ and ψ do. The second, related useful tool is the protein chirality spectrum, which is a graph of the local CCM value as a function of the location of the investigated unit in the backbone sequence. When secondary structures information is superimposed on it, a fast-analytical tool is obtained, which gives at a glance immediate information on the secondary structure unit – how ordered is it, and by how much does it deviate from its average characteristics. As the CCM calculations are done per residue, the ability of the chirality

ACS Paragon Plus Environment

25

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 31

spectrum to predict joint points is unrelated to the length of a secondary structure segment, which contributes to the accuracy and precision of the method. The analysis described in this report is by no means limited to the Ramachandran backbone unit, and the principles described here, can be applied to any other unit of choice. In that spirit, our next efforts are aimed to an even greater challenge, and that is, the conformational analysis of the side-chains of amino acids of the proteins. Here the complexity is by far larger compared with the backbone unit, but we believe that the tools described here, can stand that challenge.

ASSOCIATED CONTENT Supporting Information The Supporting Information is available free of charge on the ACS Publications website at DOI: XXXXX A PDF file with a list of 200 PDB IDs used in this work, 3D chiral RPs of four selected residues (Arg, Glu, Phe, Val), Gen, Gly, Pro and Pre-Pro 3D chiral RPs with the RUs, and two tables of the statistical analysis results.

AUTHOR INFORMATION Corresponding Authors *(I.T.-A.) E-mail: [email protected]. *(D.A.) E-mail: [email protected].

ACS Paragon Plus Environment

26

Page 27 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

ORCID Huan Wang: 0000-0002-4931-3898 David Avnir: 0000-0001-6307-1023 Inbal Tuvi-Arad: 0000-0003-0418-9915 Funding Supported by the Israel Science Foundation (grant no. 411/15). Notes The authors declare no competing financial interest.

ACKNOWLEDGMENT The Authors thank Sagiv Barhoom (The Open University of Israel) for his works on developing the PDBslicer program and thank Itay Zandbank and Devora Witty (The Scientific Software Company, Israel) for their crucial help in developing and improving the CSM and CCM programs.

REFERENCES (1) Ramachandran, G. N., Ramakrishnan, C., and Sasisekharan, V. (1963) Stereochemistry of polypeptide chain configurations, J. Mol. Biol. 7, 95-99. (2) Carugo, O., and Carugo, K. D. (2013) Half a century of Ramachandran plots, Acta Crystallogr. D Biol. Crystallogr. 69, 1333-1341.

ACS Paragon Plus Environment

27

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 31

(3) Ho, B. K., and Brasseur, R. (2005) The Ramachandran plots of glycine and preproline, BMC Struct. Biol. 5, 14. (4) Ho, B. K., Thomas, A., and Brasseur, R. (2003) Revisiting the Ramachandran plot: Hard‐sphere repulsion, electrostatics, and H‐bonding in the α‐helix, Protein Science 12, 2508-2522. (5) MacArthur, M. W., and Thornton, J. M. (1991) Influence of proline residues on protein conformation, J. Mol. Biol. 218, 397-412. (6) Beck, D. A. C., Alonso, D. O. V., Inoyama, D., and Daggett, V. (2008) The intrinsic conformational propensities of the 20 naturally occurring amino acids and reflection of these propensities in proteins, PNAS 105, 12259-12264. (7) Lovell, S. C., Davis, I. W., Arendall, W. B., de Bakker, P. I. W., Word, J. M., Prisant, M. G., Richardson, J. S., and Richardson, D. C. (2003) Structure validation by Cα geometry: ϕ,ψ and Cβ deviation, Proteins: Structure, Function, and Bioinformatics 50, 437-450. (8) Porter, L. L., and Rose, G. D. (2011) Redrawing the Ramachandran plot after inclusion of hydrogen-bonding constraints, PNAS 108, 109-113. (9) Hollingsworth, S. A., and Karplus, P. A. (2010) A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins, BioMolecular Concepts 1, 271-283. (10)

Baruch-Shpigler, Y., Wang, H., Tuvi-Arad, I., and Avnir, D. (2017) Chiral

Ramachandran Plots I: Glycine, Biochemistry 56, 5635-5643.

ACS Paragon Plus Environment

28

Page 29 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(11)

Pinsky, M., Dryzun, C., Casanova, D., Alemany, P., and Avnir, D. (2008)

Analytical methods for calculating Continuous Symmetry Measures and the Chirality Measure, J. Comput. Chem. 29, 2712-2721. (12)

Zabrodsky, H., and Avnir, D. (1995) Continuous Symmetry Measures. 4.

Chirality, Journal of the American Chemical Society 117, 462-473. (13)

Zabrodsky, H., Peleg, S., and Avnir, D. (1992) Continuous symmetry

measures, Journal of the American Chemical Society 114, 7843-7851. (14)

Alon, G., and Tuvi-Arad, I. (2018) Improved algorithms for symmetry

analysis: structure preserving permutations, J. Math. Chem. 56, 193-212. (15)

Keinan, S., and Avnir, D. (1998) Quantitative chirality in structure-activity

correlations. Shape recognition by trypsin, by the D-2 dopamine receptor, and by cholinesterases, Journal of the American Chemical Society 120, 6152-6159. (16)

Katzenelson, O., Edelstein, J., and Avnir, D. (2000) Quantitative chirality

of helicenes, Tetrahedron: Asymmetry 11, 2695-2704. (17)

Yogev-Einot, D., and Avnir, D. (2006) The temperature-dependent optical

activity of quartz: from Le Châtelier to chirality measures, Tetrahedron: Asymmetry 17, 2723-2725. (18)

Wang, X., Huang, F.-T., Yang, J., Oh, Y. S., and Cheong, S.-W. (2015)

Interlocked chiral/polar domain walls and large optical rotation in Ni3TeO6, APL Materials 3, 076105.

ACS Paragon Plus Environment

29

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(19)

Page 30 of 31

Law, E. C., Wilman, H. R., Kelm, S., Shi, J., and Deane, C. M. (2016)

Examining the Conservation of Kinks in Alpha Helices, PLOS ONE 11, e0157553. (20)

Meruelo, A. D., Samish, I., and Bowie, J. U. (2011) TMKink: A method to

predict transmembrane helix kinks, Protein Science 20, 1256-1264. (21)

Rauh, O., Urban, M., Henkes, L. M., Winterstein, T., Greiner, T., Van

Etten, J. L., Moroni, A., Kast, S. M., Thiel, G., and Schroeder, I. (2017) Identification of Intrahelical Bifurcated H-Bonds as a New Type of Gate in K+ Channels, Journal of the American Chemical Society 139, 7494-7503. (22)

Reddy, T., Ding, J., Li, X., Sykes, B. D., Rainey, J. K., and Fliegel, L.

(2008) Structural and Functional Characterization of Transmembrane Segment IX of the NHE1 Isoform of the Na+/H+ Exchanger, J. Biol. Chem. 283, 22018-22030. (23)

Berman, H., Henrick, K., and Nakamura, H. (2003) Announcing the

worldwide Protein Data Bank, Nature Structural and Molecular Biology 10, 980. (24)

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N.,

Weissig, H., Shindyalov, I. N., and Bourne, P. E. (2000) The Protein Data Bank, Nucleic Acids Res 28, 235-242. (25)

FirstGlance

in

Jmol,

2.5.1,

Martz,

E.,

http://www.bioinformatics.org/firstglance/fgij/. (26)

Kabsch, W., and Sander, C. (1983) Dictionary of protein secondary

structure: Pattern recognition of hydrogen‐bonded and geometrical features, Biopolymers 22, 2577-2637.

ACS Paragon Plus Environment

30

Page 31 of 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(27)

Touw, W. G., Baakman, C., Black, J., te Beek, T. A. H., Krieger, E.,

Joosten, R. P., and Vriend, G. (2015) A series of PDB-related databanks for everyday needs, Nucleic Acids Res 43, D364-D368. (28)

OriginPro 2018, OriginLab Corporation, https://www.originlab.com/.

(29)

Wilman, H. R., Shi, J., and Deane, C. M. (2014) Helix kinks are equally

prevalent in soluble and membrane proteins: Helix Kinks in Soluble and Membrane Helices, Proteins: Structure, Function, and Bioinformatics 82, 1960-1970. (30)

Daffner, C., Chelvanayagam, G., and Argos, P. (1994) Structural

characteristics and stabilizing principles of bent β-strands in protein tertiary architectures, Protein Science 3, 876-882. TOC Graphics

ACS Paragon Plus Environment

31