Predictions for α-Helical Glycopeptide Design from ... - ACS Publications

Sep 27, 2017 - Swiss-Prot entries were then mapped to Protein Data Bank (PDB) structures.(48, 49) The glycosylation sites in the PDB ... To account fo...
0 downloads 10 Views 4MB Size
Article pubs.acs.org/jcim

Predictions for α‑Helical Glycopeptide Design from Structural Bioinformatics Analysis Julia R. Rogers,† Sean M. McHugh, and Yu-Shan Lin* Department of Chemistry, Tufts University, Medford, Massachusetts 02155, United States S Supporting Information *

ABSTRACT: Glycosylation not only impacts the functions of glycoproteins but can also improve glycoprotein stability and folding efficiencycharacteristics that are desirable for protein engineering and therapeutic design. To further elucidate the effects of N-glycosylation on protein structure and to provide principles useful for the rational design of α-helical glycopeptides, we investigate stabilizing protein−sugar interactions in α-helical glycosylation sites using an integrated structural bioinformatics analysis and molecular dynamics simulation approach. We identify two glycan conformations with an Asn χ1 of 180° or 300° that are amenable to α-helical structure in natural α-helical glycosylation sites in the Protein Data Bank. A combination of sterics and favorable intraglycopeptide enthalpy explains the existence of only these two conformations. Furthermore, we catalog all known protein−sugar interactions that utilize these conformational modes. The most common interactions involve either a Glu residue at the −4 position interacting with the GlcNAc whose Asn has χ1 = 300° or a Glu residue at the +4 position interacting with the GlcNAc whose Asn has χ1 = 180°. Via metadynamics simulations of model αhelical glycopeptides with each of these two interactions, we find that both interactions are stabilizing as a result of favorable electrostatic intraglycopeptide interactions. Thus, we suggest that incorporating a Glu at either the −4 or +4 position relative to an N-linked glycan may be a useful strategy for engineering stable α-helical glycoproteins.

D

intrinsically enhance protein stability and folding rate through excluded volume effects, hydrophobic effects, and specific protein−sugar interactions.16,18,33−37 For example, glycosylation can stabilize the natively Nglycosylated human CD2 protein through essential protein− sugar interactions.30,36 Properly folded human CD2 contains a cluster of five positively charged lysine (Lys) residues that is destabilizing in the absence of a glycan to relieve Coulombic repulsion.30 Furthermore, the core GlcNAc attached to Asn at position i + 2 of a type-I β-bulge on CD2 is involved in CH−π interactions with phenylalanine (Phe) at position i. The formation of a compact hydrophobic core consisting of the GlcNAc, Phe, and Thr at position i + 4 further stabilizes the βbulge motif.36 These specific protein−sugar interactions have been successfully engineered into non-glycosylated proteins to stabilize β-turns,29,38−40 clearly illustrating how natural protein−sugar interactions can be applied to rationally design glycopeptides with desired secondary structures. N-Linked glycans are most often located on turns, bends, and sites of secondary structure change.41−44 Glycosylation sites are not as commonly located on α-helices, and glycosylation is often considered α-helix-breaking.31,45 The close proximity of the side chains on α-helices allows for little tolerance of a bulky glycan. Indeed, N-glycan incorporation in the middle of αhelices found in the bacterial immunity protein Im7 was destabilizing. In contrast, glycosylation sites introduced at the

iseases that are currently difficult to treat by small molecules may be impacted by protein- and peptidebased therapeutics because of the high target specificity and potency of such therapeutics.1 However, poor structural and metabolic stabilities often plague the pharmaceutical development of such drugs.2 To ameliorate problems of chemical and physical instability both in vitro and in vivo, glycoengineering has been used with some success.3−9 Peptide and/or protein modification with sugars can help remedy issues such as precipitation, aggregation, protease sensitivity, and chemical and thermal denaturation.6−9 However, the biophysical details underlying these benefits of glycosylation have not been fully elucidated. A more complete molecular-level understanding of the effects of glycosylation would enable the robust, rational design of novel glycopeptide structures and stable glycoprotein therapeutics. One of the most common types of glycosylation is N-linked glycosylation.10,11 N-Linked glycosylation occurs cotranslationally in the endoplasmic reticulum, creating a β-glycosidic bond between C1 of the core N-acetylglucosamine (GlcNAc) of the glycan and the nitrogen of an asparagine (Asn) side chain (Table 1). N-Glycosylation typically occurs within an Asn-XxxSer/Thr (serine/threonine) motif, where Xxx is any amino acid except proline (Pro).12 Naturally, glycosylation impacts the biological functions of glycoproteins,13−16 their folding efficiencies,17−20 structures,21−25 dynamics,26,27 and stabilities.28−31 N-Glycosylation can extrinsically enhance protein folding by enabling engagement of the lectin-based chaperones and quality control machineries in the endoplasmic reticulum,16,17,32 and can © XXXX American Chemical Society

Received: February 27, 2017

A

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Table 1. Structure of an Asn That Is Glycosylated by GlcNAc and Conformers of Each Dihedral Angle Used To Create Model Glycopeptide Structures for Steric Analysisa

a

The GlcNAc backbone is shown in purple. References that discuss the relevance of the conformers used are also noted.

the CONECT record for a β-glycosidic bond between Asn and the GlcNAc (residue name NAG in the PDB). PDB structures with unreasonably long glycosidic bond lengths (>3.5 Å) were excluded from further analysis. These errors were also verified using pdb-care.50,51 To ensure accurate evaluations of interactions between the protein and the core GlcNAc in later analysis, PDB sites lacking atoms from the core GlcNAc or from an amino acid within 5 Å of the core GlcNAc were also excluded from further analysis. Determination of Protein Secondary Structure. The secondary structure of each protein was determined using both the DSSP52 and STRIDE53 algorithms. Only residues classified as α-helical by both DSSP and STRIDE were considered α-helical. To investigate glycosylation sites at the center of α-helices, glycosylation sites on α-helical Asn’s with at least four additional α-helical residues toward both termini (hereafter termed 9-residue α-helices) were identified (Figure 1). α-Helical glycosylation sites with at least four α-helical residues toward the N-terminus (termed N-terminal α-helices) and sites with at least four α-helical residues toward the Cterminus (termed C-terminal α-helices) were also analyzed. By definition, glycosylation sites classified as 9-residue α-helices are

termini of those same α-helices did not alter the protein’s stability.45 Despite evidence that glycosylation destabilizes α-helices, there are nonetheless instances of naturally glycosylated αhelices.41−44 Herein we identify naturally glycosylated α-helices and investigate how they may form relatively stable α-helical structures when glycosylated. Ultimately, we aim to provide general approaches to engineer stable α-helical glycopeptides. First, we utilize structural bioinformatics to identify two sugar orientations amenable to α-helical structure that are characterized by an Asn χ1 of either 180° or 300°. We provide a physical explanation for both conformational modes through a combination of steric arguments and metadynamics simulations. Second, using structural bioinformatics we discover protein−sugar interactions in α-helical N-glycoproteins that utilize each conformational mode. The most common interactions result from a glutamate (Glu) at either position −4 or +4 (relative to the Asn at position 0) interacting with the GlcNAc. A −4 Glu−GlcNAc interaction utilizes the conformational mode with Asn χ1 of 300° to mediate α-helices Nterminal to the glycosylation site, and a +4 Glu−GlcNAc interaction utilizes the conformational mode with Asn χ1 of 180° to mediate α-helices C-terminal to the glycosylation site. We further probe the energetics of −4 and +4 Glu−GlcNAc interactions using metadynamics simulations and thermodynamics decomposition. Cumulatively, our data suggest that the incorporation of a Glu residue at either the −4 or +4 position relative to an N-glycan may provide stability to α-helical glycopeptides.



METHODS Structural Bioinformatics Analysis. Glycoprotein Data Set. Glycoproteins were first identified in the nonredundant, manually annotated Swiss-Prot sequence database.46,47 Glycosylation sites were identified from a feature line key of “CARBOHYD”, excluding sites with the “glycation” modifier in order to include only natural glycosylation sites. Swiss-Prot entries were then mapped to Protein Data Bank (PDB) structures.48,49 The glycosylation sites in the PDB were verified to have structural information for the core GlcNAc by checking

Figure 1. Illustration of N-terminal, 9-residue, and C-terminal α-helix classifications. DSSP and STRIDE secondary structure assignments are given below the corresponding structures. Residues considered for Nterminal, 9-residue, and C-terminal α-helix classification are colored red, black, and blue, respectively. Underlined residues indicate the positions of the glycosylation sites. Secondary structure assignments for DSSP: H = α-helix, S = bend, ∼ = unspecified structure. Secondary structure assignments for STRIDE: H = α-helix, T = turn, C = coil. B

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling also classified as both N-terminal and C-terminal α-helices. These three structural classifications are illustrated in Figure 1. Analysis of GlcNAc Orientation. The orientation of the core N-linked GlcNAc can be characterized by the dihedral angles χ1, χ2, ψN, ϕN, ω2, and ω6 (Table 1). To determine common GlcNAc orientations, dihedral principal component analysis (dPCA)54,55 of these dihedrals and subsequent clustering were performed. Following dPCA, a grid-based density peak-based clustering56 in two-dimensional (2D) principal subspace was utilized to distinguish common orientations of the GlcNAc and to quantify the number of sites with a given GlcNAc orientation. To further characterize the GlcNAc structure, the ring conformation of the GlcNAc was also determined with the Privateer software.57 Evaluation of Protein−Sugar Interactions. To identify the most favorable protein−sugar interactions within the α-helical secondary structure, the interaction energy of each amino acid with any heavy atom within 3.5 Å of the heavy atoms of the core GlcNAc was calculated using GROMACS 4.6.1.58 Because of the lack of hydrogen in many PDB structures, hydrogen was first added to the structures, and the positions of the hydrogens were then refined through steepest-descent energy minimization. The interaction energy between each individual amino acid and the core GlcNAc was calculated as a difference between the energy of the sugar−amino acid dimer and the sum of the energies of the individual sugar and amino acid monomers. For the energy calculation, parameters from the Charmm22* force field59 were utilized for the amino acids. Parameters from the Charmm36 force field60 were used for the GlcNAc. To account for solvation effects, generalized Born with solvent accessible surface area (GBSA) implicit solvent61 was utilized in the energy calculations. Protein−sugar interactions with negative values were considered favorable interactions. Redundancy Weighting of Glycosylation Sites. Similar to a per-feature weighting scheme developed previously,62 we weighted each PDB site such that PDB sites identified from the same Swiss-Prot site were only counted as one glycosylation site in analyses. By computation of the weighted count for each individual PDB site, all of the possible structures for a given Swiss-Prot site were accurately evaluated without bias. We note that the total weighted count of PDB sites that have a given characteristic, such as being an N-terminal α-helix or having a −4 Glu−GlcNAc interaction, can be a nonwhole number because all PDB sites mapped to the same Swiss-Prot site may not have exactly equivalent structures and because each of these PDB sites may not individually meet the requirements for a given characteristic. For this reason, the total weighted count of PDB sites with a given characteristic is less than or equal to the number of unique, nonredundant glycosylation sites with that same characteristic. Steric Analysis of Model α-Helical Glycopeptides. Ideal αhelices (ϕ = −57°, ψ = −47°) of 21 residues with a central Asn (residue 11) were built with Chimera.63 For comparison, model β-hairpins with a type-I β-turn were created by specifying residues 1−9 and 12−21 to adopt dihedrals for an extended, antiparallel structure (ϕ = −139°, ψ = 135°). The type-I β-turn was positioned at residues 10 and 11 (ϕi+1 = −60°, ψi+1 = −30°, ϕi+2 = −90°, ψi+2 = 0°), such that position i + 2 of the β-turn was glycosylated. Only backbone atoms were included, except for the central Asn, which was glycosylated with a GlcNAc using the Glycam glycoprotein builder.64 Different glycan orientations were specified by the dihedral angles χ1, χ2, ψN, ϕN, ω2, and ω6. The values of each dihedral

used to generate different model glycopeptide structures are given in Table 1. For each dihedral, we included various conformers observed in previous bioinformatics and experimental studies.41,65−70 The values used for χ1 and χ2 were selected on the basis of conformers of Asn observed in rotamer libraries generated from the PDB.65−67 In comparison with the distinct conformers observed for χ1 in these libraries, the χ2 distribution does not exhibit specific preferences and is widely distributed about the Nt conformer (χ2 = 180°) since it involves the rotation of a planar amide relative to a tetrahedral Cβ.66,68 Because ψN characterizes the amide bond between Asn and GlcNAc, the trans and cis conformers were used.41,69 Conformers of ϕN intermediate between 180° and 300° are likely preferable for GlcNAc because unfavorable sterics between the N-acetyl group on C2 and the Asn side chain are limited. An intermediate conformer with ϕN = 240°−300° has previously been observed to be the most populated conformation.41,69 The conformers used for ω2 and ω6 were chosen on the basis of previous bioinformatic 69 and experimental results.70 Combinations of all of the dihedrals in Table 1 resulted in a total of 864 model α-helical and β-hairpin glycopeptides with various Asn/GlcNAc orientations. Clashes (overlaps of the van der Waals radii) of the peptide backbone with the Asn side chain and the GlcNAc were calculated using Chimera’s Find Clashes63 to discern conformations that would be sterically forbidden. For comparison, 18 non-glycosylated model α-helical and β-hairpin peptides were also built with all combinations of χ1 and χ2 and evaluated for clashes between the peptide backbone and the Asn side chain. Molecular Dynamics Simulations. Model Peptides. To examine the GlcNAc conformational preferences using an allatom explicit-solvent force field, we performed molecular dynamics (MD) simulations of the model glycopeptide, AAAAA-AAANA-AAAAA-AA. Three initial configurations (αhelix, β-hairpin, and random coil) were constructed using the Chimera molecular modeling package.63 The peptide was then N-glycosylated with a GlcNAc residue using the Glycam glycoprotein builder.64 To examine the peptide−glycan energetics of protein− GlcNAc interactions identified from structural bioinformatics in α-helical glycopeptides using an all-atom explicit-solvent force field, we performed MD simulations of four model α-helical glycopeptides: AAAAA-AAANA-AAAAA-AA (N), AAAAEAAANA-AAAAA-AA (EN), AAAAA-AAANA-AAEAA-AA (NE), and AAAAE-AAANA-AAEAA-AA (ENE). The initial structures were constructed using the Chimera molecular modeling package63 and then N-glycosylated with a GlcNAc residue using the Glycam glycoprotein builder.64 Simulation Setup. MD simulations were performed using the GROMACS 4.6.1 suite.58 For all of the simulations, the Charmm22* force field59 was utilized for the amino acids and the Charmm36 force field60 was utilized for the GlcNAc. To maintain the initial secondary structures of the glycopeptides during the simulation, a dihedral restraint of 4184 kJ·mol−1· rad−2 was used on all of the backbone dihedral angles with a maximum allowed angle deviation of 30°. Each initial glycopeptide structure was first immersed in a 40 Å × 40 Å × 40 Å cubic box containing pre-equilibrated TIP3P71 water molecules. For simulations of EN, NE, and ENE, minimal numbers of sodium ions were added to neutralize the whole system. The solvated systems were then energy-minimized using the steepest-descent algorithm to remove any bad contacts. Next, the solvated systems underwent two equilibraC

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 2. dPCA and cluster analysis of 9-residue α-helical glycosylation sites in the PDB. (A) 2D density plot and cluster assignments in principal component space. (B) Alignment of all 9-residue α-helices shown looking down onto the sugar with the α-helix axis parallel to the page. Structures in clusters 1, 2, 3, and 4 are colored red, blue, orange and cyan, respectively. (C) Structure of each cluster center shown from the side and looking down onto the sugar. (D) Distributions of χ1, χ2, ψN, ϕN, ω2, and ω6 for each cluster.

tions. The first equilibration consisted of a 100 ps NVT (isochoric−isothermal) simulation at 300 K, and the second consisted of a 100 ps NPT (isobaric−isothermal) simulation at 300 K and 1 bar. All of the production simulations were performed using an NPT ensemble at 300 K and 1 bar. The Vrescale thermostat72 was coupled to both the glycopeptide and solvent separately73,74 using a coupling time constant of 0.1 ps. The pressure was regulated via the Berendsen barostat75 using a coupling time constant of 2.0 ps and an isothermal compressibility of 4.5 × 10−5 bar−1. The leapfrog algorithm76 using a time step of 2 fs was implemented for dynamics evolution. All bonds involving hydrogen atoms were constrained using the LINCS algorithm.77 All neighbor searching, electrostatic interactions, and van der Waals interactions were truncated at 1.0 nm. Long-range electrostatics beyond the cutoff distance were calculated using the particle mesh Ewald (PME) summation78 with a Fourier spacing of 0.12 nm and a PME order of 4. A long-range dispersion correction for energy and pressure was applied to account for the 1.0 nm cutoff of Lennard-Jones interactions.79

Bias-Exchange Metadynamics Simulations. Bias-exchange metadynamics (BE-META) simulations80,81 were performed to provide efficient conformational sampling of the angles χ1, χ2, and ϕN. All of the simulations were performed using the PLUMED 282 plugin for GROMACS. Backbone dihedral restraints were applied throughout the simulations. For each simulation, a 2D bias was placed on χ1 and χ2 of Asn as well as a 1D bias on ϕN of the GlcNAcβ1−Asn linkage. Additionally, for simulations of the model glycopeptides EN, NE, and ENE, a 2D bias was added on χ1 and χ2 for each Glu residue. Gaussian hills were added every 4 ps, with a height and width of 0.1 kJ· mol−1 and 0.31416 rad, respectively. Exchanges between replicas were attempted every 5 ps. To obtain the unbiased structural ensemble for analysis, five neutral replicas (i.e., with no bias) were added. The final simulation length of all BEMETA simulations was 300 ns. Clustering and Thermodynamic Decomposition. For each simulation, the last 50 ns of all neutral replicas were clustered on the basis of the conformer of the Asn χ1 dihedral angle. χ1 values of 0°−120°, 120°−240°, and 240°−360° were assigned to g+, t, and g− conformers, respectively. Once clustering was D

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

the 9-residue α-helical sites) exhibit χ1 distributed around 300°, the g− conformer (conformational mode #1); clusters 2 and 4 (combined 47.1% of the 9-residue α-helical sites) exhibit χ1 distributed around 180°, the t conformer (conformational mode #2) (Figure 2D). All of the clusters have similar χ2, ψN, ϕN, and ω2 distributions (Figure 2D). Thus, the two conformational modes result primarily from different χ1 conformers. Interestingly, we found no 9-residue α-helical glycosylation sites with χ1 = 60°, leading us to hypothesize that within an α-helical structure an Asn χ1 = 60° is sterically forbidden. The Existence of Only Two GlcNAc Orientations in αHelical Glycopeptides Is Due to Steric Constraints. To test this hypothesis, we began by determining the range of nonglycosylated Asn conformations that are sterically allowed in αhelices. We built 18 model non-glycosylated α-helices using combinations of various χ1 and χ2 dihedrals (Table 1) and evaluated them for steric clashes. Consistent with prior work,66 the Asn adopted a restricted subset of conformations because of clashes with the α-helix backbone that occur even when the Asn residue is non-glycosylated (Figure 3A). Furthermore, all non-

complete, a thermodynamic decomposition was performed to determine the origins of χ1 conformational preferences for each glycopeptide. From the ratios of their populations, ΔG between clusters was calculated. ΔG was then separated into ΔH (using the difference in potential energies between clusters) and ΔS (estimated as ΔS = (ΔH − ΔG)/T). Further decomposition of ΔH was based on (1) whether the interaction was within glycopeptide (ΔHP), within water (ΔHW), or between glycopeptide and water (ΔHPW)83 and (2) whether the interaction was for bonds (ΔHbond), angles (ΔHangle), dihedrals (both proper dihedrals ΔHdih. and improper dihedrals ΔHimp.), Lennard-Jones interactions (ΔHLJ), or Coulombic interactions (ΔHEE). For (1), three potential energies were calculated: all atoms, glycopeptide only (ΔHP), and water only (ΔHW). The glycopeptide−water energies (ΔH PW) were derived by subtracting the energies of glycopeptide only and water only from the energy of all atoms.83 ΔS was further decomposed into glycopeptide configurational entropy (ΔSPconf) and solvation entropy (ΔSW). ΔSconf was evaluated using the P maximum information spanning tree (MIST) approach,84−86 and the solvation entropy was calculated using the expression ΔSW = ΔS − ΔSconf P . Unconstrained Molecular Dynamics Simulations. MD simulations of the model glycopeptides N, EN, NE, and ENE starting from an α-helical structure were performed without the backbone dihedral restraints to assess the α-helical stability of each glycopeptide. Ten sets of simulations with different initial velocities were run for each system. The length of each production run for all glycopeptides was 1 μs. Additionally, variants of model glycopeptides N, EN, NE, and ENE with a +2 Thr, a consensus sequence motif for N-glycosylation, were also simulated.



RESULTS AND DISCUSSION Two GlcNAc Orientations Are Observed for α-Helical Glycosylation Sites in the PDB. We began our studies by performing a structural bioinformatics analysis of α-helical glycosylation sites in the PDB. Out of the 3402 PDB structures obtained, 1961 nonredundant glycosylation sites with structural information for at least the core GlcNAc of the glycan were cataloged. Of those N-glycosylation sites, 50.0 weighted PDB sites (consisting of 53 nonredundant sites) were found to form 9-residue α-helices (Table S1), which by definition have four αhelical residues on both sides of the glycosylated Asn (Figure 1). The GlcNAc adopts the preferred 4C1 chair conformation on 90% of the 9-residue α-helices (Figure S1).87 To characterize common GlcNAc orientations in 9-residue α-helical glycosylation sites in the PDB, dPCA of all dihedral angles (χ1, χ2, ψN, ϕN, ω2, and ω6) and subsequent density peak-based clustering were performed (Figure 2A). Dihedral angle distributions for χ1, χ2, ψN, ϕN, ω2, and ω6 from all 9residue α-helices can be found in Figure S2. dPCA and cluster analyses identified four distinct clusters characterized by variations in the χ1 and ω6 dihedrals. The widest variance among structures (PC1) is attributed to different ω6 conformers and separates clusters 1 and 2 from clusters 3 and 4. Though ω6 describes the positioning of O6 on GlcNAc, it does not characterize the global positioning of the GlcNAc with respect to the α-helix. The second widest variance among structures (PC2) is attributed to different χ1 conformers and separates clusters 1 and 3 from clusters 2 and 4. In contrast to ω6, differences in χ1 do affect the global orientation of the GlcNAc (Figure 2B,C): clusters 1 and 3 (combined 52.9% of

Figure 3. Structure analysis of simple model glycopeptides. (A) Results from purely steric clash analysis of model non-glycosylated (left) and glycosylated (right) peptides with α-helical (top) and βhairpin (bottom) structures grouped according to χ 1. (B) χ 1 distributions for α-helical, β-hairpin, and random coil model glycopeptides from BE-META simulations.

glycosylated model α-helices with χ1 = 60° have clashes between the Asn side chain and the helix backbone. We also built 18 non-glycosylated β-hairpins and observed that the conformational space of the α-helical Asn was reduced compared with that of the β-hairpin Asn (44% of α-helices have clashes vs 0% of β-hairpins) (Figure 3A). Next, we built 1728 model glycopeptides (864 α-helices and 864 β-hairpins). Addition of the GlcNAc reduced the number E

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Table 2. Natural Abundances88 of the Amino Acids and Their Frequencies at the ±4 Positions of All Glycosylations Sites from the Swiss-Prot Database, All Glycosylation Sites from the PDB, and Glycosylation Sites That Adopt an N-/C-Terminal α-Helix

conformational modes (mode #1 with χ1 = 300° and mode #2 with χ1 = 180°) place the sugar toward opposite directions of the α-helix (Figure 2B), we hypothesized that a protein−sugar interaction would utilize only one of these two modes. To evaluate protein−sugar interactions likely to utilize mode #1, in which the GlcNAc is oriented toward the N-terminus of the α-helix relative to the glycosylation site, we focused our analysis on N-terminal α-helical glycosylation sites, which by definition have four α-helical residues N-terminal to Asn (Figure 1). We cataloged 120.3 weighted PDB glycosylation sites that adopt N-terminal α-helices (consisting of 129 nonredundant sites) (Table S1). As a result of the i ± 4 hydrogen-bonding pattern of α-helices, residues at the −4 position are aligned along the same side of the α-helix as the GlcNAc and are properly positioned to interact. Therefore, we analyzed the amino acid preference at the −4 position relative to the glycosylation site. Amino acids with enhanced frequencies in N-terminal α-helices in comparison with their frequencies in all secondary structures may help stabilize αhelical glycosylation sites. Table 2 shows the frequency of observing each of the 20 natural amino acids at the −4 position in all of the glycosylation sites in the Swiss-Prot database and in all of the glycosylation sites in the PDB, in contrast to the frequency observed in the N-terminal α-helical glycosylation sites in the PDB. The frequencies of observing the 20 natural amino acids in the Swiss-Prot database and the PDB for all glycosylation sites are similar and consistent with the natural abundances of the amino acids,88 suggesting that our data set is free from significant bias. When comparing the amino acid frequencies for all glycosylation sites and those that adopt an Nterminal α-helix, one sees that the frequency of Glu at −4 is enhanced in N-terminal α-helical glycosylation sites (17.16% vs 5.94%; Table 2). We hypothesized that this enhancement could be due to favorable −4 Glu−GlcNAc interactions stabilizing Nterminal α-helices. To test our hypothesis, we next analyzed all of the possible protein−GlcNAc interactions in N-terminal α-helical glycosylation sites. The GlcNAc interacts with near-sequence residues (residues within positions −10 to +10) on 47% of N-terminal α-helical glycosylation sites. The GlcNAc interacts with only near-sequence residue(s) in 52% of these sites, whereas in the remaining sites the GlcNAc also interacts with far-sequence residue(s) (residues outside positions −10 to +10) (Figure S6). As expected, near-sequence protein−sugar interactions occur most frequently with residues at the −4 position (Figure 4A). Of the interactions between the GlcNAc and residues at the −4 position, interactions with Glu are most

of sterically allowed conformations for both structures relative to the corresponding non-glycosylated structures (Figure 3A). Fewer conformations were allowed for glycosylated α-helices relative to glycosylated β-hairpins (23% of glycosylated αhelices do not have clashes vs 54% of glycosylated β-hairpins), supporting the general consensus that glycosylation is often poorly tolerated on α-helices compared with turns and bends. All α-helical model glycopeptides with χ1 = 60° have clashes, 78% of those with χ1 = 300° have clashes, and 52% of those with χ1 = 180° have clashes (Figure 3A; see Figure S3 for the numbers of structures with clashes for each χ2, ψN, ϕN, ω2, and ω6 conformer). Thus, purely steric arguments explain the lack of χ1 = 60° in α-helical glycosylation sites in the PDB. However, in contrast with the observed preference for χ1 = 300° in the PDB (52.9% for χ1 = 300° and 47.1% for χ1 = 180°), α-helical glycopeptides with χ1 = 180° are expected to be more common when only sterics are considered (Figure 3A, top right panel). Intraglycopeptide and Intrawater Interactions Explain the Preference for GlcNAc Conformational Mode #1. To expand upon our results from purely steric analysis and to better predict GlcNAc orientations in α-helices, we then performed BE-META simulations in explicit solvent of the model glycopeptide AAAAA-AAANA-AAAAA-AA in an αhelical conformation as well as in β-hairpin and random coil conformations. Consistent with our results using steric arguments, more glycan conformations are accessible for the β-hairpin and random coil glycopeptides than for the α-helical glycopeptide (Figures 3B, S4, and S5). Additionally, χ1 = 60° was highly thermodynamically disfavored in the α-helical glycopeptide. However, in contrast to our results using steric arguments (Figure 3A, top right panel), χ1 = 300° was more favorable than χ1 = 180° in simulations of the α-helical glycopeptide (Figure 3B) because of favorable enthalpy from intraglycopeptide and intrawater interactions (Table S2). Thus, considering the actual molecular interactions is also crucial to explain the preference for χ1 = 300° over χ1 = 180° in α-helical glycopeptides. Glu at the −4 Position Is Highly Favored in Glycosylation Sites That Adopt an N-Terminal α-Helix. We subsequently sought to discover specific protein−sugar interactions that may stabilize the α-helical structure. Such interactions would be highly useful for the design of α-helical glycopeptides. To discover protein−sugar interactions that could locally stabilize the α-helix, we focus our analysis on contacts the sugar forms with near-sequence residues, which involve residues from positions −10 to +10 around the glycosylated Asn at position 0 in the sequence. Since the two F

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 5. (A−C) Examples of N-terminal α-helical sites with favorable −4 Glu−GlcNAc interactions on (A) formylglycine-generating enzyme (PDB entry 2AFY), (B) a phytase (PDB entry 3K4P), and (C) phospholipase B-like protein (PDB entry 4BWC). (D−F) Examples of N-terminal α-helical sites with unfavorable −4 Glu− GlcNAc interactions on (D) influenza hemeagglutinin (PDB entry 3ZNK), (E) α1,2-mannosidase (PDB entry 1KKT), and (F) Nacetylgalactosamine-4-sulfatase (PDB entry 1FSU). α-Helical residues from position −4 to 0 are colored red.

preference for χ2 = 180° is not necessarily due to −4 Glu− GlcNAc interactions. Instead, χ2 = 180° is more likely intrinsically favored when χ1 = 300° in α-helical glycosylation sites: Figure S8 demonstrates that when χ1 = 300° in 9-residue, N-terminal, and C-terminal α-helices, χ2 = 180° is preferred regardless of the presence of −4 Glu−GlcNAc interactions. Three example structures out of 15.6 weighted PDB glycosylation sites with favorable −4 Glu−GlcNAc interactions are shown in Figure 5A−C. The formylglycine-generating enzyme (FGE) shown in Figure 5A oxidizes a conserved cysteine residue in all sulfatases. Reduction of sulfatase activity results in severe diseases such as mucopolysaccharidosis and Xlinked ichthyosis. Elimination of FGE activity leads to the fatal disease multiple sulfatase deficiency, the severity of which is determined by FGE instability.89−92 The glycosylation site shown in Figure 5A is conserved among FGE orthologues.92 Figure 5B illustrates a glycosylation site on a phytase, an enzyme that catalyzes the hydrolysis of phytate to release stored phosphorus. Because phytase activity is limited in the gastrointestinal tracts of simple-stomached animals such as swine and poultry, the design of thermostable phytases for inclusion in animal feed is of much interest.93,94 The glycans on phytases have been shown to increase their thermostability and to affect the optimum pH for catalytic activity.95,96 Another example, phospholipase B-like protein 1 (PLBD1), a lysosomal protein that aids macromolecular degradation, is shown in Figure 5C. Improper lysosomal protein function can lead to a number of inherited lysosomal storage disorders. Transporting the necessary proteins to the lysosome is thus key. Typically transport occurs via the mannose-6-phosphate-dependent pathway. The glycosylation site specifically shown in Figure 5C is crucial for proper transport of PLBD1 to the lysosome and is conserved among the families of PBLD1 and a PBLD1 paralogue.97 To see whether the −4 Glu is also conserved at this site, Swiss-Prot was searched for proteins of the phospholipase B-like family, and the sequences were aligned using Clustal Omega 1.2.1.98 Of the 13 proteins with this conserved glycosylation site, nine also contained a −4 Glu (Table S3). In light of our structural bioinformatics

Figure 4. (A, B) Numbers of N-terminal α-helical glycosylation sites with negative or positive interaction energies according to (A) the sequence position of the interacting residue and (B) the amino acid at the −4 position. (C) χ1 distribution for N-terminal α-helical sites with −4 Glu−GlcNAc interactions.

common (Figure 4B). Out of the 20.9 sites that show −4 Glu− GlcNAc interactions, 15.6 sites (74.6%) have favorable −4 Glu−GlcNAc interactions, while 5.3 sites (25.4%) have unfavorable −4 Glu−GlcNAc interactions. In 47% of the glycosylation sites with favorable −4 Glu−GlcNAc interactions, the GlcNAc does not form any other favorable interactions. This suggests that a −4 Glu−GlcNAc interaction alone may help maintain α-helicity. The −4 Glu−GlcNAc interactions are electrostatic in nature, involving hydrogen bonding between a carboxylate oxygen of the Glu side chain and the amide hydrogen of the GlcNAc N-acetyl group (Figure 5A−C). Such interactions provide a physical basis for the enhanced occurrence of Glu observed at the −4 position in N-terminal α-helical glycosylation sites (Table 2). The glycosylated N-terminal α-helices with −4 Glu−GlcNAc interactions have a distinct preference for conformational mode #1 with χ1 around 300° (Figure 4C; see Figure S7A for the distributions of all dihedrals), supporting our hypothesis that a given protein−sugar interaction will utilize one of the two conformational modes (χ1 = 300° or χ1 = 180°). Therefore, −4 Glu−GlcNAc interactions may help maintain α-helical structure N-terminal to the glycosylation site through conformational mode #1 (χ1 = 300°). We note that although ω6 was identified as an important differentiating factor among different glycopeptide structures (Figure 2D), all of the conformers of ω6 were found in N-terminal α-helical glycosylation sites with −4 Glu−GlcNAc interactions (Figure S7A). On the other hand, most of these N-terminal α-helical sites with −4 Glu− GlcNAc interactions show a preference for χ2 = 180°. This G

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling identification of key −4 Glu−GlcNAc interactions in α-helices, mutational studies of the −4 Glu on these proteins could provide further insights into the effects of glycosylation on these proteins. Additional design principles that ought to be considered for an engineered −4 Glu−GlcNAc interaction to effectively stabilize a glycosylated α-helix can be discerned by examining −4 Glu−GlcNAc interactions with positive interaction energies since these glycosylation sites provide insights about when incorporating a −4 Glu likely will not effectively stabilize an αhelical glycosylation site. Three example structures out of 5.3 weighted PDB glycosylation sites with unfavorable −4 Glu− GlcNAc interactions are shown in Figure 5D−F. Additional protein residues interact with the −4 Glu or with the GlcNAc in these glycosylation sites. In the glycosylation site in Figure 5D, the −4 Glu forms a salt bridge with a −1 arginine (Arg) instead of a hydrogen bond with the N-acetyl group of GlcNAc. In contrast, all of the glycosylation sites with favorable −4 Glu− GlcNAc interactions lack Arg and lysine (Lys) at the −1 or −7 position (positions i ± 3 relative to the −4 Glu), and as a result, similar competing interactions with the −4 Glu do not occur. Thus, for −4 Glu−GlcNAc interactions to effectively mediate α-helical structure, glycopeptide sequences should be designed to limit the number of competing favorable interactions. In the glycosylation sites shown in Figure 5E,F, interactions between the GlcNAc and far-sequence residues occur. Although these αhelices may be stabilized by a Trp−GlcNAc interaction (Figure 5E) and an Asn−GlcNAc interaction (Figure 5F), these interactions are specific to these proteins and depend on the tertiary structure of each protein. Thus, −4 Glu−GlcNAc interactions have a greater potential to help stabilize designed α-helical glycopeptides than these interactions since −4 Glu− GlcNAc interactions are found on multiple different proteins and the utility of this interaction is not conditional on tertiary structure. However, the possibility of interactions between the GlcNAc and other far-sequence residues on larger proteins should be investigated when considering incorporation of a −4 Glu−GlcNAc interaction to help stabilize a glycosylated α-helix in a crowded protein environment. Glu at the +4 Position Is Favored in Glycosylation Sites That Adopt a C-terminal α-Helix along with Other Amino Acids, and the Enhancement Is Less Clear. Thereafter, we looked for favorable protein−sugar interactions in C-terminal α-helical glycosylation sites, which by definition have four α-helical residues C-terminal to the glycosylated Asn (Figure 1), to detect interactions that might locally stabilize the portion of the α-helix C-terminal to the glycosylation site. We cataloged 77.3 weighted PDB glycosylation sites (consisting of 82 nonredundant sites) that adopt C-terminal α-helices (Table S1). First, we compared the occurrence of each amino acid at the +4 position in C-terminal α-helical sites to the respective sequence frequency in all glycosylation sites. Amino acids with higher occurrences in C-terminal α-helical glycosylation sites than in all glycosylation sites may help stabilize C-terminal αhelices. Table 2 shows the frequency of observing each of the 20 natural amino acids at the +4 position in all the glycosylation sites in the Swiss-Prot database and in all the glycosylation sites in the PDB, in contrast to the frequency observed in C-terminal α-helical glycosylation sites in the PDB. The frequencies of observing the 20 natural amino acids at the +4 position in the Swiss-Prot database and in the PDB for all glycosylation sites are similar and consistent with the natural abundances of the amino acids,88 suggesting that our data set is free from

significant bias. Comparing the amino acid preferences for all glycosylation sites and those that adopt a C-terminal α-helix, one sees that the frequency at the +4 position is enhanced for several amino acids, including Ala, Thr, Gln, and Glu (Table 2). Subsequently, we analyzed all of the possible protein− GlcNAc interactions in these C-terminal α-helical glycosylation sites to determine whether protein−sugar interactions could explain the observed enhancements for Ala, Thr, Gln, and Glu at the +4 position. The GlcNAc interacts with near-sequence residues in 48% of C-terminal α-helical glycosylation sites. The GlcNAc interacts with only near-sequence residue(s) in 40% of these sites, whereas the GlcNAc also interacts with far-sequence residue(s) in the remaining sites (Figure S6). Near-sequence protein−sugar interactions occur most frequently with residues at either the −4 or +4 position (Figure 6A). All 15.6 structures

Figure 6. (A, B) Numbers of C-terminal α-helical glycosylation sites with negative or positive interaction energies according to (A) the sequence position of the interacting residue and (B) the amino acid at the +4 position. (C) χ1 distribution for C-terminal α-helical glycosylation sites with +4 Glu interactions.

(consisting of 26 nonredundant sites) with −4 Xaa−GlcNAc interactions are also α-helical toward the N-terminus relative to the glycosylation site and classified as both 9-residue and Nterminal α-helices as well. The most frequent interaction Cterminal to the glycosylation site occurs with a +4 position residue (Figure 6A). Again because of the α-helical pattern of the backbone, +4 position residues are properly positioned to interact with the GlcNAc. Of the interactions between the GlcNAc and +4 position residues, interactions with +4 Glu are the most common, followed closely by +4 Thr and Gln (Figure 6B), which all had enhanced sequence frequencies in C-terminal α-helical glycosylation sites (Table 2). Although the occurrence of +4 H

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling Ala was also enhanced in C-terminal α-helices, +4 Ala−GlcNAc interactions do not occur (Figure 6B); instead, the intrinsic αhelical propensity of Ala likely explains the enhanced sequence frequency observed in C-terminal α-helices.99,100 +4 Glu− GlcNAc interactions are more favorable than either +4 Thr− GlcNAc and +4 Gln−GlcNAc interactions: on average, +4 Glu−GlcNAc interactions have an interaction energy of −3.2 kcal/mol, whereas +4 Thr−GlcNAc and +4 Gln−GlcNAc interactions have average energies of only −0.6 and −1.6 kcal/ mol, respectively. Furthermore, in 64% of glycosylation sites with +4 Glu−GlcNAc interactions that have negative interaction energies, no additional favorable protein−sugar interactions are present, suggesting that a +4 Glu−GlcNAc interaction alone may help maintain α-helicity. In contrast, in all sites with favorable +4 Thr−GlcNAc or +4 Gln−GlcNAc interactions, the GlcNAc also favorably interacts with at least one additional residue. Thus, +4 Glu−GlcNAc interactions might have a greater potential to stabilize α-helical glycopeptides compared with other common protein−sugar interactions on C-terminal α-helices. Similar to −4 Glu−GlcNAc interactions in N-terminal αhelical sites (Figures 5A−C), +4 Glu−GlcNAc interactions involve hydrogen bonding between a carboxylate oxygen of the Glu side chain and the amide hydrogen of the N-acetyl group (Figure 7). In contrast to −4 Glu−GlcNAc interactions in N-

angiotensin I and bradykinin and also is a therapeutic target for cardiovascular and renal disease. The glycans on ACE are important for its stability and activity.101,102 The particular glycosylation site shown in Figure 7A is one of two glycans that most significantly contribute to the thermostability of the enzyme.101 Another example in which glycans are important for proper protein expression, cellular localization, and activity is the α isoform of the human folate receptor (αFR) (Figure 7B). Folate receptors mediate cellular uptake of folate, or vitamin B, and are overexpressed in cancer cells. All three consensus glycosylation sites, including the site illustrated in Figure 7B, play roles in expression efficiency and help maintain the enzyme’s active conformation.103,104 Similarly to the effects of glycosylation observed for αFR, altered glycosylation of neutral endopeptidase (NEP), also known as neprilysin, affects its transport to the cell surface, protein stability, and enzyme activity (Figure 7C). NEP is a zinc metallopeptidase that degrades small regulatory peptides. NEP also cleaves amyloid β, linking the enzyme to the pathogenesis of Alzheimer disease and hereditary inclusion-body myopathy.105,106 These examples suggest that positioning a Glu at the +4 position relative to an N-linked glycan may be a general strategy for promoting the stability of α-helical glycoproteins. −4 Glu−GlcNAc Interactions Are Favored Compared with +4 Glu−GlcNAc Interactions in Glycoproteins in the PDB. The −4 Glu−GlcNAc and +4 Glu−GlcNAc interactions have similar average strengths in natural glycoproteins in the PDB when evaluated using an implicit solvent model (Figure S9). Both interactions involve the same GlcNAc moiety, the N-acetyl group (Figures 5A−C and 7). Furthermore, because each interaction utilizes a different Asn orientation (χ1 = 300° vs 180°; Figures 4C and 6C), −4 Glu− GlcNAc and +4 Glu−GlcNAc interactions likely cannot occur simultaneously. Since many fewer C-terminal α-helical glycosylation sites were found, +4 Glu−GlcNAc interactions may not help maintain α-helicity as effectively as −4 Glu− GlcNAc interactions. Glycosylation sites with both −4 Glu and +4 Glu in their sequence allow the two interactions to be directly compared, and we found four such sites (Figure S10). −4 Glu−GlcNAc interactions are present in two of these sites (Figure S10A,B), whereas +4 Glu−GlcNAc interactions are not present on any of these sites. Both sites with −4 Glu−GlcNAc interactions also have helical secondary structures. Therefore, these results suggest that −4 Glu−GlcNAc interactions are preferred when +4 Glu−GlcNAc interactions are also possible. Limitations of Our Structural Bioinformatics Analysis. The generality of our conclusions from structural bioinformatics depends on the size of the data set and the presence of any biases. The number of glycoprotein structures available at the time of accession (3402 PDB glycoprotein structures) pales in comparison to the 105 000+ deposited PDB structures for all proteins and does not match the natural frequency of glycosylation. Difficulties in crystallizing glycoproteins and fully characterizing the complex structure of glycans, which naturally consist of many monosaccharides connected through various linkage types, may explain the limited number of glycoproteins in the PDB.13 The flexibility of glycans makes their structural characterization using crystallography, the method used to solve the majority of glycoprotein PDB structures, quite challenging.107 As a result, glycoprotein PDB structures often lack complete structural information about the full glycan, and we were unable to decipher the role of other monosaccharides linked to the core GlcNAc in stabilizing or

Figure 7. Examples of C-terminal α-helical glycosylation sites with favorable +4 Glu−GlcNAc interactions on (A) angiotensin-Iconverting enzyme (PDB entry 2C6N), (B) the α isoform of the human folate receptor (PDB entry 4LRH), and (C) neprilysin (PDB entry 1DMT). α-helical residues from position 0 to +4 are colored blue.

terminal α-helices, +4 Glu−GlcNAc interactions in C-terminal α-helices utilize mode #2 with χ1 = 180° (Figure 6C; see Figure S7B for the distributions of all dihedrals). With mode #2, the sugar is oriented toward the C-terminus of the α-helix and +4 Glu. +4 Glu−GlcNAc interactions may help maintain α-helical structure C-terminal to the glycosylation site through conformational mode #2. We note that although ω6 was identified as an important differentiating factor among different glycopeptide structures (Figure 2D), all conformers of ω6 are found in glycosylated C-terminal α-helices with +4 Glu− GlcNAc interactions (Figure S7B). On the other hand, most of these glycosylated C-terminal α-helical sites with +4 Glu− GlcNAc interactions show a preference for χ2 = 60°−120°. In 9-residue, N-terminal, and C-terminal α-helices, two χ2 conformers, either χ2 ≈ 100° or χ2 ≈ 210°, are preferred for χ1 = 180° (Figure S8). Thus, +4 Glu−GlcNAc interactions are also dependent on the χ2 conformer. Three example structures with favorable +4 Glu−GlcNAc interactions in C-terminal α-helical glycosylation sites are shown in Figure 7. A glycosylation site on angiotensin-Iconverting enzyme (ACE) is shown in Figure 7A. ACE regulates blood pressure by cleaving the tensor peptides I

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 8. Results from BE-META simulations of the three model α-helical glycopeptides. (A) χ1 distributions for glycopeptides EN (red), NE (blue), and ENE (black). (B) Structures showing the most prominent Glu−GlcNAc interactions for EN, NE, and ENE. α-Helical residues from position −4 to 0 are colored red in EN and ENE, and α-helical residues from position 0 to +4 are colored blue in NE. (C) Thermodynamic decomposition for the top two clusters of glycopeptides EN, NE, and ENE. Corresponding χ1 values and populations of each cluster are shown.

In the simulation of α-helical EN, Asn has a distinct preference for χ1 around 300° (Figure 8A), with a population of 76.8% (the distributions of all dihedral angles (χ1, χ2, ψN, ϕN, ω2, and ω6) for EN can be found in Figure S11). This χ1 conformation matches the conformation adopted by natural αhelical glycosylation sites with −4 Glu−GlcNAc interactions observed in the PDB (Figure 4C). Additionally, interactions between the amide hydrogen of the GlcNAc N-acetyl group and the carboxylate oxygen of the Glu side chain are present on 79.3% of the structures with interactions in the most populated cluster (−4 Glu−GlcNAc interactions occur on 85.2% of the structures within cluster 1) (Figure 8B). The conformer of EN with χ1 = 300° and −4 Glu−GlcNAc interactions predominates primarily because of favorable intraglycopeptide and intrawater enthalpy, in particular favorable electrostatic interactions (Figure 8C and Table S4). In the simulation of α-helical NE, the most populated cluster (51.6%) has a conformer of χ1 around 300° (the distributions of all dihedral angles (χ1, χ2, ψN, ϕN, ω2, and ω6) for NE can be found in Figure S11); however, this population was significantly decreased from the EN simulation (Figure 8A). The most populated cluster is favored primarily because of glycopeptide− water enthalpy (Figure 8C and Table S5). Interactions between the sugar and peptide are present in only 0.02% of the structures in the most populated cluster. Instead, the sugar is free to interact with the solvent. This result is consistent with our simulations of the model α-helical glycopeptide N, which lacked protein−sugar interactions and had a preference for χ1 = 300° (Figure 2B). The second most populated cluster of NE (48.3%) has χ1 = 180° (Figure 8), consistent with mode #2 utilized in +4 Glu−GlcNAc interactions in α-helical glycosylation sites in the PDB. The +4 Glu−GlcNAc interactions occur in 50.9% of structures within cluster 2. However, +4 Glu interacts with the C3 hydroxyl group of the GlcNAc in 63.2% of these structures (Figure 8B) and interacts with the N-acetyl group of the GlcNAc in only 21.3% of these structures. Given the limited number of C-terminal α-helices evaluated using structural bioinformatics, it is not surprising that a different,

destabilizing α-helical structure from structural bioinformatics analysis. To check whether the observed frequency of Glu−GlcNAc interactions is over-represented in the data set because of ease of crystallization, we compared the frequencies of observing each amino acid at both the −4 and +4 positions on all glycosylation sites, regardless of their secondary structure, in the PDB to the respective sequence frequencies on all glycosylation sites in the Swiss-Prot database, which does not require structural information and is not biased by crystallization requirements (Table 2). The sequence frequency of each amino acid in the PDB reasonably agrees with the sequence frequency in Swiss-Prot and with the expected occurrence of each amino acid based on its natural abundance on all proteins.88 The similar sequence frequencies observed in the PDB and Swiss-Prot indicate that crystallization requirements likely do not bias our conclusions regarding Glu− GlcNAc interactions from structural bioinformatics analysis. MD Simulations of ±4 Glu−GlcNAc Interactions in Model α-Helical Glycopeptides Support Our Conclusions from Structural Bioinformatics. To further test our conclusions from structural bioinformatics analysis, we performed BE-META simulations of model alanine-based αhelical glycopeptides with sequences including −4 Glu (EN) or +4 Glu (NE) to more rigorously evaluate the energetics of each interaction mode using explicit solvent. Since we aimed to evaluate the potential of Glu−GlcNAc interactions for α-helical glycopeptide design generally, we used simple model glycopeptide sequences and did not optimize the amino acid sequence further. Therefore, the glycopeptide sequences lacked +2 Ser or Thr, part of the consensus sequence motif required for efficient enzymatic N-glycosylation.108 We did not limit the sequence space for α-helical glycopeptide design by including +2 Ser or Thr because neither residue commonly interacted with the GlcNAc in the PDB, glycopeptides can be chemically synthesized without +2 Ser or Thr, and similar simulation results were obtained when a +2 Thr was incorporated in the sequences of the model glycopeptides. J

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

ENE is approximately as α-helical as N (Figures S12B,C and S13B,C). Thus, simulations of ENE confirm our hypothesis that −4 Glu−GlcNAc interactions are more productive to pursue for α-helical glycopeptide design than +4 Glu−GlcNAc interactions and suggest that the presence of potentially competing interactions may hinder the utility of −4 Glu− GlcNAc interactions.

more favorable type of interaction between +4 Glu and the GlcNAc was identified from the simulations. We note that, similar to the most populated cluster of EN with −4 Glu− GlcNAc interactions, the second most populated cluster of NE has more favorable intraglycopeptide and intrawater enthalpy, primarily due to electrostatic interactions, in comparison with the first cluster (Table S5); the natures of the −4 and +4 Glu− GlcNAc interactions are energetically similar, as postulated from structural bioinformatics. In summary, we found that the BE-META simulations of EN support our conclusions from structural bioinformatics about interaction mode #1 and that the BE-META simulations of NE suggest that interaction mode #2 as observed from structural bioinformatics may not capture all favorable types of interactions between +4 Glu residues and the GlcNAc. Furthermore, we performed unconstrained MD simulations starting from a perfect α-helical structure to assess which model glycopeptide among N, EN, and NE is most stable as an αhelix. Preliminary data suggest that the glycopeptide EN maintained α-helicity most effectively (Figure S12A). In contrast, NE quickly lost much of its initial α-helical structure. At long simulation times, EN was slightly more α-helical than N, and NE was slightly less α-helical than N. Thus, +4 Glu− GlcNAc interactions may not help maintain α-helicity as well as −4 Glu−GlcNAc interactions. Nevertheless, both EN and NE begin to converge to average α-helicities similar to that of the glycopeptide without Glu (Figure S12B,C). Because the simulated glycopeptide sequence is composed of primarily Ala, which has a high preference to form α-helices, differences in α-helicity due to −4 or +4 Glu−GlcNAc interactions could not be confidently identified. We also performed unconstrained MD simulations of N, EN, and NE with +2 Thr, incorporating the consensus sequence motif for N-glycosylation. Similar results were obtained as for the all-Ala sequence, but all of the model glycopeptides had lower overall α-helicity (Figure S13). −4 Glu−GlcNAc Interactions Are Favored over +4 Glu−GlcNAc Interactions in MD Simulations of a Glycopeptide with Both −4 and +4 Glu. Results from both structural bioinformatics and simulations of EN and NE suggest that −4 Glu−GlcNAc interactions occur more consistently and are more favorable than +4 Glu−GlcNAc interactions in α-helices. To fully test this hypothesis, a model α-helical glycopeptide with both −4 and +4 Glu (ENE) was simulated since very few glycoprotein sequences with both −4 and +4 Glu were cataloged in the PDB. From BE-META simulations, ENE was found to have a preference for χ1 around 300° (Figure 8A), where the most populated cluster (71.1%) adopted the same conformation as the most populated cluster of EN, which is consistent with interaction mode #1 observed in glycosylation sites with −4 Glu−GlcNAc interactions in the PDB (the distributions of all dihedral angles (χ1, χ2, ψN, ϕN, ω2, and ω6) for ENE can be found in Figure S11). Interactions between the N-acetyl group of the GlcNAc and the −4 Glu side chain occur in 74.3% of the structures in the most populated cluster (−4 Glu−GlcNAc interactions occur in 79.2% of structures within cluster 1) (Figure 8B), whereas interactions between the GlcNAc and +4 Glu occur in only 34.4% of the structures in all of the analyzed clusters. Cluster 1 has favorable glycopeptide configurational entropy and favorable intrapeptide electrostatics compared with the other clusters (Figure 8C and Table S6). On the basis of preliminary unconstrained MD simulations, ENE quickly loses α-helicity, much like NE (Figures S12A and S13A). However, at long simulation times,



CONCLUSIONS Glycans have been shown to improve protein stability and folding efficiency,16,21,27,29 characteristics desirable in protein engineering and therapeutic design. However, an incomplete understanding of the biophysical effects of glycosylation on protein structure has limited the rational design of novel glycoproteins. We thus sought to investigate principles of how protein−sugar interactions impact α-helical protein structure to expand the toolbox for α-helical glycopeptide engineering. We identified two conformations (mode #1 with χ1 = 300° and mode #2 with χ1 = 180°) of naturally glycosylated α-helices in the PDB. These two conformational modes arise from a combination of steric constraints and favorable intraglycopeptide enthalpy. From further structural bioinformatics analysis, we discovered two specific protein−sugar interactions characteristic of α-helical glycosylation sites: −4 Glu−GlcNAc interactions utilize mode #1 to mediate α-helical structure Nterminal to the glycosylation site, and +4 Glu−GlcNAc interactions utilize mode #2 to mediate α-helical structure Cterminal to the glycosylation site. Both interactions were observed in metadynamics simulations of model α-helical glycopeptides with either −4 or +4 Glu and result from favorable electrostatic intraglycopeptide interactions. Since −4 Glu−GlcNAc interactions were found to be both more prevalent than +4 Glu−GlcNAc interactions in natural glycopeptides in the PDB and more favorable in simulated αhelical glycopeptides, we suggest that positioning Glu at the −4 position, as opposed to the +4 position, may be a more productive approach for α-helical glycopeptide design. Further studies are needed to fully understand how the sequence of other neighboring amino acids may affect the utility of these interactions. In summary, we propose that incorporating Glu at either the −4 or +4 position relative to an N-linked glycan may be a useful method for promoting the stability of α-helical glycoproteins.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.7b00123. List of all PDB IDs that contain α-helical glycosylation sites; analysis of GlcNAc ring conformations, dihedral angle distributions, numbers of interacting residues, and distributions of χ2 for three different χ1 conformers of 9residue, N-terminal, and C-terminal α-helical glycosylation sites in the PDB; dihedral distributions for Nterminal α-helical glycosylation sites with −4 Glu− GlcNAc interactions and for C-terminal α-helical glycosylation sites with +4 Glu−GlcNAc interactions; comparison of −4 and +4 Glu−GlcNAc interaction energies in N- and C-terminal α-helical glycosylation sites; structures of glycosylation sites in the PDB with both −4 and +4 Glu; sequence alignment of the PLBD1 family in Swiss-Prot; complete results from steric analysis K

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling



(10) Apweiler, R.; Hermjakob, H.; Sharon, N. On the Frequency of Protein Glycosylation, as Deduced from Analysis of the Swiss-Prot Database. Biochim. Biophys. Acta, Gen. Subj. 1999, 1473, 4−8. (11) Spiro, R. G. Protein Glycosylation: Nature, Distribution, Enzymatic Formation, and Disease Implications of Glycopeptide Bonds. Glycobiology 2002, 12, 43R−56R. (12) Varki, A.; Cummings, R. D.; Esko, J. D.; Freeze, H. H.; Stanley, P.; Bertozzi, C. R.; Hart, G. W.; Etzler, M. E. Essentials of Glycobiology; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY, 2009. (13) Defaus, S.; Gupta, P.; Andreu, D.; Gutierrez-Gallego, R. Mammalian Protein Glycosylation - Structure Versus Function. Analyst 2014, 139, 2944−2967. (14) Freeze, H. H.; Eklund, E. A.; Ng, B. G.; Patterson, M. C. Neurological Aspects of Human Glycosylation Disorders. Annu. Rev. Neurosci. 2015, 38, 105−125. (15) Hart, G. W.; Copeland, R. J. Glycomics Hits the Big Time. Cell 2010, 143, 672−676. (16) Hebert, D. N.; Lamriben, L.; Powers, E. T.; Kelly, J. W. The Intrinsic and Extrinsic Effects of N-Linked Glycans on Glycoproteostasis. Nat. Chem. Biol. 2014, 10, 902−910. (17) Parodi, A. J. Role of N-Oligosaccharide Endoplasmic Reticulum Processing Reactions in Glycoprotein Folding and Degradation. Biochem. J. 2000, 348, 1−13. (18) Imperiali, B.; Rickert, K. W. Conformational Implications of Asparagine-Linked Glycosylation. Proc. Natl. Acad. Sci. U. S. A. 1995, 92, 97−101. (19) Rickert, K. W.; Imperiali, B. Analysis of the Conserved Glycosylation Site in the Nicotinic Acetylcholine Receptor: Potential Roles in Complex Assembly. Chem. Biol. 1995, 2, 751−759. (20) Hanson, S. R.; Culyba, E. K.; Hsu, T. L.; Wong, C. H.; Kelly, J. W.; Powers, E. T. The Core Trisaccharide of an N-Linked Glycoprotein Intrinsically Accelerates Folding and Enhances Stability. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 3131−3136. (21) Imperiali, B.; O’Connor, S. E. Effect of N-Linked Glycosylation on Glycopeptide and Glycoprotein Structure. Curr. Opin. Chem. Biol. 1999, 3, 643−649. (22) Imperiali, B.; Shannon, K. L. Differences between Asn-Xaa-ThrContaining Peptides: A Comparison of Solution Conformation and Substrate Behavior with Oligosaccharyltransferase. Biochemistry 1991, 30, 4374−4380. (23) O’Connor, S. E.; Imperiali, B. Modulation of Protein Structure and Function by Asparagine-Linked Glycosylation. Chem. Biol. 1996, 3, 803−812. (24) Ellis, C. R.; Noid, W. G. Deciphering the Glycosylation Code. J. Phys. Chem. B 2014, 118, 11462−11469. (25) Bosques, C. J.; Tschampel, S. M.; Woods, R. J.; Imperiali, B. Effects of Glycosylation on Peptide Conformation: A Synergistic Experimental and Computational Study. J. Am. Chem. Soc. 2004, 126, 8421−8425. (26) Lee, H. S.; Qi, Y.; Im, W. Effects of N-Glycosylation on Protein Conformation and Dynamics: Protein Data Bank Analysis and Molecular Dynamics Simulation Study. Sci. Rep. 2015, 5, 8926. (27) Wormald, M. R.; Dwek, R. A. Glycoproteins: Glycan Presentation and Protein-Fold Stability. Structure 1999, 7, R155−160. (28) Wang, X. Y.; Ji, C. G.; Zhang, J. Z. Exploring the Molecular Mechanism of Stabilization of the Adhesion Domains of Human Cd2 by N-Glycosylation. J. Phys. Chem. B 2012, 116, 11570−11577. (29) Price, J. L.; Culyba, E. K.; Chen, W.; Murray, A. N.; Hanson, S. R.; Wong, C.-H.; Powers, E. T.; Kelly, J. W. N-Glycosylation of Enhanced Aromatic Sequons to Increase Glycoprotein Stability. Biopolymers 2012, 98, 195−211. (30) Wyss, D. F.; Choi, J. S.; Li, J.; Knoppers, M. H.; Willis, K. J.; Arulanandam, A. R.; Smolyar, A.; Reinherz, E. L.; Wagner, G. Conformation and Function of the N-Linked Glycan in the Adhesion Domain of Human Cd2. Science 1995, 269, 1273−1278. (31) Gavrilov, Y.; Shental-Bechor, D.; Greenblatt, H. M.; Levy, Y. Glycosylation May Reduce Protein Thermodynamic Stability by Inducing a Conformational Distortion. J. Phys. Chem. Lett. 2015, 6, 3572−3577.

of model glycopeptides; free energy profiles, dihedral distributions, and thermodynamics decomposition of model glycopeptide N in α-helical, β-hairpin, and random coil conformations from BE-META simulations; dihedral distributions and thermodynamics decomposition of model glycopeptides EN, NE, and ENE in the αhelical conformation from BE-META simulations; average number of α-helical residues and percent αhelicity per residue for unconstrained MD simulations of N, EN, NE, and ENE and N, EN, NE, and ENE with +2 Thr (PDF)

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Yu-Shan Lin: 0000-0001-6460-2877 Present Address

† J.R.R.: Department of Chemistry, University of California, Berkeley, CA 94720.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We thank a Tufts startup fund and the Knez Family Faculty Investment Fund for support of Y.-S.L. J.R.R. acknowledges the support of the Beckman Scholars Program. We thank Professors Joshua Kritzer, Joshua Price, and Matthew Shoulders for valuable discussions and helpful comments.



REFERENCES

(1) Leader, B.; Baca, Q. J.; Golan, D. E. Protein Therapeutics: A Summary and Pharmacological Classification. Nat. Rev. Drug Discovery 2008, 7, 21−39. (2) Frokjaer, S.; Otzen, D. E. Protein Drug Stability: A Formulation Challenge. Nat. Rev. Drug Discovery 2005, 4, 298−306. (3) Ueda, T.; Tomita, K.; Notsu, Y.; Ito, T.; Fumoto, M.; Takakura, T.; Nagatome, H.; Takimoto, A.; Mihara, S.-I.; Togame, H.; Kawamoto, K.; Iwasaki, T.; Asakura, K.; Oshima, T.; Hanasaki, K.; Nishimura, S.-I.; Kondo, H. Chemoenzymatic Synthesis of Glycosylated Glucagon-Like Peptide 1: Effect of Glycosylation on Proteolytic Resistance and in Vivo Blood Glucose-Lowering Activity. J. Am. Chem. Soc. 2009, 131, 6237−6245. (4) Powell, M. F.; Stewart, T.; Otvos, L., Jr; Urge, L.; Gaeta, F. C. A.; Sette, A.; Arrhenius, T.; Thomson, D.; Soda, K.; Colon, S. M. Peptide Stability in Drug Development. Ii. Effect of Single Amino Acid Substitution and Glycosylation on Peptide Reactivity in Human Serum. Pharm. Res. 1993, 10, 1268−1273. (5) Tomabechi, Y.; Krippner, G.; Rendle, P. M.; Squire, M. A.; Fairbanks, A. J. Glycosylation of Pramlintide: Synthetic Glycopeptides That Display in Vitro and in Vivo Activities as Amylin Receptor Agonists. Chem. - Eur. J. 2013, 19, 15084−15088. (6) Dalziel, M.; Crispin, M.; Scanlan, C. N.; Zitzmann, N.; Dwek, R. A. Emerging Principles for the Therapeutic Exploitation of Glycosylation. Science 2014, 343, 1235681. (7) Moradi, S. V.; Hussein, W. M.; Varamini, P.; Simerska, P.; Toth, I. Glycosylation, an Effective Synthetic Strategy to Improve the Bioavailability of Therapeutic Peptides. Chem. Sci. 2016, 7, 2492− 2500. (8) Sola, R. J.; Griebenow, K. Effects of Glycosylation on the Stability of Protein Pharmaceuticals. J. Pharm. Sci. 2009, 98, 1223−1245. (9) Walsh, G.; Jefferis, R. Post-Translational Modifications in the Context of Therapeutic Proteins. Nat. Biotechnol. 2006, 24, 1241− 1252. L

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

(53) Frishman, D.; Argos, P. Knowledge-Based Protein Secondary Structure Assignment. Proteins: Struct., Funct., Genet. 1995, 23, 566− 579. (54) Mu, Y.; Nguyen, P. H.; Stock, G. Energy Landscape of a Small Peptide Revealed by Dihedral Angle Principal Component Analysis. Proteins: Struct., Funct., Genet. 2005, 58, 45−52. (55) Sittel, F.; Jain, A.; Stock, G. Principal Component Analysis of Molecular Dynamics: On the Use of Cartesian Vs. Internal Coordinates. J. Chem. Phys. 2014, 141, 014111. (56) Rodriguez, A.; Laio, A. Clustering by Fast Search and Find of Density Peaks. Science 2014, 344, 1492−1496. (57) Agirre, J.; Iglesias-Fernandez, J.; Rovira, C.; Davies, G. J.; Wilson, K. S.; Cowtan, K. D. Privateer: Software for the Conformational Validation of Carbohydrate Structures. Nat. Struct. Mol. Biol. 2015, 22, 833−834. (58) Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. Gromacs 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 2008, 4, 435−447. (59) Piana, S.; Lindorff-Larsen, K.; Shaw, D. E. How Robust Are Protein Folding Simulations with Respect to Force Field Parameterization? Biophys. J. 2011, 100, L47−49. (60) Best, R. B.; Zhu, X.; Shim, J.; Lopes, P. E. M.; Mittal, J.; Feig, M.; MacKerell, A. D. Optimization of the Additive Charmm All-Atom Protein Force Field Targeting Improved Sampling of the Backbone Φ, Ψ and Side-Chain X1 and X2 Dihedral Angles. J. Chem. Theory Comput. 2012, 8, 3257−3273. (61) Onufriev, A.; Bashford, D.; Case, D. A. Exploring Protein Native States and Large-Scale Conformational Changes with a Modified Generalized Born Model. Proteins: Struct., Funct., Genet. 2004, 55, 383−394. (62) Yanover, C.; Vanetik, N.; Levitt, M.; Kolodny, R.; Keasar, C. Redundancy-Weighting for Better Inference of Protein Structural Features. Bioinformatics 2014, 30, 2295−2301. (63) Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt, D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera−a Visualization System for Exploratory Research and Analysis. J. Comput. Chem. 2004, 25, 1605−1612. (64) Kirschner, K. N.; Yongye, A. B.; Tschampel, S. M.; GonzalezOuteirino, J.; Daniels, C. R.; Foley, B. L.; Woods, R. J. Glycam06: A Generalizable Biomolecular Force Field. Carbohydrates. J. Comput. Chem. 2008, 29, 622−655. (65) Dunbrack, R. L., Jr Rotamer Libraries in the 21st Century. Curr. Opin. Struct. Biol. 2002, 12, 431−440. (66) Lovell, S. C.; Word, J. M.; Richardson, J. S.; Richardson, D. C. Asparagine and Glutamine Rotamers: B-Factor Cutoff and Correction of Amide Flips Yield Distinct Clustering. Proc. Natl. Acad. Sci. U. S. A. 1999, 96, 400−405. (67) Lovell, S. C.; Word, J. M.; Richardson, J. S.; Richardson, D. C. The Penultimate Rotamer Library. Proteins: Struct., Funct., Genet. 2000, 40, 389−408. (68) Janin, J.; Wodak, S.; Levitt, M.; Maigret, B. Conformation of Amino Acid Side-Chains in Proteins. J. Mol. Biol. 1978, 125, 357−386. (69) Imberty, A.; Perez, S. Stereochemistry of the N-Glycosylation Sites in Glycoproteins. Protein Eng., Des. Sel. 1995, 8, 699−709. (70) Davis, J. T.; Hirani, S.; Bartlett, C.; Reid, B. R. 1H NMR Studies on an Asn-Linked Glycopeptide. GlcNAc-1 C2−N2 Bond Is Rigid in H2O. J. Biol. Chem. 1994, 269, 3331−3338. (71) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926−935. (72) Bussi, G.; Donadio, D.; Parrinello, M. Canonical Sampling through Velocity Rescaling. J. Chem. Phys. 2007, 126, 014101. (73) Cheng, A.; Merz, K. M. Application of the Nosé−Hoover Chain Algorithm to the Study of Protein Dynamics. J. Phys. Chem. 1996, 100, 1927−1937. (74) Lingenheil, M.; Denschlag, R.; Reichold, R.; Tavan, P. The ″Hot-Solvent/Cold-Solute″ Problem Revisited. J. Chem. Theory Comput. 2008, 4, 1293−1306.

(32) Helenius, A.; Aebi, M. Intracellular Functions of N-Linked Glycans. Science 2001, 291, 2364−2369. (33) Ellis, C. R.; Maiti, B.; Noid, W. G. Specific and Nonspecific Effects of Glycosylation. J. Am. Chem. Soc. 2012, 134, 8184−8193. (34) Lu, D.; Yang, C.; Liu, Z. How Hydrophobicity and the Glycosylation Site of Glycans Affect Protein Folding and Stability: A Molecular Dynamics Simulation. J. Phys. Chem. B 2012, 116, 390−400. (35) Shental-Bechor, D.; Levy, Y. Effect of Glycosylation on Protein Folding: A Close Look at Thermodynamic Stabilization. Proc. Natl. Acad. Sci. U. S. A. 2008, 105, 8256−8261. (36) Culyba, E. K.; Price, J. L.; Hanson, S. R.; Dhar, A.; Wong, C. H.; Gruebele, M.; Powers, E. T.; Kelly, J. W. Protein Native-State Stabilization by Placing Aromatic Side Chains in N-Glycosylated Reverse Turns. Science 2011, 331, 571−575. (37) Asensio, J. L.; Ardá, A.; Cañada, F. J.; Jiménez-Barbero, J. Carbohydrate−Aromatic Interactions. Acc. Chem. Res. 2013, 46, 946− 954. (38) Chen, W.; Enck, S.; Price, J. L.; Powers, D. L.; Powers, E. T.; Wong, C.-H.; Dyson, H. J.; Kelly, J. W. Structural and Energetic Basis of Carbohydrate-Aromatic Packing Interactions in Proteins. J. Am. Chem. Soc. 2013, 135, 9877−9884. (39) Price, J. L.; Powers, D. L.; Powers, E. T.; Kelly, J. W. Glycosylation of the Enhanced Aromatic Sequon Is Similarly Stabilizing in Three Distinct Reverse Turn Contexts. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 14127−14132. (40) Hsu, C.-H.; Park, S.; Mortenson, D. E.; Foley, B. L.; Wang, X.; Woods, R. J.; Case, D. A.; Powers, E. T.; Wong, C.-H.; Dyson, H. J.; Kelly, J. W. The Dependence of Carbohydrate−Aromatic Interaction Strengths on the Structure of the Carbohydrate. J. Am. Chem. Soc. 2016, 138, 7636−7648. (41) Petrescu, A. J.; Milac, A. L.; Petrescu, S. M.; Dwek, R. A.; Wormald, M. R. Statistical Analysis of the Protein Environment of NGlycosylation Sites: Implications for Occupancy, Structure, and Folding. Glycobiology 2004, 14, 103−114. (42) Surleac, M. D.; Spiridon, L. N.; Tacutu, R.; Milac, A. L.; Petrescu, S. M.; Petrescu, A.-J. The Structural Assessment of Glycosylation Sites Database - SAGSAn Overall View on NGlycosylation. In Glycosylation; Petrescu, S., Ed.; InTech: Rijeka, Croatia, 2012; Chapter 1, pp 3−20. (43) Zielinska, D. F.; Gnad, F.; Schropp, K.; Wisniewski, J. R.; Mann, M. Mapping N-Glycosylation Sites across Seven Evolutionarily Distant Species Reveals a Divergent Substrate Proteome Despite a Common Core Machinery. Mol. Cell 2012, 46, 542−548. (44) Zielinska, D. F.; Gnad, F.; Wisniewski, J. R.; Mann, M. Precision Mapping of an in Vivo N-Glycoproteome Reveals Rigid Topological and Sequence Constraints. Cell 2010, 141, 897−907. (45) Chen, M. M.; Bartlett, A. I.; Nerenberg, P. S.; Friel, C. T.; Hackenberger, C. P.; Stultz, C. M.; Radford, S. E.; Imperiali, B. Perturbing the Folding Energy Landscape of the Bacterial Immunity Protein Im7 by Site-Specific N-Linked Glycosylation. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 22528−22533. (46) Magrane, M.; Consortium, U. Uniprot Knowledgebase: A Hub of Integrated Protein Data. Database 2011, 2011, bar009−bar009. (47) UniProtKB. Swiss-Prot. http://www.uniprot.org (accessed Dec 2, 2014). (48) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235−242. (49) Protein Data Bank. http://www.rcsb.org (accessed Dec 15, 2015). (50) Lütteke, T.; von der Lieth, C.-W. Pdb-Care (Pdb Carbohydrate Residue Check): A Program to Support Annotation of Complex Carbohydrate Structures in Pdb Files. BMC Bioinf. 2004, 5, 69−69. (51) Pdb-Care. http://www.glycosciences.de/tools/pdb-care (accessed Dec 15, 2014). (52) Kabsch, W.; Sander, C. Dictionary of Protein Secondary Structure - Pattern-Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 1983, 22, 2577−2637. M

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling (75) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola, A.; Haak, J. R. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81, 3684−3690. (76) Hockney, R. W. The Potential Calculation and Some Applications. Methods Comput. Phys. 1970, 9, 136−211. (77) Hess, B.; Bekker, H.; Berendsen, H. J.; Fraaije, J. G. Lincs: A Linear Constraint Solver for Molecular Simulations. J. Comput. Chem. 1997, 18, 1463−1472. (78) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103, 8577−8593. (79) Allen, M. P.; Tildesley, D. J. Computer Simulation of Liquids; Clarendon Press: Oxford, U.K., 1989; p 385. (80) Laio, A.; Parrinello, M. Escaping Free-Energy Minima. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 12562−12566. (81) Piana, S.; Laio, A. A Bias-Exchange Approach to Protein Folding. J. Phys. Chem. B 2007, 111, 4553−4559. (82) Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. Plumed 2: New Feathers for an Old Bird. Comput. Phys. Commun. 2014, 185, 604−613. (83) Fenley, A. T.; Muddana, H. S.; Gilson, M. K. Entropy-Enthalpy Transduction Caused by Conformational Shifts Can Obscure the Forces Driving Protein-Ligand Binding. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 20006−20011. (84) Fleck, M.; Polyansky, A. A.; Zagrovic, B. Parent: A Parallel Software Suite for the Calculation of Configurational Entropy in Biomolecular Systems. J. Chem. Theory Comput. 2016, 12, 2055−2065. (85) King, B. M.; Silver, N. W.; Tidor, B. Efficient Calculation of Molecular Configurational Entropies Using an Information Theoretic Approximation. J. Phys. Chem. B 2012, 116, 2891−2904. (86) King, B. M.; Tidor, B. Mist: Maximum Information Spanning Trees for Dimension Reduction of Biological Data Sets. Bioinformatics 2009, 25, 1165−1172. (87) Sattelle, B. M.; Almond, A. Is N-Acetyl-D-Glucosamine a Rigid (4)C(1) Chair? Glycobiology 2011, 21, 1651−1662. (88) Doolittle, R. F. Redundancies in Protein Sequences. In Prediction of Protein Structure and the Principles of Protein Conformation; Fasman, G. D., Ed.; Plenum Press: New York, 1989; pp 599−623. (89) Dierks, T.; Dickmanns, A.; Preusser-Kunze, A.; Schmidt, B.; Mariappan, M.; von Figura, K.; Ficner, R.; Rudolph, M. G. Molecular Basis for Multiple Sulfatase Deficiency and Mechanism for Formylglycine Generation of the Human Formylglycine-Generating Enzyme. Cell 2005, 121, 541−552. (90) Schlotawa, L.; Radhakrishnan, K.; Baumgartner, M.; Schmid, R.; Schmidt, B.; Dierks, T.; Gartner, J. Rapid Degradation of an Active Formylglycine Generating Enzyme Variant Leads to a Late Infantile Severe Form of Multiple Sulfatase Deficiency. Eur. J. Hum. Genet. 2013, 21, 1020−1023. (91) Schlotawa, L.; Ennemann, E. C.; Radhakrishnan, K.; Schmidt, B.; Chakrapani, A.; Christen, H.-J.; Moser, H.; Steinmann, B.; Dierks, T.; Gärtner, J. Sumf1Mutations Affecting Stability and Activity of Formylglycine Generating Enzyme Predict Clinical Outcome in Multiple Sulfatase Deficiency. Eur. J. Hum. Genet. 2011, 19, 253−261. (92) Dierks, T.; Schmidt, B.; Borissenko, L. V.; Peng, J.; Preusser, A.; Mariappan, M.; von Figura, K. Multiple Sulfatase Deficiency Is Caused by Mutations in the Gene Encoding the Human Ca-Formylglycine Generating Enzyme. Cell 2003, 113, 435−444. (93) Casey, A.; Walsh, G. Purification and Characterization of Extracellular Phytase from Aspergillus Niger Atcc 9142. Bioresour. Technol. 2003, 86, 183−188. (94) Fonseca-Maldonado, R.; Maller, A.; Bonneil, E.; Thibault, P.; Botelho-Machado, C.; Ward, R. J.; Polizeli, M. d. L. T. d. M. Biochemical Properties of Glycosylation and Characterization of a Histidine Acid Phosphatase (Phytase) Expressed in Pichia Pastoris. Protein Expression Purif. 2014, 99, 43−49. (95) Guo, M.; Hang, H.; Zhu, T.; Zhuang, Y.; Chu, J.; Zhang, S. Effect of Glycosylation on Biochemical Characterization of Recombinant Phytase Expressed in Pichia Pastoris. Enzyme Microb. Technol. 2008, 42, 340−345.

(96) Han, Y.; Lei, X. G. Role of Glycosylation in the Functional Expression of an Aspergillus Niger Phytase (Phya) in Pichia Pastoris. Arch. Biochem. Biophys. 1999, 364, 83−90. (97) Repo, H.; Kuokkanen, E.; Oksanen, E.; Goldman, A.; Heikinheimo, P. Is the Bovine Lysosomal Phospholipase B-Like Protein an Amidase? Proteins: Struct., Funct., Genet. 2014, 82, 300−311. (98) Sievers, F.; Wilm, A.; Dineen, D.; Gibson, T. J.; Karplus, K.; Li, W.; Lopez, R.; McWilliam, H.; Remmert, M.; Söding, J.; Thompson, J. D.; Higgins, D. G. Fast, Scalable Generation of High-Quality Protein Multiple Sequence Alignments Using Clustal Omega. Mol. Syst. Biol. 2011, 7, 539. (99) Pace, C. N.; Scholtz, J. M. A Helix Propensity Scale Based on Experimental Studies of Peptides and Proteins. Biophys. J. 1998, 75, 422−427. (100) Spek, E. J.; Olson, C. A.; Shi, Z.; Kallenbach, N. R. Alanine Is an Intrinsic A-Helix Stabilizing Amino Acid. J. Am. Chem. Soc. 1999, 121, 5571−5572. (101) Anthony, C. S.; Corradi, H. R.; Schwager, S. L. U.; Redelinghuys, P.; Georgiadis, D.; Dive, V.; Acharya, K. R.; Sturrock, E. D. The N Domain of Human Angiotensin-I-Converting Enzyme: The Role of N-Glcyosylation and the Crystal Structure in Complex with an N Domain Specific Phophinic Inhibitor, Rxp407. J. Biol. Chem. 2010, 285, 35685−35693. (102) Corradi, H. R.; Schwager, S. L. U.; Nchinda, A. T.; Sturrock, E. D.; Acharya, K. R. Crystal Structure of the N Domain of Human Somatic Angiotensin I-Converting Enzyme Pprovides a Structural Basis for Domain-Specific Inhibitor Design. J. Mol. Biol. 2006, 357, 964−974. (103) Roberts, S. J.; Petropavlovskaja, M.; Chung, K. N.; Knight, C. B.; Elwood, P. C. Role of Individual N-Linked Glycosylation Sites in the Function and Intracellular Transport of the Human Alpha Folate Receptor. Arch. Biochem. Biophys. 1998, 351, 227−235. (104) Shen, F.; Wang, H. Q.; Zheng, X.; Ratnam, M. Expression Levels of Functional Folate Receptors Alpha and Beta Are Related to the Number of N-Glycosylated Sites. Biochem. J. 1997, 327, 759−764. (105) Broccolini, A.; Gidaro, T.; De Cristofaro, R.; Morosetti, R.; Gliubizzi, C.; Ricci, E.; Tonali, P. A.; Mirabella, M. Hyposialylation of Neprilysin Possibly Affects Its Expression and Enzymatic Activity in Hereditary Inclusion-Body Myopathy Muscle. J. Neurochem. 2008, 105, 971−981. (106) Lafrance, M. H.; Vezina, C.; Wang, Q.; Boileau, G.; Crine, P.; Lemay, G. Role of Glycosylation in Transport and Enzymatic-Activity of Neutral Endopeptidase-24.11. Biochem. J. 1994, 302, 451−454. (107) Kwong, P. D.; Wyatt, R.; Desjardins, E.; Robinson, J.; Culp, J. S.; Hellmig, B. D.; Sweet, R. W.; Sodroski, J.; Hendrickson, W. A. Probability Analysis of Variational Crystallization and Its Application to Gp120, the Exterior Envelope Glycoprotein of Type 1 Human Immunodeficiency Virus (Hiv-1). J. Biol. Chem. 1999, 274, 4115− 4123. (108) Lizak, C.; Gerber, S.; Numao, S.; Aebi, M.; Locher, K. P. X-Ray Structure of a Bacterial Oligosaccharyltransferase. Nature 2011, 474, 350−355.

N

DOI: 10.1021/acs.jcim.7b00123 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX