Using Chemical Synthesis to Study and Apply ... - ACS Publications

ABSTRACT: Protein glycosylation is one of the most common post-translational modifications and has the capability to influence many properties of prot...
0 downloads 0 Views 2MB Size
Subscriber access provided by READING UNIV

Perspective

Using Chemical Synthesis to Study and Apply Protein Glycosylation Patrick K Chaffey, Xiaoyang Guan, Yaohao Li, and Zhongping Tan Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.7b01055 • Publication Date (Web): 08 Jan 2018 Downloaded from http://pubs.acs.org on January 8, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Biochemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Using Chemical Synthesis to Study and Apply Protein Glycosylation Patrick K. Chaffey, Xiaoyang Guan, Yaohao Li, Zhongping Tan* Department of Chemistry and Biochemistry and BioFrontiers Institute, University of Colorado, Boulder, CO 80303, United States ABSTRACT: Protein glycosylation is one of the most common post-translational modifications and has the capability to influence many properties of proteins. Abnormal protein glycosylation can lead to protein malfunction and serious disease. While appreciation of glycosylation’s importance is growing in the scientific community, especially in recent years, a lack of homogeneous glycoproteins with well-defined glycan structures has made it difficult to understand the correlation between the structure of glycoproteins and their properties at a quantitative level. This has been a significant limitation on rational applications of glycosylation and on optimizing glycoprotein properties. Through the extraordinary efforts of chemists, it is now feasible to use chemical synthesis to produce collections of homogeneous glycoforms with systematic variations in amino acid sequence, glycosidic linkage, anomeric configuration, and glycan structure. Such a technical advance has greatly facilitated the study and application of protein glycosylation. This Perspective highlights some representative work in this research area, with the goal of inspiring and encouraging more scientists to pursue the glycosciences.

1. INTRODUCTION One of the earliest reports of carbohydrates, also known as glycans, covalently bound to proteins in living systems comes from a 1958 report by Jevons concerning ovalbumin.1 This protein post-translation modification, commonly called protein glycosylation, has since risen in the minds of those who study it from a possibly rare curiosity to a ubiquitous and important feature of living systems across all branches of life.2 In recent years, protein glycosylation has attracted considerable attention for its ability to introduce an impressive amount of structural diversity (and thus functional diversity as well) to the proteins it modifies.3 Carbohydrates possess a relatively large number of stereogenic atoms and oligosaccharides can be formed through numerous different attachment points with sometimes complicated branched architectures.4 Additionally, several different covalent linkages between glycans and proteins have been identified including the oxygen atom of a serine, threonine, or tyrosine side chain (O-linked), the terminal nitrogen of an asparagine side chain (N-linked), a carbon atom of a tryptophan side chain (C-linked), a glycosylphosphatidylinositol (GPI) anchor (glypiation), and even phosphate groups that are themselves covalently modifying proteins.5,6 All this chemical variety is the primary reason that protein glycosylation is seen as having such immense potential to modify the structure and function of proteins.7-11 Indeed, numerous effects of protein glycosylation have been documented.2,12 Glycosylation in the endoplasmic reticulum (ER) has been known for many years to act as a quality control signal for proper protein folding and as a way to direct glycosylated proteins towards the correct Golgi compartments.13 More recent work has shown that protein glycosylation can also have intrinsic effects on a protein’s folding independent of the quality control checking chaperones.14,15 Glycans have been shown to act as specific ligands that mediate the attachment of certain cells to their target tissues or that initiate signal transduction pathways within target cells.16 Since they are often very hydrophilic and can be quite bulky, the attachment

of glycans can significantly alter the surface properties of proteins and thus affect protein-protein interactions.17 Basic physical properties, like solubility and thermodynamic stability, can be impacted as well.18 When beneficial, the effects of glycosylation can be exploited to improve a protein’s properties and performance for some specific application. This could include adding glycans to a therapeutic protein to enhance the pharmacodynamic/pharmacokinetic profile or to an industrial enzyme to increase stability, solubility and activity.19,20 This process could be accurately called protein glycoengineering. A real-world example of this is the development of Darbepoetin Alfa, a human erythropoietin (EPO) analog that was approved by both the U.S. Food and Drug Administration (FDA) and the European Medicines Agency (EMA) in 2001 for the treatment of anemia associated with chronic renal failure and/or chemotherapy.21 Darbepoetin Alfa has two additional Nglycosylation sites and the hyperglycosylation of the protein results in a 3-fold longer serum half-life than the wild-type EPO molecule and an increase in in vivo activity.22,23 The compound is thus a more convenient and effective drug thanks to the additional glycosylation.24 The number of such successful examples is limited, however, with the major reason being an almost complete lack of general guidelines that can be used to understand how a certain glycan structure or glycosylation pattern might alter a protein of interest. This forces most efforts at glycoengineering to center on random mutagenesis and trial-and-error investigations of randomly chosen glycans; a process that is time consuming, expensive, and inefficient. Naturally occurring glycosylated proteins are almost always produced as a complex mixture of many individual glycoforms, which complicates any analysis of structure-function relationships and has hampered our grasp of the phenomenon.25,26 Like most post-translational modifications, protein glycosylation is not templated and it is instead controlled by numerous regulatory elements that integrate both environmental and genetic cues. Additionally, the large glycan

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

structures are constructed through the action of many enzymes, none of which are perfectly selective or active. Together, these factors lead to heterogeneity on several levels.27 On proteins with multiple glycosylation sites, varying amounts of site occupancy in a glycoform mixture is known as macroheterogeneity. Even at one specific, occupied site, the glycan structure can vary between individual molecules with respect to size, number of charged carbohydrate residues, and branching structure, which gives rise to what is commonly termed microheterogeneity.28 It is currently very difficult, and in many cases impossible, to separate the subtly different glycoforms that are produced naturally or in most recombinant glycoprotein expression systems, and this has led to many studies of the effect of protein glycosylation relying on glycoform mixtures.2,12 Such studies can only yield coarse-grained approximations of any general role glycosylation might be playing in living systems. Further complicating the studies is the fact that many such glycoform mixtures contain sample specific variation resulting from differences in sample source, culture conditions and isolation procedures.29 This has led to generally vague and occasionally contradictory conclusions being drawn from previous studies and many gaps in our knowledge of protein glycosylation. Together, these factors have been a severe drag on the practice of glycoengineering.

Figure 1. The overall strategy to study and apply protein glycosylation. It is difficult to directly determine the structure-property relationships of naturally occurring glycoproteins. This limitation can be overcome by systematically analyzing the differences of homogenous glycoforms in many important properties through the combined use of techniques from chemical synthesis, biochemistry, biophysics, molecular biology, and cell biology. More specifically, it involves the following major steps: (A) based on literature data, identify the possible glycans and glycosylation sites that could occur on a glycoprotein and use this information to design and synthesize a collection of glycoforms; (B) individually characterize the properties of each glycoform; (C) compare differences in properties among the synthetic glycoforms to determine the correlations between structures of glycoforms and their properties; (D) apply guidelines derived from the correlations to improve enzyme and therapeutic protein through glycoengineering.

With current technology, significant discoveries related to the basic principles of protein glycosylation will only be made

Page 2 of 17

through the study of glycoprotein samples that are homogeneous and structurally well characterized, so production of such samples has been a central focus of research in the field for many years (Fig. 1).30-33 The importance of homogeneous samples for study can be seen by looking at the advances made in our understanding of nucleic acids34 and proteins35 after reliable methods for their production became commonplace. To date, because of the difficulties of separating heterogeneous glycoform mixtures into individual components, the most successful strategies have incorporated synthetic methods to prepare samples. These can be further sub-divided into enzymatic and chemical synthesis routes.32 Approaches that use only chemical synthesis are supremely flexible since chemically introducing glycans does not depend on a protein’s sequence or local structure. Chemical synthesis also allows construction of glycoproteins carrying different glycan structures at individual sites with relative ease. However, despite the amount of effort that has gone into optimizing synthetic procedures for glycoprotein synthesis, the process remains difficult and tedious in most cases. Enzymatic synthesis is generally more convenient, although enzymes to construct specific glycan structures must still be identified, optimized, and purified. Moreover, using enzymes does limit the synthesis based on the sequence specificity and activity of the enzymes being used and producing multiply glycosylated proteins with different glycans at each site is still difficult in almost all cases. While enzymatic methods are being continually refined, it is the chemical synthesis of glycosylated proteins that has contributed most significantly thus far and it is the application of chemical synthesis that we will focus on in this perspective. Three specific case studies will be discussed in order to illustrate the development and usefulness of chemical synthesis, its impact on studies of protein glycosylation, and how the findings of those studies can be applied in glycoengineering. The first section will cover efforts to synthesize fully glycosylated human EPO, notable as a therapeutically relevant glycoprotein that contains both O- and N-linked glycosylation. We will then have one section each on how chemical synthesis has aided the study of N- and O-linked glycosylation. Studies of the enhanced aromatic sequon will be used to show how Nglycosylation can stabilize a wide variety of proteins, and recent efforts to understand O-mannosylation and OGalNAcylation will be used to show the physical and functional consequences of O-glycosylation. The final section will discuss the outlook going into the future. We will identify places for improvement in the synthesis of glycoproteins, important biological questions that might be answered in the near future, and additional approaches that may strengthen the research of protein glycosylation. 2. CHEMICAL SYNTHESIS OF GLYCOPROTEINS Many years of studying glycoproteins has clearly pointed to homogeneous glycoforms as essential tools to define the roles of glycosylation in modulating protein properties,32,36 and considerable research effort has been directed towards chemical synthesis of glycoproteins. As a result, many advanced chemical techniques have been discovered for the synthesis of carbohydrates and glycopeptide fragments, and new methods for ligating small synthetic fragments to create larger molecules are being continually refined.37,38 Nevertheless, extensive tailoring and optimization is still required before these techniques can be effectively applied to prepare structurally de-

ACS Paragon Plus Environment

Page 3 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

fined glycoproteins. Few examples of such an endeavor are found in the literature.39,40

biologically relevant glycans: α(2-3) sialylated corefucosylated complex type bi-antennary dodecasaccharide Nlinked glycans and a disialylated T-antigen O-linked glycan (Fig. 3A).47 These glycans are some of the largest ever incorporated into a glycoprotein of this size, so the range of problems that were faced during chemical synthesis and the solutions developed to address them are important advances in the field.

Figure 2. Human erythropoietin. (A) Amino acid sequence of the full-length EPO. The N- and O-linked glycosylation sites are highlighted in red and the disulfide bonds are in blue. (B) The structure of glycosylated EPO. The glycans at the three N-linked glycosylation sites (Asn24, 38, and 83) are tetranatennary complex type oligosaccharides. The O-linked glycan at Ser126 site has a structure of disialyl-T-antigen. The glycans were added to the protein using the GLYCAM Web-tool Glycoprotein Builder.41

Considering all the previously reported work on glycoprotein synthesis, it is the chemical synthesis of glycosylated human EPO that has been most thoroughly investigated (Fig. 2).40 Human EPO is a typical glycoprotein and this makes it a great proof of concept target for total chemical synthesis. It has four glycosylation sites and naturally produced samples are very heterogeneous mixtures of glycoforms, separation of which is almost impossible.42 The protein carries three Nglycans at Asn24, 38, and 83, which are mainly core fucosylated and sialic acid terminated complex type N-glycans with di-, tri- and tetra-antennary structures (Fig. 2).43 Its fourth glycosylation site is located at Ser126, where O-linked mucin type core 1 glycans are present.44 The kinds of N- and Olinked glycans identified on EPO are thought to be the most common forms of these glycans on human glycoproteins.45,46 Human EPO is 166 amino acids in length, and is thus an average-sized soluble human protein. Taken together, human EPO has representative characteristics of many glycoproteins and is therefore well suited as a model molecule for developing chemical synthesis of glycoproteins. Although several synthetic routes to EPO have been reported,39 only the work by the Danishefsky group has actually achieved the total chemical synthesis of an EPO glycoform containing both N- and O-linked glycans with structures of

Figure 3. Research on human EPO has contributed many general principles for chemical synthesis of glycoproteins. (A) Retrosynthetic analysis of glycosylated EPO. The native chemical ligation/metal-free desulfurization strategy (NCL/MFD) was demonstrated to be more generally applicable for constructing the linear structure of glycoprotein, while the convergent aspartylation and cassette approach were found to be more feasible for the synthesis of N- and O-linked glycopeptides. (B)(C) Chemical synthesis of N- and O-linked glycans. The synthesis can be greatly simplified by using convergent strategies. PG, protecting group. SR, thioester.

Danishefky and co-workers had evolved and fine-tuned their approach for around 10 years before they successfully

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

realized the synthesis of glycosylated human EPO.40 During this time, they invented new methods to address several critical technique gaps that appeared, and systematically expanded the existing synthetic approaches so they could be more generally applied. As shown in Figure 3A, the final strategy for the synthesis of EPO was designed based on findings from the optimization phase, and centers around the convergent assembly of four long glycopeptide fragments by chemoselective ligation. Each of the long glycopeptides were prepared by joining large sugars with long peptides, or by joining small fragments together (Fig. 3A). Such a convergent strategy largely minimizes the number of synthetic steps and the resultant product loss during intermediate purification steps. To apply this highly efficient strategy successfully, Danishefsky and coworkers refined the total chemical synthesis of complex N- and O-glycans, overcame diverse problems required for the synthesis of large glycopeptides, and dealt with many difficulties in ligating peptide/glycopeptide fragments. 2.1. Chemical Synthesis of N- and O-linked Glycans In order to synthesize glycopeptide fragments that can be stitched together to form human EPO, it is important to first synthesize the N- and O-linked glycans. The synthesis of the N-linked glycan was a challenging task, mainly due to the presence of the sialic acid and fucose units. These two carbohydrates are highly labile under many conditions, which significantly limits the available chemistry when designing the synthesis of large glycans containing these two residues. After many rounds of optimization, a synthetic route was developed to enable the synthesis of the fully-protected dodecasaccharide, which, after deprotection and subsequent amination, was converted to an anomeric amine for coupling to the peptide fragment (Fig. 3B).48,49 Notably, this large oligosaccharide contains almost all of the individual monosaccharide types found in eukaryotic N-glycans.50 This route thus has great potential as a way to access most N-glycan structures found in nature. In fact, soon after this publication, more complex triantennary N-glycans were successfully prepared using a similar strategy.51,52 In contrast to the N-linked glycan, the Olinked glycan used in the synthesis was prepared as a glycoamino acid, which is often referred to as a “cassette”. As shown in Figure 3C, under optimized reaction conditions, this molecule can be convergently prepared from two synthetic intermediates.53 Such a cassette can then be incorporated into a peptide fragment via an optimized chemical process (Section 2.2).54,55 2.2. Chemical Synthesis of N- and O-linked Glycopeptides Another important ability required for the chemical synthesis of full-length glycoproteins is the efficient preparation of large glycopeptide fragments. Through many years of research, the synthesis of many glycopeptides had been accomplished, but most of those synthetic targets were relatively small glycopeptides.56,57 Not surprisingly, due to the additional complexities of long glycopeptides used in glycoprotein synthesis, including the presence of functional groups on peptides and sugars that interfere with the reaction chemistry and more severe conformational restrictions imposed at the reaction sites, it was not possible to directly apply previously developed approaches for efficient synthesis of the large glycopeptides shown in Figure 3A.55 Substantial optimization was thus

Page 4 of 17

carried out to improve the convenience and efficiency of the synthesis.40,58 As mentioned above, N-linked glycopeptides can be synthesized through a convergent approach involving the direct coupling of an N-glycan bearing an anomeric amine59 to the side chain of an aspartic acid in a long peptide (Fig. 3). This type of coupling chemistry, also known as the Lansbury aspartylation, leads to the formation of the natural amide linkage60 and minimizes the amount of precious N-glycan required. In order to make this reaction and the ensuing ligation reactions practical, many aspects of the peptide moiety were systematically optimized by Danishefky and co-workers, including the amino acid composition, the reactivity of the C-terminal thioesters, and the protecting group strategy for amino acid side chains.61,62 The optimization process provided a wealth of information, which can be seen in the final synthesis of glycosylated EPO.47 First, the peptide fragments chosen should not contain an unreasonable amount of hydrophobic amino acids, which lower the yield due to poor solubility.63 Therefore, none of the three long N-linked glycopeptide fragments used in the final synthesis, EPO(1-28), (29-59), and (60-97), contain large stretches of hydrophobic amino acids. Second, the side chains of the residues at the C-termini should not be sterically hindered and the thioesters should not be overly reactive. In the final synthesis of EPO, Gly, Gln, and Lys were chosen as the C-terminal residues of the N-glycopeptides and alkyl thioesters, which are less reactive than the more commonly employed aryl thioesters, were used. Third, fully protected peptides with pseudoproline dipeptide substructural motifs at the (n+2) positions of unprotected Asp residues should be used for the synthesis of N-glycopeptides of significant length (Fig. 4A). This lesson was derived from the many problems encountered during the synthesis of glycopeptides containing Nlinked saccharides.62,64 Psuedoproline dipeptides provided many benefits when attempting aspartylation of EPO peptides, including operational simplicity, high yields, and fewer side reactions including aspartimide formation. These lessons should be applicable in the synthesis of any sufficiently large and complex N-glycoprotein and are thus useful guiding principles to keep in mind for those in the field.

Figure 4. Attempts at the total chemical synthesis of EPO have forced several advancements broadly relevant to the synthesis of N- and O-linked glycopeptides. (A) The pseudoproline structure at the (n+2) position of unprotected Asp residue was found to be able to effectively suppress aspartimide formation in Lansbury aspartylation during the synthesis of N-linked glycopeptides. (B) Steric hindrance was found to effectively control the selectivity during synthesis of O-linked glycopeptides. OAct represents an

ACS Paragon Plus Environment

Page 5 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

activated ester species; R1 and R2, side chains; P1 and P2, protected peptides; Su, succinimide; 3o and 4o, tertiary and quaternary centers.

A cassette approach based on the use of fully protected glycoamino acids has been demonstrated to have broad applicability in the synthesis of O-linked glycopeptides.58 However, the commonly used deprotection conditions for O-linked glycan protecting groups were not compatible with the extensive functionality present in the large, complex O-glycopeptides used in the human EPO synthesis. To overcome this problem, Danishefsky and co-workers extended the approach to tolerate glycoamino acids with unmasked carbohydrate moieties (Fig. 4B).55,65 This was achieved by systematically optimizing the synthetic strategy to address the differences between protected and unprotected O-glycoamino acids. For example, carbohydrates with free hydroxyl groups are significantly more hydrophilic than those with protecting groups in place and contain more functional groups that might react under peptide synthesis conditions to form inseparable by-products. This challenge was successfully met by creatively exploiting differences in reaction rates caused by steric effects and the ability of peptide ligation strategies to tolerate functional groups.54,55 As shown in Figure 3, the long O-linked glycoeptide, EPO(98-166), was efficiently prepared using this modified “Mask-Off” approach. Briefly, the short O-glycopeptide EPO(125-127) was prepared using highly selective peptide coupling reactions (Fig. 4B), and the desired long O-linked glycopeptide was obtained by sequentially merging the short glycopeptide with two long peptides, EPO(128-166) and (98-124) through ligation strategies highlighted in the next section. 2.3. Chemically Assembling Large Synthetic Fragments into Glycoproteins. In addition to developing a range of broadly useful methods for the syntheses of oligosaccharides and long glycopeptides, the endeavors by Danishefsky et al. to devise a practical and efficient synthesis of human EPO also introduced to the field a generally applicable and powerful new method: native chemical ligation coupled to metal-free desulfurization (NCL/MFD).66-72 Traditional NCL is a chemoselective reaction that joins two peptides, one with a C-terminal thioester, one bearing an N-terminal Cys residue, through the formation of a native amide bond.73 However, the requirement of Cys residues for merging peptide fragments limits its practical application. Cys residues are rare or absent in proteins and when present, many are at sites not suitable for ligation. In the case of human EPO, none of the four Cys residues are located at desirable NCL sites (Fig. 2A).35 To address this challenge, Danishefsky and his co-workers conceived the NCL/MFD method. The basic concept of the NCL/MFD method itself is rather simple (Fig. 5). First, a thiol group is introduced into the side chain of the N-terminal amino acid of one fragment to catalyze the ligation process. Following ligation, this temporarily incorporated thiol functionality is removed to regenerate the natural structure of the N-terminal residue. However, in reality, many difficulties were found before they could realize this concept. The main reason was that desulfurization conditions known at the time were not compatible with Cys protecting groups and/or led to product degradation in many cases.74,75 Through extensive investigation, the Danishefsky group was able to identify reaction conditions compatible with the syn-

thesis of many long glycopeptides and glycoproteins.66 This was clearly demonstrated in the synthesis of glycosylated EPO (Fig. 3). The application of this method not only led to the successful synthesis of a long O-linked glycoprptide EPO(98166),65 but also made the final assembly of full-length EPO possible.47

Figure 5. Methods for ligating peptides/glycopeptides to proteins/glycoproteins were greatly expanded by Danishefsky and coworkers during their synthetic studies of human EPO. They developed an NCL/MFD method that allowed for ligation at amino acid residues beyond Cys.66-72 SAr, arylthio group. R1 and R2, side chains. P1 and P2, peptides. PhFl, 9-(9-phenylfluorenyl). Trt, trityl.

In summary, through the synthesis of EPO, many highly efficient and broadly applicable new methods and tricks were developed for the synthesis of glycoproteins. Moreover, from the development and optimization of the synthesis of human EPO, useful and general guidelines for glycoprotein synthesis emerged, which include, but not limited to: (a) the retrosynthetic disconnection of a glycoprotein should not lead to fragments that have long hydrophobic regions;63 (b) the designed overall synthetic strategy should be tested and optimized using fragments at least containing small glycans.65 Most importantly, the EPO synthesis demonstrated that the field of organic chemistry had matured to the point that complex biomolecules like glycoproteins are feasible targets for total synthesis. Together with the demonstrated synthesis of the large, complextype oligosaccharides, and the flexible, precise nature of chemical synthesis, the EPO synthesis effort not only provides detailed and general directions for how to carry out glycoprotein chemical synthesis,76 but also gives researchers confidence to attempt chemical synthesis of large glycoform libraries for greater insights into the complex structure–property relationship issues of glycoproteins (Fig. 1). 3. CASE STUDIES: PROTEIN N-GLYCOSYLATION All of the recent advances in glycoprotein synthesis, many of which were discussed in the previous section, have opened the door to investigations that use synthetic glycoform libraries as a tool to rigorous characterize the influence of protein N-glycosylation on proteins.38 With synthetic chemistry allowing scientists to precisely control subtle differences in glycan structure and amino acid sequence, it is now more possible than ever before to quantitatively interrogate the structureproperty relationships of N-glycosylated proteins,31,36 and in particular, to gain insights into the molecular basis of such relationships. This has enabled the pursuit of many new and

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

exciting opportunities to rationally engineering peptides and proteins using N-glycosylation. While many laboratories have used chemical synthesis to study the underpinnings and applications of protein Nglycosylation across several systems,39 here we will focus on the N-glycosylation of the enhanced aromatic sequon to illustrate the usefulness of chemical synthesis and show how the insights gained from such work can be applied in the context of protein glycoengineering.77

Figure 6. Primary sequence and NMR structure of the Pin WW domain. The amino acids that were mutated by Kelly and coworkers to GlcNAc-β-Asn are shown in red. Chemical glycobiology research on this domain provided further support for the hypothesis that specific protein-glycan contacts play an important role in determining the properties of glycoproteins.

3.1. Identification and Characterization of the Enhanced Aromatic Sequon Among extracellular membrane bound and secreted proteins in eukaryotic living systems, N-linked protein glycosylation is perhaps the most common post-translational modification. Nglycosylation is initiated co-translationally, within the ER, when the oligosaccharyltransferase (OST) enzyme complex performs the en bloc transfer of a triantennary precursor pentadecasaccharide to the asparagine of the consensus sequence N-X-S/T, where X is any amino acid but proline.78 During the protein’s passage through the ER and Golgi, this glycan is trimmed to a pentasaccharide core that is common to all Nglycans and then elongated to the final context-dependent structures found on the protein outside the cell.79 The Nglycans on most proteins that carry the modification are critical signals for the cell’s quality control machinery and interact with the calnexin and calreticulin lectins (CNX/CRT), as well as numerous chaperones in the secretory pathway to monitor and direct the proper folding of proteins.80 This is probably the most well-known function of protein N-glycosylation. However, recent work has shown that these glycans also have intrinsic effects on the folding of proteins entirely independent of the chaperones in the ER.81,82 It is this intrinsic ability of Nglycans to alter the physical properties of proteins that has the most potential for glycoengineering applications, and work by Kelly and co-workers has been particularly enlightening in this area.14 They have shown that the three carbohydrates closest to the protein, part of the universal core structure of all Nglycans, impart a significant acceleration to the folding rate of glycosylated proteins and enhance the thermal stability noticeably. These conclusions are based on in vitro studies, so the effects observed are divorced from any chaperone function that might be occurring in the ER during normal Nglycoprotein biosynthesis and must be intrinsic to the glycan structures themselves. Even the initial monosaccharide, β-N-

Page 6 of 17

acetylglucosamine (GlcNAc), is sufficient for a strong effect. Since these carbohydrate residues are so well conserved across N-glycans, it was proposed that if they were incorporated into N-glycan naïve proteins, the N-glycans would similarly accelerate the folding process of the resulting unnatural, engineered N-glycoproteins. To test this important hypothesis, Kelly and co-workers synthesized a collection of homogeneous glycoforms of the WW domain of human Pin 1 (Pin WW). Each member of the collection carried a monosaccaride, the same N-linked GlcNAc found as the initial residue in all naturally occurring N-glycans, at various positions.83 The Pin WW domain is 34 residues long and is composed of three antiparallel β-strands joined by two loops (Fig. 6). It is a domain found in cytosolic proteins and is not glycosylated under normal conditions. The isolated domain has a melting temperature of 57.5 °C, so it is fairly stable, and most positions in the sequence can tolerate amino acid mutations.84 These facts make it an attractive model molecule to use chemical synthesis to obtain information on the effects of N-glycosylation. After quantifying the rate of folding and thermodynamic stability for each Pin WW glycoform in the collection and systematically comparing the data, it was found that accelerated folding and increased stability were not independent of the protein sequence surrounding the introduced N-glycans.83 Instead, it appeared that certain carbohydrate-protein interactions are necessary for any beneficial effects of Nglycosylation. It is very plausible that N-glycan structures and protein sequences co-evolved to stabilize N-glycoproteins. Furthermore, random placement of N-glycans within protein sequences was often found to destabilize the protein, likely as a consequence of generic excluded volume effects from the bulky glycans being present within highly structured regions of the native protein. The conclusions from this study show a clear need to understand better why certain sites lead to stabilization even while most sites do not. Once this is understood, guidelines could be developed for determining the ideal placement of N-glycans in a protein sequence for stability and accelerated folding, which would then allow far more rational glycoengineering in the future. Progress towards such engineering guidelines was subsequently achieved by studying the adhesion domain of the human protein CD2, whose N-glycosylation was known to have a significant impact on its stability. Through mutagenesis and detailed biochemical analysis, they identified a five residue sequence, Phe-Yyy-Asn(glycan)-Xxx-Thr, where Yyy is any amino acid and Xxx is any amino acid but Pro, that acted as a stabilizing micro-domain when nested within a type I β-bulge turn. Because this sequence motif contains an aromatic residue (Phe) in addition to the standard sequon for N-glycosylation, it was conceptualized as an “enhanced aromatic sequon”.85 Once identified, the motif was investigated for use in general glycoengineering with the hope that it could serve as a stabilizing cassette that could be plugged into type I β-bulge turns in a wide variety of different proteins. Acylphosphatase, a cytoplasmic human protein from muscle cells that is not naturally N-glycosylated, was engineered to carry the sequon and displayed noticeable improvements in stability.85 The idea was further verified using synthetic glycoforms of WW domain into which two similar structural motifs, Phe-Yyy-ZzzAsn(glycan)-Xxx-Thr within a type II β-turn and PheAsn(glycan)-Xxx-Thr within a simpler type I′ β-turn, had been

ACS Paragon Plus Environment

Page 7 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

engineered. It was found that these modified enhanced aromatic sequons can also serve as stabilizing motifs in a protein.85,86 More importantly, the ready availability of WW domain by chemical synthesis allowed the exploration of the origin of the stabilizing effect involving the enhanced aromatic sequons.77 Using chemical synthesis, Kelly and co-workers were able to site-specifically incorporate uniformly 13C- and 15N-labeled amino acids to loop 1 of Pin WW domain.87 The isotopically labeled residues made it possible to gather high resolution nuclear magnetic resonance (NMR) structures of engineered variants carrying five- or six-residue enhanced aromatic sequons glycosylated with a single GlcNAc at Asn19 (Fig 7). These NMR results largely supported previous conclusions and showed that the backbone conformations of nonglycosylated and glycosylated variants were very similar. Significant differences between the two sets of structures were mostly observed in the vicinity of the glycosylation site. In glycosylated Pin WW, both Phe16 and Asn19 side chains rotate so as to bring the protons of the GlcNAc residue’s α-face (H1, H3, and H5) very close to the aromatic face of the Phe16 side chain (Fig. 7). The proximity of the glycan’s axial hydrogens to the aromatic ring of Phe16 suggests that CH−π interactions may explain the stabilizing effect conferred by glycosylation of enhanced aromatic sequon motifs.

glycosylation’s ability to stabilize proteins in the context of the enhanced aromatic sequon. They found that although the hydrophobic effect was significantly stabilizing, it never accounted for more than 44% of the overall energy, while CH−π interactions could be as high as 72%. Surprisingly, they also observed little variation in the strength of carbohydrate−aromatic interactions between variants carrying aromatic rings of very different electronic structure. This suggests that the strength of the CH−π interactions in the enhanced aromatic sequon is a consequence of general dispersion forces between temporary diploes on the carbohydrate and aromatic ring and not electrostatic forces.

Figure 8. Structures of Pin WW variants used to study the Asn glycosylation. The strength of carbohydrate−aromatic interactions in this domain were found to be affected by many changes in the stereochemistry and identity of the substituents on the Asn side chain.

Figure 7. Stick representations of the N-GlcNAcylated Phe-AlaAsn-Gly-Thr and Phe-Arg-Ser-Asn-Gly-Thr enhanced aromatic sequons in loop I of Pin WW variants. N-glycosylation of such enhanced aromatic sequons increases the stability of Pin WW. The stabilizing effect is mainly due to specific CH−π interactions between the aromatic side chain and the N-GlcNAc residue.

Expanding on the results of that study, and in order to quantify the stabilizing contributions of different molecular forces, including CH−π interactions, Kelly and co-workers synthesized more than 40 pairs of glycosylated and unglycosylated Pin WW variants with engineered enhanced aromatic sequons and mutations to the critical Phe16 residue.87 These mutations included natural amino acids, unnatural amino acids with very hydrophobic alkane and cycloalkane side chains, and unnatural analogs of Phe with various aromatic ring substitutions. They then measured the thermal unfolding temperature of each variant by variable temperature circular dichroism (CD) and systematically analyzed the results to calculate stabilization energies due to glycosylation. This allowed them to investigate a wide range of fundamental forces that might explain

Kelly and co-workers have also examined these interactions from another direction. For this, they took their Pin WW model with a five-residue enhanced aromatic sequon located in a type 1 β-bulge reverse turn and synthesized more than 10 variants bearing a selection of carbohydrates or hydrophobic cycloalkanes at the Asn position (Fig. 8).88 Comparing glycoforms with several different monomeric glycans revealed that changes in the stereochemical arrangement of carbohydrate hydroxyl groups have a modest but measurable effect on the strength of the interaction between the carbohydrate and the aromatic ring of Phe16. Analyzing the NMR structures of several representative Pin WW glycovariants and examining the geometry of the interacting groups helps explain the range of interactions between Phe16 and different monosaccharides. Peracetylation of the carbohydrate hydroxyl groups was found to increase the strength of the CH−π interaction, independent of the stereochemistry of the carbohydrate ring. Previous work with peracetylated carbohydrates in other contexts had also come to this conclusion, which suggests that this may be a generally useful principle.89,90 Surprisingly, similar energies were measured for a variant carrying cylcohexane and those carrying monosaccharides. These data seem to point to similar levels of hydrophobicity for the axial protons of a carbohydrate’s α-face and those of an alkane. Extending this idea, the study supports the notion that burial of hydrophobic surface

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

area contributes a large amount of stabilizing energy to the interaction of carbohydrates and aromatic amino acids. Overall, Kelly and co-workers were able to leverage recent advances in chemical glycobiology to identify and quantitatively characterize the N-glycosylation of enhanced aromatic sequons in several systems. Such studies provided a deep understanding of the effects of protein N-glycosylation and the molecular basis of these effects, which improves our ability to rationally design, optimize, and apply N-glycosylation to boost the performance of industrial enzymes and therapeutic proteins. This idea is exemplified in the following section. 3.2. Application of the Enhanced Aromatic Sequon in Protein Stabilization Monoclonal antibodies (mAbs) currently comprise a fast growing and increasing important sector of the pharmaceutical market.91 In large part, this attention and investment has been motivated by mAbs’ ability to bind target proteins with high selectivity and effectiveness. This class of therapeutics is not without downsides, however, and aggregation of the molecules is a common problem that can result in loss of efficacy and even negative immunological consequences for patients.92 The CH2 domain of immunoglobulin G (IgG) type mAbs, the type that includes most therapeutic antibodies, is usually the least stable portion of the molecule (Fig. 9).93 It is thought that hydrophobic surfaces within this domain, which are normally shielded on the interior of the folded structure, become transiently exposed to the environment as a consequence of normal, stochastic conformational dynamics. Exposed hydrophobic surfaces are known to lead to protein aggregation in many contexts, and since a more stable CH2 domain will better shield its hydrophobic residues, stabilizing the domain was expected to reduce aggregation.94

Figure 9. Enhanced aromatic sequons can be used to alter large proteins. Engineering the sequence of the CH2 domain of an IgG to include an enhanced aromatic sequon at a natural glycosylation site resulted in a significant increase in thermal stability.

By replacing the N-glycosylated C′E loop with a fiveresidue enhanced aromatic sequon, Phe295-Ala-Asn(glycan)Ser-Thr, Kelly and co-workers were able to strengthen the interaction between the N-glycan and the CH2 domain (Fig. 9).95 Examining the crystal structures of the newly engineered antibody showed that mutating the wild-type C′E loop to contain the enhanced aromatic sequon encouraged the region to adopt a more rigid type I β-turn with a G1 β-bulge, which they had previously shown to be an ideal context for the enhanced

Page 8 of 17

aromatic sequon.86 This engineered loop conformation brought the side chain of Phe295 and the N-glycan together and introduced new protein-glycan interactions. Specifically, the initial GlcNAc monosaccharide and the core fucose attached to that GlcNAc residue were observed to interact with the aromatic Phe residue. These interactions were stabilizing not only within the isolated CH2 domain, but also at the level of the entire antibody molecule, as judged by the results of melting temperature measurements done with differential scanning calorimetry (DSC). As hypothesized, the more stable, engineered antibodies were found to be much more resistant to heat- and acidinduced aggregation compared to the wild-type. Since previous studies had shown that mutations to the C′E loop can change the binding affinity of IgG antibodies towards its receptors,96 the authors also measured binding constants for their engineered antibodies towards several different receptors. They found no change in affinity towards FcRn or FcγRI, but reduced binding of FcγRIIa, FcγRIIIa, and FcγRIIIb. In certain therapeutic contexts, discouraging antibody interaction with FcγRs is desirable97,98 and this study reveals a novel approach to control antibody-receptor binding while also stabilizing the molecules. 4. CASE STUDIES: PROTEIN O-GLYCOSYLATION Protein O-glycosylation, where the carbohydrate is covalently linked to the protein via the side chain oxygen of a serine or threonine, is the other common type of glycosylation alongside N-linked.99 Like N-glycosylation, it is extremely common on extracellular proteins, both membrane bound and secreted. By some estimates, it occurs on as many as half of all proteins that pass through the mammalian secretory pathway.6,100 Unlike N-glycosylation, it is also found inside of cells on cytosolic proteins in the form of monomeric β-Nacetylglucosamine residues.101 Most chemical glycobiology studies to understand the influence of O-glycosylation have focused on one particular class of O-glycosylated proteins: mucins.102,103 These proteins are the major component of the glycocalyx, which is a thick layer of carbohydrate, protein, and lipid structures that surrounds and protects almost all cells.104 Mucins themselves are largely unstructured and consist of many repeated glycine-, proline-, and serine-rich sequences. In vivo, these sequences are densely O-glycosylated and adopt an extended structure that is less rigid than α-helices or β-sheets but much more ordered than random coils or intrinsically disordered protein domains.105 While individually small, the densely packed O-glycans render the mucin glycoproteins resistant to protease cleavage, lead to unique bulk properties, and impart the extended structure of the molecules.106 These studies certainly show that Oglycosylation can have a dramatic effect on the structure and function of a protein, but are hard to generalize past the specific example of the mucin class. In some senses, O-glycosylation is a much more diverse modification than N-glycosylation and the term covers a wide variety of chemically distinct glycan-protein linkages.5,6 Unlike N-glycosylation, which has a pentasaccharide core common to all N-glycans, O-glycosylation can be initiated by at least seven different monosaccharides including Nacetylgalactosamine (GalNAc), N-acetylglucosamine (GlcNAc), mannose (Man), galactose (Gal), fucose (Fuc), glucose (Glc), and xylose (Xyl). These glycans are also found bound to both serine and threonine, further diversifying the possible

ACS Paragon Plus Environment

Page 9 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

structures. Many research groups are studying different types O-glycosylation using chemical synthesis.107-110 In this section, we will mainly describe our recent efforts to use a library based strategy to study the effects of the most common types of O-glycosylation: O-mannosylation and O-GalNAcylation. 4.1. Gaining New Insights into the Effects of O-mannosylation and O-GalNAcylation From studying mucins, it was known for some time that Oglycosylation of proteins can appreciably affect a protein.111,112 However, the same type of knowledge regarding protein Omannosylation has not been explicitly documented.113 Computational studies of an O-mannosylated cellulase from the fungus Trichoderma reesei, TrCel7A, showed that the O-linked mannoses bound directly to cellulose during catalytic action.114 This suggested a previously unknown role of O-mannosylation in substrate recognition and enzyme action, and pointed towards as-yet unknown roles for protein O-glycosylation in living systems. We were inspired by these findings and decided to investigate the details of how the glycosylation pattern and glycan structures of TrCel7A influenced the enzyme’s properties. TrCel7A is composed of two domains: a catalytic domain that is N-glycosylated and a carbohydrate binding module (CBM) that is O-mannosylated.115,116 We were most interested in uncovering roles for O-glycosylation, so our efforts focused on the CBM. Since the CBM is only 36 residues in length, construction of a sizable library of glycoforms carrying a variety of differently sized O-mannose glycans and glycosylation patterns was relatively swift and convenient (Fig. 10). However, the CBM still has important features of larger glycoproteins, including a well-defined secondary structure stabilized by multiple disulfide bonds and three distinct glycosylation sites (Fig. 10).117

tide towards cellulose, its natural substrate.118 Data from this study revealed that glycosylation of the Ser3 site had a significantly stronger effect on the thermal stability and resistance to protease digestion than modification at either of the other sites. Such a strong site-specific effect was not observed for binding affinity, which increased after monosaccharide glycosylation at any site. Increasing the glycan size to dimer and trimer glycans further stabilized the CBM, but decreased binding affinity. With the library approach being solidified and intriguing results from initial work, more than 30 additional CBM isoforms were synthesized to investigate the details of glycosylation at the Ser3 site. These new variants had systematic variations in amino acid sequence, glycopeptide linkage, glycan structure, and anomeric configuration, which made it possible to determine the importance of each of these structural and sequence elements in mediating the effects of Ser3 glycosylation (Fig. 11).119 Amino acid mutations near the glycosylation site revealed that the stabilizing influence of O-mannosylation was dependent on neighboring Gln2 and Tyr5 residues. Mutating the glycosylated amino acid from serine to DSer (D-serine), Cys, or hSer (homoserine) also removed any stabilizing effect glycosylation had on the domain. Finally, by investigating monosaccharides beyond the α-O-mannose used in the previous study, it was found that α-linked glycans tend to stabilize the domain more than analogous β-linked glycans and αlinked mannose had, by far, the strongest effect of any of the monosaccharides tested. These results hint to a conclusion along the lines of the enhanced aromatic sequon, that is the glycan structure and local amino acid sequence likely coevolved to reach a particularly useful (i.e. stabilizing) structure.77 This might explain why the most stabilizing glycan identified for the CBM was the naturally occurring α-Omannose.

Figure 10. CBM and the structures of three O-mannosylated building blocks that were used to incorporate specific O-glycans (sites highlighted in red). Glycans with different sizes or at different sites were found to confer different effects on CBM proteolytic stability, thermal stability, and cellulose binding affinity. R1 represents a hydrogen atom in the case of serine and a methyl group in the case of threonine.

We first demonstrated that synthesizing and characterizing a glycoform library is a feasible approach to quantify the effects of protein O-glycosylation. 20 CBM variants with different state of O-mannosylation were chemical prepared. The members of this glycoform library were then characterized and the resulting data was compared to reveal how glycan size and site occupancy might affect the thermal stability, resistance to proteolytic degradation, and binding affinity of the glycopep-

Figure 11. Structures of synthetic CBM variants used to study the the molecular basis of the unique effects of O-glycosylation at Ser3. The results of this study revealed the collective importance of many structural features, including Ser side-chain, mannosyl

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

residue, and Gln2 and Tyr5, in controlling the most pronounced effects of O-glycosylation at this glycosylation site.

NMR structures of a small collection of synthetic CBM glyco-variants allowed for the investigation of the molecular basis underlying the increase in proteolytic and thermal stability upon O-mannosylation. We gathered high resolution NMR structures of six different CBM glyco-variants prepared through chemical synthesis. By systematically comparing the unique structural features of each molecule, we were able to identify the key elements that may account for the observed stabilizing effects of O-mannoses at Ser3 site. Our results suggest that the direct interaction between the mannose residue and the side chain of Gln2 increases the rigidity of the glycosylated CBM domain, which could explain both the increased thermal stability and resistance to protease degradation (Fig. 12).120

Figure 12. Contacts made between the carbohydrate residue and peptide in CBM that is O-mannosylated at Ser3. Experimental results suggest that, to some extent, O-mannosylation stabilizes the structure by forming interactions with key local residues.

It is reasonable to assume that many fundamental properties are affected by the change in backbone rigidity observed upon glycosylation, and the availability of synthetic glycoforms makes it possible to rigorously study such questions. For example, taking advantage of the CBM’s disulfide bond linked structure, it was possible to monitor the folding kinetics of many glycosylated variants of the molecule.15 By comparing these folding rates to one another, it was found that glycosylation at either of the two N-terminal glycosylation sites, Thr1 and Ser3, increased the rate of folding and disulfide bond formation dramatically. As with the earlier studies that quantified stability, only α-O-mannose was found to have this effect on folding and both Gln2 and Tyr5 were necessary for the rate acceleration. This evidence points to the same interactions between glycan and local amino acids, which have been visualized by NMR, are responsible for the increased thermal stability, protease resistance, and increased folding rate.15,118,119 These conclusions also match well with those of Kelly with respect to the enhanced aromatic sequon and N-glycosylation and show that both the glycan moiety and the local amino acids cooperate as a sort of structural unit to affect the molecule’s properties.121 While these studies using CBM as a model molecule have revealed much about the nature of O-glycosylation, they focus on O-mannosylation, which although present in mammalian systems,122 is predominantly a fungal modification.123 Additionally, since it is likely that protein sequence and glycan structure co-evolve, studies of mammalian proteins that bear GalNAc, the most common form of O-glycosylation in mammals, would be more relevant to human biology.124 Cytokines and chemokines are one such class of molecules that are critical to mammalian biology and naturally glycosylated with OGalNAc type glycans.7,125 These proteins have relatively small sizes and defined secondary and tertiary structures, making

Page 10 of 17

them useful molecules for studying the roles of O-GalNAc using chemical synthesis. RANTES (Regulated on Activation, Normal T-Cell Expressed and Secreted, also known as C-C chemokine ligand 5 or CCL5) is one such molecule and can serve as a great model molecule.7,126 It is naturally O-GalNAcylated at two sites near the N-terminus, Ser4 and 5, and is conveniently sized for chemical synthesis at only 68 residues (Fig. 13).127,128 RANTES has drawn much interest over the past several decades for its ability to bind C-C chemokine receptor 5 (CCR5), which is a necessary co-receptor for Human Immunodeficiency Virus (HIV) cell entry, and has been the subject of peptide engineering efforts by many groups.129

Figure 13. RANTES and the structures of two O-GalNAcylated residues that are incorporated at the Ser4 and Ser5 sites. Comparing the properties of the synthetic variants containing these two types of O-glycans revealed that O-linked glycosylation of RANTES can simultaneously decrease multiple physical properties that are closely associated with its undesired proinflammatory side-effects.

The N-terminal sequence of RANTES is directly involved in regulating many of its functional properties, including those related to HIV entry inhibition. It is highly possible that OGalNAcylation in this same region might affect the behavior of the chemokine. This hypothesis was tested, much like the previously discussed cases, using a library-based method. Synthesizing a collection of glycosylated RANTES molecules and quantifying their properties and biological activities was expected to reveal important structure-function relationships. It was also anticipated that the direct comparison of the results to previous peptide engineering efforts on RANTES would provide a preliminary assessment of the advantages and disadvantages of glycosylation in improving the therapeutic properties of RANTES and similar molecules.129 RANTES’ natural function is to signal to circulating leukocytes and direct their migration. This is accomplished in part by binding glycosaminoglycans (GAGs) in tissues and forming a concentration gradient of chemokine that the leukocytes can travel along. This natural function is closely associated with the inflammatory side effects of RANTES-based HIV inhibitors, which limit their clinical use in disease treatment. By characterizing a collection of RANTES glycoforms for their ability to promote leukocyte migration in a Boyden chamber assay, O-GalNAcylation was shown to decrease the chemotactic activity of RANTES and larger glycans showed larger decreases.130 Binding to GAGs is also a critical component of RANTES biology. By comparing the ability of different RANTES glycoforms to bind heparin, a common GAG, it

ACS Paragon Plus Environment

Page 11 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

was also found that glycosylation decreases the chemokine’s GAG binding affinity and that increasing glycan size further decreased binding. Importantly, the glycosylation at one of the sites, Ser4, did not drastically reduce the ability of RANTES to inhibit HIV infection by binding CCR5. Together, the findings from this study indicate that O-GalNAcylation of the flexible and functionally important N-terminus of RANTES can reduce several properties associated with inflammation in vivo. Overall, the systematic chemical glycobiology research on the O-glycosylation of the CBM and RANTES illustrates that the effects of O-glycosylation are strongly influenced by even minor changes to the carbohydrate structure, local amino acid sequence, or environment. All these elements must be taken into account during glycoengineering for best results. Furthermore, the data points to α-linked glycans, especially mannose residues, as having the largest effects. These observations, together with the observations by Kelly and co-workers, led to a hypothesis that is important for simplifying the protein glycoengineering process: glycoforms with better overall properties can be generated by collaboratively varying glycan structures and adjacent amino acids within unstructured regions that are important for biological function and/or susceptible to proteolytic cleavage and other undesired degradation reactions.

Figure 14. O-mannosylation of insulin B-chain Thr27 reduces the peptide’s susceptibility to proteases and self-association. t1/2, halflife to α-chymotrypsin degradation. Monomer%, the amount of monomer in each insulin sample.

For efficient absorption from the gastrointestinal tract, orally available insulin analogs must have high proteolytic stability and low aggregation propensity. Glycosylation, which does not naturally occur on human insulin, was explored as a means to improve the properties associated with its oral bioavailability. As with the other studies highlighted here, design and chemical synthesis of a library of insulin glycoforms bearing systematically varied glycosylation was crucial. Only by comparing a sufficient variety of glycoforms would it be possible to reveal the extent to which unnatural glycosylation could affect insulin’s therapeutically relevant physical and biological properties or to evaluate the feasibility of previously acquired O-glycoengineering guidelines. Based on what was learned from studying the Oglycosylation of CBM and RANTES, a library of insulin glycoforms containing α-linked GalNAc or mannose at several sites were designed and prepared. By characterizing these molecules, and comparing the results to the unglycosylated control, it was found that only α-O-mannosylation at one of the five sites tested, ThrB27, has the potential to simultaneously improve multiple properties that are important for its optimal oral bioavailability (Fig. 14).131 Interestingly, the ThrB27 site is located in the conformationally flexible, degradationprone, and functionally critical C-terminal region of human insulin. This result agrees well with earlier studies of Oglycosylation, suggesting that principles derived from that work may represent useful guidelines for O-glycoengineering of many peptides and proteins, even those that are not naturally modified with carbohydrates. Moreover, it was observed that although only a modest decrease in the rate of αchymotrypsin-mediated proteolytic degradation was observed when monosaccharide glycans were introduced at the ThrB27 site, when the glycan was elongated to a mannose trimer, the half-life doubled (Fig. 14). This mirrors the results of the studies of CBM glycosylation discussed earlier. While these data are not yet comprehensive, they do indicate that the stabilizing effects of O-linked glycans might be predominantly explained by the short-range steric hindrance imparted by the glycan residues. Taken together, these glycoengineering studies suggest that it may be possible to improve the oral availability of human insulin further by collaboratively varying the glycan and local amino acid structures, within the C-teminal region of human insulin’s B-chain, in order to encourage more contact between the glycans and adjacent amino acid side chains.118

4.2. Harnessing the Power of O-linked Glycans to Improve the Properties of Proteins A test of these proposed O-glycoengineering guidelines and an interesting application of these results can be seen in a recent study of human insulin.131 Since it is required by many diabetics at some point in the course of treatment, and over 400 million people are affected by the condition, human insulin attracts a lot of attention from academic and industry research programs as a model system to test new peptide engineering strategies. Many of these strategies aim to adjust the physical properties of the molecule to address different limitations of this therapeutic peptide. Currently, many analogs of insulin are used therapeutically that have been engineered to act quickly or are formulated for slow, controlled release.132 While these analogs have made the treatment of diabetes for more effective, the development of more convenient and safer orally available insulin analogs has not yet been achieved.133

5. CONCLUDING REMARKS AND OUTLOOK In summary, more than 10 years of systematic method optimization has made the total synthesis of dauntingly complex homogeneous glycoforms possible, as exquisitely illustrated by the synthesis of glycosylated human EPO.47 These synthetic studies, together with studies from others, serve as constructive examples for the ability of chemical synthesis to address previously intractable problems in glycoscience.39 Two representative case studies, one N-glycosylation-related and one Oglycosylation-related, exemplify strategies that use chemical synthesis to acquire much greater insights into the fundamental structure–property relationships of glycoproteins. These case studies have revealed that certain glycan structures have some highly specific effects on protein properties, in addition to a few more general consequences seen upon glycosylation. Glycans appear to be capable of having both positive and negative effects, and the balance is determined by a combination

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of factors including glycosylation site, peptide sequence, glycan size, and structure. Glycosylation sites that are adjacent to unstructured regions, which are important for biological function and/or susceptible to protein degradation and oligomerization, have higher impact on the physical and biological properties of glycoproteins. Most of the effects of glycosylation can be readily seen in glycoforms containing small glycans, although the combination of glycans at multiple glycosylation sites and elongation of sugar chains can further change the effects. The changes caused by further addition of sugar residues can be either positive or negative. The structure of a glycan affects its interaction with the protein it is directly attached to and with the molecules it binds to and thus is important for the effects. In light of the quantitative results and observations discussed here, we propose an important hypothesis with practical applications for simplifying the current protein and peptide glycoengineering process: glycoforms with better overall properties can be generated by collaboratively varying glycan structures and adjacent amino acids within conformationally flexible regions that are important for biological function and/or susceptible to degradation and self-association. Previously taken approaches to glycoengineer both IgG and insulin serve as strong evidence for this hypothesis. Independent of this hypothesis, the broad selection of case studies described here establish the feasibility, effectiveness and advantages of chemical synthesis as a tool in glycoprotein research. It is our sincere hope that highlighting these advances and the many fruitful studies being carried out in the field will attract many new researchers to work in the glycosciences. Although chemical glycobiology is very useful for obtaining rigorous quantitative results, it is not without limitations. For example, there remains the relative difficulty of synthesizing a large collection of glycoforms. Because of the high complexity of such a synthesis task, the chemical synthesis approach can now only be practiced by a small number of dedicated laboratories. It is not yet routine in broader glycoscience field. Moving forward, it is thus necessary to further simplify the synthesis process of glycoproteins to minimize the reluctance of scientists to use this method to prepare samples for analysis. The optimization should include, but is not limited to: 1) Substantial improvement of the availability of peptides and glycopeptide fragments that can be readily used for ligation. This can be achieved by optimizing the coupling conditions for glycans, amino acids, and glycoamino acids, commercializing more building blocks, and more useful cleavable tags for improving the handling properties of different fragments. 2) Substantial improvement of the efficiency of fragment ligation. This can be achieved by developing and commercializing more ligation precursors and affinity isolation tags. 3) Substantial improvement of the purification of glycoproteins. This can be achieved by optimizing current purification processes or by developing new separation techniques. In addition to being useful for quantitative understanding of the effects of glycosylation on protein properties and the molecular determinants of these effects, the underlying principle in the chemical glycobiology approach can be applied to investigate other important biological questions, such as quantifying the compositions of natural glycoprotein mixtures and correlations between glycoprotein compositions and their functions. Glycoproteomic technologies have the potential to quantify alterations in glycoprotein composition.134 However,

Page 12 of 17

these technologies require synthetic peptide and glycopeptide and glycoprotein standards for optimizing assay design, for adjusting instrument parameters to achieve optimal sensitivity and resolution as well as for generating calibration curves for absolute quantification. With the ability to synthesize a wide variety of glycopeptides and glycoproteins, it would be feasible to undertake mass spectrometry-based quantitative glycoproteomic studies to determine the compositions of glycoprotein mixtures. After compositional analysis, functional assays for the glycoform mixtures can be performed using artificial samples, which can be prepared by mixing synthetically produced and purified glycoforms according to the measured quantitative ratios to closely mimic the truly natural mixtures. Since it is generally difficult to isolate a glycoprotein mixture from a relevant biological source that is not contaminated by other proteins and is of sufficient size for both analytical and functional studies, artificial mixtures made from known components are a convenient alternative that side-step such contamination and scarcity issues. After analysis, directly correlating the compositions of different mixtures to their corresponding biological activities will allow for the elucidation of structure-composition relationships of glycoform mixtures. At the same time, directly comparing biological functions of mixtures with those of homogeneous glycoforms will yield definitive insights regarding the advantages and disadvantages of using heterogeneous glycoform mixtures in biological systems. Overall, such studies can provide a strong basis for understanding why glycoprotein mixtures are commonly used and why aberrant glycosylation are associated with diseases.135 With a continuous stream of advancements in our ability to chemically synthesize glycopeptides and glycoproteins, chemical glycobiology research is expected to become only more popular in the future. Our understanding of protein glycosylation will no doubt benefit such an expansion. Of course, in reality, it is very challenging to pinpoint the role of every glycan of each glycoprotein and the function of every glycoform mixture. Therefore, supplemental approaches, like computational simulation and modeling, should be considered to reduce the time and efforts required for chemical glycobiology studies. Similar to the prediction of protein properties, the development of computational approaches for predicting glycoprotein properties can be aided by chemical glycobiology experiments. Chemical glycobiology studies can provide a large set of experimentally derived structure and property data and these data can be used to construct computational models for the effects of glycosylation.114 On the other hand, chemical glycobiology approaches have the capability to experimentally test and validate the results of simulations and provide further data to address the limitations of the computational approaches.

AUTHOR INFORMATION Corresponding Author * [email protected] ORCID Zhongping Tan: 0000-0002-9302-150X

Notes The authors declare no competing financial interest.

Author Contributions

ACS Paragon Plus Environment

Page 13 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

All authors have given approval to this version of the manuscript.

ACKNOWLEDGMENT We would like to thank the University of Colorado Boulder (Start-up fund) and the NSF CAREER Award (Grant number: CHE-1454925) for their support.

REFERENCES (1) Jevons, F. R. (1958) Linkage of carbohydrate to protein in ovalbumin, Nature 181, 1346-1347. (2) Varki, A. (2017) Biological roles of glycans, Glycobiology 27, 3-49. (3) National Research Council. (2012) Transforming Glycoscience: A Roadmap for the Future. Washington, DC: The National Academies Press. (4) Higel, F., Demelbauer, U., Seidl, A., Friess, W., and Sorgel, F. (2013) Reversed-phase liquid-chromatographic mass spectrometric N-glycan analysis of biopharmaceuticals, Anal. Bioanal. Chem. 405, 2481-2493. (5) Spiro, R. G. (2002) Protein glycosylation: nature, distribution, enzymatic formation, and disease implications of glycopeptide bonds, Glycobiology 12, 43R-56R. (6) Chaffey, P. K., Chi, L., and Tan, Z. (2017) Chemical Biology of Protein O-Glycosylation, In Chemical Biology of Glycoproteins, pp 48-93, The Royal Society of Chemistry. (7) Opdenakker, G., Rudd, P. M., Wormald, M., Dwek, R. A., and Van Damme, J. (1995) Cells regulate the activities of cytokines by glycosylation, FASEB J. 9, 453-457. (8) Van den Steen, P., Rudd, P. M., Dwek, R. A., Van Damme, J., and Opdenakker, G. (1998) Cytokine and protease glycosylation as a regulatory mechanism in inflammation and autoimmunity, Adv. Exp. Med. Biol. 435, 133-143. (9) Dwek, R. A. (1995) Glycobiology: more functions for oligosaccharides, Science 269, 1234-1235. (10) Dwek, R. A. (1996) Glycobiology: Toward understanding the function of sugars, Chem. Rev. 96, 683-720. (11) Rudd, P. M., and Dwek, R. A. (1997) Glycosylation: heterogeneity and the 3D structure of proteins, Crit. Rev. Biochem. Mol. Biol. 32, 1-100. (12) Varki, A. (1993) Biological roles of oligosaccharides: all of the theories are correct, Glycobiology 3, 97-130. (13) Xu, C., and Ng, D. T. (2015) Glycosylation-directed quality control of protein folding, Nat. Rev. Mol. Cell Biol. 16, 742-752. (14) Hanson, S. R., Culyba, E. K., Hsu, T. L., Wong, C. H., Kelly, J. W., and Powers, E. T. (2009) The core trisaccharide of an N-linked glycoprotein intrinsically accelerates folding and enhances stability, Proc. Natl. Acad. Sci. U S A 106, 3131-3136. (15) Chaffey, P. K., Guan, X., Wang, X., Ruan, Y., Li, Y., Miller, S. G., Tran, A. H., Koelsch, T. N., Pass, L. F., and Tan, Z. (2017) Quantitative Effects of O-linked Glycans on Protein Folding, Biochemistry 56, 4539-4548. (16) Marth, J. D., and Grewal, P. K. (2008) Mammalian glycosylation in immunity, Nat. Rev. Immunol. 8, 874-887. (17) Ohtsubo, K., and Marth, J. D. (2006) Glycosylation in cellular mechanisms of health and disease, Cell 126, 855-867. (18) Sola, R. J., and Griebenow, K. (2009) Effects of glycosylation on the stability of protein pharmaceuticals, J. Pharm. Sci. 98, 1223-1245. (19) Dicker, M., and Strasser, R. (2015) Using glycoengineering to produce therapeutic proteins, Expert Opin. Biol. Ther. 15, 1501-1516. (20) Greene, E. R., Himmel, M. E., Beckham, G. T., and Tan, Z. (2015) Glycosylation of cellulases: Engineering better enzymes for biofuels, Adv. Carbohydr. Chem. Biochem. 72, 63-112.

(21) Baldo, B. A. (2014) Drugs that Act on the Immune System: Cytokines and Monoclonal Antibodies, In Side Effects of Drugs Annual (Ray, S. D., Ed.), pp 561-590, Elsevier. (22) Elliott, S., Lorenzini, T., Asher, S., Aoki, K., Brankow, D., Buck, L., Busse, L., Chang, D., Fuller, J., Grant, J., Hernday, N., Hokum, M., Hu, S., Knudten, A., Levin, N., Komorowski, R., Martin, F., Navarro, R., Osslund, T., Rogers, G., Rogers, N., Trail, G., and Egrie, J. (2003) Enhancement of therapeutic protein in vivo activities through glycoengineering, Nat. Biotechnol. 21, 414-421. (23) Sinclair, A. M., and Elliott, S. (2005) Glycoengineering: the effect of glycosylation on the properties of therapeutic proteins, J. Pharm. Sci. 94, 1626-1635. (24) Sinclair, A. M. (2013) Erythropoiesis stimulating agents: approaches to modulate activity, Biologics 7, 161-174. (25) Hatton, M. W. C., März, L., and Regoeczi, E. On the significance of heterogeneity of plasma glycoproteins possessing N-glycans of the complex type: a perspective, Trends Biochem. Sci. 8, 287-291. (26) Moremen, K. W., Tiemeyer, M., and Nairn, A. V. (2012) Vertebrate protein glycosylation: diversity, synthesis and function, Nat. Rev. Mol. Cell Biol. 13, 448-462. (27) Bieberich, E. (2014) Synthesis, processing, and function of N-glycans in N-glycoproteins, Adv. Neurobiol. 9, 47-70. (28) Berger, M., Kaup, M., and Blanchard, V. (2012) Protein glycosylation and its impact on biotechnology, Adv. Biochem. Eng. Biotechnol. 127, 165-185. (29) Lasne, F., and de Ceaurriz, J. (2000) Recombinant erythropoietin in urine, Nature 405, 635. (30) Grogan, M. J., Pratt, M. R., Marcaurelle, L. A., and Bertozzi, C. R. (2002) Homogeneous glycopeptides and glycoproteins for biological investigation, Annu. Rev. Biochem. 71, 593-634. (31) Wang, L. X., and Amin, M. N. (2014) Chemical and chemoenzymatic synthesis of glycoproteins for deciphering functions, Chem. Biol. 21, 51-66. (32) Rich, J. R., and Withers, S. G. (2009) Emerging methods for the production of homogeneous human glycoproteins, Nat. Chem. Biol. 5, 206-215. (33) Koeller, K. M., and Wong, C. H. (2000) Emerging themes in medicinal glycoscience, Nat. Biotechnol. 18, 835-841. (34) Caruthers, M. H. (2013) The chemical synthesis of DNA/RNA: our gift to science, J. Biol. Chem. 288, 1420-1427. (35) Nilsson, B. L., Soellner, M. B., and Raines, R. T. (2005) Chemical synthesis of proteins, Annu. Rev. Biophys. Biomol. Struct. 34, 91-118. (36) Anthony, R. M., Wermeling, F., and Ravetch, J. V. (2012) Novel roles for the IgG Fc glycan, Ann. N Y Acad. Sci. 1253, 170180. (37) Varki, A., Cummings, R. D., Esko, J. D., Stanley, P., Hart, G. W., Aebi, M., Darvill, A. G., Kinoshita, T., Packer, N. H., Prestegard, J. H., Schnaar, R. L., and Seeberger, P. H. (2015) Essentials of Glycobiology, 3rd ed. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press. (38) Tan, Z., and Wang, L. X. (2017) Chemical Biology of Glycoproteins. Royal Society of Chemistry Press. (39) Li, H., Dao, Y., and Dong, S. (2017) Chemical Synthesis and Engineering of N-Linked Glycoproteins, In Chemical Biology of Glycoproteins, pp 150-187, The Royal Society of Chemistry Press. (40) Wilson, R. M., Dong, S. W., Wang, P., and Danishefsky, S. J. (2013) The Winding pathway to erythropoietin along the chemistry-biology frontier: A success at last, Angew. Chem., Int. Ed. 52, 7646-7665. (41) Woods Group. (2005-2017) GLYCAM Web. Complex Carbohydrate Research Center, University of Georgia, Athens, GA. (http://glycam.org/).

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(42) Sytkowski, A. J. (2006) Erythropoietin: Blood, Brain and Beyond, Wiley-VCH, Weinheim.. (43) Jensen, P. H., Karlsson, N. G., Kolarich, D., and Packer, N. H. (2012) Structural analysis of N- and O-glycans released from glycoproteins, Nat. Protoc. 7, 1299-1310. (44) Byeon, J., Lim, Y. R., Kim, H. H., and Suh, J. K. (2015) Structural identification of a non-glycosylated variant at ser126 for O-glycosylation site from EPO BRP, human recombinant erythropoietin by LC/MS analysis, Mol. Cells 38, 496-505. (45) Chu, C. S., Ninonuevo, M. R., Clowers, B. H., Perkins, P. D., An, H. J., Yin, H., Killeen, K., Miyamoto, S., Grimm, R., and Lebrilla, C. B. (2009) Profile of native N-linked glycan structures from human serum using high performance liquid chromatography on a microfluidic chip and time-of-flight mass spectrometry, Proteomics 9, 1939-1951. (46) Yabu, M., Korekane, H., and Miyamoto, Y. (2014) Precise structural analysis of O-linked oligosaccharides in human serum, Glycobiology 24, 542-553. (47) Wang, P., Dong, S., Shieh, J. H., Peguero, E., Hendrickson, R., Moore, M. A. S., and Danishefsky, S. J. (2013) Erythropoietin derived by chemical synthesis, Science 342, 13571360. (48) Wu, B., Hua, Z., Warren, J. D., Ranganathan, K., Wan, Q., Chen, G., Tan, Z., Chen, J., Endo, A., and Danishefsky, S. J. (2006) Synthesis of the fucosylated biantennary N-glycan of erythropoietin, Tetrahedron Lett. 47, 5577-5579. (49) Nagorny, P., Fasching, B., Li, X., Chen, G., Aussedat, B., and Danishefsky, S. J. (2009) Toward fully synthetic homogeneous beta-human follicle-stimulating hormone (betahFSH) with a biantennary N-linked dodecasaccharide. Synthesis of beta-hFSH with chitobiose units at the natural linkage sites, J. Am. Chem. Soc. 131, 5792-5799. (50) Corfield, A. P., and Berry, M. (2015) Glycan variation and evolution in the eukaryotes, Trends Biochem. Sci. 40, 351-359. (51) Walczak, M. A., and Danishefsky, S. J. (2012) Solving the convergence problem in the synthesis of triantennary N-glycan relevant to prostate-specific membrane antigen (PSMA), J. Am. Chem. Soc. 134, 16430-16433. (52) Walczak, M. A., Hayashida, J., and Danishefsky, S. J. (2013) Building biologics by chemical synthesis: practical preparation of di- and triantennary N-linked glycoconjugates, J. Am. Chem. Soc. 135, 4700-4703. (53) Chen, J., Chen, G., Wu, B., Wan, Q., Tan, Z., Hua, Z., and Danishefsky, S. J. (2006) Mature homogeneous erythropoietinlevel building blocks by chemical synthesis: The EPO 114-166 glycopeptide domain, presenting the O-linked glycophorin, Tetrahedron Lett. 47, 8013-8016. (54) Marcaurelle, L. A., and Bertozzi, C. R. (2002) Recent advances in the chemical synthesis of mucin-like glycoproteins, Glycobiology 12, 69R-77R. (55) Tan, Z., Shang, S., Halkina, T., Yuan, Y., and Danishefsky, S. J. (2009) Toward homogeneous erythropoietin: non-NCL-based chemical synthesis of the Gln78-Arg166 glycopeptide domain, J. Am. Chem. Soc. 131, 5424-5431. (56) Hojo, H., and Nakahara, Y. (2007) Recent progress in the field of glycopeptide synthesis, Biopolymers 88, 308-324. (57) Guo, Z., and Shao, N. (2005) Glycopeptide and glycoprotein synthesis involving unprotected carbohydrate building blocks, Med. Res. Rev. 25, 655-678. (58) Fernandez-Tejada, A., Brailsford, J., Zhang, Q., Shieh, J. H., Moore, M. A., and Danishefsky, S. J. (2015) Total synthesis of glycosylated proteins, Top. Curr. Chem. 362, 1-26. (59) Likhosherstov, L. M., Novikova, O. S., Derevitskaja, V. A., and Kochetkov, N. K. (1986) A new simple synthesis of amino sugar β-d-glycosylamines, Carbohydr. Res. 146, C1-C5.

Page 14 of 17

(60) Cohen-Anisfeld, S. T., and Lansbury, P. T. (1993) A practical, convergent method for glycopeptide synthesis, J. Am. Chem. Soc. 115, 10531-10537. (61) Wang, P., Dong, S., Brailsford, J. A., Iyer, K., Townsend, S. D., Zhang, Q., Hendrickson, R. C., Shieh, J., Moore, M. A., and Danishefsky, S. J. (2012) At last: erythropoietin as a single glycoform, Angew. Chem., Int. Ed. 51, 11576-11584. (62) Wang, P., Aussedat, B., Vohra, Y., and Danishefsky, S. J. (2012) An advance in the chemical synthesis of homogeneous Nlinked glycopolypeptides by convergent aspartylation, A Angew. Chem., Int. Ed. 51, 11571-11575. (63) Tan, Z., Shang, S., and Danishefsky, S. J. (2011) Rational development of a strategy for modifying the aggregatibility of proteins, Proc. Natl. Acad. Sci. U S A 108, 4297-4302. (64) Ullmann, V., Radisch, M., Boos, I., Freund, J., Pohner, C., Schwarzinger, S., and Unverzagt, C. (2012) Convergent solidphase synthesis of N-glycopeptides facilitated by pseudoprolines at consensus-sequence Ser/Thr residues, Angew. Chem., Int. Ed. 51, 11566-11570. (65) Dong, S., Shang, S., Tan, Z., and Danishefsky, S. J. (2011) Toward homogeneous erythropoietin: Application of metal free dethiylation in the chemical synthesis of the Ala79-Arg166 glycopeptide domain, Isr. J. Chem. 51, 968-976. (66) Wan, Q., and Danishefsky, S. J. (2007) Free-radical-based, specific desulfurization of cysteine: a powerful advance in the synthesis of polypeptides and glycopolypeptides, Angew. Chem., Int. Ed. 46, 9248-9252. (67) Tan, Z., Shang, S., and Danishefsky, S. J. (2010) Insights into the finer issues of native chemical ligation: an approach to cascade ligations, Angew. Chem., Int. Ed. 49, 9500-9503. (68) Shang, S., Tan, Z., and Danishefsky, S. J. (2011) Application of the logic of cysteine-free native chemical ligation to the synthesis of Human Parathyroid Hormone (hPTH), Proc. Natl. Acad. Sci. U S A 108, 5986-5989. (69) Shang, S., Tan, Z., Dong, S., and Danishefsky, S. J. (2011) An advance in proline ligation, J. Am. Chem. Soc. 133, 1078410786. (70) Townsend, S. D., Tan, Z., Dong, S., Shang, S., Brailsford, J. A., and Danishefsky, S. J. (2012) Advances in proline ligation, J. Am. Chem. Soc. 134, 3912-3916. (71) Guan, X., Chaffey, P. K., Zeng, C., and Tan, Z. (2015) New methods for chemical protein synthesis, Top. Curr. Chem. 363, 155-192. (72) Chen, J., Wan, Q., Yuan, Y., Zhu, J., and Danishefsky, S. J. (2008) Native chemical ligation at valine: a contribution to peptide and glycopeptide synthesis, Angew. Chem., Int. Ed. 47, 8521-8524. (73) Camarero, J. A., and Muir, T. W. (2001) Native chemical ligation of polypeptides, Curr. Protoc. Protein Sci. Chapter 18:Unit18.4. (74) Rohde, H., and Seitz, O. (2010) Ligation-desulfurization: a powerful combination in the synthesis of peptides and glycopeptides, Biopolymers 94, 551-559. (75) Ma, J., Zeng, J., and Wan, Q. (2015) Postligationdesulfurization: a general approach for chemical protein synthesis, Top. Curr. Chem. 363, 57-101. (76) Murakami, M., Kiuchi, T., Nishihara, M., Tezuka, K., Okamoto, R., Izumi, M., and Kajihara, Y. (2016) Chemical synthesis of erythropoietin glycoforms for insights into the relationship between glycosylation pattern and bioactivity, Sci. Adv. 2, e1500678. (77) Price, J. L., Culyba, E. K., Chen, W., Murray, A. N., Hanson, S. R., Wong, C. H., Powers, E. T., and Kelly, J. W. (2012) N-glycosylation of enhanced aromatic sequons to increase glycoprotein stability, Biopolymers 98, 195-211.

ACS Paragon Plus Environment

Page 15 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(78) Schwarz, F., and Aebi, M. (2011) Mechanisms and principles of N-linked protein glycosylation, Curr. Opin. Struct. Biol. 21, 576-582. (79) Reynders, E., Foulquier, F., Annaert, W., and Matthijs, G. (2011) How Golgi glycosylation meets and needs trafficking: the case of the COG complex, Glycobiology 21, 853-863. (80) Ferris, S. P., Kodali, V. K., and Kaufman, R. J. (2014) Glycoprotein folding and quality-control mechanisms in proteinfolding diseases, Dis. Model Mech. 7, 331-341. (81) Wormald, M. R., and Dwek, R. A. (1999) Glycoproteins: glycan presentation and protein-fold stability, Structure 7, R155160. (82) Hebert, D. N., Lamriben, L., Powers, E. T., and Kelly, J. W. (2014) The intrinsic and extrinsic effects of N-linked glycans on glycoproteostasis, Nat. Chem. Biol. 10, 902-910. (83) Price, J. L., Shental-Bechor, D., Dhar, A., Turner, M. J., Powers, E. T., Gruebele, M., Levy, Y., and Kelly, J. W. (2010) Context-dependent effects of asparagine glycosylation on Pin WW folding kinetics and thermodynamics, J. Am. Chem. Soc. 132, 15359-15367. (84) Jager, M., Dendle, M., and Kelly, J. W. (2009) Sequence determinants of thermodynamic stability in a WW domain—An all-β-sheet protein, Protein Sci. 18, 1806-1813. (85) Culyba, E. K., Price, J. L., Hanson, S. R., Dhar, A., Wong, C. H., Gruebele, M., Powers, E. T., and Kelly, J. W. (2011) Protein native-state stabilization by placing aromatic side chains in N-glycosylated reverse turns, Science 331, 571-575. (86) Price, J. L., Powers, D. L., Powers, E. T., and Kelly, J. W. (2011) Glycosylation of the enhanced aromatic sequon is similarly stabilizing in three distinct reverse turn contexts, Proc. Natl. Acad. Sci. U S A 108, 14127-14132. (87) Chen, W., Enck, S., Price, J. L., Powers, D. L., Powers, E. T., Wong, C. H., Dyson, H. J., and Kelly, J. W. (2013) Structural and energetic basis of carbohydrate-aromatic packing interactions in proteins, J. Am. Chem. Soc. 135, 9877-9884. (88) Hsu, C. H., Park, S., Mortenson, D. E., Foley, B. L., Wang, X., Woods, R. J., Case, D. A., Powers, E. T., Wong, C. H., Dyson, H. J., and Kelly, J. W. (2016) The dependence of carbohydrate-aromatic interaction strengths on the structure of the carbohydrate, J. Am. Chem. Soc. 138, 7636-7648. (89) Laughrey, Z. R., Kiehna, S. E., Riemen, A. J., and Waters, M. L. (2008) Carbohydrate-pi interactions: what are they worth?, J. Am. Chem. Soc. 130, 14625-14633. (90) Kiehna, S. E., Laughrey, Z. R., and Waters, M. L. (2007) Evaluation of a carbohydrate-pi interaction in a peptide model system, Chem. Commun. 4026-4028. (91) Weiner, G. J. (2015) Building better monoclonal antibodybased therapeutics, Nat. Rev. Cancer 15, 361-370. (92) Vazquez-Rey, M., and Lang, D. A. (2011) Aggregates in monoclonal antibody manufacturing processes, Biotechnol. Bioeng. 108, 1494-1508. (93) Jefferis, R. (1990) Molecular Structure of Human IgG Subclasses, In The Human IgG Subclasses, pp 15-30, Pergamon press, Amsterdam. (94) Chennamsetty, N., Voynov, V., Kayser, V., Helk, B., and Trout, B. L. (2009) Design of therapeutic proteins with enhanced stability, Proc. Natl. Acad. Sci. U S A 106, 11937-11942. (95) Chen, W., Kong, L., Connelly, S., Dendle, J. M., Liu, Y., Wilson, I. A., Powers, E. T., and Kelly, J. W. (2016) Stabilizing the CH2 Domain of an Antibody by Engineering in an Enhanced Aromatic Sequon, ACS Chem. Biol. 11, 1852-1861. (96) Shields, R. L., Namenuk, A. K., Hong, K., Meng, Y. G., Rae, J., Briggs, J., Xie, D., Lai, J., Stadlen, A., Li, B., Fox, J. A., and Presta, L. G. (2001) High resolution mapping of the binding site on human IgG1 for Fc gamma RI, Fc gamma RII, Fc gamma RIII, and FcRn and design of IgG1 variants with improved binding to the Fc gamma R, J. Biol. Chem. 276, 6591-6604.

(97) Stewart, R., Hammond, S. A., Oberst, M., and Wilkinson, R. W. (2014) The role of Fc gamma receptors in the activity of immunomodulatory antibodies for cancer, J. Immunother. Cancer 2, 29. (98) Kaneko, E., and Niwa, R. (2011) Optimizing therapeutic antibody function: progress with Fc domain engineering, BioDrugs 25, 1-11. (99) Minguez, P., Parca, L., Diella, F., Mende, D. R., Kumar, R., Helmer-Citterich, M., Gavin, A. C., van Noort, V., and Bork, P. (2012) Deciphering a global network of functionally associated post-translational modifications, Mol. Syst. Biol. 8, 599. (100) Gill, D. J., Clausen, H., and Bard, F. Location, location, location: new insights into O-GalNAc protein glycosylation, Trends Cell Biol. 21, 149-158. (101) Bond, M. R., and Hanover, J. A. (2015) A little sugar goes a long way: the cell biology of O-GlcNAc, J. Cell Biol. 208, 869-880. (102) Kiessling, L. L., and Splain, R. A. (2010) Chemical approaches to glycobiology, Annu. Rev. Biochem. 79, 619-653. (103) Bertozzi, C. R., and Kiessling, L. L. (2001) Chemical glycobiology, Science 291, 2357-2364. (104) Johansson, M. E. V., and Hansson, G. C. (2016) Immunological aspects of intestinal mucus and mucins, Nat. Rev. Immunol. 16, 639-649. (105) Tran, D. T., and Ten Hagen, K. G. (2013) Mucin-type Oglycosylation during development, J. Biol. Chem. 288, 69216929. (106) Live, D. H., Kumar, R. A., Beebe, X., and Danishefsky, S. J. (1996) Conformational influences of glycosylation of a peptide: a possible model for the effect of glycosylation on the rate of protein folding, Proc. Natl. Acad. Sci. U S A 93, 1275912761. (107) Sames, D., Chen, X. T., and Danishefsky, S. J. (1997) Convergent total synthesis of a tumour-associated mucin motif, Nature 389, 587-591. (108) Marcaurelle, L. A., Mizoue, L. S., Wilken, J., Oldham, L., Kent, S. B., Handel, T. M., and Bertozzi, C. R. (2001) Chemical synthesis of lymphotactin: a glycosylated chemokine with a C-terminal mucin-like domain, Chemistry 7, 1129-1132. (109) Hsieh, Y. S., Taleski, D., Wilkinson, B. L., Wijeyewickrema, L. C., Adams, T. E., Pike, R. N., and Payne, R. J. (2012) Effect of O-glycosylation and tyrosine sulfation of leech-derived peptides on binding and inhibitory activity against thrombin, Chem. Commun. 48, 1547-1549. (110) Asahina, Y., Fujimoto, R., Suzuki, A., and Hojo, H. (2015) Synthesis of Fmoc-Thr unit carrying core 1 O-linked sugar with acid-sensitive O-protecting group and its application to the synthesis of glycosylated peptide thioester, J. Carbohydr. Chem. 34, 12-27. (111) Tian, E., and Ten Hagen, K. G. (2009) Recent insights into the biological roles of mucin-type O-glycosylation, Glycoconj. J. 26, 325-334. (112) Barb, A. W., Borgert, A. J., Liu, M., Barany, G., and Live, D. (2010) Intramolecular glycan-protein interactions in glycoproteins, Methods Enzymol. 478, 365-388. (113) Strahl-Bolsinger, S., Gentzsch, M., and Tanner, W. (1999) Protein O-mannosylation, Biochim. Biophys. Acta. 1426, 297-307. (114) Payne, C. M., Resch, M. G., Chen, L., Crowley, M. F., Himmel, M. E., Taylor, L. E., 2nd, Sandgren, M., Stahlberg, J., Stals, I., Tan, Z., and Beckham, G. T. (2013) Glycosylated linkers in multimodular lignocellulose-degrading enzymes dynamically bind to cellulose, Proc. Natl. Acad. Sci. U S A 110, 14646-14651. (115) Stals, I., Sandra, K., Geysens, S., Contreras, R., Van Beeumen, J., and Claeyssens, M. (2004) Factors influencing glycosylation of Trichoderma reesei cellulases. I: Postsecretorial

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

changes of the O- and N-glycosylation pattern of Cel7A, Glycobiology 14, 713-724. (116) Stals, I., Sandra, K., Devreese, B., Van Beeumen, J., and Claeyssens, M. (2004) Factors influencing glycosylation of Trichoderma reesei cellulases. II: N-glycosylation of Cel7A core protein isolated from different strains, Glycobiology 14, 725-737. (117) Kraulis, J., Clore, G. M., Nilges, M., Jones, T. A., Pettersson, G., Knowles, J., and Gronenborn, A. M. (1989) Determination of the three-dimensional solution structure of the C-terminal domain of cellobiohydrolase I from Trichoderma reesei. A study using nuclear magnetic resonance and hybrid distance geometry-dynamical simulated annealing, Biochemistry 28, 7241-7257. (118) Chen, L., Drake, M. R., Resch, M. G., Greene, E. R., Himmel, M. E., Chaffey, P. K., Beckham, G. T., and Tan, Z. (2014) Specificity of O-glycosylation in enhancing the stability and cellulose binding affinity of Family 1 carbohydrate-binding modules, Proc. Natl. Acad. Sci. U S A 111, 7612-7617. (119) Guan, X., Chaffey, P. K., Zeng, C., Greene, E. R., Chen, L., Drake, M. R., Chen, C., Groobman, A., Resch, M. G., Himmel, M. E., Beckham, G. T., and Tan, Z. (2015) Molecularscale features that govern the effects of O-glycosylation on a carbohydrate-binding module, Chem. Sci. 6, 7185-7189. (120) Chaffey, P. K., Guan, X., Chen, C., Ruan, Y., Wang, X., Tran, A. H., Koelsch, T. N., Cui, Q., Feng, Y., and Tan, Z. (2017) Structural insight into the stabilizing effect of O-glycosylation, Biochemistry 56, 2897-2906. (121) Ardejani, M. S., Powers, E. T., and Kelly, J. W. (2017) Using cooperatively folded peptides to measure interaction energies and conformational propensities, Acc. Chem. Res. 50, 1875-1882. (122) Praissman, J. L., and Wells, L. (2014) Mammalian Omannosylation pathway: glycan structures, enzymes, and protein substrates, Biochemistry 53, 3066-3078. (123) Loibl, M., and Strahl, S. (2013) Protein O-mannosylation: What we have learned from baker's yeast, Biochim. Biophys. Acta 1833, 2438-2446. (124) Van den Steen, P., Rudd, P. M., Dwek, R. A., and Opdenakker, G. (1998) Concepts and principles of O-linked glycosylation, Crit. Rev. Biochem. Mol. Biol. 33, 151-208. (125) Chamorey, A. L., Magne, N., Pivot, X., and Milano, G. (2002) Impact of glycosylation on the effect of cytokines. A special focus on oncology, Eur. Cytokine Netw. 13, 154-160. (126) Schall, T. J., Jongstra, J., Dyer, B. J., Jorgensen, J., Clayberger, C., Davis, M. M., and Krensky, A. M. (1988) A

Page 16 of 17

human T cell-specific molecule is a member of a new gene family, J. Immunol. 141, 1018-1025. (127) Kameyoshi, Y., Dorschner, A., Mallet, A. I., Christophers, E., and Schroder, J. M. (1992) Cytokine RANTES released by thrombin-stimulated platelets is a potent attractant for human eosinophils, J. Exp. Med. 176, 587-592. (128) Oran, P. E., Sherma, N. D., Borges, C. R., Jarvis, J. W., and Nelson, R. W. (2010) Intrapersonal and populational heterogeneity of the chemokine RANTES, Clin. Chem. 56, 14321441. (129) Vangelista, L., Secchi, M., and Lusso, P. (2008) Rational design of novel HIV-1 entry inhibitors by RANTES engineering, Vaccine 26, 3008-3015. (130) Guan, X., Chaffey, P. K., Chen, H., Feng, W., Wei, X., Yang, L., Ruan, Y., Wang, X., Li, Y., Barosh, K. B., Tran, A. H., Zhu, J., Liang, W., Zheng, Y., Wang, X., and Tan, Z. (2017) OGalNAcylation of RANTES improves its properties as a human immunodeficiency virus type 1 entry inhibitor., Biochemistry, doi: 10.1021/acs.biochem.7b00875. (131) Guan, X., Chaffey, P. K., Wei, X., Gulbranson, D. R., Ruan, Y., Wang, X., Li, Y., Ouyang, Y., Chen, L., Zeng, C., Koelsch, T. N., Tran, A. H., Liang, W., Shen, J., and Tan, Z. (2017) Chemically precise glycoengineering improves human insulin, ACS Chem Biol, doi: 10.1021/acschembio.7b00794. (132) Grunberger, G. (2014) Insulin analogs-are they worth it? Yes!, Diabetes Care 37, 1767-1770. (133) Arbit, E., and Kidron, M. (2009) Oral insulin: the rationale for this approach and current developments, J. Diabetes Sci. Technol. 3, 562-567. (134) Pan, S., Chen, R., Aebersold, R., and Brentnall, T. A. (2011) Mass spectrometry based glycoproteomics—from a proteomics perspective, Mol. Cell Proteomics 10, R110 003251. (135) Meany, D. L., and Chan, D. W. (2011) Aberrant glycosylation associated with enzymes as cancer biomarkers, Clin. Proteomics 8, 7.

ACS Paragon Plus Environment

Page 17 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

ACS Paragon Plus Environment

17