Glycosylation is vital for industrial performance of ... - ACS Publications

Abstract: In the terrestrial biosphere, biomass deconstruction is conducted by microbes employing a variety of complementary strategies, many of which...
0 downloads 0 Views 909KB Size
Subscriber access provided by Iowa State University | Library

Article

Glycosylation is vital for industrial performance of hyper-active cellulases Daehwan Chung, Nicholas S. Sarai, Brandon C Knott, Neal Hengge, Jordan Russell, John M. Yarbrough, Roman Brunecky, Jenna Young, Nitin Supekar, Todd Vander Wall, Deanne W Sammond, Michael F. Crowley, Christine Szymanski, Lance Wells, Parastoo Azadi, Janet Westpheling, Michael E. Himmel, and Yannick J Bomble ACS Sustainable Chem. Eng., Just Accepted Manuscript • DOI: 10.1021/ acssuschemeng.8b05049 • Publication Date (Web): 01 Feb 2019 Downloaded from http://pubs.acs.org on February 5, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Glycosylation is vital for industrial performance of hyper-active cellulases Daehwan Chung1§, Nicholas S. Sarai1§, Brandon C. Knott1, Neal Hengge1, Jordan F. Russell2, John M. Yarbrough1, Roman Brunecky1, Jenna Young2, Nitin Supekar3, Todd VanderWall1, Deanne W. Sammond1, Michael F. Crowley1, Christine M. Szymanski3, Lance Wells3, Parastoo Azadi3, Janet Westpheling2, Michael E. Himmel1, and Yannick J. Bomble1* Biosciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden CO, 80401, USA, 2Department of Genetics, University of Georgia, Athens GA, 30602, USA, 3Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602

1

§ These

authors contributed equally to this work. *Correspondence: [email protected]

Abstract: In the terrestrial biosphere, biomass deconstruction is conducted by microbes employing a variety of complementary strategies, many of which remain to be discovered. Moreover, the biofuels industry seeks more efficient (and less costly) cellulase formulations upon which to launch the nascent sustainable bioenergy economy. The glycan decoration of fungal cellulases has been shown to protect these enzymes from protease action and to enhance binding to cellulose. We show here that thermal tolerant bacterial cellulases are glycosylated as well, although the types and extents of decoration differ from their Eukaryotic counterparts. Our major findings being that glycosylation of CelA is uniform across its three linker peptides and composed of mainly galactose disaccharides (which is unique), and that this glycosylation dramatically impacts the hydrolysis of insoluble substrates, proteolytic and thermal stability, substrate binding and changes the dynamics of the enzymes. This study suggests that the glycosylation of CelA is crucial for its exceptionally high cellulolytic activity on biomass and provides the robustness needed for this enzyme to function in harsh environments including industrial settings. Keywords: glycosylation, enzyme stability, biofuels, galactose, CAZymes, Caldicellulosiruptor bescii, cellulolytic anaerobes

1 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction In the terrestrial biosphere, biomass deconstruction is conducted by various microbes employing a variety of complementary strategies. The hyperthermophilic anaerobic bacterium, Caldicellulosiruptor bescii, isolated from hot springs in the geothermally active Valley of Geysers in Siberia, can efficiently solubilize biomass without pretreatment. C. bescii relies primarily on a suite of complex multi-catalytic domain and multifunctional gene products to deconstruct biomass (1-3). The secretome of C. bescii displays high cellulolytic activity and the combination of the four most highly expressed enzymes in the secretome are enough to reproduce the activity of the entire secretome, making these enzymes appealing for the biofuels industry (2). One of these enzymes, CelA, has been shown to be one of the most efficient biomass degrading enzymes on several biomass substrates (4). CelA is a complex, thermally stable cellulase, containing an Nterminal GH9A-CBM3c processive endoglucanase, two family 3 carbohydrate-binding modules (CBM3b), and a C-terminal GH48 exo-β-1,4-glucanase domain linked by Pro/Thr rich linkers (Figure 1) (4-6). Recent characterization of C. bescii carbohydrate-active enzymes (CAZymes), and especially native CelA has shown that the enzyme is glycosylated upon secretion from the cell (7). Protein glycosylation is one of the most common protein post-translational modifications (PTM) of proteins and is thought to permit microorganisms to expand the combinatorial complexity of their gene products at a level beyond sequence space alone, opening new routes to structural, catalytic, and thermodynamic diversity (8). Glycosylation as a PTM in Nature has a variety of proposed roles that include enhancing protein solubility, biasing protein folding pathways, providing stability against proteolysis, and modulating signaling and molecular recognition pathways (9, 10). Eukaryotic cellulolytic enzymes are often glycosylated, and this PTM has been shown to be important for both function and stability. For example, the CBMs, linker peptides, and catalytic domains (CDs) of fungal cellulases are decorated by N-linked and Olinked glycosylation (11-13). Glycosylation of the CD of T. reesei Cel7A (TrCel7A), a processive cellulase that is a key component of commercial cellulase preparations, has been shown to affect substrate binding affinity, stability, and activity (11, 14, 15). Moreover, O-linked glycosylation of the isolated family-1 CBM of TrCel7A has been shown to enhance the binding affinity for 2 ACS Paragon Plus Environment

Page 2 of 27

Page 3 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

crystalline cellulose (13). The O-linked glycosylation of Ser/Pro/Thr rich linkers connecting CDs and CBMs in fungal cellulases (16, 17) has also been found to impart a degree of resistance against proteolytic cleavage, both to folded domain backbones and to extended linkers by increasing steric hindrance (i.e., decreasing accessibility) (9, 13, 16, 18). It has also been shown that glycosylation of linker peptides induces a more extended conformation of the molecule via excluded volume and steric effects (15-17) and glycans may also shield hydrophobic regions of protein from aqueous solvent, thus enhancing protein solubility and reducing aggregation (9, 19, 20). Despite the recognition of its important role in cellulolytic eukaryotes, detailed studies of bacterial cellulase glycosylation have not been reported. Gerwig and co-workers demonstrated that cellulosome subunits from Clostridium thermocellum and Bacteroides cellulosolvens exhibit glycosylation (21-23), which is primarily O-linked galactosylation of the linker peptides. Glycosylation was also observed by Pages and coworkers for proteins produced by Clostridium cellulolyticum (24). However, these studies focused on the elucidation of the glycan structures and did not describe the role of these glycans in cellulase functionality. Langsford and coworkers identified protection against proteolysis as one possible role for glycosylation of the a Cellulomonas fimi cellulase (18). Here, we report a detailed glycomics analyses of the industrially relevant CAZyme, CelA, from the hyperthermophilic bacterium, C. bescii. Using a combination of biochemical assays, circular dichroism, and molecular modeling we assess the role of these glycans in the proteolytic and thermal stability of CelA. Then, with biomass binding and deconstruction assays we determine the roles and impacts of glycosylation on hydrolysis of insoluble substrates and substrate binding. Finally, we conduct molecular modeling using molecular dynamics and hydrophobic patch prediction to try to explain the difference in substrate binding of the native glycosylated and non-glycosylated forms of CelA. Taken together, these data suggest that the glycosylation of CelA is crucial for its exceptionally high cellulolytic activity on biomass, providing the robustness needed for this enzyme to function in harsh environments and in industrial settings.

3 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Results and Discussion The CelA glycans are unusual To determine the identities and positioning of the glycans decorating CelA, we investigated the full-length enzyme and three truncations (Figures 1, S1, and S2). These constructs were selected based on their architectures. The CT (CBM3b-GH48) and NT1 (GH9- CBM3c-CBM3b) constructs are very similar to efficient gene products from other cellulolytic bacteria such as Thermobifida fusca (Cel48A and Cel9A) (25, 26).They both include a CBM3b that is needed for binding to cellulose. NT2 (GH9-CBM3c) probably lacks the ability to bind strongly to cellulose but this construct was selected to evaluate the impact of expressing in both E. coli and C. bescii with minimal changes in glycosylation as we expected this construct to have minimal to no glycosylation. These constructs were expressed in C. bescii to obtain enzymes with native glycosylation, as well as in E. coli to provide non-glycosylated versions of each construct (supplementary information). A protease deficient E. coli strain was used to limit protease degradation of the exposed linker regions. After purification, we observed that, except for NT2, the C. bescii expressed constructs had significantly higher molecular weights than their E. coli counterparts and could be identified using the glycan-specific periodic acid-Schiff staining protocol (Figures 2B and S3). Differences in electrophoretic mobility are most likely due to glycosylation of the linker peptides, which was confirmed by the glycan stain. In all these constructs, there are long Pro/Thr repeats that are predicted as potential O-glycosylation attachment sites. NT2 may be an exception in that it possesses only a three residue (Thr-Pro-Thr) linker sequence (Figure S2). Detailed glycomics analyses revealed that most of the glycans on CelA and its truncations are galactose disaccharides O-linked to either Thr or Ser. O-glycans were released by β-elimination; subsequent MALDI/TOF-MS analysis demonstrated that Hex2 was the predominant occupancy, with minor Hex1 and Hex3 (a small amount of Hex4 was also detected only in full-length CelA) (Figure S4). Monosaccharide analysis with high-performance anion exchange chromatography (HPAEC) revealed the presence of galactose (Figure S5). Glucose was also detected as a minor component but is most likely a contaminating artifact. Finally, glycosyl linkage analysis by GC-MS demonstrated that the glycans produced by C. bescii CelA are O-linked -1,2 galactose 4 ACS Paragon Plus Environment

Page 4 of 27

Page 5 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

disaccharides (Figure S6). This degree of glycosylation constitutes roughly ten percent of the enzyme’s molecular weight. This level of glycosylation is higher than that in other thermophilic cellulolytic bacteria such as B. cellulosolvens and C. thermocellum where it was found to represent between 4 to 7% of the mass of cellulosomes (21-23). Glycosylation in these microorganisms was also located on the long linker regions of their secreted proteins but the composition of the glycans was much more heterogeneous with a diverse array of oligosaccharides, composed primarily of galactose (Gal) but also many other glycan moieties such as methyl-D-glucopyranose (GlcpNAc) and N-Acetylglucosamine (GlcNAc)

in a variety of

occupancies with several linkage types (21). The main glycosylation patterns for these other cellulolytic bacteria involve four branched glycans and represent close to 50% of the glycosylation in these microorganisms. There are also other glycosylation patterns that are simpler in their architectures but are still more heterogeneous that the galactose disaccharide pattern found in CelA. Given the redundancy in the linker sequence (Figure S2), precise determination of the location of these disaccharides is difficult. However, given that CelA has 85 Thr and Ser residues in its linker peptides and displays 84 galactose molecules, the simplest model involves digalactose attached to 50% of the Thr and Ser residues in the linker peptides. Significantly, the glycosylation occupancy is very similar for the full-length enzyme and the two glycan-containing truncations (NT1 and CT), indicating that the three long linkers are relatively homogeneous in their glycosylation coverage (Figures S6-10). Overall protein stability is increased by glycosylation Glycosylation has been demonstrated to increase protein thermal stability (11, 13, 27). We utilized circular dichroism (CD) spectroscopy to evaluate the melting transition of natively glycosylated and non-glycosylated versions of CelA. We observed that non-glycosylated CelA has a melting transition that begins near 85°C, whereas the melting transition of natively glycosylated CelA is around 95°C (Figure 3A). In addition to thermal stability, glycosylation has also been demonstrated to confer proteolytic protection to many proteins and could thus be a commonly used mechanism to 5 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

protect cellulases, especially those with long linker peptides. Longer linker regions are especially susceptible to cleavage by protease and we have shown that O-linked glycosylation is concentrated in the linker regions of CelA. These flexible, extended regions are likely to be highly prone to proteolytic cleavage due simply to greatly increased accessibility. We conducted proteolytic digestion experiments with a thermostable metalloprotease, thermolysin from Bacillus thermoproteolyticus, to assess the potential role of glycans in the proteolytic stability of CelA. Thermolysin is a relatively promiscuous enzyme in that it has a recognition motif requiring a bulky hydrophobic or aromatic residue C-terminal to the scissile bond (Figures 4A and S11); motifs of this kind occur throughout the entire CelA sequence (28) (Figure S2). The presence of threonine-valine and threonine-alanine motifs throughout the linker sequences could allow for several possible scissile bonds in the exposed linkers themselves (Figure 4A). SDS-PAGE gel analysis at various time points during incubation with thermolysin demonstrated enhanced proteolytic resistance of the natively glycosylated CelA (Figures 4B and S12). Over the course of four hours, most of the enzyme remained in its native full-length conformation when glycosylated, whereas the non-glycosylated version of this enzyme was degraded within a few minutes of protease incubation. Additionally, the protease cleavage profile was dramatically different between these two enzymes. In both the C. bescii CelA (CbCelA) and E. coli CelA (EcCelA) protease cleavage reactions, regions of the protein remain visible on the SDS gel, indicating that the protein is not fully cleaved into short oligomeric peptides. It is likely that in the EcCelA, the folded protein domains are shielded from attack due to their lower degree of accessibility. The main electrophoretic bands, given their molecular weights, may correspond to the GH48, GH9-CBM3c, and the two CBM3b domains in EcCelA, but apparently correspond to more complex structures for CbCelA (Figure 4B and S3). To understand in greater detail the mechanisms by which glycosylation increases CelA stability, we performed molecular dynamics (MD) simulations on (1) an isolated CelA linker in solution, and (2) two CBM3b domains connected by a CelA linker bound to a hydrophobic face of cellulose. CelA has three long linkers that are almost identical in composition (Figure 2A and S2). We focused on the middle linker between the two CBM3b domains (Linker 3). We considered the 6 ACS Paragon Plus Environment

Page 6 of 27

Page 7 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

linker in three different states: (1) non-glycosylated, corresponding to an E. coli expressed CelA, (2) the “base case” corresponding to the best approximation of native glycosylation in CbCelA, given the aforementioned glycomics data (with the assumption that every other Thr or Ser is an attachment site for -1,2 linked di-galactose), and (3) “doubly glycosylated”, wherein digalactose is attached to every linker at Thr and Ser (Figure 4A, Figure S13). On the surface of cellulose, the fluctuations of CBM3b (as measured by root mean squared fluctuations, RMSF) tend to be dampened with increased linker glycan content, an effect likely to promote increased thermostability. Figure 3C demonstrates this trend for the first CBM3b. This trend is less dramatic (though still holds) for the second CBM3b (Figure S14). Regarding proteolytic stability, linker glycans have two competing impacts. First, linker glycans tend to extend the linker region, as demonstrated by both increased end-to-end length and radius of gyration (both in solution and on the cellulose surface, Figure S15). This extending effect has been previously demonstrated via MD simulations with the fungal cellulase T. reesei Cel7A, both in solution (16) and on the surface of cellulose (15). The current result is informative because it demonstrates that the effect holds with different glycans (di-galactose in CelA vs. mannose and N-acetylglucosamine in Cel7A) and with the chemical nature of bacterial (longer and richer in alternating Pro and Thr) and fungal (shorter, richer in Thr, often displaying several consecutively, and Ser) linkers. The second effect is that of “shielding” the linker peptide from exposure to solvent and more importantly to proteases. This can be quantified by a reduced solvent-accessible surface area (SASA, Figure S16). This trend also holds when one considers a probe size corresponding to the width of the active site of thermolysin (~3.36 Å, see Figure 4C inset), thus producing an “enzyme accessible surface area” (EASA). In both calculations, only the linkages susceptible to protease attack are included in the calculation (backbone C atoms of the cut sites - Thr162, Thr168, Thr182, Thr186, Pro199, Val200, Figure 4A- and the backbone nitrogen of the following residue). Figure 4C quantifies the proteolytic protection afforded by the linker glycans. This indicates that the shielding effect of the linker glycans seems to predominate over its seemingly “opposite” propensity to extend the molecule. However, the extension of the linker could also protect against proteolytic cleavage, as inferred by a recent computational study suggesting that extension of the linker impairs binding by the catalytic site of proteases (29). 7 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Glycosylation is essential for the high activity of CelA We performed enzymatic digestions of the model substrate, Avicel, with CbCelA, EcCelA, and their respective truncations. Glycosylation appears vital to the high cellulolytic activity of CbCelA, considering that natively glycosylated CelA (CbCelA) converts 77% of Avicel to simple sugars after 96 h, whereas CelA expressed in E. coli (EcCelA), reaches only 55% conversion (Figure 5A). The improvement in the final extent of conversion was facilitated by the much faster rate of conversion in the initial hours of the digestion. Similarly, NT1 (a processive endoglucanase and the most active fragment of CelA) demonstrates higher conversion performance when glycosylated (66%) than its non-glycosylated counterpart (38%) after 96 h (Figure 5C). In this case, the initial rates of conversion between the glycosylated and non-glycosylated variants were more similar than for full-length CelA, with the difference in final extent of conversion being exacerbated by a higher rate of conversion for glycosylated NT1 throughout the course of digestion. Enzymatic digestions by CT1 (possessing only exoglucanase activity) show distinctly lower conversions than full-length enzyme and NT1 (Figure 5B), possibly resulting from an inability to produce new chain ends. Interestingly, glycosylated CbCT1 performed more poorly (11% conversion) than EcCT1 (23% conversion) after 96 h (Figure 5B) which shows that the impact of glycosylation is different depending on the mode of action of the enzyme. The NT2 (GH9CBM3c) construct displays the least variation due to host expression and show similar levels of performance, most likely because this region of CelA doesn’t seem to be natively glycosylated (Figure 5D and Figure 2B). Given that the glycosylated and non-glycosylated CelA enzymes are still stable at 75°C, there are several factors that could explain the difference in activity: (1) differences in the dynamics of folded domains and the linker peptides, (2) differences in the separation of the domains – a characteristic likely important for optimal activity, and/or (3) differences in substrate binding. MD simulations indicate that the dynamics and the elongation of CelA change with the degree of glycosylation; however, the possibility remains that activity is also impacted by differential substrate binding by the glycosylated and non-glycosylated forms of CelA.

8 ACS Paragon Plus Environment

Page 8 of 27

Page 9 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Glycosylation modulates binding to cellulose and lignin Efficient binding to biomass is necessary, especially at high temperatures and low substrate loadings. However, at high solids loadings, higher substrate binding affinity can decrease overall activity, as it is thought to slow the dissociation rate of the unproductivelybound cellulase (and subsequent productive reattachment to substrate). Additionally, if binding affinity to non-substrate moieties, such as lignin, is too high, this may also result in attenuated activity as the enzyme becomes stuck in a non-productive binding mode. We conducted binding assays to the insoluble substrates, Avicel and lignin, and found that non-glycosylated versions of CelA had a greater propensity to remain associated with both substrates after incubation (Figure 6). After incubation with lignin, 86% of glycosylated CelA was released, whereas only 45% of nonglycosylated CelA was released. The same trend holds for Avicel, where the released fractions were 26% and 10%, respectively. This indicates that some of the decrease in hydrolytic activity displayed by non-glycosylated CelA may be attributed to non-productive, inhibitory binding to cellulose and lignin. We believe that rather than being a detrimental feature, this decreased adsorption propensity likely reflects more dynamic binding by glycosylated CelA, in which the cellulase can hydrolyze substrate, disassociate, and search for more substrate. It has been observed experimentally that cellulases with reduced affinity for substrate do not necessarily display decreased activity (30), and it has been more recently been shown that there is an optimal degree of binding affinity for cellulases to their substrate, consistent with the Sabatier principle (31). This is most likely the reason for the fact that glycosylated CelA outperforms its nonglycosylated counterparts in both initial rate of conversion and final extent of substrate conversion (Figure 5). These results, for cellulose, are different from previous studies conducted on T. reesei Cel7A where glycosylation was shown to increase binding affinity and most likely activity (15, 32). However, the location and composition of the glycosylation in this enzyme are very different from the ones identified in our study. Indeed, glycans found on CelA, are uniquely galactose disaccharides whereas the glycosylation of other cellulases reported in the literature was composed of mixed glycans. Additionally, we hypothesized that another reason for this difference could also be due to the inherent hydrophobicity of the protein backbone of CelA. Therefore, to further investigate the differences in binding, especially to lignin, we also 9 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

conducted simulations using the protein design software, Rosetta (33-35), to determine the potential presence of hydrophobic patches on CelA, focusing on the linker peptides where the glycosylation is found. We followed the work we had previously conducted on correlating the presence of hydrophobic patches on biomass degrading enzymes with their propensity to nonspecifically bind to lignin (36, 37). Since CelA is a multifunctional flexible protein, we generated several different conformations of the enzymes distinct from one another. We focused on four distinct conformations to conduct the analysis. Figure S19 shows the location of the hydrophobic patches on the linkers for the different conformations. These locations change as a function of the conformations that the linkers adopt and can be in some cases quite substantial. We also report the highest hydrophobic patch score for each linker and the total of the highest scores for each linker. The total hydrophobic patch scores are in the range of those we have previously found sufficient to induce non-productive binding to lignin (Figure S20). All of the patches located on the linkers could be more exposed in the absence of glycosylation which could lead to increased binding to lignin or cellulose through hydrophobic interactions. Conclusion: Detailed understanding of glycosylation by bacteria has lagged behind that of eukaryotic systems. To address this gap, we presented a detailed investigation into the identities and roles of glycosylation in the most active cellulase yet discovered. The pattern of glycosylation discovered here is unique among previous studies of eukaryotic and bacterial cellulase glycosylation systems. Combining experiments and computation, we find that glycosylation on the three intrinsically disordered regions of CelA plays key roles in modulating its proteolytic stability, thermodynamic stability, substrate binding, hydrolytic activity, and overall tertiary structure. We also found, using a set of truncations of the full-length enzyme, that the glycosylation is evenly distributed on its long linker peptides. The collective effects of glycosylation provide this multifunctional cellulase with the ability to function optimally in the harsh and substrate-limited geothermal hot spring ecosystem. The understanding of bacterial cellulase glycosylation developed in this study opens the door to further studies of glycosylation in these systems including more detailed analysis of 10 ACS Paragon Plus Environment

Page 10 of 27

Page 11 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

binding to biomass with varying degrees of lignin content or studies on the effect of different glycan decorations (with CelA expressed in heterologous hosts). It also motivates studies on the roles of glycosylation in more complex CAZyme systems such as the one of the cellulosome in Clostridium thermocellum. Additionally, this study is relevant for applications in the biofuels and biomaterials industries where better biomass degrading enzymes are needed. For example, nearly all industrial applications of secreted enzymes require access to hosts capable of large scale production yielding high titers of proteins. Challenges with the expression of bacterial glycosyl hydrolases, many of which are highly active or display diverse and important specificities, continue to limit the introduction of these enzymes into commercial markets. Enhancing our understanding of the roles played by glycans decorating bacterial enzymes will greatly enable use of these diverse catalysts at large scale to help enable the sustainable production of biofuels and biochemicals. Materials and Methods Details about strains, media, and growth Conditions, construction and transformation of CelA and CelA derivative expression vectors, and protein expression can be found in the supplementary information. Protein Purification and glycoprotein staining Lysate from E. coli and concentrated broth from C. bescii were purified via immobilized metal affinity purification using a 5 mL HisTrap FF Crude column (GE Healthcare, Piscataway, NJ, U.S.A.). Once bound, Buffer B (Buffer A with 200 mM Imidazole) was used to elute the protein. The recovered fractions were then purified further via size exclusion chromatography using a HiLoad 16/600 Superdex 200 prep grade column (GE Healthcare, Piscataway, NJ, U.S.A.). This step buffer exchanged the fractions into SEC Buffer (200 mM acetate, 100 mM NaCl, 10 mM CaCl2, pH 5.5). Once collected, the recovered protein was concentrated using a 10 kDa spin concentrator (Sartorius, Stonehouse, UK). Final protein concentration was determined using a Pierce BCA protein assay (Pierce, Rockford, IL).

11 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Purified protein from C. bescii and E. coli were analyzed by SDS-PAGE using a 4-12% NuPAGE Bis-Tris Gel (Invitrogen, Carlsbad, CA, U.S.A.) run at 150 V for 55 min in MOPS SDS buffer. Glycosylated proteins were visualized by a Glycoprotein Staining Kit (Pierce Biotechnology, Waltham, MA, U.S.A.). Based on the Periodic-Acid-Schiff method, this kit detects glycans by oxidizing their sugar moieties to aldehydes, which are then specifically stained magenta by formation of a Schiff base (38). Following glycoprotein staining, the gel was stained with Colloidal Blue (Invitrogen, Carlsbad, CA, U.S.A.) to visualize non-glycosylated proteins which are stained blue. Analysis of released O-glycans (-elimination) The peptides and glycopeptides were treated with a mixture of 50 mM NaOH solution and sodium borohydride (NaBH4) solution in 50 mM NaOH solution. The samples were heated to 45C for 18 h. The samples were cooled, neutralized by 10 % acetic acid, passed through Dowex H+ resin column (Dow Chemical, Midland, MI, U.S.A.) and lyophilized. Borates were removed under a stream of nitrogen. The glycans were permethylated for structural characterization by mass spectrometry as reported previously(39, 40). Briefly, the dried eluate was dissolved with dimethyl sulfoxide and methylated by using methyl iodide on DMSO/NaOH mixture. The reaction was quenched with water and the reaction mixture was extracted with methylene chloride and dried. The permethylated glycans were dissolved in methanol and crystallized with αdihydroxybenzoic acid (DHBA, 20 mg/mL in 50% v/v methanol:water) matrix. Analysis of glycans present in the samples was performed in the positive ion mode by MALDI-TOF/TOF-MS using AB SCIEX TOF/TOF 5800 (Applied Biosystems, Foster City, CA, U.S.A.) mass spectrometer. Monosaccharide composition analysis by HPAEC-PAD The samples were dialyzed overnight against nanopure water in a cold room. The water was completely replaced 4 times during the entire dialysis period. After lyophilization, the glycoproteins were hydrolyzed with trifluoroacetic acid (2N TFA for 4hr at 100ºC) for neutral and amino sugars analysis. A mix of neutral and amino sugars standards (Fuc, GalNAc, GlcNAc, Gal, Glc, and Man) with known amounts were hydrolyzed in the

12 ACS Paragon Plus Environment

Page 12 of 27

Page 13 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

same manner and at the same time as the samples. Four concentrations of standard mixture were prepared to establish a calibration equation. The monosaccharides were analyzed by HPAEC using a Dionex ICS3000 system (Dionex, Sunnyvale, CA, U.S.A.) equipped with a gradient pump, an electrochemical detector and an autosampler. The residues were separated by a Dionex CarboPac analytical column with an amino trap using nanopure H2O and NaOH as eluents. All methods were based on protocols described by Hardy and Townsend (41). Glycosyl linkage analysis For determination of sugar linkages, partially methylated alditol acetates were prepared from permethylated O-glycans with some modification in the reported procedure (42). Briefly, permethylated glycans were hydrolyzed with 2M TFA at 100oC for 4 h, followed by reduction with 1% NaBH4 in 30 mM NaOH and acetylation with acetic anhydride/pyridine (1:1, v/v) at 100°C for 15 min. The partially methylated alditol acetates thus obtained were analyzed by GC-MS. Circular dichroism (CD) CD measurements were carried out using a Jasco J-715 spectropolarimeter (Jasco, Easton, MD, U.S.A.) with a jacketed quartz cell with a 1.0 mm path length. The cell temperature was controlled to within +/− 0.1oC by circulating 90% ethylene glycol using a Neslab R-111m water bath (NESLAB Instruments, Portsmouth, NH, U.S.A.) through the CD cell jacket. The results were expressed as mean residue ellipticity []mrw. The spectra obtained were averages of five scans. The spectra were smoothed using an internal algorithm in the Jasco software package, J-715 for Windows. Protein samples were studied in 20 mM sodium acetate buffer, pH 5.0 with 100 mM NaCl and 15mM CaCl2 at a protein concentration of 0.5 mg/mL for all samples. For the analysis of thermostability, the temperature was increased from 25 to 105oC with a step size of 0.2o C and monitored at a wavelength of 222 nm. Thermolysin digestions Digestions were performed at 70°C in 200 µL solution (50 mM Tris-HCl, 0.5 mM CaCl2, pH 8.0) with an initial CelA concentration of 0.5 mg/mL and an initial Bacillus thermoproteolyticus thermolysin (Promega, Madison, WI, U.S.A.) concentration of 0.05 mg/mL. Twenty µL samples were taken at denoted time intervals and all samples were immediately quenched with 8 µL of 13 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

50 mM EDTA. Samples for each time point were analyzed by SDS-PAGE using a 4-12% NuPAGE Bis-Tris Gel (Invitrogen, Carlsbad, CA, U.S.A.) run at 150 V for 55 minutes in MOPS SDS buffer. The gel was visualized using a Power Stainer (Pierce Biotechnology, Waltham, MA, U.S.A.). Enzymatic digestion assays and sugar release analysis The activities of CelA and its relevant derivatives were determined through 1% Avicel digestions. Avicel was loaded into 2 mL screw-top vials. Hydrolysis was carried out in SEC buffer at constant temperature of 75°C with rotation. A final enzyme loading of 15 mg/g cellulose was utilized for CelA and truncation mutants. All digestions were supplemented with 1 mg/g of βglucosidase from Thermatoga maritima. Digestions were run for 96 h with sampling at 8, 24, 48, 72, and 96 h in triplicate. At each time point, samples were diluted by a factor of ten before being immediately filtered through a 0.22 µm filter and then refrigerated until subjected to monomeric sugar analysis. The monomeric sugar yield (glucose and xylose) and cellobiose were measured by HPLC/RID using a HPX-87H 7.8 x 300 mm i.d., 9 µm column (BioRad, Hercules, CA, U.S.A.) with an isocratic flow of 0.01 N H2SO4 at 0.6 mL/min for a total run time of 27 min using standard protocols (43). Standards and samples were injected onto the column at a volume of 20 µL, while the temperature of the column and detector were maintained at 55°C. Sugar standards used to construct calibration curves were purchased from Absolute Standards (Hamden, CT, U.S.A.). SDS-PAGE binding assays Pre-cast 3-8% SDS-PAGE gels (Life Technologies, Carlsbad, CA) where used for CelA to visualize proteins bound and unbound to both Avicel and lignin extracted from corn stover (the lignin extraction procedure can be found in Reference (6)). All gels were run at 200V constant for 50 min in MOPS-SDS buffer. For binding studies, 800 μg of protein was incubated with 20mg biomass, corresponding to a process loading of 20 mg protein/g biomass with biomass theoretically consisting of 50% cellulose, therefore, we had 10 mg of total cellulose in the form of Avicel. For the lignin binding work, the same protein loading as previously stated was utilized with 6mg of total lignin (representing 30%) of mass associated with the biomass. The protein/lignin combinations were incubated at 75°C for 30 min in 20mM sodium acetate buffer with 5 mM calcium chloride at pH 5.5. After the incubation, both the Avicel and 14 ACS Paragon Plus Environment

Page 14 of 27

Page 15 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

lignin samples were centrifuged at 14,000xg for five min and the supernatant containing the unbound protein was collected while the Avicel and lignin pellet were washed an addition four times with the above described buffer. The desalted starting enzyme, supernatant containing unbound proteins, and the lignin pellet containing protein bound to insoluble lignin were diluted with 4X LDS sample buffer (3:1 sample:buffer) and held at 100oC for 10 min in preparation for SDS-PAGE. For the SDS-PAGE gel, CelA from either C. bescii (Figure S17a) or E. coli (Figure S17b) were electrophoresed after incubation to quantify amount of protein loss via adsorption to these two substrates via densitometry according to a concentration curve (Figure S18). Molecular simulations Molecular dynamics (MD) simulations were performed with two basic setups: linker 3 (see Figure 2A in main text for nomenclature) in solution (Figure S13A-C) and CBM3b - linker 3 - CBM3b (see Figure 1 in main text) on the surface of the hydrophobic face of cellulose (Figure S13D-G). The cellulose slab is three layers thick and twelve chains wide, and each chain has a degree of polymerization of forty. For each setup, three different levels of glycosylation were simulated: non-glycosylated (corresponding to the linker in E. coli expressed CelA variants), “base case” glycosylation (corresponding to the glycan content in natively expressed C. bescii CelA), and “doubly glycosylated” (containing twice as many glycans as the “base case”). In all cases, the glycans are α-1,2 linked di-galactose, as determined experimentally in this work. Base case glycosylation involves the attachment of glycans to every other threonine side chain in the linker region. Doubly glycosylated systems have glycans attached to every threonine in the linker region. The structures of the carbohydrate binding domains were obtained from homology models using SWISS-MODEL (44-48). 4b9f, a structure of the CBM3a in the main scaffoldin from Clostridium thermocellum with 54% of identity was used as a template for the CBM3b. 2xfg, a structure of the CBM3c in CelI from Clostridium thermocellum with 60% of identity was used as a template for the CBM3c. The starting configurations for the surface simulations were constructed in the following manner. Two key tryptophan residues exposed on the exterior of each CBM3b (Trp775 and Trp830 on the first CBM3 and Trp974 and Trp1029 on the second) were situated with their side chain stacked on the hydrophobic face of cellulose. At time zero, the linker in each case is in contact with the cellulose surface. As a control, a fourth surface simulation 15 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

was also constructed with base case glycosylation wherein the linker is far from the cellulose surface at time zero (Figure S13F). We included this simulation 1) to ensure that starting the simulation with the linker on the surface of cellulose doesn’t artificially affect the dynamics of the system and 2) to gain insight into the time scale for linker binding. Each system was solvated in a box of explicit water molecules. All systems were built utilizing CHARMM version 42b1 and the CHARMM36 force field for carbohydrate and protein associated with that version. Water was described with the TIP3P model (49). All MD simulations utilized explicit solvent. The systems then minimized and equilibrated in CHARMM (50) and subsequent production runs were performed with constant volume and temperature (348 K) in NAMD (51). Linkers in solution were simulated for 500 ns of unrestrained molecular dynamics. Surface simulations were simulated for 340 ns, with the exception of the “base case” simulation that began with linker unbound. This latter simulation was simulated for 270 ns. The linker in this “control” simulation binds to the surface of cellulose approximately 180 ns into the simulation. From this point on, it behaves similarly to the “base case” simulation in which the linker began bound to the cellulose surface (more details about the molecular simulations can be found in the supplementary information). Hydrophobic patch analysis. These analyses were conducted using the protein design software, Rosetta (33-35), as done in reference in Sammond et al. (36). The patches were then mapped out on the corresponding structures that were generated in the molecular simulations conducted above. Abbreviations (GlcpNAc) Methyl-D-glucopyranose (GlcNAc ) N-Acetylglucosamine (Gal) Galactose (CBM) Carbohydrate-binding module (CAZymes) Carbohydrate-active enzymes (PTM) Post-translational modifications (CD) Catalytic domain (CbCelA) C. bescii CelA (EcCelA) E. coli CelA (MD) Molecular dynamics 16 ACS Paragon Plus Environment

Page 16 of 27

Page 17 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

(CD) Circular dichroism Supplementary information Additional Materials and Methods, Tables S1-S3, Figures S1-S20. Conflicts of interest There are no conflicts to declare. Acknowledgments Funding provided by the BioEnergy Science Center (BESC) and the Center for Bioenergy Innovation (CBI), from the U.S. Department of Energy Bioenergy Research Centers supported by the Office of Biological and Environmental Research in the DOE Office of Science. This research was also supported in part by the National Institutes of Health grants 1S10OD018530 and P41GM10349010 and the Chemical Sciences, Geosciences and Biosciences Division, Office of Basic Energy Sciences, U.S. Department of Energy grant (DE-FG02-93ER20097) at the Complex Carbohydrate Research Center. This work was authored in part by Alliance for Sustainable Energy, LLC, the Manager and Operator of the National Renewable Energy Laboratory for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.

17 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figures

Figure 1: C. bescii CelA bound to cellulose (green/red “sticks” representation). The catalytic domains (orange and maroon), carbohydrate binding modules (cyan and purple), and proline/threonine-rich linker peptides (gray) decorated with O-glycans (yellow). Domain organization of CelA and three functional truncation mutants (bottom). These constructs were expressed in both C. bescii and E. coli to obtain natively glycosylated and non-glycosylated variants, respectively. FL, full-length; CT, C-terminal truncation; NT1, N-terminal truncation with 2 CBM domains; NT2, N-terminal truncation with 1 CBM domain.

Figure 2: The linker peptides of C. bescii CelA are decorated with O-linked glycans. (A) Model of CbCelA in solution (domain coloring identical to Fig. 1), with di-galactose residues (yellow) decorated linkers 2-4. (B) CelA and most

18 ACS Paragon Plus Environment

Page 18 of 27

Page 19 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

truncation mutants produced in C. bescii are glycosylated, as indicated by greater MW compared to E. coli variants and specific staining of glycoproteins (magenta) by periodic acid-Schiff staining on SDS-PAGE.

Figure 3: Glycosylation stabilizes CelA against thermal unfolding. (A) Characterization of secondary structure loss of “base case” (expressed in C. bescii) and “non-glycosylated” (expressed in E. coli) CelA at 222 nm as a function of temperature. (B) To probe the effects of glycosylation computationally, models of “Linker 3” (defined in Fig. 2A) and two CBM3b domains were constructed both in solution and bound to the surface of cellulose (shown). Three states of these models with varying degrees of glycosylation were constructed: “non-glycosylated” (corresponding to CelA expressed in E. coli), “base-case” approximating native glycosylation in C. bescii as deduced by glycomics, and “2x

19 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

glycosylated” wherein every threonine on the linker is decorated with di-galactose. Shown here is the “base case” model. (C) Root mean square fluctuation (RMSF) of residues in the N-terminal CBM3b of the three models on the surface of cellulose demonstrates a stabilizing influence of the linker glycans.

20 ACS Paragon Plus Environment

Page 20 of 27

Page 21 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Figure 4: Glycosylation protects CelA from protease attack. (A) The sequence of “Linker 3” depicting scissile bonds susceptible to Bacillus thermoproteolyticus thermolysin. Galactose disaccharides modeled in the “base case” are shown in yellow, whereas the “2x glycosylated” system includes di-galactose at every threonine (both yellow (“base case”) and maroon). (B) SDS-PAGE depicting CelA expressed in C. bescii and E. coli at varying time points after incubation at 70C with thermolysin. X is CelA subject only to heat in the absence of thermolysin. (C) The “enzymeaccessible surface area” was calculated for the scissile bonds of the three states of the model both in solution and bound to the surface of cellulose. Inset: surface representation of thermolysin (PDB code 5UN3) with the active site residues in sticks.

21 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

22 ACS Paragon Plus Environment

Page 22 of 27

Page 23 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Figure 5: (A-D) Avicel hydrolysis performance of CelA and functional truncation mutants expressed in C. bescii (“base case”) and E. coli (“non-glycosylated”). Error bars are computed as the standard error of the mean.

Figure 6: Fraction of CelA expressed in C. bescii (“base case”) and E. coli (“non-glycosylated”) left in the supernatant after incubation with lignin and Avicel at 75C for 30 minutes.

23 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.

A. Lochner et al., Use of label-free quantitative proteomics to distinguish the secreted cellulolytic systems of Caldicellulosiruptor bescii and Caldicellulosiruptor obsidiansis. Appl Environ Microbiol 77, 4042-4054 (2011). R. Brunecky et al., High activity CAZyme cassette for improving biomass degradation in thermophiles. Biotechnol Biofuels 11, 22 (2018). Y. J. Bomble et al., Lignocellulose deconstruction in the biosphere. Curr Opin Chem Biol 41, 61-70 (2017). R. Brunecky et al., Revealing nature’s cellulase diversity: the digestion mechanism of Caldicellulosiruptor bescii CelA. Science 342, 1513-1516 (2013). J. Young, D. Chung, Y. J. Bomble, M. E. Himmel, J. Westpheling, Deletion of Caldicellulosiruptor bescii CelA reveals its crucial role in the deconstruction of lignocellulosic biomass. Biotechnology for biofuels 7, 142 (2014). R. Brunecky et al., The Multi Domain Caldicellulosiruptor bescii CelA Cellulase Excels at the Hydrolysis of Crystalline Cellulose. Sci Rep 7, 9622 (2017). D. Chung et al., Homologous expression of the Caldicellulosiruptor bescii CelA reveals that the extracellular protein is glycosylated. PLoS One 10, e0119508 (2015). D. Shental-Bechor, Y. Levy, Effect of glycosylation on protein folding: a close look at thermodynamic stabilization. Proceedings of the National Academy of Sciences of the United States of America 105, 8256-8261 (2008). A. Varki, Biological roles of glycans. Glycobiology 27, 3-49 (2017). H. Nothaft, C. M. Szymanski, Protein glycosylation in bacteria: sweeter than ever. Nat Rev Microbiol 8, 765-778 (2010). G. T. Beckham et al., Harnessing glycosylation to improve cellulase activity. Curr Opin Biotechnol 23, 338-345 (2012). W. S. Adney et al., Probing the role of N-linked glycans in the stability and activity of fungal cellobiohydrolases by mutational analysis. Cellulose 16, 699-709 (2009). L. Chen et al., Specificity of O-glycosylation in enhancing the stability and cellulose binding affinity of Family 1 carbohydrate-binding modules. Proceedings of the National Academy of Sciences of the United States of America 111, 7612-7617 (2014). T. Jeoh, W. Michener, M. E. Himmel, S. R. Decker, W. S. Adney, Implications of cellobiohydrolase glycosylation for use in biomass conversion. Biotechnol Biofuels 1, 10 (2008). A. Amore et al., Distinct roles of N- and O-glycans in cellulase activity and stability. Proceedings of the National Academy of Sciences of the United States of America 114, 13667-13672 (2017). G. T. Beckham et al., The O-glycosylated linker from the Trichoderma reesei Family 7 cellulase is a flexible, disordered protein. Biophys J 99, 3773-3781 (2010). R. Shogren, T. A. Gerken, N. Jentoft, Role of glycosylation on the conformation and chain dimensions of O-linked glycoproteins: light-scattering studies of ovine submaxillary mucin. Biochemistry 28, 5525-5536 (1989). M. Langsford et al., Glycosylation of bacterial cellulases prevents proteolytic cleavage between functional domains. FEBS letters 225, 163-167 (1987). R. Hoiberg-Nielsen, P. Westh, L. K. Skov, L. Arleth, Interrelationship of steric stabilization and selfcrowding of a glycosylated protein. Biophys J 97, 1445-1453 (2009). 24 ACS Paragon Plus Environment

Page 24 of 27

Page 25 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40.

V. Kayser et al., Glycosylation influences on the aggregation propensity of therapeutic monoclonal antibodies. Biotechnology Journal 6, 38-44 (2011). G. J. Gerwig et al., The nature of the carbohydrate-peptide linkage region in glycoproteins from the cellulosomes of Clostridium thermocellum and Bacteroides cellulosolvens. Journal of biological chemistry 268, 26956-26960 (1993). G. J. Gerwig, J. P. Kamerling, J. F. VLIEGENTHART, R. LAMED, E. A. BAYER, Primary structure of O-linked carbohydrate chains in the cellulosome of different Clostridium thermocellum strains. The FEBS Journal 196, 115-122 (1991). G. J. GERWIG et al., Novel oligosaccharide constituents of the cellulase complex of Bacteroides cellulosolvens. The FEBS Journal 205, 799-808 (1992). S. Pagès et al., Sequence analysis of scaffolding protein CipC and ORFXp, a new cohesin-containing protein inClostridium cellulolyticum: Comparison of various cohesin domains and subcellular localization of ORFXp. Journal of bacteriology 181, 1801-1810 (1999). D. C. Irwin, S. Zhang, D. B. Wilson, Cloning, expression and characterization of a family 48 exocellulase, Cel48A, from Thermobifida fusca. Eur J Biochem 267, 4988-4997 (2000). D. L. Watson, D. B. Wilson, L. P. Walker, Synergism in binary mixtures of Thermobifida fusca cellulases Cel6B, Cel9A, and Cel5A on BMCC and Avicel. Appl Biochem Biotechnol 101, 97-111 (2002). A. N. Shirke et al., Comparative thermal inactivation analysis of Aspergillus oryzae and Thiellavia terrestris cutinase: role of glycosylation. Biotechnology and bioengineering 114, 63-73 (2017). B. Keil, Specificity of proteolysis. (Springer Science & Business Media, 2012). E. T. Prates et al., The impact of O-glycan chemistry on the stability of intrinsically disordered proteins. Chemical Science 9, 3710-3715 (2018). D. Gao et al., Increased enzyme binding to substrate is not necessary for more efficient cellulose hydrolysis. Proceedings of the National Academy of Sciences of the United States of America 110, 10922-10927 (2013). J. Kari et al., Sabatier Principle for Interfacial (Heterogeneous) Enzyme Catalysis. ACS Catalysis, 11966-11972 (2018). C. M. Payne et al., Glycosylated linkers in multimodular lignocellulose-degrading enzymes dynamically bind to cellulose. Proceedings of the National Academy of Sciences of the United States of America 110, 14646-14651 (2013). R. Jacak, A. Leaver-Fay, B. Kuhlman, Computational protein design with explicit consideration of surface hydrophobic patches. Proteins: Structure, Function, and Bioinformatics 80, 825-838 (2011). B. Kuhlman, D. Baker, Native protein sequences are close to optimal for their structures. Proceedings of the National Academy of Sciences 97, 10383 (2000). C. A. Rohl, C. E. M. Strauss, K. M. S. Misura, D. Baker, in Methods in enzymology. (Elsevier, 2004), vol. 383, pp. 66-93. D. W. Sammond et al., Predicting enzyme adsorption to lignin films by calculating enzyme surface hydrophobicity. J Biol Chem 289, 20960-20969 (2014). J. M. Yarbrough et al., New perspective on glycoside hydrolase binding to lignin from pretreated corn stover. Biotechnol Biofuels 8, 214 (2015). M. MANTLE, A. ALLEN. (Portland Press Limited, 1978). A. Shajahan, C. Heiss, M. Ishihara, P. Azadi, Glycomic and glycoproteomic analysis of glycoproteins-a tutorial. Anal Bioanal Chem 409, 4483-4505 (2017). A. Shajahan, N. T. Supekar, C. Heiss, M. Ishihara, P. Azadi, Tool for Rapid Analysis of Glycopeptide by Permethylation via One-Pot Site Mapping and Glycan Analysis. Anal Chem 89, 10734-10743 (2017). 25 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51.

M. R. Hardy, R. R. Townsend, [12] High-pH anion-exchange chromatography of glycoproteinderived carbohydrates. Methods in enzymology 230, 208-225 (1994). W. S. York, A. G. Darvill, M. McNeil, T. T. Stevenson, P. Albersheim, in Methods in Enzymology. (Academic Press, 1986), vol. 118, pp. 3-40. A. Sluiter et al., Determination of sugars, byproducts, and degradation products in liquid fraction process samples. National Renewable Energy Laboratory, (2008). S. Bienert et al., The SWISS-MODEL Repository—new features and functionality. Nucleic Acids Research 45, D313-D319 (2017). N. Guex, C. Peitsch Manuel, T. Schwede, Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective. ELECTROPHORESIS 30, S162-S173 (2009). A. Waterhouse et al., SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic acids research, (2018). P. Benkert, M. Biasini, T. Schwede, Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 27, 343-350 (2010). M. Bertoni, F. Kiefer, M. Biasini, L. Bordoli, T. Schwede, Modeling protein quaternary structure of homo-and hetero-oligomers beyond binary interactions by homology. Scientific reports 7, 10480 (2017). W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926-935 (1983). B. R. Brooks et al., CHARMM: the biomolecular simulation program. J. Comput. Chem. 30, 15451614 (2009). J. C. Phillips et al., Scalable molecular dynamics with NAMD. J. Comput. Chem. 26, 1781-1802 (2005).

26 ACS Paragon Plus Environment

Page 26 of 27

Page 27 of 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Abstract Graphic

Synopsis: This article highlights the role of glycosylation in biomass degrading enzymes from cellulolytic bacteria and shows that it is essential to retain their peak activity and robustness needed for industrial applications.

27 ACS Paragon Plus Environment