Article pubs.acs.org/ac
Characterizing the Range of Extracellular Protein Post-Translational Modifications in a Cellulose-Degrading Bacteria Using a Multiple Proteolyic Digestion/Peptide Fragmentation Approach Andrew B. Dykstra,†,‡ Miguel Rodriguez, Jr.,† Babu Raman,§ Kelsey D. Cook,‡,∥ and Robert L. Hettich*,† †
Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States University of Tennessee, Knoxville, Tennessee 37996, United States § Dow AgroSciences, Indianapolis, Indiana 46268, United States ‡
S Supporting Information *
ABSTRACT: Post-translational modifications (PTMs) are known to play a significant role in many biological functions. The focus of this study is to optimize an integrated experimental/informatics approach to more confidently characterize the range of post-translational modifications of the cellulosome protein complex used by the bacterium Clostridium thermocellum to better understand how this protein machine is tuned for enzymatic cellulose solubilization. To enhance comprehensive characterization, the extracellular cellulosome proteins were analyzed using multiple proteolytic digests (trypsin, Lys-C, Glu-C) and multiple fragmentation techniques (collisionally activated dissociation, electron transfer dissociation, decision tree). As expected, peptide and protein identifications were increased by utilizing alternate proteases and fragmentation methods, in addition to the increase in protein sequence coverage. The complementarity of these experiments also allowed for a global exploration of PTMs associated with the cellulosome based upon a set of defined PTMs that included methylation, oxidation, acetylation, phosphorylation, and signal peptide cleavage. In these experiments, 85 modified peptides corresponding to 28 cellulosome proteins were identified. Many of these modifications were located in active cellulolytic or structural domains of the cellulosome proteins, suggesting a level of possible regulatory control of protein function in various cellulotyic conditions. The use of complementary proteolytic digestion/peptide fragmentation processes allowed for independent verification of PTMs in different experiments, thus leading to increased confidence in PTM identifications.
T
database searching is complicated by the wide variety of known PTMs, exemplified by the 955 known modifications currently deposited in the Unimod database.14 The addition of multiple PTMs to search parameters exponentially increases search space,15 which in turn increases search time as well as the false discovery rate of the results. To probe the PTM signature of a particular organism, a multifaceted approach must be utilized in order to increase the number of protein identifications, the number of modified peptide identifications, and confidence in these results. The value of using multiple proteases in proteomic experiments has been demonstrated in multiple studies.16−19 In addition to multiple proteases, the introduction of alternate peptide fragmentation methods has also given rise to another dimension in PTM identification. For example, electrontransfer dissociation (ETD)20,21 induces peptide fragmentation
he role of post-translational modifications (PTMs) in influencing and controlling protein function has become increasingly evident as studies have linked PTMs to a wide variety of cellular activities, including protein activation,1 tumorigenesis,2 chemotaxis,3 redox signaling,4 and protein secretion into the extracellular matrix.5 Despite the increasing awareness of the importance of PTMs in biological systems, large-scale identification of protein modifications remains difficult. Traditional methods of PTM identification have relied on gel stains specific to particular modifications,6,7 immunochemistry,8 and selective mass spectrometry.9 However, these methods are typically limited by the requirement of purified proteins in quantities that are often not available,10 the limited specificity and availability of antibodies,11,12 and/or the capability to identify only a single targeted modification per experiment.13 Strides have been made in proteomics with respect to the identification of PTMs10 but in most cases are still limited in the range and confidence of PTM characterizations. While the genome sequences of many organisms have been completed, © 2013 American Chemical Society
Received: November 18, 2012 Accepted: February 14, 2013 Published: February 14, 2013 3144
dx.doi.org/10.1021/ac3032838 | Anal. Chem. 2013, 85, 3144−3151
Analytical Chemistry
■
that is distinct compared to collisionally activated dissociation (CAD).22 In addition to producing distinctive ions, ETD has also been shown to more effectively preserve labile PTMs that may be removed from a modified peptide when fragmented by CAD.23 Combination of these methods, either separately or in “decision tree” experiments24 can provide both redundancy and complementarity, both of which enhance the confidence of assignments. Even when a particular assignment derives from only one fragmentation method (CAD or ETD), the assignment can be considered to be confirmed by multiple fragmentation techniques if it is observed in both decision tree and single-mode (CAD or ETD) experiments because of the protocol differences. Though this distinction is subtle, it is important for increasing overall confidence. Clostridium thermocellum is an organism of particular interest to bioenergy research due to its ability to solubilize cellulosic biomass into carbohydrates used for biofuel production.25 Key to microbial cellulose utilization in C. thermocellum is the cellulosome, an extracellular protein machine consisting of proteins with a variety of enzymatic functions such as endoglucanases, exoglucanases, cellobiohydrolases, xylanases, etc.26−28 During exponential growth, the cellulosomes are thought to be attached to C. thermocellum; however, as the cells approach stationary growth phase, the cellulosomes begin releasing from the cell surface.29 Up to nine different enzymes can attach to a single scaffoldin protein CipA in the cellulosome through high-affinity type I dockerin−cohesin interactions.30,31 In turn, CipA binds to proteins anchored into the cell surface through type II dockerin−cohesin interactions.32 The utilization of these cellulosome complexes by C. thermocellum results in one of the highest observed growth rates on cellulosic biomass.33 While the use of multiple proteases and fragmentation methods has been previously demonstrated in proteomic studies, many of these experiments have been focused on particular subsets of PTMs, such as phosphorylation. Studies directed toward understanding the biological implications of PTMs are far from routine, and PTM analysis ranges considerably in quality and depth in typical proteomics workflows. In this study, complementary proteases and fragmentation methods were used as an enabling platform for integration with advanced bioinformatics methods to more confidently and robustly identify PTMs. Herein we present a systematically optimized and integrated experimental/informatic approach for the broad characterization of PTMs associated with C. thermocellum cellulosome proteins, with the aim of investigating the range of modifications present in this important protein complex. Better knowledge of the PTMs associated with the cellulosome is needed to improve understanding of both the structure and function of the protein machine. This in turn would have broad implications in bioenergy research, potentially enabling more detailed computational cellulosome modeling,34 engineering of artificial cellulosomes,35−37 structural characterization of the cellulosome,38 and cellulosome metabolic activity.39 To that end, this study seeks first to optimize and implement multiple fragmentation methods and complementary enzymes for a deeper dive into the organism’s proteome and/or a significant increase in sequence coverage of the extracellular proteins. Once the range of extracellular proteins has been identified, PTMs will be mined from these results to assess the utility of multiple fragmentation methods and enzymes for probing the PTM signature of the C. thermocellum cellulosome.
Article
EXPERIMENTAL SECTION
Fermentation. C. thermocellum (wild-type strain ATCC 27405) was grown anaerobically on crystalline cellulose (Avicel PH105, FMC BioPolymer, Philadelphia, PA) as previously described.40 Details pertaining to fractionation of fermentation broth and protein isolation are described in the Supporting Information. Protein Digestion. Isolated proteins were denatured and reduced in a solution of 8 M urea and 5 mM DTT in 100 mM Tris buffer for 1 h at room temperature with intermittent vortexing. The sample was then adjusted to 20 mM iodoacetamide and incubated in the dark for 15 min at room temperature to alkylate cysteine residues.41 Denaturing, reducing, and alkylating agents were then diluted, and proteins were digested with sequencing grade modified trypsin (Promega, Madison, WI), Lys-C (Roche, Mannheim, Germany), or Glu-C (Roche, Mannheim, Germany) for approximately 20 h. Further details pertaining to protein digestion are described in the Supporting Information. Two-Dimensional Separations. Columns and nanospray emitters were prepared in-house, and details of these preparations are presented in the Supporting Information. An Ultimate 3000 HPLC (Dionex, Sunnyvale, CA) was employed for two-dimensional separations using three solvents: solvent A (95% HPLC grade water, 5% acetonitrile, 0.1% formic acid), solvent B (30% HPLC grade water, 70% acetonitrile, 0.1% formic acid), and solvent C (500 mM ammonium acetate in solvent A). A four-step, 7 h MudPIT42,43 gradient was utilized for separations. Following the MudPIT elution, two 30 min wash cycles were applied prior to running the next sample. The back column and emitter were connected to the other components of the MudPIT system using standard low-volume unions (Upchurch Scientific, Oak Harbor, WA). LTQ-Orbitrap Hybrid Mass Spectrometry. An LTQOrbitrap-XL hybrid mass spectrometer with ETD44,45 capabilities (Thermo Scientific, Waltham, MA) was operated in data-dependent mode in these experiments. The instrument was operated using a nanospray source (Proxeon Biosystems, Odense, Denmark) with a spray voltage of 3.80 kV. Precursor scans were performed in the Orbitrap in positive ion mode over the range of 400−1700 m/z at resolution 30 000. Fragment scans were performed in the LTQ with five datadependent scans per MS1 scan. The dynamic exclusion46 list had a repeat count of 1, repeat duration of 30.0 s, an exclusion list size of 100, an exclusion duration of 60.0 s, and an exclusion mass width of 1.5 Da. For CAD experiments, the isolation width was 3.0 Da, the normalized collision energy was 35.0, the activation Q was 0.250, and the activation time was 30 ms. For ETD experiments, preview mode was enabled for highresolution master scans, charge state screening was enabled, charge-state-dependent ETD time was enabled, and precursors with a +1 charge state were rejected. Isolation width was 3.0 m/ z, activation time was 100 ms, and supplemental activation energy47 was enabled. Each digest was subjected to a CAD only run, an ETD only run, and a decision tree run. Technical replicates of each sample were measured in all cases. Data Analysis. DTA Generator48 was used to extract .dta files from .RAW files and generate merged DTAs for database searching. Data were searched using OMSSA49 version 2.1.8 with average precursor ion m/z values, monoisotopic product ion m/z values, a precursor ion mass tolerance of 3 Da, a product ion mass tolerance of 0.5 Da, and three missed 3145
dx.doi.org/10.1021/ac3032838 | Anal. Chem. 2013, 85, 3144−3151
Analytical Chemistry
Article
identified with ETD (reflecting its intrinsic selectivity), this includes 193 (1%) identified only with ETD. That so few peptides are identified only with ETD is most likely due to a combination of lower duty cycle with ETD experiments, relatively low efficiency of the chemical fragmentation, and scrambling of c- and z•-fragments to c•- and z-fragments due to supplemental activation energy,56 which could confound the peptide identifications. Much less redundancy was observed when analyzing peptides identified with different enzymatic digests (Supporting Information Figure S1B). The redundancy between trypsin and Lys-C is reasonable considering that both enzymes cleave peptides at the C-terminus of lysine. That this redundancy is relatively small, whereas many peptides are identified with only one or the other enzyme, is also reasonable in light of the fact that trypsin also cleaves at the C-terminus of arginine. Similarly, the orthogonality of Glu-C digested peptides is consistent with cleavage only by Glu-C at the C-terminus of glutamic acid. When results from all fragmentation methods and enzymes were merged, a total of 1058 distinct unmodified proteins were identified. Considering the fragmentation techniques separately, the number of proteins identified averaged for all enzymatic digests was 543, 210, and 587 for CAD, ETD, and decision tree, respectively. Some of these are intracellular proteins found in the extracellular fraction, presumably as a result of unavoidable cellular lysis. Corresponding Venn diagrams for protein identifications as a function of fragmentation or digestion method are presented as Supporting Information Figure S2. Note that we detected significant levels of 60 proteins with cohesin or dockerin domains associated with the extracellular cellulosome complex in these samples, suggesting that there is a reasonable amount of cellulosome release even from the active cells even during exponential growth. These cellulosome-linked proteins are the focus of the remainder of the discussion. The detailed sequence coverage for the detected cellulosome proteins is presented in Supporting Information Table S1. The average sequence coverage for the 48 cellulosome proteins identified with conventional CAD/trypsin experiments ranged from 3% to 72%, with an average of 29 ± 17% (averaged across the 48 proteins). When results from all fragmentations methods and digests were considered, substantial gains in sequence coverages were achieved (averaged across the 48 proteins). Significantly, an additional 12 proteins were identified, so that overall 60 cellulosome proteins were identified, with an average sequence coverage of 40 ± 23%. When sequence coverage of the 48 cellulosome proteins identified with CAD/trypsin experiments is compared to results for the same proteins from Lys-C and Glu-C experiments, sequence overlap (defined as the percentage of amino acid sequence identified with at least two enzymatic digests) ranging up to 63% was observed, with an average sequence overlap of 21 ± 16% (averaged across the 48 proteins). This enhanced coverage, combined with the substantial sequence coverage overlap afforded by multiple enzymes, should increase the probability of identifying PTMs. Identification of Modified Peptides and Proteins. Having established the viability of the multimethod approach for unmodified proteins from C. thermocellum, the next step was to search the data to identify PTMs. A total of 1353 MS/MS spectra from all experiments were identified with modified peptides that corresponded to cellulosome proteins. Of these, 1239 scans (∼92%) could be assigned to the 85 modified peptides (each was manually verified) presented in Supporting Information Table S2. The remaining 114 peptides (∼8%) also
cleavages allowed. For CAD searches, b- and y-ions were searched, and for ETD searches, c-, z•-, and y-ions were searched. A reversed database was concatenated to the C. thermocellum ATCC 27405 database (GenBank Accession No. CP000568.1) along with common contaminants, and peptides were filtered at a 1% false discovery rate (FDR) based on Evalues.50 Detection of at least two peptides was required for each protein identification.51 All searches were performed with a fixed modification of carbamidomethylation.41 Six searches were performed with each data set: no modifications (other than carbamidomethylation of cysteine); mono- (K, R, E), di(K, R), and tri- (K, R) methylation; mono- (C, M, W, Y), di(C, M), and tri- (C) oxidation; acetylation (K); phosphorylation (H, S, T, Y); and computationally predicted cleavage of signal peptides using a database generated with SignalP-3.0.52,53 Identification of the new protein N-terminus was required for the identification of a protein as having its signal peptide cleaved. Modifications other than carbamidomethylation of cysteine and signal peptide cleavage were searched as variable modifications. Though searching for signal peptide cleavage is computationally different than searching for variable modifications, signal peptide cleavage is treated as a PTM in this discussion because identification of this modification would not be possible using conventional database searches. These targeted modifications were selected to represent those prevalent in bacterial systems.54 A strategic exception was phosphorylation, which was included as a test for false positives; the degree of phosphorylation should be minimal for these extracellular fractions.55 Modifications were searched one at a time to minimize search space. Ambiguous identifications (i.e., cases where a single peptide tandem mass spectrometry (MS/MS) spectrum was found to be consistent with two or more possible modifications) were excluded from the results. Surviving identifications were merged, and a modification site was confirmed if it was identified in experiments using at least two of the nine different conditions (three fragmentations methods each performed on three different digests). Venn diagrams presented in the Results and Discussion section were created using Venn Diagram Plotter from Pacific Northwest National Laboratory, available at omics.pnl.gov.
■
RESULTS AND DISCUSSION Identification of Unmodified Peptides and Proteins. In order to establish that the use of multiple fragmentation methods and proteolytic digests was a viable approach for deepening the proteome measurement and thereby enhancing characterization of possible PTMs, the identification of unique unmodif ied peptides was compared initially across the different conditions. The Venn diagrams in Supporting Information Figure S1 illustrate the 18 688 unmodified peptides identified as a function of fragmentation technique and enzymatic digest, respectively. Supporting Information Figure S1A reveals that 63% of these unmodified peptides were identified with more than one fragmentation technique. While this high degree of redundancy is somewhat expected for the use of CAD and ETD in the decision tree experiments, there is also clear complementarity. The identification of peptides with CAD (or ETD) in decision tree experiments but not in a CAD- (or ETD)-only experiment has been previously reported24 and may be the result of sampling variability during chromatographic separations of different experiments. Notable in these results was the fact that, although only 20% of all peptides were 3146
dx.doi.org/10.1021/ac3032838 | Anal. Chem. 2013, 85, 3144−3151
Analytical Chemistry
Article
Figure 1. Modified peptide (CelK dimethylation of K652) identified with both ETD (A) and CAD (B) in a decision tree experiment.
passed the 1% peptide-level FDR filter and showed correspondingly low, OMSSA-assigned E-values (ranging as low as 9.3060 × 10−13), indicating high confidence in the peptide identification. Nevertheless, these scans did not meet the demanding redundancy criterion for confirmation as a modified peptide outlined above (detection by at least two methods). The fact that such a small percentage of scans identified as modified were not corroborated by additional experiments suggests high confidence in the assignment of peptides used to identify the modified proteins in Supporting Information Table S2. Figures 1 and 2 provide two illustrations of the multimethod confirmation of modifications. In Figure 1, the decision tree method identified dimethylation at residue K652 of the CelK enzyme from a Lys-C digest. The +4 form of the peptide was identified with ETD (Figure 1A), and the +2 form of the peptide was identified with CAD (Figure 1B). This modification was also observed in a straight CAD experiment on the same digest (data not shown), thus providing confirmation under at least two different experimental conditions. Figure 2 illustrates identification of an oxidized residue, M1833 of protein CipA, in both an ETD run of a GluC digest (Figure 2A) and a CAD run of a trypsin digest (Figure 2B). In this case, the charge states, the fragmentation methods, the enzymes used for digestion, and the lengths of the peptides were different, but the same modification was identified. The identification and verification of this modification in independent experiments greatly increases the confidence that this particular methionine was oxidized. Figure 3 summarizes the overall performance of the methods for the cellulosome protein CipA. Overall, the sequence coverage increased from 32% using conventional CAD/trypsin
experiments to 48% when data from additional fragmentation methods and digests were included. Particularly noteworthy are the eight identified PTMs (highlighted by circling). The modification at residue K663 in Figure 3B is marked “Acetylation/Trimethylation” due to the uncertainty of unambiguously distinguishing these two species under the experimental conditions used here. It is interesting to note that for CipA, all peptides containing modification sites (excluding the new N-terminal peptide resulting from signal peptide cleavage) were identified both as unmodified peptides in CAD/ trypsin experiments as well as modified peptides in subsequent searches. This suggests that CipA exists as isoforms in a single sample, possibly indicating the presence of different cellulosome combinations containing either modified or unmodified CipA in the cellular sample; however, the methodology used here was not designed to examine this matter at a higher level of detail. Supporting Information Table S2 summarizes the methods used to confirm the eight CipA PTM sites described above and provides similar information for an additional 77 PTMs in 27 other cellulosome proteins. As indicated in the table, all but one of these proteins (Cthe_0660) was also detected in unmodified form. An additional 33 proteins detected only without modification are listed in Supporting Information Table S3. The 61 cellulosome proteins identified comprise ∼73% of the known cellulosome proteins. Nearly half of these (28) were identified with at least one modification. Signal peptide cleavage was prevalent across this set of proteins, consistent with the fact that signal peptides are indicative of proteins marked for export into the extracellular matrix, where these proteins function. With regard to the full set of modified peptides, methionine oxidation was especially prevalent, as expected since oxidation 3147
dx.doi.org/10.1021/ac3032838 | Anal. Chem. 2013, 85, 3144−3151
Analytical Chemistry
Article
Figure 2. Modified peptide (CipA M1833) identified with both a Glu-C/ETD experiment (A) and a trypsin/CAD experiment (B).
is easily induced during sample preparation57 or by the voltage applied to the nanospray emitter.58 As follows, the majority of these oxidations likely are not biologically relevant. Under these experimental conditions, it was not possible to parse out biologically functional methionine oxidations from those induced by experimental conditions; however, many (9 of the 26 detected instances) would have been undetected in a conventional one-method experiment. Oxidations observed on cysteine, tyrosine, and tryptophan residues, on the other hand, are less likely to be due to sample processing and most likely are related to biological function;59−61 11 of 25 detected instances would have been missed by conventional analysis. No phosphorylations were identified in any of the cellulosome proteins, as expected. In view of the extracellular function of cellulosome proteins, it may be surprising that there was no evidence of signal peptide cleavage for so many (six from Supporting Information Table S2, plus all 33 in Supporting Information Table S3). However, in order to verify signal peptide cleavage, identification the protein’s computationally predicted new Nterminus was necessary. Failing to identify the new N-terminus could be the result of inaccurate prediction by the SignalP algorithm, poor ionization efficiency of the peptide containing the protein’s new N-terminus, or N-terminal PTMs that were not taken into consideration in this study. As a result, signal peptide cleavage could easily be missed. Considering the results above, one might wonder whether the large number of PTMs detected results from either the relative size or abundance of proteins. Larger proteins yield
more peptide fragments following digestion, and this could lead to the biased identification of more modified peptides. In addition, peptides corresponding to more abundant proteins will be sampled more during the course of an experiment, so the identification of a modification could be more likely for a peptide sampled more often. In fact, those cellulosome proteins with higher abundances in a previous report40 such as CipA, CelS, CelK, Cthe_0821, and XynC exhibit high degrees of modification in Supporting Information Table S2. However, as illustrated in Supporting Information Table S3, more proteins are identified without modifications than those identified with modifications in Supporting Information Table S2. The vast majority of proteins in Supporting Information Table S3 were identified using both multiple fragmentation methods and enzymatic digestions with no modifications, and combining these results with the high rate of modification observed in Supporting Information Table S2 and the 1% peptide-level FDR used for all searches indicates that the random identification of modifications was minimal. Though proteins identified as modified are primarily the more abundant cellulosome proteins,40 the impact of protein size was most likely minimal (modified proteins ranged in size from ∼28.5 to 248.0 kDa with an average of 95 ± 49 kDa compared to unmodified proteins, which ranged in size from 37.9 to 236.0 kDa with an average of 77 ± 36 kDa). Further confidence in the assignment of the PTMs can be surmised from consideration of methionine oxidation. As noted above, this is a relatively common artifact deriving from sample treatment and/or the electrospray process itself. Nevertheless, 3148
dx.doi.org/10.1021/ac3032838 | Anal. Chem. 2013, 85, 3144−3151
Analytical Chemistry
Article
Figure 3. Sequence coverage of CipA, as identified by trypsin/CAD (A) and all fragmentations and enzymes (B). Also indicated are PTMs identified with at least two fragmentation methods and/or enzymes.
remaining 59 modification sites corresponded to active cellulolytic domains of the protein such as glycosyl hydrolase, carbohydrate binding, cohesin, and dockerin domains. These results are presented in Supporting Information Table S2. Interestingly, modifications in structural proteins such as CipA, OlpB, Cthe_0735, and Cthe_0736 are found primarily in the cohesin domain, while modifications in cellulolytic enzymes such CelA, CelS, CelK, CbhA, XynC, etc., are found primarily in the glycosyl hydrolase domains. These results suggest that many of the modifications identified in Supporting Information Table S2 may be custom-tuned for enzymatic activity in cellulose degradation, perhaps through modulating either enzymatic integration into the cellulosome’s scaffold or cellulase activity in general. Specific focus on modifications of the key structural cellulosome protein CipA (presented in Figure 3) suggests an interesting biological role for controlling structure and/or function of this vital metabolic player. The tyrosine and tryptophan oxidations identified at Y1571 and W485, respectively, are likely the result of endogenous oxidative stress. Oxidation of these amino acids has also been linked to protein conformational changes, which may be linked to functionality of CipA.63 More intriguing CipA modifications identified in Figure 3 are the glutamic acid methylation at E1267 as well as the lysine trimethylation and/or acetylations identified at K80 and K663. Glutamic acid methylation has
only 26 of the 423 methionine amino acids found in the 28 proteins identified in Supporting Information Table S2 (average number of methionine sites per protein 15 ± 7) experience oxidation. In contrast, none of the 414 methionines in the 33 proteins listed in Supporting Information Table S3 was modified, even though the methionine content (averaging 13 ± 6 per protein) is similar to that for the proteins of Supporting Information Table S2. Were PTMs being identified randomly by the search algorithm, one would expect a similar degree of methionine oxidation in the two sets of proteins. Confidence in this assertion must be balanced by the observation that the average sequence coverage for the proteins in Supporting Information Table S3 (average sequence coverage 27 ± 18%, using all enzymes and fragmentation techniques) was less than half that for the proteins in Supporting Information Table S2 (average sequence coverage 56 ± 17%). In total, the data presented in this manuscript provides a wealth of biological information for the C. thermocellum cellulosome. Of the 85 modification sites identified in Supporting Information Table S2, 22 correspond to signal peptide cleavage. The remaining 63 modification sites were compared to protein domains determined using the Pfam protein families database.62 Four of these modifications, including three tryptophan oxidations and one methionine oxidation, were unassigned to a particular domain. The 3149
dx.doi.org/10.1021/ac3032838 | Anal. Chem. 2013, 85, 3144−3151
Analytical Chemistry
Article
Notes
been shown to play a key role in chemotaxis for bacterial proteins,64−66 and it is possible that this particular modification directs CipA toward either carbohydrate substrates or enzymes related to carbohydrate degradation. Although trimethylations and acetylations could not be absolutely distinguished, lysine acetylation has been linked to a number of functions in bacterial proteins, including substrate binding and energy metabolism,67 and lysine trimethylation has been linked to changes in protein flexibility.68,69 Though additional experiments would be necessary to confidently assign this particular modification, it is likely that these two lysine residues play a key role in the functionality of CipA in cellulose degradation.
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS Special thanks to Richard Giannone, Rachel Adams, Adriane Lochner, and Paul Abraham for technical and computational advice. This research was sponsored by the U.S. DOE BER, Bioenergy Research Program. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the U.S. Department of Energy. Participation by K.D.C. while at the National Science Foundation was supported through the NSF Independent Research and Development program. This manuscript has been authored by UT-Battelle, LLC, under contract with the U.S. Department of Energy.
■
CONCLUSIONS The use of complementary proteolytic digestion/peptide fragmentation processes, when integrated with modern informatics tools, provides a robust and more definitive platform for broad PTM characterization in microbial proteome measurements. Although two distinct peptides are generally adequate to identify a peptide with database search algorithms, the small percentage of sequence afforded by two tryptic peptides yields little insight into protein function. The increase in sequence coverage resulting from the use of multiple fragmentation methods allows for the high-confidence identification of multiple PTMs, including methylations and/or acetylations that may have critical structural or functional significance to the cellulosome scaffoldin protein and the protein complex in general. Some of the more heavily modified proteins identified in this study such as CipA, CelK, and CelS are critical to the overall function of the cellulosome, and the extent of modification observed in these experiments suggests these modifications may impact the utilization of cellulose by the protein complex. Such results may ultimately guide biochemical validation and/or custom molecular design of engineered cellulosomes.
■
■
ASSOCIATED CONTENT
S Supporting Information *
Supplemental methods; Figure S1, Venn diagram illustrating peptides identified as a function of fragmentation method and enzymatic digest; Figure S2, Venn diagram illustrating proteins identified as a function of fragmentation method and enzymatic digest; Table S1, sequence coverage of identified proteins; Table S2, modified cellulosome proteins identified, their function, experiments in which particular PTMs were identified, and corresponding Pfam domains; Table S3, unmodified cellulosome proteins identified, their function, and experiments in which they were identified. This material is available free of charge via the Internet at http://pubs.acs.org.
■
REFERENCES
(1) Appella, E.; Anderson, C. W. Eur. J. Biochem. 2001, 268 (10), 2764−2772. (2) Edberg, D. D.; Bruce, J. E.; Siems, W. F.; Reeves, R. Biochemistry 2004, 43 (36), 11500−11515. (3) Blair, D. F. Annu. Rev. Microbiol. 1995, 49, 489−522. (4) Levonen, A. L.; Landar, A.; Ramachandran, A.; Ceaser, E. K.; Dickinson, D. A.; Zanoni, G.; Morrow, J. D.; Darley-Usmar, V. M. Biochem. J. 2004, 378, 373−382. (5) Vonheijne, G. J. Membr. Biol. 1990, 115 (3), 195−201. (6) Steinberg, T. H.; Top, K. P. O.; Berggren, K. N.; Kemper, C.; Jones, L.; Diwu, Z. J.; Haugland, R. P.; Patton, W. F. Proteomics 2001, 1 (7), 841−855. (7) Steinberg, T. H.; Agnew, B. J.; Gee, K. R.; Leung, W. Y.; Goodman, T.; Schulenberg, B.; Hendrickson, J.; Beechem, J. M.; Haugland, R. P.; Patton, W. F. Proteomics 2003, 3 (7), 1128−1144. (8) Makita, Z.; Vlassara, H.; Cerami, A.; Bucala, R. J. Biol. Chem. 1992, 267 (8), 5133−5138. (9) Gibson, B. W.; Cohen, P. Liquid secondary ion mass spectrometry of phosphorylated and sulfated peptides and proteins. In Methods in Enzymology; McCloskey, J. A., Ed.; Academic Press: San Diego, CA, 1990; Vol. 193, pp 480−501. (10) Mann, M.; Jensen, O. N. Nat. Biotechnol. 2003, 21 (3), 255− 261. (11) Ivanov, S. S.; Chung, A. S.; Yuan, Z. L.; Guan, Y. J.; Sachs, K. V.; Reichner, J. S.; Chin, Y. E. Mol. Cell. Proteomics 2004, 3 (8), 788−795. (12) Sheehan, K. M.; Calvert, V. S.; Kay, E. W.; Lu, Y. L.; Fishman, D.; Espina, V.; Aquino, J.; Speer, R.; Araujo, R.; Mills, G. B.; Liotta, L. A.; Petricoin, E. F.; Wulfkuhle, J. D. Mol. Cell. Proteomics 2005, 4 (4), 346−355. (13) Hoffman, M. D.; Sniatynski, M. J.; Kast, J. Anal. Chim. Acta 2008, 627 (1), 50−61. (14) Unimod Protein Modifications for Mass Spectrometry. www. unimod.org (accessed March 15, 2011). (15) Schubert, P.; Hoffman, M. D.; Sniatynski, M. J.; Kast, J. Anal. Bioanal. Chem. 2006, 386 (3), 482−493. (16) Dykstra, A. B.; Chen, M. L.; Cook, K. D. J. Am. Soc. Mass Spectrom. 2009, 20 (11), 1983−1987. (17) Swaney, D. L.; Wenger, C. D.; Coon, J. J. J. Proteome Res. 2010, 9 (3), 1323−1329. (18) van Montfort, B. A.; Doeven, M. K.; Canas, B.; Veenhoff, L. M.; Poolman, B.; Robillard, G. T. Biochim. Biophys. Acta, Bioenerg. 2002, 1555 (1−3), 111−115. (19) Choudhary, G.; Wu, S. L.; Shieh, P.; Hancock, W. S. J. Proteome Res. 2003, 2 (1), 59−67. (20) Coon, J. J.; Syka, J. E. P.; Schwartz, J. C.; Shabanowitz, J.; Hunt, D. F. Int. J. Mass Spectrom. 2004, 236 (1−3), 33−42. (21) Syka, J. E. P.; Coon, J. J.; Schroeder, M. J.; Shabanowitz, J.; Hunt, D. F. Proc. Natl. Acad. Sci. U.S.A. 2004, 101 (26), 9528−9533. (22) Kaiser, R. E.; Cooks, R. G.; Syka, J. E. P.; Stafford, C. J. Rapid Commun. Mass Spectrom. 1990, 4 (1), 30−33.
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. Phone: 865-574-4968. Fax: 865576-8559. Present Address ∥
National Science Foundation, Arlington, Virginia 22230, United States. Author Contributions
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. 3150
dx.doi.org/10.1021/ac3032838 | Anal. Chem. 2013, 85, 3144−3151
Analytical Chemistry
Article
(23) Mikesh, L. M.; Ueberheide, B.; Chi, A.; Coon, J. J.; Syka, J. E. P.; Shabanowitz, J.; Hunt, D. F. Biochim. Biophys. Acta, Proteins Proteomics 2006, 1764 (12), 1811−1822. (24) Swaney, D. L.; McAlister, G. C.; Coon, J. J. Nat. Methods 2008, 5 (11), 959−964. (25) Demain, A. L.; Newcomb, M.; Wu, J. H. D. Microbiol. Mol. Biol. Rev. 2005, 69 (1), 124−154. (26) Bayer, E. A.; Belaich, J. P.; Shoham, Y.; Lamed, R. Annu. Rev. Microbiol. 2004, 58, 521−554. (27) Schwarz, W. H. Appl. Microbiol. Biotechnol. 2001, 56 (5−6), 634−649. (28) Lamed, R.; Kenig, R.; Setter, E.; Bayer, E. A. Enzyme Microb. Technol. 1985, 7 (1), 37−41. (29) Bayer, E. A.; Kenig, R.; Lamed, R. J. Bacteriol. 1983, 156 (2), 818−827. (30) Bayer, E. A.; Shimon, L. J. W.; Shoham, Y.; Lamed, R. J. Struct. Biol. 1998, 124 (2−3), 221−234. (31) Bayer, E. A.; Setter, E.; Lamed, R. J. Bacteriol. 1985, 163 (2), 552−559. (32) Lemaire, M.; Ohayon, H.; Gounon, P.; Fujino, T.; Beguin, P. J. Bacteriol. 1995, 177 (9), 2451−2459. (33) Lynd, L. R.; Weimer, P. J.; van Zyl, W. H.; Pretorius, I. S. Microbiol. Mol. Biol. Rev. 2002, 66 (3), 506 ff. (34) Bomble, Y. J.; Beckham, G. T.; Matthews, J. F.; Nimlos, M. R.; Himmel, M. E.; Crowley, M. F. J. Biol. Chem. 2011, 286 (7), 5614− 5623. (35) Fierobe, H. P.; Bayer, E. A.; Tardif, C.; Czjzek, M.; Mechaly, A.; Belaich, A.; Lamed, R.; Shoham, Y.; Belaich, J. P. J. Biol. Chem. 2002, 277 (51), 49621−49630. (36) Fierobe, H. P.; Mechaly, A.; Tardif, C.; Belaich, A.; Lamed, R.; Shoham, Y.; Belaich, J. P.; Bayer, E. A. J. Biol. Chem. 2001, 276 (24), 21257−21261. (37) Fierobe, H. P.; Mingardon, F.; Mechaly, A.; Belaich, A.; Rincon, M. T.; Pages, S.; Lamed, R.; Tardif, C.; Belaich, J. P.; Bayer, E. A. J. Biol. Chem. 2005, 280 (16), 16325−16334. (38) Xu, J.; Crowley, M. F.; Smith, J. C. Protein Sci. 2009, 18 (5), 949−959. (39) Olson, D. G.; Tripathi, S. A.; Giannone, R. J.; Lo, J.; Caiazza, N. C.; Hogsett, D. A.; Hettich, R. L.; Guss, A. M.; Dubrovsky, G.; Lynd, L. R. Proc. Natl. Acad. Sci. U.S.A. 2010, 107 (41), 17727−17732. (40) Raman, B.; Pan, C.; Hurst, G. B.; Rodriguez, M., Jr.; McKeown, C. K.; Lankford, P. K.; Samatova, N. F.; Mielenz, J. R. PLoS One 2009, 4 (4), e5271. (41) Sechi, S.; Chait, B. T. Anal. Chem. 1998, 70 (24), 5150−5158. (42) Washburn, M. P.; Wolters, D.; Yates, J. R. Nat. Biotechnol. 2001, 19 (3), 242−247. (43) Wolters, D. A.; Washburn, M. P.; Yates, J. R. Anal. Chem. 2001, 73 (23), 5683−5690. (44) McAlister, G. C.; Phanstiel, D.; Good, D. M.; Berggren, W. T.; Coon, J. J. Anal. Chem. 2007, 79 (10), 3525−3534. (45) McAlister, G. C.; Berggren, W. T.; Griep-Raming, J.; Horning, S.; Makarov, A.; Phanstiel, D.; Stafford, G.; Swaney, D. L.; Syka, J. E. P.; Zabrouskov, V.; Coon, J. J. J. Proteome Res. 2008, 7 (8), 3127− 3136. (46) Spahr, C. S.; Davis, M. T.; McGinley, M. D.; Robinson, J. H.; Bures, E. J.; Beierle, J.; Mort, J.; Courchesne, P. L.; Chen, K.; Wahl, R. C.; Yu, W.; Luethy, R.; Patterson, S. D. Proteomics 2001, 1 (1), 93− 107. (47) Swaney, D. L.; McAlister, G. C.; Wirtala, M.; Schwartz, J. C.; Syka, J. E. P.; Coon, J. J. Anal. Chem. 2007, 79 (2), 477−485. (48) Wenger, C. D.; Phanstiel, D.; Lee, M. V.; Bailey, D. J.; Coon, J. J. Proteomics 2011, 11, 1064−1074. (49) Geer, L. Y.; Markey, S. P.; Kowalak, J. A.; Wagner, L.; Xu, M.; Maynard, D. M.; Yang, X. Y.; Shi, W. Y.; Bryant, S. H. J. Proteome Res. 2004, 3 (5), 958−964. (50) Elias, J. E.; Gygi, S. P. Nat. Methods 2007, 4 (3), 207−214. (51) Carr, S.; Aebersold, R.; Baldwin, M.; Burlingame, A.; Clauser, K.; Nesvizhskii, A. Mol. Cell. Proteomics 2004, 3 (6), 531−533.
(52) Erickson, B. K.; Mueller, R. S.; VerBerkmoes, N. C.; Shah, M.; Singer, S. W.; Thelen, M. P.; Banfield, J. F.; Hettich, R. L. J. Proteome Res. 2010, 9 (5), 2148−2159. (53) Nielsen, H.; Engelbrecht, J.; Brunak, S.; vonHeijne, G. Protein Eng. 1997, 10 (1), 1−6. (54) Thompson, M. R.; Thompson, D. K.; Hettich, R. L. J. Proteome Res. 2008, 7 (2), 648−658. (55) Stock, J. B.; Ninfa, A. J.; Stock, A. M. Microbiol. Rev. 1989, 53 (4), 450−490. (56) Ledvina, A. R.; McAlister, G. C.; Gardner, M. W.; Smith, S. I.; Madsen, J. A.; Schwartz, J. C.; Stafford, G. C.; Syka, J. E. P.; Brodbelt, J. S.; Coon, J. J. Angew. Chem., Int. Ed. 2009, 48 (45), 8526−8528. (57) MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Sadygov, R.; Clark, J. M.; Tasto, J. J.; Gould, K. L.; Wolters, D.; Washburn, M.; Weiss, A.; Clark, J. I.; Yates, J. R., III. Proc. Natl. Acad. Sci. U.S.A. 2002, 99 (12), 7900−7905. (58) Morand, K.; Talbo, G.; Mann, M. Rapid Commun. Mass Spectrom. 1993, 7 (8), 738−743. (59) Miki, H.; Funato, Y. J. Biochem. 2012, 151 (3), 255−261. (60) Rinalducci, S.; Murgiano, L.; Zolla, L. J. Exp. Bot 2008, 59 (14), 3781−3801. (61) Stadtman, E. R.; Levine, R. L. Ann. N. Y. Acad. Sci. 2000, 899, 191−208. (62) Finn, R. D.; Mistry, J.; Tate, J.; Coggill, P.; Heger, A.; Pollington, J. E.; Gavin, O. L.; Gunasekaran, P.; Ceric, G.; Forslund, K.; Holm, L.; Sonnhammer, E. L.; Eddy, S. R.; Bateman, A. Nucleic Acids Res. 2010, 38 (Database Issue), D211−D222. (63) Bourdon, E.; Blache, D. Antioxid. Redox Signaling 2001, 3 (2), 293−311. (64) Ahlgren, J. A.; Ordal, G. W. Biochem. J. 1983, 213 (3), 759−763. (65) Kleene, S. J.; Toews, M. L.; Adler, J. J. Biol. Chem. 1977, 252 (10), 3214−3218. (66) Muppirala, U. K.; Desensi, S.; Lybrand, T. P.; Hazelbauer, G. L.; Li, Z. Protein Sci. 2009, 18 (8), 1702−1714. (67) Zhang, J.; Sprung, R.; Pei, J.; Tan, X.; Kim, S.; Zhu, H.; Liu, C. F.; Grishin, N. V.; Zhao, Y. Mol. Cell. Proteomics 2009, 8 (2), 215−225. (68) Taverna, S. D.; Li, H.; Ruthenburg, A. J.; Allis, C. D.; Patel, D. J. Nat. Struct. Mol. Biol. 2007, 14 (11), 1025−1040. (69) Walter, T. S.; Meier, C.; Assenberg, R.; Au, K. F.; Ren, J.; Verma, A.; Nettleship, J. E.; Owens, R. J.; Stuart, D. I.; Grimes, J. M. Structure 2006, 14 (11), 1617−1622.
3151
dx.doi.org/10.1021/ac3032838 | Anal. Chem. 2013, 85, 3144−3151