Automated Data Extraction from In Situ Protein ... - ACS Publications

E-mail: [email protected]. Cite this:J. ... MetaProSIP: Automated Inference of Stable Isotope Incorporation Rates in Proteins for Functional Metapro...
1 downloads 0 Views 2MB Size
Article pubs.acs.org/jpr

Automated Data Extraction from In Situ Protein-Stable Isotope Probing Studies Gordon W. Slysz,† Laurey Steinke,‡ David M. Ward,§ Christian G. Klatt,§ Therese R. W. Clauss,† Samuel O. Purvine,† Samuel H. Payne,† Gordon A. Anderson,† Richard D. Smith,† and Mary S. Lipton*,† †

Pacific Northwest National Laboratory, Richland, Washington 99354, United States University of Nebraska Medical Center, Omaha, Nebraska 68182, United States § Montana State University, Bozeman, Montana 59715, United States ‡

S Supporting Information *

ABSTRACT: Protein-stable isotope probing (protein-SIP) has strong potential for revealing key metabolizing taxa in complex microbial communities. While most protein-SIP work to date has been performed under controlled laboratory conditions to allow extensive isotope labeling of the target organism(s), a key application will be in situ studies of microbial communities for short periods of time under natural conditions that result in small degrees of partial labeling. One hurdle restricting large-scale in situ protein-SIP studies is the lack of algorithms and software for automated data processing of the massive data sets resulting from such studies. In response, we developed Stable Isotope Probing Protein Extraction Resources software (SIPPER) and applied it for large-scale extraction and visualization of data from short-term (3 h) protein-SIP experiments performed in situ on phototrophic bacterial mats isolated from Yellowstone National Park. Several metrics incorporated into the software allow it to support exhaustive analysis of the complex composite isotopic envelope observed as a result of low amounts of partial label incorporation. SIPPER also enables the detection of labeled molecular species without the need for any prior identification. KEYWORDS: stable isotope probing, 13C labeling, carbon metabolism, proteomics, bioinformatics, metaproteomics



INTRODUCTION Stable isotope probing (SIP) is an analytical technique that follows the incorporation of heavy isotopestypically 13C, 15 Ninto microbial cell components such as DNA, rRNA, mRNA, fatty acids, and proteins to gain insight into the taxa that metabolize different metabolites. The technique has proven useful in a number of research areas, such as bioenergy, bioremediation, and carbon cycling, seeking to exploit the functional roles of specific phylogenies in microbial communities to enhance fuel production and remediation strategies, among others. 1−3 Recent reviews of the comparative advantages and disadvantages of the different types of SIP experiments1,4 show that protein-SIP has the potential for detecting very low label incorporation. Indeed, current evidence has suggested limits of 13C and 15N detection as low as 0.1%,5,6 more than 2 orders of magnitude more sensitive than DNA-SIP and RNA-SIP, which require at least 20% label incorporation.1 Processing data from protein-SIP studies is tedious, owing to the time-consuming task of manually extracting and validating isotopic profile information obtained from liquid chromatography−tandem mass spectrometry (LC−MS/MS) analyses of tryptic digest samples.7−9 A typical bioinformatic workflow9,10 begins with careful manual extraction of the observed isotopic profile for every peptide studied, ensuring that extracted © 2014 American Chemical Society

isotopic profiles are free of coeluting peptides that would otherwise confound 13C-incorporation calculations. Third-party software or Excel scripts are then used to calculate theoretical isotopic profiles, and Perl scripts9,11 or Excel-based macros10 perform the 13C-incorporation calculation on a per peptide basis. Literature contains very few reports of in situ protein-SIP studies. Herbst et al.7 describe application of protein-SIP to elucidate in situ polycyclic aromatic hydrocarbon degradation. Microcosms consisting of carbon pellets were loaded with either 13C-napthalene or 13C-fluorene and incubated for 100 days and 68 days, respectively, to achieve high 13 C incorporation that could be quantified using methods and software previously established for processing 15N-labeled peptides that exhibit similar isotopic distributions. In another study, Pan et al.12 labeled acid-mine drainage biofilms grown in laboratory bioreactors with 15N-ammonium sulfate for 105− 274 h to track ammonia flux into various taxa. In that study, the 15 N-containing growth medium was carefully controlled, resulting in >80% 15N atom incorporation, giving a labeled isotopic profile that was clearly distinct from its unlabeled Received: June 28, 2013 Published: January 27, 2014 1200

dx.doi.org/10.1021/pr400633j | J. Proteome Res. 2014, 13, 1200−1210

Journal of Proteome Research

Article

June 19. Cores were combined into vials (five per vial) in order to give ample amounts of material for mass spectral analysis, giving a total of four vials.. Incubation solution was added to the cores (two vials with 12C and two vials with 13C) at 5:50 AM. Photosynthetically active radiation (in the 400−700 nm waveband) was monitored with a LI-192 irradiance sensor (LiCor, Lincoln, NE, USA). Light was 16.69 μM photons/m2/ sec, and increased approximately 0.01 μEinstein per minute. Incubation was started at 6:01 AM, in an area of the pool measured to be 63 °C. At 6:57 AM light in this area was measured at 157.4 μM photons m−2 s−1. Replicates were stopped at 9:00 AM (Light 895.5 μM photons m−2 s−1) by freezing in dry ice.

counterpart. This was followed by software-automated identification and quantification of unlabeled and labeled profiles to reveal 15N incorporation values. While the level of sensitivity afforded by protein-SIP opens the door for short-term experiments performed in situ on uncultivated organisms at a field site, a major roadblock to such large-scale application of protein-SIP to microbial communities is the lack of automated software1,13 for processing the data. For the in situ short-term studies described here, a large fraction of the peptides remain unlabeled while a small percentage peptides are labeled with small amounts the heavy isotope. Such a scenario, as we discuss in this paper, leads to complex overlapping isotopic profiles comprised of a significant portion of unlabeled peptide and smaller percentages of labeled peptides at various levels of isotope incorporation. Previous software by Pan et al, though capable of analyzing low amounts of labeling, did not meet our needs for processing data containing complex mixtures of labeled and unlabeled peptides.12,14 To process such data, we introduce a new stable isotope probing protein extraction resources tool (“SIPPER”) for large-scale automated data extraction from protein-SIP experiments, particularly those that occur on short time scales and involve overlapping composite labeled isotopic profiles. SIPPER provides a console for batch processing data sets, as well as a graphical user interface for visualizing results. The software automates a set of metrics that were determined by manually investigating the nature of 13C-labeling observed in one .LC−MS data set from a short-term (3 h) in situ proteinSIP study of a phototrophic bacterial mat from Yellowstone National Park, which has been studied extensively with respect to community composition, structure and function,3,15 most recently using metagenomic,16 metatranscriptomic17,18 and metaproteomic19 approaches. These metrics allow SIPPER to support exhaustive analysis of the complex composite isotopic envelope observed as a result of low amounts of partial label incorporation, subsequently enabling the detection of labeled molecular species without the need for any prior identification. In addition to these metrics, we report initial demonstration of SIPPER for automated large-scale analysis of label incorporation in 120 data sets from the Yellowstone in situ SIP-protein study.



Protein Extraction, Digestion and Mass Spectral Analysis

All cores incubated with 12C buffer were combined and all cores incubated with 13C buffer were combined. The two sets were treated identically. Cores were lyophilized and ∼0.05 g of dry material was placed in a buffer consisting of 0.1 M triethyl ammonium bicarbonate (TEAB), 0.1% Triton X 100, 6 M guanidine, pH 8.5 with one Roche mini-protease inhibitor tablet/10 mL. Samples were then subjected to bead beating for two 30-s bursts in a Mini bead beater 8 (Biospec Products) set to maximize homogenization (3200 oscillations/min) using 425−600 μm acid-washed glass beads (Sigma). Homogenates were centrifuged at 11000 × g for 10 min to pellet insoluble material. Proteins from the resulting supernatants were precipitated with 5 volumes of 0.1 M ammonium acetate in 100% methanol overnight at −20 °C. Samples were spun 10 min at 800 × g at 4 °C, and the protein pellets were washed once in 0.08 M ammonium acetate in 80% methanol, 20% ddH2O. After centrifugation, the pellets were washed in 80:20 acetone: H2O (v/v). Pellets were resuspended in 5 mM TEAB, 8 M urea, 2% triton X-100, 0.1% SDS, pH 8.5 and an aliquot was subjected to amino acid analysis to determine protein concentration. Trypsin Digestion

Briefly, 250 μg of protein (in 40 μL) was reduced by treating for 1 h at room temperature with 50 mM Tris(2-carboxyethyl)phosphine (1 mM final) and the cysteines were blocked by treating for 10 min at room temperature with 200 mM methyl methane-thiosulfonate (0.125 mM final concentration). These samples were diluted 1:10 (up to 400 μL) with 50 mM TEAB, pH 8.5, and 25 μg of TPCK (L-1-tosylamido-2-phenylethyl chloromethyl ketone)-treated trypsin (Applied Biosystems) was added to each sample. Samples were incubated at 37 °C for 2.5 h, after which an additional aliquot of 50 μg of trypsin was added to the samples. The samples continued to incubate overnight at 37 °C, and then were dried using a Speed Vac.

EXPERIMENTAL METHODS

Field Site

Samples were collected from Mushroom Spring, Yellowstone National Park, MT (UTM Coordinates 463352 Easting, 4931811 Northing). The temperature of the source pool fluctuated from 67.1 to 66.4 °C when measured at 18:30 at the source pool on June 18, 2009.

Multidimensional Column Chromatography

Buffers

Prior to chromatography, a tryptic sample was resuspended in 250 μL of 10 mM KH2PO4 containing 25% acetonitrile, pH 3 (buffer A) and the pH adjusted to 2.7−3 with phosphoric acid. The sample was loaded onto a polySULFOETHYL A SCX column (100 × 4.6 mm 5 μm; 300 Å; PolyLC Inc., Columbia, MD) and analyzed using a Magic 2002 HPLC (Michrom Biosciences Inc., Auburn, CA). Buffer B consisted of buffer A with 500 mM KCL, pH 3.0. The gradient was 135 min (200 μL/min) and consisted of a 45 min equilibration with buffer A, followed by 0−12% B for 15 min, 12% B−50% B for 45 min, 50% B−100% B for 15 min, then 100% B for 15 min. Fractions were collected every minute (200 μL/fraction). The majority of

NaH13CO3 and NaH12CO3 (58 mmol) in anoxic ddH2O were prepared in advance as stock solutions. On site, unfiltered spring water was made anoxic with nitrogen gas, after which the pH was adjusted to 0.95) elution correlation, with the exception of the final peak whose intensity is less than 0.2% of the most abundant peak of the isotopic profile. For QALAEEVAAEIK, shown in the lower half of Figure 3c, the chromatogram correlation data clearly shows

To be classified as unlabeled, the mass spectrum of the peptide was visually inspected by an expert and had to closely resemble the normalized theoretical profile generated by either DeconTools27,28 or similar software.37,38 Higher m/z ranges were visually searched for any isotopic peaks that may be members of the profile under evaluation. If no higher m/z peaks were observed at the m/z spacing expected for the isotopic profile, then the peptide was classified as “unlabeled”. Otherwise, if peaks were observed at higher m/z at the expected m/z spacing and these peaks displayed poor coelution with the other peak members of the isotopic profile, then the profile was classified as “inconclusive.” Of the 1248 LC−MS features, 60 showed clear evidence of 13 C labeling, 295 were clearly not labeled, and 893 were classified as inconclusive. The large number of inconclusive features reflects the rigidity of the manual analysis and the frequent presence of very low abundance peaks at intervals relevant to the peptide being evaluated. While the lack of a clear labeling trend in most of the inconclusive peptides suggests that most of these members were unlabeled, the spectra were not definitive enough for the peptide to be categorized as such. Developing Metric Criteria for Automated Analysis of Protein-SIP Data

Given that manual analysis of identified LC−MS features of a single data set is labor intensive and tedious, the next step was to develop metrics that mimic many of the tasks involved in the manual analysis to enable automated implementation on a proteome scale. A central theme for these metrics is the exhaustive analysis of the fine structure of composite isotopic profiles observed in protein-SIP experiments. For example, in Figure 1 the simulated isotopic profile for the 4:1 mixture of 0 and 8% 13C-labeled peptide with the amino acid sequence “SAMPLERSAMPLER” has several key isotopic profile peaks at an intensity 0.75) 1205

dx.doi.org/10.1021/pr400633j | J. Proteome Res. 2014, 13, 1200−1210

Journal of Proteome Research

Article

mixture of multiple pools of labeled peptide of a range of labeling amounts. The distribution of least-squares fit scores for unlabeled and labeled peptides (after subtracting unlabeled data; see Figure 6) were analyzed for the manually curated data (Supplemental Figure 2, Supporting Information). Fit scores averaged 0.25 for the 60 labeled peptides and the distribution was skewed toward lower (i.e., better) least-squares fit scores. For unlabeled peptides, least-squares fit scores were assigned only to peptides registering a sum of ratios score >0, which happens when the observed isotopic profile has peaks whose relative intensities measure greater than what is expected in the unlabeled theoretical isotopic profile. For these peptides, fit scores averaged 0.63, and the score distributions were centered around 0.5. The differences in the fit score distributions for unlabeled and labeled peptides provide an additional means of resolving the two groups. Consecutiveness Score. The final metric is the “consecutiveness” score, which is the number of consecutive isotopic profile peaks of the labeled isotopic profile that coelute with the most abundant peak of the unlabeled profile. The consecutiveness score is most easily demonstrated by referring to Figure 6C, which displays a labeled composite isotopic profile after the unlabeled isotopic profile was subtracted out. In this example, there are seven contiguous isotopic peaks that have a chromatogram correlation >0.75. The average consecutiveness score for fraction 70 was 4.7 for the manually curated 13Clabeled peptides and 0.2 for manually confirmed unlabeled peptides.

species, which were particularly prevalent in our data. Also, our implementation of label distribution analysis does not score the quality of the deconvoluted mixtures of profiles. Therefore, to extract information regarding the quality of the labeled component of the mixed isotopic profile, as well as quantify the amount of label incorporation, we implemented the fitting approaches used in quantifying deuterium labeling44 and 15Nincorporation in peptides.36 Figure 6 illustrates the steps involved in fitting and extracting the label incorporation amounts. First, the observed isotopic

SIPPER: Software for Automated Processing of Protein SIP Data

SIPPER is designed to implement the protein-SIP metrics in a user-friendly, fully automated resource. Figure 7 displays the overall workflow for processing both identified and unidentified LC−MS features. Automatic processing requires three inputs: (1) the raw LC−MS data file, (2) a text file listing the LC−MS targets, and (3) an .xml file containing parameters for the automated workflow. SIPPER uses the information provided on the peptide or LC−MS target (m/z, peptide sequence, empirical formula, charge state, and scan in which it was identified) and mines the data to first extract the full isotopic profile. This is a notable difference from Sipros protein-SIP software,12,14 which uses novel peptide identification algorithms and significant computational resources to identify both unlabeled and labeled peptides. Here, we use the unlabeled LC−MS features, identified or unidentified, as anchors for gathering and extracting the composite labeled isotopic profile, which is scored and analyzed using the metrics described above. Once LC−MS targets are imported, SIPPER gathers all extracted ion chromatograms for every peak of the observed isotopic profile, performs elution correlation analysis for all peaks relative to the most intense peak of the profile, and finally calculates iScore, sum of ratios score, labeled profile fit, consecutiveness score, percent carbon incorporation, and the label distribution. Results are outputted to a tab-delimited text file that can be readily imported into other software packages such as Excel for further data analysis. Large-Scale Automated Detection of Labeled Peptides. While manual processing of the representative data set (fraction 70) took two days, fully automated processing of the fraction 70 data set was accomplished in just 4.3 min (200 ms per peptide) on a standard quad-core Dell workstation. SIPPER

Figure 6. Calculation of the least-squares fit score for peptide DSEIGDLIAEVMEK (m/z 774.8792, +2). The unlabeled theoretical isotopic profile is subtracted from the observed mass spectrum (A) to give a composite labeled isotopic profile (B), which is least-squares fitted by an array of theoretical labeled profiles. (Inset) Best fitting theoretical labeled profile (8.25% 13C).

profile for a given peptide is extracted (Figure 6, step A). Next, the normalized theoretical unlabeled isotopic profile is subtracted from the normalized observed profile (Figure 6, step B), resulting in a remaining group of peaks that represent the labeled component of the original composite isotopic profile. Only isotopic peaks whose elution correlation is greater than 0.75 as compared to the most abundant peak of the isotopic profile are included in subsequent least-squares fitting. Next, an array of theoretical isotopic profiles spanning 0−20% 13 C incorporation are fitted onto remaining observed data in 0.25% increments and the best theoretical labeled profile is selected based on its least-squared fit to the observed (Figure 6C). For details on the fit score calculation, see Jaitley et al.27 The algorithm outputs the “labeled fit score” and the 13Cpercent incorporation that gave the best fit. As discussed earlier, it is unknown whether there are only two pools of peptides (i.e., unlabeled and labeled) or whether there are many due to the potential for heterogeneous labeling and the heterogeneous mixture of unlabeled and labeled amino acids available during protein synthesis. As a result, the returned percent incorporation value must be used cautiously since the remaining labeled isotopic profile component may itself represent a 1206

dx.doi.org/10.1021/pr400633j | J. Proteome Res. 2014, 13, 1200−1210

Journal of Proteome Research

Article

filters and summary statistics of the results passing these filters are presented in Table 1. SIPPER detected a total of 875 13CTable 1. Summary of SIPPER-Automated Extraction of Labeled Peptides from 120 LC−MS Data Sets 12

C-control data sets

Total number Number of LC−MS features identified Number of autodetected labeled features (tighta /looseb filter)

69 50 142 44/559

13

C-enrichment data sets 51 37 372 875/2242

Tight filter: labeled fit score ≤ 0.4; iScore ≤ 0.4; SumOfRatios ≥ 2.0; Consecutiveness score ≥ 3; Percent carbon incorporated ≥ 0; Percent peptides labeled ≥ 0. bLoose filter: labeled fit score ≤ 0.8; iScore ≤ 0.6; SumOfRatios ≥ 0; Consecutiveness score ≥ 0; Percent carbon incorporated ≥ 0.5; Percent peptides labeled ≥ 0.5. a

enriched LC−MS features matching to peptides across all 51 data sets using tight filtering and 2242 using loose filtering. A ROC curve for this data is presented in Supplemental Figure 4, Supporting Information. To help validate the optimized filters, the manually processed data set (fraction 70) was automatically processed using SIPPER and filtered using tight and loose criteria. The data, presented in Table 2, show that under tight filtering, SIPPER Table 2. Auto-Detection of Labeled LC−MS Features using Tight and Loose Filtering, as Applied to the Manually Curated Data Set (Fraction 70)a

Figure 7. Workflow for automated processing of protein-SIP data using SIPPER.

was subsequently applied to process the 120 data sets collected from the Yellowstone phototrophic mat samples, 69 data sets were from ion exchange fractions from a control experiment using natural carbon sources and 51 data sets were from fractions from the 13C-enrichment study. We balanced the filtering thresholds for SIPPER metrics to return the greatest number of labeled peptides (i.e., true positives) while minimizing the number of unlabeled peptides falsely reported as labeled (i.e., false positives). This balancing was accomplished in an automated fashion using the “FilterOptimizer” feature in SIPPER (Supplemental Figure 3, Supporting Information). The FilterOptimer takes two sets of results: one set which contains unlabeled results, and a second set which contains peptides that may be labeled. The program iterates over combinations SIPPER metrics (iScore, sum of ratios score, the least-squares label fit score, the “consecutiveness” score, the percent carbon incorporation, and the label distribution) to give the number of peptides from each experiment (unlabeled vs labeled) that pass the filtering criteria. The user inputs a maximum false discovery rate to allow, and the software lists the top filters that will give the most labeled peptides with the fewest unlabeled results (see screenshot in Supplemental Figure 3, Supporting Information). SIPPER also outputs the data for all parameter combinations into a flat text file for further analysis. For this study, all results from the control experiment (69 data sets) were merged into one flat text file and this served as the unlabeled input for the FilterOptimizer. Similarly, all results from the labeled experiments (51 data sets) were merged into one file and used as the labeled peptide input into the FilterOptimizer. Two sets of filter combinations were selected by setting the maximum allowed false discovery rate to either 0.05 (“tight” filtering) or 0.2 (“loose” filtering). Details of these

true positives false positives false negatives

tight

loose

43 3 17

55 50 5

a

The number of true negatives is not calculated since SIPPER is geared to detecting the presence of labeling and does not attempt to classify peptides as unlabeled.

extracted labeled features at 72% sensitivity and at a false discovery rate of less than 10%, while under loose filtering SIPPER functioned at 92% sensitivity, at the expense of a high false discovery rate. Given that the total number of LC−MS features was high and the overall number of labeled features low, it may be beneficial to filter loosely and thus increase sensitivity, followed by manual verification using SIPPER’s graphical interface to resolve the incorrectly annotated results. We examined the false negatives following tight filtering and found that more than half were excluded because the leastsquares fit score of the labeled profile was too high even though the observed isotopic profile visually displayed the typical “tailing” pattern common to most labeled peptide signatures (see Supplemental Figure 5 for three examples, Supporting Information). The extraction of the labeled profile is a key step in resolving labeled from unlabeled and therefore represents a focus for refinement in future work. Summary histograms depicting label distribution and amount of label incorporation for autodetected (tight filtering) labeled peptides from the 120 data sets are presented in Figure 8. Label distribution analysis of all peptides showed that an average of 12.1% of a given peptide is labeled with 13C. The amount of label incorporation averaged 6.4% for all peptides. SIPPER showed excellent sensitivity for detecting the presence of label. The lowest amount of label incorporation automatically 1207

dx.doi.org/10.1021/pr400633j | J. Proteome Res. 2014, 13, 1200−1210

Journal of Proteome Research

Article

Figure 9. Mass spectrum of an unidentified feature automatically determined by SIPPER to exhibit evidence of 13C incorporation. (Inset) Elution correlation of each isotopic peak with the elution of the most abundant peak of the isotopic profile.

of a potentially informative peptide that might be identified in separate experiments, and later could be matched to the current data sets. Capturing such data may provide a lead for identifying metabolizing organisms with incomplete genomes, who escape peptide identification but may play important and yet identified roles in their microbial community.



Figure 8. Histogram analysis of (A) label distribution and (B) amount of label incorporation for all autodetected labeled peptides following full-scale analysis of all data sets.

CONCLUSIONS The metrics collectively captured in the SIPPER software package enable automated proteome-scale analysis of complex data from protein-SIP experiments, as well as targeted reanalysis of the sample to increase the depth of labeled protein discovery. The software’s demonstrated ability to reveal small changes in isotopic profiles of thousands of peptides from complex mixtures following 3-h exposure to label allows for large-scale application of protein-SIP to microbial communities in their natural environment. In particular, the short exposure time to label reduces complications inherent with long incubation times whereby organisms are subjected to constantly changing conditions, such as light intensity that varies from low in the early morning to more intense in the afternoon. Importantly, the information gained from protein-SIP labeling of organisms under a defined set of conditions in their natural environment can help differentiate the functional mechanisms of protein synthesis in these organisms. While the software is presently geared toward low extents of 13C labeling in which the observed isotopic profile is a composite of multiple labeling states, the modularity of the software facilitates its adaptation for other scenarios, including 15N and deuterium labeling under conditions promoting greater incorporation. The optimization of SIPPER metrics for resolving labeled from unlabeled data is a key step. Within this manuscript, we describe an automated brute force approach of iterating over combinations of metrics. Future efforts would greatly benefit from more advanced treatments of the data, such as adjusting metrics according to intensity, m/z range, or chromatographic elution time. Application of machine learning methodologies should therefore be a focus of continued research in improving filter optimization. Another focus of future work will be to combine all metrics to give an overall probability that a given peptide is labeled.

detected using tight filtering was 0.14% for peptide VELVPVAIEEGLR (m/z 768.9553, +2), with 99.86% of the population of this peptide remaining unlabeled. The mass spectral and chromatogram correlation data was manually confirmed and presented in Supplemental Figure 6, Supporting Information. Recovery of Unidentified Features. SIPPER also enables evaluation of unidentified features for evidence of 13Cincorporation (Figure 7). To validate this mode of data extraction, we first reprocessed previously identified features from fraction 70 using averagine as a basis for calculating the theoretical isotopic profile at a given mass,45 thereby mimicking the approach to be used with unidentified features. SIPPER metrics were calculated for each peptide from fraction 70 and filtering thresholds for metrics were optimized for the averagine-based analysis using the same brute force testing of combinations of metrics described above. Applying a tight filter (5−10% false positive) resulted in 64% of features being shared between averagine-based and standard processing, while a loose filter (∼10−15% false positive) resulted in 93% features being shared. The same procedures were applied to the 14 101 unidentified LC−MS features from the fraction 70 data set; 29 passed tight filtering criteria for labeling, while 182 passed loose filtering criteria. Compared to the 59 LC−MS features identified as labeled (loose filter) from 1248 identified LC−MS features, the recovery of 182 potentially labeled features represents a substantial gain in species that could be manually reanalyzed (e.g., using denovo sequencing methods) and/or targeted in subsequent experiments. Figure 9 shows the mass spectrum and chromatogram correlation data for one of the unidentified LC−MS features that SIPPER scored as being 13C enriched. MSGF+ based analysis of the MS/MS spectra for this species did not result in any strong peptide/spectrum matches (data not shown). By detecting 13C enrichment in this species, SIPPER prevents loss



ASSOCIATED CONTENT

S Supporting Information *

Supplemental figures and mass spectrometry data. This material is available free of charge via the Internet at http://pubs.acs.org. 1208

dx.doi.org/10.1021/pr400633j | J. Proteome Res. 2014, 13, 1200−1210

Journal of Proteome Research



Article

(10) Taubert, M.; Jehmlich, N.; Vogt, C.; Richnow, H. H.; Schmidt, F.; von Bergen, M.; Seifert, J. Time resolved protein-based stable isotope probing (Protein-SIP) analysis allows quantification of induced proteins in substrate shift experiments. Proteomics 2011, 11 (11), 2265−2274. (11) Snijders, A. P. L.; de Koning, B.; Wright, P. C. Perturbation and interpretation of nitrogen isotope distribution patterns in proteomics. J. Proteome Res. 2005, 4 (6), 2185−2191. (12) Pan, C. L.; Fischer, C. R.; Hyatt, D.; Bowen, B. P.; Hettich, R. L.; Banfield, J. F. Quantitative tracking of isotope flows in proteomes of microbial communities. Mol. Cell. Proteomics 2011, 10 (4), No. M110.006049. (13) Seifert, J.; Herbst, F. A.; Nielsen, P. H.; Planes, F. J.; Ferrer, M.; Bergen, M. Bioinformatic progress and applications in metaproteogenomics for bridging the gap between genomic sequences and metabolic functions in microbial communities. Proteomics 2013, 13 (18−19), 2786−2804. (14) Wang, Y.; Ahn, T. H.; Li, Z.; Pan, C. Sipros/ProRata: a versatile informatics system for quantitative community proteomics. Bioinformatics 2013, 29 (16), 2064−2065. (15) Ward, D. M.; Castenholz, R. W.; Miller, S. R. Cyanobacteria in geothermal habitats. In Ecology of Cyanobacteria II; Springer: New York, 2012; pp 39−63. (16) Klatt, C. G.; Liu, Z.; Ludwig, M.; Kuhl, M.; Jensen, S. I.; Bryant, D. A.; Ward, D. M. Temporal metatranscriptomic patterning in phototrophic Chloroflexi inhabiting a microbial mat in a geothermal spring. ISME J. 2013, 7 (9), 1775−1789. (17) Liu, Z. F.; Klatt, C. G.; Ludwig, M.; Rusch, D. B.; Jensen, S. I.; Kuhl, M.; Ward, D. M.; Bryant, D. A. ‘Candidatus Thermochlorobacter aerophilum’: an aerobic chlorophotoheterotrophic member of the phylum Chlorobi defined by metagenomics and metatranscriptomics. ISME J. 2012, 6 (10), 1869−1882. (18) Klatt, C. G.; Wood, J. M.; Rusch, D. B.; Bateson, M. M.; Hamamura, N.; Heidelberg, J. F.; Grossman, A. R.; Bhaya, D.; Cohan, F. M.; Kuhl, M.; Bryant, D. A.; Ward, D. M. Community ecology of hot spring cyanobacterial mats: predominant populations and their functional potential. ISME J. 2011, 5 (8), 1262−1278. (19) Schaffert, C. S.; Klatt, C. G.; Ward, D. M.; Pauley, M.; Steinke, L. Identification and distribution of high-abundance proteins in the octopus spring microbial mat community. Appl. Environ. Microb. 2012, 78 (23), 8481−8484. (20) Livesay, E. A.; Tang, K.; Taylor, B. K.; Buschbach, M. A.; Hopkins, D. F.; LaMarche, B. L.; Zhao, R.; Shen, Y.; Orton, D. J.; Moore, R. J. Fully automated four-column capillary LC-MS system for maximizing throughput in proteomic analyses. Anal. Chem. 2008, 80 (1), 294−302. (21) Maiolica, A.; Borsotti, D.; Rappsilber, J. Self-made frits for nanoscale columns in proteomics. Proteomics 2005, 5 (15), 3847− 3850. (22) Kelly, R. T.; Page, J. S.; Luo, Q. Z.; Moore, R. J.; Orton, D. J.; Tang, K. Q.; Smith, R. D. Chemically etched open tubular and monolithic emitters for nanoelectrospray ionization mass spectrometry. Anal. Chem. 2006, 78 (22), 7796−7801. (23) Mayampurath, A. M.; Jaitly, N.; Purvine, S. O.; Monroe, M. E.; Auberry, K. J.; Adkins, J. N.; Smith, R. D. DeconMSn: a software tool for accurate parent ion monoisotopic mass determination for tandem mass spectra. Bioinformatics 2008, 24 (7), 1021−1023. (24) Kim, S.; Mischerikow, N.; Bandeira, N.; Navarro, J. D.; Wich, L.; Mohammed, S.; Heck, A. J. R.; Pevzner, P. A. The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: Applications to database search. Mol. Cell. Proteomics 2010, 9 (12), 2840−2852. (25) Kim, S.; Pevzner, P. A., MS-GF+: Universal database search tool for mass spectrometry. Submitted. (26) Zimmer, J. S. D.; Monroe, M. E.; Qian, W. J.; Smith, R. D. Advances in proteomics data analysis and display using an accurate mass and time tag approach. Mass Spectrom. Rev. 2006, 25 (3), 450− 482.

AUTHOR INFORMATION

Corresponding Author

*Tel: 509-371-6589. Fax: 509-371-6564. E-mail: mary.lipton@ pnl.gov. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS Portions of this research were supported by the U.S. Department of Energy Office of Biological and Environmental Research (DOE/BER) Genome Sciences Program under the Pan-omics and Fundamental Science Focus Area projects. Work was performed in the Environmental Molecular Science Laboratory, a DOE/BER national scientific user facility at Pacific Northwest National Laboratory in Richland, Washington. L.S. and D.M.W. acknowledge support by the National Science Foundation (EF 0805385). D.M.W. also acknowledges support from the NASA Exobiology Program and the NSF IGERT program (DGE 0654336). We appreciate the assistance of Tracy Cheever during the field expedition, and the technical assistance of Michele Fontaine. The UNMC Protein Structure Core Facility, supported by the Nebraska Research Initiative, was instrumental in the completion of this work. This study was conducted under Yellowstone National Park permits YELL0129 (D.M.W.) and YELL-0567 (L.S.). The authors gratefully acknowledge the support and assistance of National Park Service Personnel at Yellowstone National Park.



REFERENCES

(1) Uhlik, O.; Leewis, M. C.; Strejcek, M.; Musilova, L.; Mackova, M.; Leigh, M. B.; Macek, T. Stable isotope probing in the metagenomics era: A bridge towards improved bioremediation. Biotechnol. Adv. 2013, 31 (2), 154−165. (2) Hettich, R. L.; Pan, C. L.; Chourey, K.; Giannone, R. J. Metaproteomics: Harnessing the power of high performance mass spectrometry to identify the suite of proteins that control metabolic activities in microbial communities. Anal. Chem. 2013, 85 (9), 4203− 4214. (3) Ward, D. M.; Klatt, C. G.; Wood, J.; Cohan, F. M.; Bryant, D. A. Functional genomics in an ecological and evolutionary context: maximizing the value of genomes in systems biology. In Functional Genomics and Evolution of Photosynthetic Systems; Springer: New York, 2012; pp 1−16. (4) Jehmlich, N.; Schmidt, F.; Taubert, M.; Seifert, J.; Bastida, F.; von Bergen, M.; Richnow, H. H.; Vogt, C. Protein-based stable isotope probing. Nature Protoc. 2010, 5 (12), 1957−1966. (5) Taubert, M.; von Bergen, M.; Seifert, J. Limitations in detection of 15N incorporation by mass spectrometry in protein-based stable isotope probing (protein-SIP). Anal. Bioanal. Chem. 2013, 1−8. (6) Taubert, M.; Baumann, S.; von Bergen, M.; Seifert, J. Exploring the limits of robust detection of incorporation of C-13 by mass spectrometry in protein-based stable isotope probing (protein-SIP). Anal. Bioanal. Chem. 2011, 401 (6), 1975−1982. (7) Herbst, F. A.; Bahr, A.; Duarte, M.; Pieper, D. H.; Richnow, H. H.; Bergen, M.; Seifert, J.; Bombach, P. Elucidation of in situ polycyclic aromatic hydrocarbon degradation by functional metaproteomics (protein-SIP). Proteomics 2013, 13 (18−19), 2910−2920. (8) Bozinovski, D.; Herrmann, S.; Richnow, H. H.; von Bergen, M.; Seifert, J.; Vogt, C. Functional analysis of an anaerobic m-xylenedegrading enrichment culture using protein-based stable isotope probing. FEMS Microbiol. Ecol. 2012, 81 (1), 134−144. (9) Jehmlich, N.; Schmidt, F.; von Bergen, M.; Richnow, H. H.; Vogt, C. Protein-based stable isotope probing (Protein-SIP) reveals active species within anoxic mixed cultures. ISME J. 2008, 2 (11), 1122− 1133. 1209

dx.doi.org/10.1021/pr400633j | J. Proteome Res. 2014, 13, 1200−1210

Journal of Proteome Research

Article

resolved isotopic distributions. J. Am. Soc. Mass Spectrom. 1995, 6 (4), 229−233.

(27) Jaitly, N.; Mayampurath, A.; Littlefield, K.; Adkins, J. N.; Anderson, G. A.; Smith, R. D. Decon2LS: An open-source software package for automated processing and visualization of high resolution mass spectrometry data. BMC Bioinform. 2009, 10, 87. (28) Slysz, G. W.; Baker, E. S.; Shah, A. R.; Jaitly, N.; Anderson, G. A.; Smith, R. D. In The DeconTools framework: an application programming interface enabling flexibility in accurate mass and time tag workflows for proteomics and metabolomics; American Society for Mass Spectrometry: Salt Lake City, UT, 2010. (29) Monroe, M. E.; Tolic, N.; Jaitly, N.; Shaw, J. L.; Adkins, J. N.; Smith, R. D. VIPER: An advanced software package to support highthroughput LC-MS peptide identification. Bioinformatics 2007, 23 (15), 2021−2023. (30) Stanley, J. R.; Adkins, J. N.; Slysz, G. W.; Monroe, M. E.; Purvine, S. O.; Karpievitch, Y. V.; Anderson, G. A.; Smith, R. D.; Dabney, A. R. A statistical method for assessing peptide identification confidence in accurate mass and time tag proteomics. Anal. Chem. 2011, 83 (16), 6135−6140. (31) Chik, J. K.; Vande Graaf, J. L.; Schriemer, D. C. Quantitating the statistical distribution of deuterium incorporation to extend the utility of H/D exchange MS data. Anal. Chem. 2006, 78 (1), 207−214. (32) Auberry, K. J.; Kiebel, G. R.; Monroe, M. E.; Adkins, J. N.; Anderson, G. A.; Smith, R. D. Omics.pnl.gov: A portal for the distribution and sharing of multi-disciplinary Pan-omics information. J. Proteomics Bioinform. 2010, 3 (1), 1. (33) Angel, T. E.; Aryal, U. K.; Hengel, S. M.; Baker, E. S.; Kelly, R. T.; Robinson, E. W.; Smith, R. D. Mass spectrometry-based proteomics: existing capabilities and future directions. Chem. Soc. Rev. 2012, 41 (10), 3912−3928. (34) van der Meer, M. T. J.; Schouten, S.; Bateson, M. M.; Nubel, U.; Wieland, A.; Kuhl, M.; de Leeuw, J. W.; Damste, J. S. S.; Ward, D. M. Diel variations in carbon metabolism by green nonsulfur-like bacteria in alkaline siliceous hot spring microbial mats from Yellowstone National Park. Appl. Environ. Microb. 2005, 71 (7), 3978−3986. (35) Slysz, G. W.; Percy, A. J.; Schriemer, D. C. Restraining expansion of the peak envelope in H/D exchange-MS and its application in detecting perturbations of protein structure/dynamics. Anal. Chem. 2008, 80 (18), 7004−11. (36) Huttlin, E. L.; Hegeman, A. D.; Harms, A. C.; Sussman, M. R. Comparison of full versus partial metabolic labeling for quantitative proteomics analysis in Arabidopsis thaliana. Mol. Cell. Proteomics 2007, 6 (5), 860−881. (37) MacCoss, M. J. IDCalc - Isotope Distribution Calculator; http://proteome.gs.washington.edu/software/IDCalc/. (38) ProteinProspector (MS-Isotope); http://prospector.ucsf.edu/. (39) MacCoss, M. J.; Wu, C. C.; Liu, H.; Sadygov, R.; Yates, J. R., 3rd A correlation algorithm for the automated quantitative analysis of shotgun proteomics data. Anal. Chem. 2003, 75 (24), 6912−6921. (40) Ting, L.; Cowley, M. J.; Hoon, S. L.; Guilhaus, M.; Raftery, M. J.; Cavicchioli, R. Normalization and statistical analysis of quantitative proteomics data generated by metabolic labeling. Mol. Cell. Proteomics 2009, 8 (10), 2227−2242. (41) Mayor, T.; Graumann, J.; Bryan, J.; MacCoss, M. J.; Deshaies, R. J. Quantitative profiling of ubiquitylated proteins reveals proteasome substrates and the substrate repertoire influenced by the Rpn10 receptor pathway. Mol. Cell. Proteomics 2007, 6 (11), 1885−1895. (42) Wenger, C. D.; Lee, M. V.; Hebert, A. S.; McAlister, G. C.; Phanstiel, D. H.; Westphall, M. S.; Coon, J. J. Gas-phase purification enables accurate, multiplexed proteome quantification with isobaric tagging. Nat. Methods 2011, 8 (11), 933−935. (43) Slysz, G. W.; Baker, C. A.; Bozsa, B. M.; Dang, A.; Percy, A. J.; Bennett, M.; Schriemer, D. C. Hydra: Software for tailored processing of H/D exchange data from MS or tandem MS analyses. BMC Bioinform. 2009, 10, 162. (44) Pascal, B. D.; Chalmers, M. J.; Busby, S. A.; Griffin, P. R. HD Desktop: An integrated platform for the analysis and visualization of H/D exchange data. J. Am. Soc. Mass Spectrom. 2009, 20 (4), 601−610. (45) Senko, M. W.; Beu, S. C.; Mclafferty, F. W. Determination of monoisotopic masses and ion populations for large biomolecules from 1210

dx.doi.org/10.1021/pr400633j | J. Proteome Res. 2014, 13, 1200−1210