Rescuing Those Left Behind: Recovering and Characterizing

Jun 25, 2015 - Detergents: Friends not foes for high-performance membrane proteomics toward precision medicine. Xi Zhang. PROTEOMICS 2017 17 (3-4), ...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/ac

Rescuing Those Left Behind: Recovering and Characterizing Underdigested Membrane and Hydrophobic Proteins To Enhance Proteome Measurement Depth Richard J. Giannone,*,† Louie L. Wurch,‡,§ Mircea Podar,‡,∥ and Robert L. Hettich† †

Chemical Sciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, Tennessee 37831, United States Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, Tennessee 37831, United States § Department of Biology, James Madison University, Harrisonburg, Virginia 22807, United States ∥ Department of Microbiology, University of Tennessee, Knoxville, Tennessee 37996, United States ‡

S Supporting Information *

ABSTRACT: The marine archaeon Nanoarchaeum equitans is dependent on direct physical contact with its host, the hyperthermophile Ignicoccus hospitalis. As this interaction is thought to be membrane-associated, involving a myriad of membrane-anchored proteins, proteomic efforts to better characterize this difficult to analyze interface are paramount to uncovering the mechanism of their association. By extending multienzyme digestion strategies that use sample filtration to recover underdigested proteins for reprocessing/consecutive proteolytic digestion, we applied chymotrypsin to redigest the proteinaceous material left over after initial proteolysis with trypsin of sodium dodecyl sulfate (SDS)-extracted I. hospitalis-N. equitans proteins. Using this method, we show that proteins with increased hydrophobic character, including membrane proteins with multiple transmembrane helices, are enriched and recovered in the underdigested fraction. Chymotryptic reprocessing provided significant sequence coverage gains in both soluble and hydrophobic proteins alike, with the latter benefiting more so in terms of membrane protein representation. These gains were despite a large proportion of high-quality peptide spectra remaining unassigned in the underdigested fraction suggesting high levels of protein modification on these often surface-exposed proteins. Importantly, these gains were achieved without applying extensive fractionation strategies usually required for thorough characterization of membrane-associated proteins and were facilitated by the generation of a distinct, complementary set of peptides that aid in both the identification and quantitation of this important, under-represented class of proteins.

T

increase their global representation in order to more accurately quantify them with respect to and in direct context with their more soluble counterparts. Without question, the increased use of ionic detergents like sodium dodecyl sulfate (SDS) and other potent detergents/ denaturants/chaotropes5−9 in proteomic sample preparation has aided in the extraction and subsequent identification of proteins with increased hydrophobic character.10 Though SDS is considered a severe ion-suppressor in ESI-MS, its removal through FASP,11 TCA/acetone precipitation,12−14 or detergent-removal resins10,15 has become routine. However, despite detergent application, the continued under-representation of important hydrophobic proteins may be largely limited to the degree of protease-specific cleavage sites available to generate measurable peptide ions of optimal size and charge.16,17 In fact,

hough LC-MS/MS-based shotgun proteomics provides comprehensive data sets that detail changes in protein abundance over time or across different conditions, it is often the case that many proteins remain largely under-represented in the final analysis, especially those that are integral to cellular membranes or contain large patches of regional hydrophobicity.1,2 This under-representation can lead to sample bias with regard to the set of identified proteins and may lead to erroneous conclusions or missed opportunities to recognize significant and important changes to the proteome. As these types of proteins are largely comprised of hydrophobic amino acid residues/secondary structures and are generally of low abundance, their soluble extraction and amenability toward proteolytic cleavage often prevent their robust detection in routine proteomic inquiries.3 Though strategies do exist to enrich and specifically target these classes of proteins, these protocols often require extensive fractionation that can make it difficult to compare their abundances in context with the more soluble portion of a proteome.1−4 Thus, it is more desirable to © XXXX American Chemical Society

Received: March 29, 2015 Accepted: June 25, 2015

A

DOI: 10.1021/acs.analchem.5b01187 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

Protein Sample Processing with Trypsin for LC-MS/ MS. Conjoined I. hospitalis-N. equitans cells were lysed by sodium dodecyl sulfate (SDS) with sample sonication, as previously detailed.14,22 The isolated proteins were resolubilized in 8 M urea plus 5 mM dithiothreitol (DTT), cysteines blocked with 15 mM iodoacetamide (IAA), and digested to peptides with sequencing-grade trypsin. The proteolyzed samples were adjusted to 200 mM NaCl and 0.1% formic acid and filtered through a prewashed 10 kDa MWCO centrifugal filter (Vivaspin 2, GE Healthcare) to collect 1 mL of tryptic peptides of optimal size.11 The visible (Figure S-1, Supporting Information) and once entirely soluble proteinaceous material (as determined by BCA assay) filtered away from the tryptic peptide solution was then washed with 250 μL of Tris buffer (100 mM Tris-HCl, pH 8.0), recollected by centrifugation and added to the tryptic peptide solution collected earlier. These samples represent the tryptic fractions (herein “T”) of each respective methodological replicate. The remaining un(der)processed, proteinaceous material still atop the filter was given a final rinse/vortex with another 1 mL of Tris buffer, filtered again by centrifugation, and discarded. Reprocessing Un(der)digested Proteins with Chymotrypsin for LC-MS/MS. The washed proteinaceous material collected above the 10 kDa MWCO spin filter was resuspended/vortexed in 100 μL of 1% RapiGest SF Surfactant (Waters) prepared in 50 mM NH4HCO3, incubated at RT for 10 min, and vortexed again. Solubilized material was then collected from the spin filter and boiled for 10 min. Two microliters were removed for BCA assay prior to adjusting samples to 5 mM DTT (10 min, RT), followed by 15 mM IAA (20 min, RT in dark). The un(der)digested protein sample, now isolated and in solution, was digested overnight at 37 °C with α-chymotrypsinogen (10 μg/mL in LC-MS-grade water with 1 mM HCl; Sigma) at a 1:25 (w/w) protease/sample ratio as assessed by BCA assay. Following proteolysis, chymotryptic and composite tryptic/chymotryptic peptides (due to the preprocessing with trypsin as described above) of appropriate size were collected via filtration with another 10 kDa MWCO, prewashed spin filter. The visibly reduced material atop the filter membrane was then resuspended/vortexed with 500 μL of 50 mM NH4HCO3 and repassed through the spin filter to recover even more appropriately sized peptides. Rapigest SF was degraded/removed by adjusting the sample to 0.5% trifluoroacetic acid followed by incubation at 37 °C for 45 min and centrifugation. These samples represent the chymotryptic fractions (herein “C”) of each respective methodological replicate and contain peptides with both tryptic and chymotryptic ends. Peptide recoveries were then assessed by BCA assay across all sample fractions. Data Acquisition via MudPIT 2D LC-MS/MS. Samples from each subfraction (T and C) were analyzed both independently (25 μg peptides per run) and as a 50/50 mixture of peptides (25 μg per fraction − 50 μg total; herein “TC”) for a total of nine, 2D LC-MS/MS runs with a biphasic MudPIT back column containing both strong-cation exchange and reversed-phase resins.22,25,26 As previously described, data was collected over 11 independent salt cuts with subsequent organic gradient elution (∼120 min per cut). Eluting peptides were sampled in real-time by a hybrid LTQ-Orbitrap Pro mass spectrometer (Thermo Scientific) with the following parameters: one full scan at 30K resolution (Orbitrap; 1 μscan) followed by 20 data-dependent, CID MS/MS scans (LTQ; 1 μscan); peptide isolation window = 2.2 m/z, dynamic exclusion

many of the commonly used proteases for proteomic sample preparation have cleavage specificities that target hydrophilic amino acids and are thus inefficient at digesting membrane proteins and/or proteins with expansive hydrophobic regions that lack these charged residues. If not properly targeted with a compatible protease such as chymotrypsin, large portions of these proteins would remain untouched by protease, leading to coverage deficiencies especially to important proteins wholly embedded within the cellular membrane. The underdigestion of membrane and other hydrophobic proteins, though seemingly detrimental, can be used to one’s advantage, specifically with regard to their facile enrichment relative to other fractionation methodologies. Taking cues from other multienzyme, filter-based digestion strategies,18,19 which use molecular weight cutoff (MWCO) filters as reaction chambers to perform consecutive, orthogonal protein digests to enhance proteome coverage, we posited that un(der)digested material retained atop the filter after initial proteolytic digests, usually via trypsin and/or Lys-C, would be enriched in membrane/hydrophobic proteins. Though application of multiple proteases was included in the original MED-FASP study,18 including the use of chymotrypsin, focus was not placed on characterizing the proteins that remained atop the filter nor optimizing for their digestion. However, as membrane proteins are extractible via SDS, they are presumably part of each tryptic digest and thus likely un(der)digested and enriched following proteolysis and filtration. This filtration step is integral to FASP,11 MED-FASP,18,19 and other variations20,21 and easily included after most proteolytic digests.22 In this study, proteins extracted from an archaeal binary partnership (Ignicoccus hospitalis-Nanoarchaeum equitans) were used to investigate the nature of this un(der)digested material and assess the applicability of a dual protease digestion strategy to augment the identification of under-represented classes of proteins such as membrane and/or proteins with increased hydrophobic character. As membrane proteins are integral to both the I. hospitalis-N. equitans interaction14,22 and important players in cellular import/export, communication, signaling, protection, etc.,1,2 enhancing their representation in proteomic data sets ensures valid quantitative comparisons by enhancing their measurement probability and frequency. In addition, spectral quality analysis was performed across both the soluble and un(der)digested proteome sets to identify the degree to which high-quality peptide spectra were assigned to their respective proteins. Perturbations in assignment frequency, especially among residual un(der)digested proteins, could indicate high degrees of post-translational modification (PTM), such as glycosylation, which occur at increased frequencies on membrane-associated cell-surface proteins including in archaea.21,23,24



MATERIALS AND METHODS Cultivation of Ignicoccus hospitalis and Nanoarchaeum equitans. I. hospitalis KIN4/I (host) was cultured together with N. equitans (ectosymbiont/parasite) at 90 °C in a 300 L bioreactor at the University of Regensburg Archaea Center, as previously described.22 This current study focuses on time point S1, when there were roughly 7 N. equitans per I. hospitalis cell. This time point in the coculture was described to be particularly rich in host membrane proteins and was posited to be a response to the propagation of N. equitans on its surface.22 B

DOI: 10.1021/acs.analchem.5b01187 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry residence time = 15 s, window = −1.0 to +2.0 m/z, and max list size = 500. Spectral Assignment by Database Searching. Peptides were matched to MS/MS spectra using MyriMatch v.2.1.27 For database searching, the I. hospitalis and N. equitans proteomes were combined and appended with common contaminants and then concatenated with reversed entries to assess falsediscovery rates (FDR). For T fractions, peptides were required to contain at least one tryptic end (semispecific; K or R). However, for C and TC samples, peptides were required to contain at least one chymotryptic or tryptic end (semispecific; K, R, F, L, Y, or W) as both sample fractions contained un(der)processed proteins and peptides that had been predigested with trypsin. Peptide modifications included in each database search included: A static +57.0214 Da on cysteines (carboxamidomethyl by IAA), a dynamic +43.0082 Da on peptide N-termini (carbamylation via urea breakdown), and a dynamic +15.9949 Da on methionine to account for sample oxidation. Bioinformatic Tools Employed for Data Analysis and Interpretation. IDPicker v.3.028 was used to filter and assemble peptide spectral matches (PSM) to proteins. Metrics for individual sample runs were tabulated after adjusting filters to maintain FDR at acceptable rates (Table S-1, Supporting Information) mainly by adjusting the minimum PSM per protein but maintaining at least 2 distinct peptides per protein and a q-value ≤0.02. Assignment frequencies for each sample were assessed and compared across fractions and included a deeper analysis of high-quality peptide spectral assignment by ScanRanker.29 For semiquantitative analyses and comparisons, peptide and protein matched-ion intensities were calculated for each PSM and tabulated at both peptide and protein levels. Sequence coverage analyses including helical overlap propensities in predicted membrane proteins and protein-level hydrophobicity assessments were aided by TMHMM transmembrane domain prediction30 and Kyte-Doolittle hydropathy scores.31 Peptide coverage maps, proteolytic cleavage propensities, and amino acid frequencies were assessed by combining peptide data from IDPicker with TMHMM. Venn diagrams for peptide-level comparisons of each proteolytic fraction were created using eulerAPE.32

Figure 1. BCA analysis of proteolytic fractions. Protein/peptide abundance tracked across the course of MS sample preparation for each methodological replicate. Proteinaceous material remaining atop the filter membrane after tryptic digestion was roughly 39% of the original load (61% tryptic peptide recovery; Tpep) and represents the un(der)digested complement of the sample. Of the remaining 39%, roughly 30% was recovered as chymotryptic peptides (Cpep) leaving ∼9% of the original starting material atop a second filter. Error bars represent standard deviation.

“debris” likely contain proteins with sufficient chymotrypsinspecific amino acid residues. Overall Proteome Sampling Metrics. As reported in Table S-1, Supporting Information, protein-level FDRs were assessed on an individual run basis and kept under 5% by applying a run-specific PSM cutoff. Averaged across all three replicates, the greatest number of proteins were identified in the T fractions (x̅ = 1479; RSD = 2.5%) relative to C fractions (x̅ = 1125; RSD = 4.7%) or when both fractions (TC) were analyzed together (x̅ = 1408; RSD = 0.6%). TC samples, which were comprised of a 1:1 peptide load of both T and C samples, showed a consistent increase in the number of peptides identified (x̅ = 24 894; RSD = 1.5%) with slight depression in assigned spectra (x̅ = 223 817; RSD = 2.7%) relative to T but was more or less in line with T especially when compared to C where the number of PSMs were 50% less than T or TC and likely explains the reduction in overall peptide and protein numbers observed in C. Spectral Quality Analysis. The large difference in PSMs between the T and C sample fractions prompted further inquiry into the overall quality of the spectra collected. Given the wellknown constraints of database search algorithms, specifically with regard to their inability to match real, quality peptide fragmentation spectra to sequences not explicit in the proteome database (i.e., improper gene calls/open reading frames, sequence polymorphisms, post-translational modifications, etc.),33,34 we sought to identify high-quality, peptide-derived spectra regardless of whether they ultimately matched a peptide using ScanRanker. This analysis is unbiased: a high quality (HQ) spectrum can be either matched or unmatched to a proteome database. This provides a unique perspective on database quality or, as in this case, whether the low PSM rate observed in C was a function of general protease underperformance, reduced sample complexity, or because these proteins contain a disproportionate number of PTMs or other mass perturbations relative to those observed in T. As follows, all collected MS/MS spectra were scored, ranked by overall quality, and labeled as either matched or unmatched



RESULTS Chymotryptic Reprocessing of Proteinaceous “Debris”. As the un(der)processed material is visible (Figure S-1, Supporting Information), proteinaceous (Figure 1), and potentially enriched in trypsin-incompatible proteins/protein regions, it was resolubilized in 1% RapiGest SF and redigested with chymotrypsin, a complementary protease that cleaves after hydrophobic amino acids F, L, Y, and W. These amino acids are enriched in hydrophobic protein regions and are integral to the transmembrane helical domains (TMD) of membraneassociated proteins (Figure S-2; Table S-2, Supporting Information). Following proteolysis and filter-aided collection of chymotryptic peptides (or tryptic-chymotryptic hybrids), a marked visual reduction in the original un(der)processed material that remained atop the filter was noted: an observation corroborated by the mass balance analysis presented in Figure 1. Protein concentrations tracked across sample processing found that reprocessing with chymotrypsin rescued an additional 30% of the original input as analyzable peptides. Considering both T and C peptide fractions, roughly 91% of the input was accounted for. These data suggest the left over C

DOI: 10.1021/acs.analchem.5b01187 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

Figure 2. Spectral quality analysis of both proteolytic fractions. MS/MS spectral quality analysis indicates that a larger proportion of high-quality, probable peptide spectra remained unassigned in the un(der)digested, chymotrypsin reprocessed (C) protein fraction relative to the initial trypsin (T) fraction. Blue and red bars represent the proportion of assigned or unassigned spectra per ScanRanker score bin, respectively. Higher scores are an indication of peptide spectral quality.

assigns an intensity dimension to a spectra count and provides more informative, abundance weighted values that better represent peptide intensity profiles and is in line with other similar label-free quantitation strategies.35,36 In columns 1 and 2, one noticeable difference is the abundance of C-terminal K and R peptides, which are reduced by 1.6× and 1.4×, respectively, when compared to Figure S-3A, Supporting Information. Meanwhile, the abundance of C-terminal F peptides is increased by 1.8× with relatively no change observed for L, W, and Y peptides. These data further suggest that the majority of peptide abundance (78.2%) in C contain C-terminal residues processed by chymotrypsin. These residues are generally infrequent and occur mostly in hydrophobic or TMD regions of membrane proteins (Table S-2, Supporting Information), suggesting their enrichment atop the filter following trypsin digestion. This second view of the data also highlights an important difference between the two protease fractions. Although the in silico combined data (col. 5) matches the TC combined run data (col. 6), chymotrypsin-derived peptides remain underrepresented. Considering the spectral quality analysis presented above, this under-representation is perhaps exacerbated by the large difference in HQ peptide assignment frequency observed between the T and C fractions. Peptides Identified in Each Fraction Are Distinct from One Another. Peptides were grouped by their replicate persistence and labeled as being either liberally or conservatively identified. The liberal group includes all identified peptides, as long as they appear at least once across all six (T vs C) or nine (T vs C vs TC) LC-MS/MS runs. The conservative group requires a peptide to be identified in all three replicates for a particular sample fraction. Peptides of the conservative set represent the more persistent (and likely abundant) identifications, representing 96.6% and 98.7% of the total MIT across replicates for chymotrypsin and trypsin, respectively (Table S-3, Supporting Information). Across all nine runs (T vs C vs TC), 47 445 distinct peptides were identified (liberal), 17 841 of which were identified in all three replicates of a particular fraction (conservative). When considering only the T and C fractions, the number of overall peptide identifications falls to 43 429 (liberal) but more, 21 978, pass the three-replicate criterion (conservative).

to a peptide sequence identified by MyriMatch (Table S-3, Supporting Information). To compare the T and C LC-MS/ MS runs, the ratio of matched/assigned spectra (blue bars) to unmatched/unassigned spectra (red bars) was calculated per ScanRanker score bin across all replicates and plotted as stacked histograms (Figure 2). As depicted, a much larger proportion of high-scoring MS/MS spectra remained unassigned in C relative to T despite there being roughly the same number of spectra collected per run (Table S-1, Supporting Information). On average, 59% of top ranked (≥50%) spectra were “converted” or matched to a peptide sequence in the T fractions while there was just 27% in the C fractions. These lower rates of PSM have been observed previously with proteases other than trypsin16 but, to our knowledge, have not been explained in the context of overall spectral quality. In this regard, ScanRanker analysis indicates an unusually large number of high-quality peptide spectra remain unassigned (HQU) in the un(der)digested protein fraction. Cleavage Frequency of Identified Peptides. To assess the cleavage fidelity of each protease, identified peptides were compared across all fractions (Table S-3, Supporting Information) with their C-terminal residues used as a proxy to estimate the frequency of particular cut. As shown in Figure S-3A, Supporting Information, the C-terminal cleavage frequencies clearly demonstrate: (1) trypsin’s preference to cut after K and R (97.5%; col. 3), (2) chymotrypsin’s cleavage preference for hydrophobic amino acids F, L, W, and Y (90.5%; col. 2), (3) the prevalence of proteins/peptides un(der)digested by trypsin (i.e., peptides only identified in the C that contain a tryptic end; 27.1%; cols. 1 and 2), (4) residual peptides identified in both protease fractions are minimal and likely due to carry-over as they are almost entirely tryptic (96.5%; col. 4), and (5) that the in silico combined data (col. 5) mimics the actual data obtained when 25 μg of each fraction are combined and analyzed in a single 24 h, LC-MS/MS run This suggests the combination sample may actually aid in semiquantitative robustness since each protein would be more thoroughly represented in the data set, including trypsinincompatible proteins/regions. The data in Figure S-3B, Supporting Information, presents a similar analysis, but rather based on summed match-ion intensity (MIT) for each peptide across all of the runs. MIT D

DOI: 10.1021/acs.analchem.5b01187 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

identified tryptic peptides and thereby gains in sequence coverage were highly dependent on trypsin compatibility (see below). In contrast, peptides identified in T and C exhibit strong overlap when compared with the combination sample (TC). Here, a consistent and rich mixture of both sets of proteolytic peptides were represented, particularly among the conservative set, suggesting that gains in sequence coverage, protein identification, and localization of PTMs by sequence stacking can be realized within a single combination run where both proteolytic fractions are analyzed together. Peptides Map to Predicted Transmembrane Helices. As these un(der)digested proteins are hypothesized to be more hydrophobic, an increase in the identification rate of TMDassociated peptides in C was expected. Using both liberal and conservative data sets, identified peptide sequences were crossreferenced with predicted TMDs for every protein in the I. hospitalis-N. equitans proteome using TMHMM. As depicted in Figure 4, a large majority of TMD-overlapping peptides were identified in C (>90%) relative to T across both liberal and conservative peptide sets. The degree of peptide/TMD sequence overlap was substantially higher in C as well, especially when normalizing for the overall number of peptides identified in each fraction. Although C peptides dominate the TMD landscape, T peptides were sporadically identified, though only few reach an overlap of 10 or greater. This result is expected as TMDs are comprised of more hydrophobic amino acid residues (Table S-2, Supporting Information) and are thus more amenable to identification via chymotrypsin. Although unique, nonredundant, helix-mapped peptides identified in the above analysis represent a very small fraction of the total number of peptides identified for each proteolytic fraction, it is important to consider that only 5.2% of amino acid residues in the I. hospitalis-N. equitans proteome participate in a predicted TMD. Evaluated accordingly and considering the high-quality peptide assignment bias reported earlier (Figure 2), the observed differences are perhaps even more significant and strongly suggest the un(der)digested proteins recovered

Figure 3. Venn diagrams of identified peptides across fractions. T and C fractions were analyzed independently (top) or in combination with TC (bottom) using either conservative (left) or liberal (right) filters. The area of each individual ellipsis is weighted by the number of peptides identified per fraction. Peptides from either T or C are remarkably unique showing only a slight intersection between fractions. The combination load (TC) identifies a large majority of peptides specific to T or C in a single LC-MS/MS run.

Venn diagrams (Figure 3) indicate that each protease fraction is quite distinct from one another, sharing only 1018 (4.6%) conservative peptides or 2679 (6.1%) liberal peptides between the two analyses, corroborating results from other multienzyme digestion strategies.16,18 This low level of peptide redundancy suggests that peptides identified from the un(der)digested sample portion provide additional sequence information that has the potential to bolster sequence coverage as well as overall protein identification, especially to those proteins thoroughly incompatible with trypsin. It should be noted that many of these peptides sequentially overlap with already

Figure 4. Frequency and degree of transmembrane helix overlap by fraction. Identified peptides mapping to TMDs were binned according to their degree of amino acid overlap and plotted as a stacked histogram. Both liberally (A) and conservatively (B) identified TMD peptides were compared across T (red) and C (blue) fractions. Peptides identified in C more frequently mapped to and spanned deeper into TMDs relative to T. Similar patterns are observed for both the liberal and conservative peptide sets. E

DOI: 10.1021/acs.analchem.5b01187 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

Table 1. Membrane Protein Identification across Fractions Shows Increased Representation in C Relative to T, Especially Those Comprised of More TMDs liberal membrane proteins fraction T fraction C union proteome coverage, %

conservative membrane proteins

Mema

TMDb

TMD/Mem

PUqc

MUqd

MUq TMDe

ratiof

Mem

TMD

TMD/Mem

PUq

MUq

MUq TMD

ratio

144 154 189 336 56

490 659 765 1364 56

3.4 4.3 4.0 4.1 --

377 70 ----

35 45 ----

106 275 ----

3.0 6.1 ----

115 123 152 336 45

365 534 617 1364 45

3.2 4.3 4.1 4.1 --

273 38 ----

20 28 ----

56 193 ----

2.8 6.9 ----

a

Mem = number of membrane proteins. bTMD = transmembrane domains in those proteins. cPUq = number of proteins unique to fraction. dMUq = number of membrane proteins unique to fraction. eMUq TMD = total number of TMDs represented by the identified unique membrane proteins. f Ratio = MUq TMD/MUq.

Figure 5. Fractional contributions to protein sequence coverage across the hydrophobicity range. Identified proteins were binned by TMD content (left) or hydrophobicity index (right), and average sequence coverage per bin was calculated. Overall sequence coverage was partitioned between C (blue), T (red), and their shared overlap (purple) to depict the coverage contribution per fraction. Normalized sequence coverage contributions were also provided (inset). As protein TMD content or hydrophobicity increases, the overall sequence coverage contributions from T are reduced, especially relative to C where contributions remain roughly stable across all hydrophobicities.

identified in T, these proteins contain less TMDs per protein and thus a proportionally larger soluble/trypsin-compatible region, which effectively increases their chance LC-MS/MS identification. Taken together, these data indicate that membrane proteins with increased hydrophobicity (i.e., increased density of TMDs per unit length) tend to be located above the filter membrane following trypsin digestion and are likely un(der)digested (and thus under-represented) due mainly to trypsin incompatibility. Sequence Coverage Analysis Reveals Steady Gain in Coverage. To complement the above findings, the sequence coverage gains/contributions per proteolytic fraction were investigated across membrane proteins (containing ≥1 TMD) and proteins of increasing hydrophobicity which might not contain TMDs but have a higher prevalence of hydrophobic amino acids. Contributions to overall sequence coverage were thusly compared by TMD content or overall protein hydrophobicity across fractions (Figure 5). As expected, T contribution to overall sequence coverage diminishes as proteins become more hydrophobic; an effect that is evident in both plots. Meanwhile, C contributions trend roughly stable (10−13%) across all hydrophobicities, sequestering a larger portion of overall sequence coverage as hydrophobicity increases. Similar trends were observed in two other bacterial systems, Escherichia coli and Rhodopseudomonas palustris, suggesting these results are not an artifact of this unique archaeal pairing (Figure S-4, Supporting Information). Interestingly, T contribution to sequence coverage falls precipitously with the presence of only one or two TMDs

after the trypsin digest contain proteins enriched in TMDs (membrane proteins). Large Proportion of Un(der)digested Proteins Have Increased Hydrophobic Character. Impressive proteome coverage was achieved for both I. hospitalis (76%) and N. equitans (86%) independently, with a combined coverage of 79% (Table S-4, Supporting Information), metrics that corroborate our previous studies.14,22 This high proteome coverage was buoyed by the identification of 189 membrane proteins out of 336 (56%) containing at least one predicted TMD (Table 1). This recovery is on par with, if not slightly improved over, other studies utilizing chymotrypsin to enhance membrane protein representation where reported identification rates range from 38% to 50%.37,38 Though similar numbers of membrane proteins were identified across both fractions, more were identified in C despite a 20% reduction in overall protein identification relative to T as well as reduced peptide identification and PSM rates described above (Table S-1, Supporting Information). In fact, C showed significant gains to proteins possessing greater numbers of TMDs. Overall, membrane proteins identified in C contained 34% more TMDs than in T, representing a TMD per membrane protein ratio closer to that of the proteome as a whole. This ratio is further biased toward C when considering only proteins uniquely identified in each fraction. Furthermore, most proteins unique to C are membrane-associated, 64% (liberal) to 74% (conservative). In comparison, only 9% (liberal) and 7% (conservative) of the unique proteins in T were membraneassociated. Though membrane proteins were sampled and F

DOI: 10.1021/acs.analchem.5b01187 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

Figure 6. Peptide coverage maps of select membrane proteins by fraction. Membrane proteins were represented as TMD traces to visualize predicted helical domains (purple trace; TMD probability) in the context of identified peptides. Peptides identified in either C (blue) or T (red) were mapped below each trace. In general, peptides in the T fraction generally avoid predicted TMDs while peptides in the C fraction contribute sequence coverage regardless of TMD propensity.

conditions more compatible with the un(der)digested hydrophobic proteins that remain atop the filter membrane. As presented here, reprocessing these un(der)digested proteins with chymotrypsin confirms the proteinaceous debris left behind contains a higher proportion of TMD-containing membrane proteins compared to the tryptic fraction. Both fractions contributed a unique set of peptides that together increased the sequence coverage of a number of proteins. Peptides mapping to TMDs were especially prevalent in C, leading to significant coverage gains to hydrophobic proteins. In fact, the vast majority of proteins uniquely identified in C were membrane proteins with numerous TMDs, highlighting the importance of reprocessing this debris, particularly if membrane protein representation is relevant. Furthermore, reprocessing sometimes recovers exclusive proteotypic peptides per protein, many of which were previously unobtainable by trypsin alone. This corroborates similar multienzyme digestion strategies16,18,19 and translates to improved digestion yields where >90% of the original protein input was represented by analyzable peptides, a more complete processing of the proteome that no doubt enhances measurement robustness without the need for upfront enrichment that could skew quantitative comparisons. As shown, combining T and C in a single LC-MS/MS run provides a consistent and rich mixture of both sets of proteolytic peptides to more properly represent (and quantify) a given proteome without an increase in measurement time. The methodology presented here was successfully used to analyze the temporal dynamics of the I. hospitalis-N. equitans association, specifically as the ectosymbiont/parasite propagates its host’s surface.22 As reported, most of the proteomic changes occurred at the membrane-level whereby membrane protein abundance increased lockstep with N. equitans abundance, an observation that was more apparent due to this methodology. This is especially true for proteins comprised of a large majority of TMDs, many of which are likely mechanistically important

even as these proteins presumably contain soluble, digestible regions (Figure 5). Normalizing protease-specific contributions to 100% (inset) further highlights these trends and clearly illustrates trypsin’s inability to completely digest proteins with increased hydrophobic character. These trends are perhaps best visualized in Figure 6. This view of sequence coverage corroborates the data presented above, whereby integral membrane proteins like Igni_0545 and Igni_0546, which are almost entirely composed of TMDs, remain un(der)digested in T and are therefore not reliably identified by trypsin use alone. This observation highlights not only the importance of choosing the right protease (or combination of proteases) to successfully sample proteins of interest but also that the un(der)digested material filtered away from the initial tryptic digest is naturally enriched in membrane/hydrophobic proteins that can be easily recovered and redigested in more suitable conditions.



DISCUSSION Global proteome characterization via LC-MS/MS-based shotgun approaches has quickly become a standard tool in the systems biology workflow, providing robust, protein-level perspectives that detail the myriad of biological processes and systems that exist in nature. Though reasonably deep, proteome measurements utilizing standard “bottom-up” sample preparation methodologies are often biased toward soluble proteins.2 Though membrane protein identification is still possible, the number of LC-MS analyzable peptides is strongly dependent on the overall hydrophobic character of the protein, with a protein’s chance to be identified inversely proportional to the number of TMDs across a proteins sequence, a phenomena that largely hinges on protease compatibility.1−3 Fortunately, these proteins can be rescued, and subsequently sampled, by pairing SDS extraction with sample filtration and consecutive, orthogonal proteolytic digestion using proteases and digestion G

DOI: 10.1021/acs.analchem.5b01187 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

Research and Development Program of Oak Ridge National Laboratory (ORNL). ORNL is managed by UT-Battelle, LLC, for the U.S. Department of Energy. The authors would like to thank Harald Huber, Thomas Heimerl, and Reinhard Rachel for providing samples and Paul Abraham for his valuable suggestions and critical reading of this manuscript. Raw LCMS/MS data evaluated in this study is available upon request. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paidup, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

to the association but would have been missed due to trypsin incompatibility. These include a substantial number of hypothetical membrane proteins that were identified to be up-regulated specifically by the presence of N. equitans. The up-regulation of enzymes/pathways involved in protein glycosylation was another interesting finding that coincided with N. equitans’ growth progression on I. hospitalis’ surface.22 As glycosylation is important in archaea24 and often decorate membrane-associated, surface-exposed proteins,21,23 exactly the type of proteins enriched in the un(der)digested fraction, it is possible that protein glycosylation prevents the assignment of numerous peptide spectra. In fact, preliminary results from a 2D-PAGE separation of an I. hospitalis-N. equitans membrane fraction (data not shown) indicates the existence of a tremendous number of protein isoforms, many of which are likely due to glycosylation. This is relevant to our current study, as the overall gains realized in C were despite a large proportion of HQ peptide spectra remaining unassigned. Further analysis of these HQU spectra with PTM interrogation software provided only marginal recovery/assignment of these spectra. This may be due to the difficulty in identifying glycosylated peptides especially as this study was not originally designed to sample these types of modified proteins. In light of these and other data, interrogation of surface protein glycosylation in the I. hospitalis-N. equitans system is currently underway.



(1) Speers, A. E.; Wu, C. C. Chem. Rev. 2007, 107, 3687−3714. (2) Vuckovic, D.; Dagley, L. F.; Purcell, A. W.; Emili, A. Proteomics 2013, 13, 404−423. (3) Helbig, A. O.; Heck, A. J.; Slijper, M. J. Proteomics 2010, 73, 868− 878. (4) Savas, J. N.; Stein, B. D.; Wu, C. C.; Yates, J. R., 3rd. Trends Biochem. Sci. 2011, 36, 388−396. (5) Lin, Y.; Huo, L.; Liu, Z.; Li, J.; Liu, Y.; He, Q.; Wang, X.; Liang, S. PLoS One 2013, 8, e59779. (6) Lin, Y.; Wang, K.; Yan, Y.; Lin, H.; Peng, B.; Liu, Z. J. Sep Sci. 2013, 36, 3026−3034. (7) Zhou, J.; Zhou, T.; Cao, R.; Liu, Z.; Shen, J.; Chen, P.; Wang, X.; Liang, S. J. Proteome Res. 2006, 5, 2547−2553. (8) Wu, F.; Sun, D.; Wang, N.; Gong, Y.; Li, L. Anal. Chim. Acta 2011, 698, 36−43. (9) Tanca, A.; Biosa, G.; Pagnozzi, D.; Addis, M. F.; Uzzau, S. Proteomics 2013, 13, 2597−2607. (10) Bereman, M. S.; Egertson, J. D.; MacCoss, M. J. Proteomics 2011, 11, 2931−2935. (11) Wisniewski, J. R.; Zougman, A.; Nagaraj, N.; Mann, M. Nat. Methods 2009, 6, 359−362. (12) Abraham, P.; Adams, R.; Giannone, R. J.; Kalluri, U.; Ranjan, P.; Erickson, B.; Shah, M.; Tuskan, G. A.; Hettich, R. L. J. Proteome Res. 2012, 11, 449−460. (13) Chourey, K.; Jansson, J.; VerBerkmoes, N.; Shah, M.; Chavarria, K. L.; Tom, L. M.; Brodie, E. L.; Hettich, R. L. J. Proteome Res. 2010, 9, 6615−6622. (14) Giannone, R. J.; Huber, H.; Karpinets, T.; Heimerl, T.; Kuper, U.; Rachel, R.; Keller, M.; Hettich, R. L.; Podar, M. PLoS One 2011, 6, e22942. (15) Antharavally, B. S.; Mallia, K. A.; Rosenblatt, M. M.; Salunkhe, A. M.; Rogers, J. C.; Haney, P.; Haghdoost, N. Anal. Biochem. 2011, 416, 39−44. (16) Swaney, D. L.; Wenger, C. D.; Coon, J. J. J. Proteome Res. 2010, 9, 1323−1329. (17) Tran, B. Q.; Hernandez, C.; Waridel, P.; Potts, A.; Barblan, J.; Lisacek, F.; Quadroni, M. J. Proteome Res. 2011, 10, 800−811. (18) Wisniewski, J. R.; Mann, M. Anal. Chem. 2012, 84, 2631−2637. (19) Wisniewski, J. R.; Rakus, D. J. Proteomics 2014, 109C, 322−331. (20) Erde, J.; Loo, R. R.; Loo, J. A. J. Proteome Res. 2014, 13, 1885− 1895. (21) Deeb, S. J.; Cox, J.; Schmidt-Supprian, M.; Mann, M. Mol. Cell. Proteomics 2014, 13, 240−251. (22) Giannone, R. J.; Wurch, L. L.; Heimerl, T.; Martin, S.; Yang, Z.; Huber, H.; Rachel, R.; Hettich, R. L.; Podar, M. ISME J. 2015, 9, 101. (23) Han, D.; Moon, S.; Kim, Y.; Min, H.; Kim, Y. BMC Genomics 2014, 15, 95.



CONCLUSIONS The results presented here show that membrane and other hydrophobic proteins can be enriched through subtractive digestion whereby their extraction and recovery are facilitated by strong detergents, initial digestion with trypsin, and sample filtration. This allows for a subsequent proteolysis in conditions better suited to this residual material, i.e., an acid-labile detergent and chymotrypsin, to improve upon their representation in collected data. Issues with membrane protein identification have not been conclusively solved, however. Though further optimization of the digestion conditions could prove quite fruitful, one must also seriously consider the degree to which protein modification will preclude their identification. Without planning accordingly, many high-quality peptide spectra will remain unassigned, framing a limited, biased view of the proteome under investigation. Thus, the development and application of strategies that address not only membrane protein solubilization and digestion but also their frequent modification are paramount to their robust measurement.



ASSOCIATED CONTENT

S Supporting Information *

Additional information as noted in text. The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.5b01187.



REFERENCES

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Phone: (865) 241-3507. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This research was supported by a grant from the U.S. Department of Energy, Office of Biological and Environmental Research (DE-SC0006654) and by the Laboratory Directed H

DOI: 10.1021/acs.analchem.5b01187 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry (24) Eichler, J.; Adams, M. W. Microbiol Mol. Biol. Rev. 2005, 69, 393−425. (25) Delahunty, C. M.; Yates, J. R., 3rd. Biotechniques 2007, 43, 563 565, 567 passim. (26) McDonald, W. H.; Ohi, R.; Miyamoto, D. T.; Mitchison, T. J.; Yates, J. R. Int. J. Mass Spectrom. 2002, 219, 245−251. (27) Tabb, D. L.; Fernando, C. G.; Chambers, M. C. J. Proteome Res. 2007, 6, 654−661. (28) Ma, Z. Q.; Dasari, S.; Chambers, M. C.; Litton, M. D.; Sobecki, S. M.; Zimmerman, L. J.; Halvey, P. J.; Schilling, B.; Drake, P. M.; Gibson, B. W.; Tabb, D. L. J. Proteome Res. 2009, 8, 3872−3881. (29) Ma, Z. Q.; Chambers, M. C.; Ham, A. J.; Cheek, K. L.; Whitwell, C. W.; Aerni, H. R.; Schilling, B.; Miller, A. W.; Caprioli, R. M.; Tabb, D. L. J. Proteome Res. 2011, 10, 2896−2904. (30) Moller, S.; Croning, M. D.; Apweiler, R. Bioinformatics 2001, 17, 646−653. (31) Kyte, J.; Doolittle, R. F. J. Mol. Biol. 1982, 157, 105−132. (32) Micallef, L.; Rodgers, P. PLoS One 2014, 9, e101717. (33) Bern, M.; Goldberg, D.; McDonald, W. H.; Yates, J. R., 3rd. Bioinformatics 2004, 20 (Suppl 1), i49−i54. (34) Nesvizhskii, A. I.; Roos, F. F.; Grossmann, J.; Vogelzang, M.; Eddes, J. S.; Gruissem, W.; Baginsky, S.; Aebersold, R. Mol. Cell. Proteomics 2006, 5, 652−670. (35) Cox, J.; Hein, M. Y.; Luber, C. A.; Paron, I.; Nagaraj, N.; Mann, M. Mol. Cell. Proteomics 2014, 13, 2513−2526. (36) Griffin, N. M.; Yu, J.; Long, F.; Oh, P.; Shore, S.; Li, Y.; Koziol, J. A.; Schnitzer, J. E. Nat. Biotechnol. 2010, 28, 83−89. (37) Fischer, F.; Wolters, D.; Rogner, M.; Poetsch, A. Mol. Cell. Proteomics 2006, 5, 444−453. (38) Franzel, B.; Wolters, D. A. Proteomics 2011, 11, 3651−3656.

I

DOI: 10.1021/acs.analchem.5b01187 Anal. Chem. XXXX, XXX, XXX−XXX