Effect of Immunoaffinity Depletion of Human Serum during Proteomic Investigations Anastasia K. Yocum, Kenneth Yu, Tomoyuki Oe, and Ian A Blair* Center for Cancer Pharmacology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104-6160 Received June 11, 2005
Controversy exists regarding the proper mining of the human serum proteome. Because of the analytical challenges of accurately measuring samples containing a very large dynamic range of protein concentrations, current practices have employed depletion of the highly abundant housekeeping serum proteins, such as albumin and immunoglobins. There is question as to the selectivity of depletion, namely, is there loss of other non abundant serum proteins along with albumin, haptoglobin and other commonly depleted proteins. In this study, human serum was analyzed with and without immunoaffinity depletion of the six most abundant proteins by multidimensional liquid chromatography tandem mass spectrometry. Two replicates of each experiment were conducted and compared against one another. In both depleted and nondepleted replicates there was a 73% and 72% overlap of identified peptides and a 64% and 78% overlap of identified proteins, respectively. Of 262 unique proteins identified in the four experiments, 82 were found in common to all four experiments, 142 unique to the depleted serum, and 38 unique to the nondepleted serum. Although serum depletion of highly abundant proteins significantly increased the number of proteins identified, both the degree of sample complexity and this depletion method resulted in a nonselective loss of other proteins. Keywords: serum • biomarker • mass spectrometry • albumin-depletion • INTERACT • ProteinProphet/Peptide Prophet • proteomics • 2D-LC-MS/MS
Introduction Serum plays a central role in clinical diagnosis. It is hoped that one day, given sufficient biomarkers, clinical proteomics will provide a network of information allowing for the early diagnosis of disease and individualized treatment of patients.1 This aspiration lies in the fundamental hypothesis that every cell in the body leaves a record of its physiological state as products shed into the local milieu either as waste or as messages to neighboring or distant cells.2 It is estimated that 20-25% of all cellular proteins are secreted or shed. Since all tissues are perfused by blood and lymphatics, protein and protein fragments shed or secreted by cells can enter circulation.3 Theoretically, blood contains a treasure trove of biomarkers that could reflect the ongoing physiological and pathophysiological states.2 Serum is thought to contain tens of thousands of proteins along with their cleaved or modified forms. These proteins are a reflection of ongoing physiological or pathological events. For a protein to serve as a biomarker for monitoring a pathological process such as cancer, it must conform to the following three rules: it must be secreted or shed from the cancer cell, have the capacity to be carried or diffuse into the circulation, and be present throughout tumor development and progression.3 * To whom correspondence should be addressed. Center for Cancer Pharmacology, University of Pennsylvania School of Medicine, 854 BRB II/ III, 421 Curie Boulevard, Philadelphia PA 19104-6160. Tel: (215) 573-9880. Fax: (215) 573-9889. E-mail:
[email protected].
1722
Journal of Proteome Research 2005, 4, 1722-1731
Published on Web 09/07/2005
One current serum biomarker in use for detection and monitoring of prostate cancer is prostate specific antigen (PSA). There is no specific normal or abnormal PSA level, although the higher a man’s PSA level, the more likely it is that cancer is present. Unfortunately, various benign factors can cause PSA levels to fluctuate, which can lead to unnecessary, more invasive testing. In addition, due to the known complexity in molecular mechanisms underlying oncogensis, the quest for a single biomarker makes little sense from a biological perspective. Hence, there is the need to analyze the entire complex mixture of proteins from serum. Highly abundant proteins significantly interfere with the detection of less abundant proteins.4 Controversy within the field of proteomics has emerged on the best way to obtain the greatest number of protein identifications from human serum;5 i.e., to deplete the major components; e.g., albumin, immunoglobins, or not. Additionally, the high sequence variability within immunoglobins complicates protein assignment.6 As a result, these highly abundant proteins are routinely removed from serum. This removal of high abundance proteins reduces range of protein concentrations that are present allowing lower abundance proteins to be analyzed and quantified. However, immunoaffinity depletion may specifically or even nonspecifically remove low abundance proteins that are bound to the high abundance proteins.7 This is particularly problematic for the already complicated quantitative measurements due to not only the normally sophisticated serum protein kinetics but also 10.1021/pr0501721 CCC: $30.25
2005 American Chemical Society
Immunoaffinity Depletion of Human Serum
the variable and selective protein losses along with the depleted proteins.4 The size of a protein determines how fast it is cleared from the blood. Free flowing low molecular weight proteins and peptides; i.e., less than 30 kDa, should be rapidly cleared by the kidneys.8 Abundant high molecular weight proteins that exist above the filtration range of the kidneys will exist in serum until they are proteolytically cleaved into smaller fragments and excreted. Half-lives of proteins in serum can vary widely: the half-life of albumin is approximately 20 days as compared with that of complement factors, which are in the range of several minutes. Carrier proteins such as albumin act like magnets to accumulate and amplify lower abundance proteins such as bilirubin, functioning as a molecular sponge.9 Therefore, at any point the concentration of a low molecular weight protein or peptide is a function of its entry and exit out of the blood compartment and it’s binding affinity to a carrier protein.2 If a biomarker is a low molecular weight protein that has a high binding affinity toward a carrier protein, then its existence is thus dependent on its production rate, clearance/excretion rate, and the clearance/excretion rates of the bound carrier protein.8 Because a carrier protein has a longer half-life than the unbound biomarker, the total concentration of the biomarker at steady state can become elevated due to its association with the carrier protein.9 This amplification can in theory result in an increase of several orders of magnitude. Existing technologies that often discard abundant carrier proteins could thus fail to capture these protein bound low abundance peptides and proteins.2 Currently, there are a several technologies to remove the higher abundance carrier proteins from serum, including ultracentrifugal filtration and immunoaffinity depletion. Laboratories conducting human serum proteomics have completed their work relying on one method of depletion or another, or not depleting at all, further causing variability in results from different laboratories. The absence or presence of a given protein in any given study brings into to question not only the biology of the experiment but also the possibility of human error. This study was undertaken to explore the utility of one method of depletion keeping every other variable essentially the same. Differences in protein identification in serum were determined, with or without immunoaffinity depletion.
Experimental Section Sample Preparation. Human serum was purchased from Sigma (St. Louis, MO). The human serum was immunoaffinity depleted of highly abundant proteins using the manufacturer’s instructions using a 4.6 × 100 mm Multiple Affinity Removal Column (Agilent Technologies, Palo Alto, CA). This column is designed to selectively remove albumin, IgG, IgA, anti-trypsin, transferrin, and haptoglobin from a serum sample. Immunoaffinity chromatography was conducted on a Hitachi EZChrome Elite HPLC (Hitachi High Technologies America, San Jose, CA). For each sample, the fraction containing the low abundance proteins was collected. Both immunoaffinity depleted and nondepleted human serum was concentrated and buffer exchanged into 100 mM ammonium bicarbonate (Sigma, St. Louis, MO) using 5 kD molecular weight cutoff centrifugal filters (Millipore, Billerica, MA). The serum UV absorbance at 260 and 280 nm was measured and protein quantification was estimated by the following equation; concentration in mg/mL is ) (1.55 × A280 nm) - 0.76 × A260 nm.10 An aliquot of 1 mg of immunoaffinity depleted and nondepleted human serum
research articles proteins were placed in a boiling water bath for 5 min. The sample was allowed to cool in a 37 °C water bath for 15 min. Dithiothreitol (DTT) (Bio-Rad Life Sciences, Hercules, CA) was added to a final concentration of 5 mM and incubated at 60 °C for 1 h to reduce the disulfide bonds. The reduced cysteines were then alkylated with the addition of iodoacetamide (IAA) (Bio-Rad Life Sciences, Hercules, CA) at a final concentration of 15 mM and incubated at room temperature in the dark for 30 min. The reduced and alkylated samples were digested using sequencing grade trypsin (Promega, Madison, WI) at 1:50, enzyme: protein (w/w) ratio and incubated overnight at 37 °C. The pH of the digested protein sample was then lowered to 3 with 1% formic acid solution. An equal volume of strong cation exchange (SCX) mobile phase A (10 mM ammonium formate, 25% acetonitrile, pH 3) was added and the solution was incubated at 4 °C for 4 h to allow for all peptides to take on maximal positive charges. Off-Line Fractionation. SCX separation of the peptides was carried out on a PolySulfoethy A column (100 mm × 4.6, 5 µm 300 Å, PolyLC, Columbia, MD) at a flow rate of 0.2 mL/min. The separation was performed by HPLC using a Hitachi L-7100 pump (Hitachi, San Jose, CA) with a Waters model 996 photodiode array detector (PDA) and a Rheodyne model 7725 manual injector with a 1 mL sample loop (Rohnert Park, CA). The sample was loaded for 10 min with mobile phase A composed of 10 mM ammonium formate, 25% acetonitrile, pH 3. A linear gradient for 60 min was run to 100% mobile phase B composed of 500 mM ammonium formate, 25% acetonitrile, pH 6.8 to elute the peptides. Forty, 2 min fractions were collected with a Foxy Jr (Dionex, Sunnyvale, CA) automated fraction collector, lyophilized and stored at -80 °C until further analysis. On-Line Reversed-Phase Liquid Chromatography/Tandem Mass Spectrometry (RP-LC-MS/MS) Analysis. Lyophilized peptides were reconstituted with 0.1% formic acid, 5% acetonitrile for reversed phase separation on-line to the Finnigan LTQ ion trap mass spectrometer (Thermo Electron, San Jose, CA) using ESI. Each fraction was loaded using a Surveyor autosampler (Thermo Electron, San Jose, CA) and injected onto a Vydac C18 (100 × 1 mm i.d., 300 Å, 5 µm; Bodmann, Aston, PA) column with 0.1% formic acid in water for 5 min before a linear gradient was run over 60 min to 70% mobile phase B composed of 0.1% formic acid in acetonitrile. Nitrogen was used as the sheath (75 psi) and auxiliary (10 units) gas with the heated capillary at 180 °C. Collision induced disassociation (CID) experiments employed helium with collision energy at 25% of 1 V. To identify the eluting peptides, the ion trap mass spectrometer was operated in a data-dependent MS/MS mode (m/z 300-2000), in which the top five ions were subjected to MS/MS with 25% normalized CID. Dynamic mass exclusion was enabled with a repeat count of 2 every 30 s for a list size of 200. Database Searching and Statistical Analysis. The raw MS/ MS data were submitted to Bioworks Browser (Thermo Electron, San Jose, CA) and batch searched through TurboSEQUEST against an indexed human RefSeq database (version updated 12/04). The database was indexed using strict trypsin cleavage rules with two maximum internal cleavage sites and differential modifications of methionine oxidation, and carboxyamidomethylation on cysteine. The SEQUEST output files were analyzed and validated by PeptideProphet as an automated method to assign peptides to MS/MS spectra. INTERACT was used to restrict the datasets by filtering at a cutoff probability Journal of Proteome Research • Vol. 4, No. 5, 2005 1723
research articles
Yocum et al.
Results and Discussion
Figure 1. Redundancy in two replicate 2D-LC-MS/MS analysis. Blue represents depleted and red represents non depleted human serum: (a) 1352 and 1392 unique peptides identified from depleted serum with an average overlap of 73%, (b) 743 and 731 unique peptides identified from nondepleted serum with an average overlap of 72%. (c) 171 and 158 unique proteins identified from depleted serum with an average overlap of 64%, (d) 102 and 93 unique proteins identified with an average overlap of 78%.
pcomp of 0.5. INTERACT differential (IADIFF) (version v.1) was used for side-by-side comparison of identified peptide sequences contained within multiple INTERACT files. The pcomp filtered data files were submitted to ProteinProphet (version 7.9.03) to determine a minimal list of proteins that explain the observed data and computes protein probabilities. ProtDiff, a software program developed in-house, was used for side-byside comparison of lists of proteins. More information on the open source applications PeptideProphet, INTERACT, IADIFF, and ProteinProphet can be found at www.systemsbiology.org/.
Individually, 1353 and 1392 unique peptide sequences were identified in the two depleted serum replicate experiments. This represents an average of 1372 unique peptides and 998 (73%) peptides found in common to both replicates (Figure 1a). Not surprisingly, the replicates with the nondepleted serum resulted in significantly less uniquely identified peptide sequences. There were 743 and 731 unique peptides sequenced in each replicate in which 532 (72%) were found to be in common (Figure 1b). Figures 2 and 3 show the LC-SCX/UV chromatogram and histogram of number peptides analyzed per LC-SCX fraction collected for nondepleted and depleted experiments, respectively. Although the depleted and the nondepleted LCSCX chromatograms look considerably different, the replicate chromatograms were extremely similar (data not shown). Additionally, MS data correlated with the charge state and retention time in cation exchange chromatography. Higher fraction numbers contained higher charge state peptides (Figure 4), which was expected with this chromatography method. Although these figures may seem small in contrast to other studies,4,6,11 we have used a newer statistical model to gain more stringent validation of peptide sequence assignments.12,13 In the past, MS/MS data were searched via SEQUEST, which generates scores relating to the quality of the peptide sequence assigned to each spectrum. The individually defined “correctness” of a peptide assignment was based on a “threshold approach,” i.e., cross correlation (Xcorr), delta correlation value (∆Cn) or ranking preliminary score (RSp). Identifications below an arbitrary threshold for one or more of these criteria were discarded, and identifications that were close to the set threshold were manually inspected for validation. This threshold approach only adds to the already highly variable process of identifying proteins in complex mixtures. In this experiment
Figure 2. LC-SCX chromatogram and histogram showing number of peptides with pcomp g 0.5 analyzed in the mass spectrometer per off-line SCX fraction collected for nondepleted serum. 1724
Journal of Proteome Research • Vol. 4, No. 5, 2005
Immunoaffinity Depletion of Human Serum
research articles
Figure 3. LC-SCX chromatogram and histogram showing number of peptides with pcomp g 0.5 analyzed in the mass spectrometer per off-line SCX fraction collected for immunoaffinity depleted serum.
we submitted the datasets to PeptideProphet, a statistical algorithm that generates its own discriminate score based on weighting a number of parameters for the peptide, including various SEQUEST scores mentioned above and the mass difference between calculated and observed values. PeptideProphet calculates the population distribution for the discriminate score for all peptide matches, then learns the underlying distributions of correct and incorrect identifications and performs an iterative process of refining the model to better fit the observed data.12 Figure 5 shows the statistical modeling of one depleted serum data set. This dataset provided MS/MS spectra from 43 760 singly charged, 53 204 doubly charged and 53 204 triply charged peptides. After 63, 27, and 28 iterations, the estimated total number of correct peptide assignments were 0.2, 11 182.1, and 4576.3, respectively for each charge state. There were 15,670 correct peptide assignments out of the collected 150 168 MS/MS spectra. Therefore, only approximately 10%, of the peptide assignments made by SEQUEST were indeed correct. Further, all of the correct assignments were for doubly or triply charged ions. This finding was expected due to the mode of trypsin hydrolysis. Every peptide should have at least two basic sites that can be readily protonated; the N-terminal amino group and the C-terminal arginine or lysine side chain. PeptideProphet computes a probability (pcomp) for each of the correct peptide assignments at which pcomp ) 0.5 occurs at the point where the incorrect and correct distributions intersect. Figure 6a shows this calculation graphically where the red line represents sensitivity and the green line error. For this joined charge state dataset, pcomp g 0.5 is a level that is sensitive enough to correctly identify 94% of the peptides that are indeed correct but also identify
the approximately 5% that are false positives. Therefore, we used ProteinProphet to determine the proteins that these peptides collectively represented. ProteinProphet is another advanced statistical algorithm that, using Occam’s razor, derives the simplest list of proteins that can explain the peptide data. It assigns its own probability, Pcomp, of a given identification to be correct. Figure 6b shows this calculation graphically where, again, sensitivity and error can be plotted against probability (Pcomp). In this same dataset, a Pcomp g 0.3 was chosen and provided a sensitivity of 98% with a false positive rate of 5%. It is important to note that each dataset was analyzed independently. Respective pcomp and Pcomp values were independently chosen for a false positive rate e5%, allowing for maximum sensitivity. Using this method of statistical analysis, an average of 171 and 158 unique proteins were identified in the two depleted serum replicate experiments. This represents an average of 165 unique proteins where 105 (64%) proteins were identified in common to both replicates (Figure 1c). Analysis of nondepleted serum replicates resulted in the identification of 102 and 93 unique proteins in each replicate in which 75 (78%) were found to be in common (Figure 1d). It is known that human serum contains the proteases elastase and chymotrypsin. These enzymes result in endogenously digested protein fragments. Therefore, it was hypothesized that we were missing the identification of peptides and proteins by requiring two tryptic termini. To investigate this possible loss of identifications and additionally differences in computational analysis, one replicate of depleted serum was analyzed with SEQUEST using five additional enzyme parameters: elastase, chymotrypsin, each with two missed cleavages, Journal of Proteome Research • Vol. 4, No. 5, 2005 1725
research articles
Yocum et al.
Figure 4. Representative MS/MS spectrum from Apolipoprotein A4 (ApoA4). 60% of ApoA4 was identified by sequencing 32 unique peptides 313 times in one depleted serum replicate. The top panel shows an earlier fraction that has a lower charge state consistent with a corresponding shorter retention time within SCX chromatography. The bottom panel shows the same peptide which has a higher charge state and longer SCX chromatographic retention time and its different fragmentation pattern.
Figure 5. Example of statistical modeling of MS/MS data by PeptideProphet for one replicate dataset. Singly (a), doubly (b), and triply (c) charged MS/MS spectra are calculated individually.
the combination of elastase, trypsin, and chymotrypsin with two, four and five missed cleavages. All five additional enzyme parameters were then analyzed through INTERACT, PeptideProphet and ProteinProphet. With the data analyzed using elastase and chymotrypsin, PeptideProphet could not distinguish between correct and incorrect peptide sequences and therefore was not able to be fully analyzed by ProteinProphet. This inability to determine correct and incorrect peptide sequences also occurred with the combination digestion elastase, chymotrypsin and trypsin but only with two missed 1726
Journal of Proteome Research • Vol. 4, No. 5, 2005
cleavages. When we increased the missed cleavages to four and then five, PeptideProphet was able to discriminate between correct and incorrect peptide sequences and therefore analysis was completed with ProteinProphet. This is speculated to result form the lack of fidelity with endogenous elastase and chymotrypsin and because they cleave nonspecifically at numerous amino acids. Elastase cleaves C-terminal to G, A, S, V, L, and I, while chymotrypsin cleaves C-terminal to F, Y, W, and L. Including trypsin which cleavages C-terminal to K and R, these three enzymes collectively would cleave at 12 of the 20 amino
research articles
Immunoaffinity Depletion of Human Serum
Figure 6. (a) Peptide Probability (pcomp) or (b) Protein Probability (Pcomp) for acceptance of peptide and protein identifications, respectively. Sensitivity (Red Line) is the fraction of all correct assignments passing pcomp or Pcomp filter while the error (Green Line) is fraction of peptide assignments passing pcomp or filter Pcomp that are incorrect. Table 1. Number of Protein and Peptide Identifications for One Experiment Analyzed by Different Statistical Thresholdsa
protease
Trypsin strict 2MC Elastase only 2MC Chymotrypsin only 2MC ChymotrypsinElastase-Trypsin 2MC ChymotrypsinElastase-Trypsin 4MC ChymotrypsinElastase-Trypsin 5MC a
Xcorr +1 g 1.9, +2 g 3, +3 g 3.75, ∆Cn g 0.1
Xcorr +1 g 1.9, +2 g 2.2, +3 g 3.2, ∆Cn g 0.1
Xcorr +1 g 1.9, +2 g 2.2, +3 g 3.3, ∆Cn g 0.08
unique IDs
peptides
single peptide protein hits
unique IDs
peptides
single peptide protein hits
unique IDs
peptides
single peptide protein hits
109
5245
18
265
7887
120
289
7909
138
17
30
11
237
690
147
275
770
162
23
47
14
238
627
140
274
723
167
38
245
14
255
1105
146
295
1211
167
104
2386
20
380
4936
189
460
5132
121
3275
30
374
6028
194
454
6249
peptide and Protein Prophet error rate 5% with greatest sensitivity
unique IDs
peptides
single peptide protein hits
416 170 10118 (P > 0.5) (p > 0.5)
235
224
278 238 5017 (P > 0.7) (p > 0.75)
109
238
301 212 6619 (P > 0.7) (p > 0.7)
160
MC ) missed cleavages.
acids. If all three enzymes operated with high fidelity with only a maximum of two missed cleavages, the result, theoretically, would be tripeptides or dipeptides that may not be retained on the reverse-phase column. They would not be detected by the mass spectrometer. Larger peptides that were indeed analyzed by the mass spectrometer would not follow the rules of theoretical digestion and therefore not be identified by PeptideProphet. Nevertheless, ProteinProphet results with four and five missed cleavages in the combination digestion enzyme search, yielded an additional 108 and 131 identified proteins, respectively, confirming our hypothesis. Armed with discovery of more identified proteins, we analyzed all our data sets with these new enzyme parameters, but found very low concordance between replicates of approximately 45%. This low concordance between replicates was also seen when the data were analyzed purely through INTERACT and filtered with common thresholds; i.e., Xcorr and ∆Cn (Table 1). A more comprehensive investigation of these results
was not completed for this study although a cursory look causes some concern. The thresholds shown in Table 1 have been found in the literature and were subsequently not used for this study due to the lack of concordance between replicates.4,6,7,14 Using the Institute for Systems Biology analysis tools, INTERACT, PeptideProphet, and ProteinProphet, a combined
Figure 7. Combined (262) unique proteins identified from each replicate compared overall between depleted (224) and nondepleted (120) serum. Journal of Proteome Research • Vol. 4, No. 5, 2005 1727
research articles
Yocum et al.
Table 2. Proteins Found in Both Replicates in Common to Both Immunoaffinity Depleted and Non-depleted Samples afamin precursor albumin precursor alpha 1B-glycoprotein alpha-1-microglobulin/bikunin precursor alpha-2-glycoprotein 1, zinc alpha-2-HS-glycoprotein alpha-2-macroglobulin precursor alpha-2-plasmin inhibitor angiotensinogen precursor apolipoprotein A-II precursor apolipoprotein A-IV precursor apolipoprotein B precursor; apoB-100; apoB-48 apolipoprotein C-I precursor apolipoprotein C-II precursor apolipoprotein D precursor apolipoprotein E precursor apolipoprotein H precursor beta globin biotinidase precursor CD5 antigen-like (scavenger receptor cysteine rich family) ceruloplasmin (ferroxidase) clusterin isoform 1 coagulation factor II precursor coagulation factor IX coagulation factor XII precursor complement component 1, r subcomponent complement component 2 precursor complement component 3 precursor complement component 4 binding protein, alpha complement component 4B preproprotein complement component 5 Complement component 6 precursor complement component 8, alpha polypeptide precursor complement component 8, beta polypeptide complement component 8, gamma polypeptide complement component 9 complement factor B preproprotein complement factor H complement factor H-related 1 corticosteroid binding globulin precursor fibrinogen, alpha chain isoform alpha preproprotein fibrinogen, beta chain preproprotein
fibrinogen, gamma chain isoform gamma-A precursor glycosylphosphatidylinositol specific phospholipase D1 isoform 1 precursor H factor (complement)-like 3 haptoglobin hemopexin heparin cofactor II histidine-rich glycoprotein precursor I factor (complement) insulin-like growth factor binding protein, acid labile subunit inter-alpha (globulin) inhibitor H1 inter-alpha (globulin) inhibitor H2 inter-alpha (globulin) inhibitor H3 inter-alpha (globulin) inhibitor H4 (plasma Kallikrein-sensitive glycoprotein) kininogen 1 leader-binding protein 32 isoform 1 leucine-rich alpha-2-glycoprotein 1 lymphocyte antigen 64 homolog, radioprotective 105kDa orosomucoid 1 precursor orosomucoid 2 paraoxonase 1 peptidoglycan recognition protein L precursor plasma carboxypeptidase B2 isoform a preproprotein plasma kallikrein B1 precursor plasminogen Apolipoprotein A-I precursor (Apo-AI) properdin P factor, complement protein S (alpha) retinol-binding protein 4, plasma precursor serine (or cysteine) proteinase inhibitor, clade A, member 3 precursor serine (or cysteine) proteinase inhibitor, clade A, member 4 serine (or cysteine) proteinase inhibitor, clade C, member 1 serine (or cysteine) proteinase inhibitor, clade F, member 1 serum amyloid A4, constitutive serum amyloid P component precursor transferrin transthyretin vitamin D-binding protein precursor vitronectin precursor
Table 3. Proteins Found in Both Replicates Unique Non-immunoaffinity Depleted Samples adenosine deaminase, tRNA-specific 1
fibrillin 2
apolipoprotein L1 isoform a brother of CDO calpastatin isoform d
fibulin 1 isoform D gelsolin isoform b Jumonji, AT rich interactive domain 1B (RBP2-like) DEAD (Asp-Glu-Ala-Asp) box polypeptide 24 KIAA1501 protein DHHC1 protein membrane associated guanylate kinase interacting protein-like 1 dicer1 myosin 18A isoform a dishevelled associated activator of NIMA (never in mitosis gene a)morphogenesis 2 related kinase 11 dynein, axonemal, heavy polypeptide peroxisomal biogenesis factor 19 9 isoform 2
262 unique proteins were identified in four separate twodimensional LC-MS/MS experiments from 1 mg of human serum proteins (Figure 7). Table 2 lists those unique proteins found in common to both depleted and nondepleted serum replicates. Analysis of replicates of the same serum sample under identical conditions led to the identification of many of the same proteins. However, these sets were not identical. The most comprehensive data set published to date resulted in approximately 800 unique protein identifications from 28 separate experiments, each replicate contributing approximately 10-20% unique identifications.4 Additionally, the HUPO 1728
Journal of Proteome Research • Vol. 4, No. 5, 2005
propionyl-Coenzyme A carboxylase, alpha polypeptide precursor protein p205; archvillin regulatory solute carrier protein, family 1, member 1 serine (or cysteine) proteinase inhibitor, clade A member 1 supervillin isoform 2; membrane-associated F-actin binding zinc finger protein 180
Plasma Proteome Project (PPP) reports 9504 proteins have been identified across 18 different laboratories with 7,138 identified in only one laboratory.15 It is logical to expect that the degree of overlap between replicates samples is a function of sample complexity. As the level of sample complexity increases, so does the need for pre-fractionation methods due to limitations in sampling rate of the mass spectrometer. Competition for ionization and variation in fragmentation act to reduce the magnitude of the correlation between replicates. Similarly, all correctly matching peptides between replicates will not necessarily have the same discriminatory scores unless the same
research articles
Immunoaffinity Depletion of Human Serum Table 4. Proteins Found in Both Replicates Unique to Immunoaffinity Depleted Samples 5′ nucleotidase, cytosolic IB isoform 2 adaptor-related protein complex 3, delta 1 subunit additional sex combs such as 2 adenylate cyclase 9 adiponectin precursor AF-1 specific protein phosphatase angiogenin, ribonuclease, RNase A family, 5 precursor Apolipoprotein C-III precursor (Apo-CIII) apolipoprotein F precursor apolipoprotein L1 isoform b precursor apolipoprotein M attractin isoform 1 attractin isoform 3 beta-2-microglobulin precursor bile acid beta-glucosidase breast carcinoma amplified sequence 1 calcium channel, voltage-dependent, alpha 1E subunit carboxypeptidase N, polypeptide 1, 50kD precursor carnosinase 1 catenin, alpha 3 CD14 antigen precursor cell division cycle 10 isoform 1 centromere protein F (350/400kD) centromere protein J chromodomain helicase DNA binding protein 6 chromosome 1 open reading frame 9 protein chromosome 16 open reading frame 50 coagulation factor V precursor coagulation factor X precursor coagulation factor XIII A1 subunit precursor coagulation factor XIII B subunit precursor complement component 1 inhibitor precursor complement component 1, q subcomponent, alpha polypeptide precursor complement component 1, q subcomponent, beta polypeptide precursor complement component 1, q subcomponent, gamma polypeptide complement component 1, r subcomponent-like precursor complement component 1, s subcomponent complement component 7 precursor complement factor D preproprotein c-src tyrosine kinase C-type lectin domain family 3, member B cytochrome P450, family 24 precursor cytosolic phosphoenolpyruvate carboxykinase 1 dante DEP domain containing 2 isoform a dopamine beta-hydroxylase precursor dopamine receptor D1 DPCD protein dynein, axonemal, heavy polypeptide 8 E1A binding protein p400 eomesodermin ephrin receptor EphB6 precursor eukaryotic translation initiation factor 3, subunit 4 delta, 44kDa FAT gene product F-box only protein 25 fibrinogen, gamma chain isoform gamma-B precursor fibronectin 1 isoform 1 preproprotein fibulin 1 isoform C precursor G protein-coupled receptor 124 galectin 3 binding protein gamma-glutamyl hydrolase precursor ganglioside induced differentiation associated protein 2 gelsolin isoform a hephaestin
heterogeneous nuclear ribonucleoprotein L-like histidyl-tRNA synthetase homeo box D12 HS1-binding protein 3 isoform 1 hyaluronan binding protein 2 importin 9 inositol polyphosphate-5-phosphatase F isoform 1 keratin, hair, basic, 1 KIAA0493 protein KIAA1217 KIAA1529 KIAA1754 KIAA1853 protein l(3)mbt-like 3 isoform a liprin-beta 1 LKB1 interacting protein low affinity immunoglobulin gamma Fc region receptor III-B precursor L-plastin lumican precursor macrophage stimulating 1 (hepatocyte growth factor-like) mannosidase, alpha, class 2A, member 1 matrix metalloproteinase 11 preproprotein mitochondrial ribosomal protein L40 mitogen-activated protein kinase kinase kinase 5 mixed-lineage leukemia; translocated to, 4 NACHT, leucine rich repeat and PYD containing 11 ninein isoform 5 peroxisomal short-chain alcohol dehydrogenase phosphoglucomutase 1 progestin-induced protein prolactin regulatory element binding protein promyelocytic leukemia protein isoform 10 promyelocytic leukemia protein isoform 5 protein kinase D3 protein phosphatase 1, catalytic subunit, beta isoform 1 RAN binding protein 2 RAN binding protein 5 rearranged L-myc fusion sequence regulator of nonsense transcripts 1 Rho guanine nucleotide exchange factor 7 isoform a ribosome binding protein 1 RIKEN cDNA A430025D11 ryanodine receptor 1 (skeletal) serine (or cysteine) proteinase inhibitor, clade A, member 10 serine (or cysteine) proteinase inhibitor, clade A, member 7 serine active site containing 1 Serine/threonine-protein kinase Nek1 serum amyloid A1 isoform 2 sodium channel, voltage gated, type VIII, alpha solute carrier family 12 (potassium/chloride transporters), member 7 SRY (sex determining region Y)-box 6 stage-specific S antigen homolog TAR RNA binding protein 1 testisin isoform 1 thyroid hormone receptor interactor 11 titin isoform N2-A transcription factor 19 (SC1) transient receptor potential cation channel, subfamily M, member 7 tumor necrosis factor, alpha-induced protein 3 ubiquitin protein ligase E3 component n-recognin 1 zinc finger protein 101
Journal of Proteome Research • Vol. 4, No. 5, 2005 1729
research articles number of product ions are generated for a sequence and all of those product ions are present in mass spectrum.16 Reassuringly, serum depletion of highly abundant proteins allowed for almost double the number of peptide assignments and protein identifications. However, total removal of albumin was not achieved. In the depleted samples, 29 peptides (7 unique) were identified representing approximately 7% of albumin by mass. This compares to the nondepleted sample, where 9582 peptides (117 unique) were identified representing 86% of albumin by mass. This finding may be the result of inadequate or inefficient affinity removal, or the presence of albumin fragments containing epitopes not recognized by antibodies on the affinity column. Regardless, a decrease of a minimum 9553 data dependent MS/MS scans allowed the instrument to interrogate other, lower abundance peptides. It also allowed for higher sequence coverage on identified proteins. For example, Apolipoprotein A IV (ApoA4) was found in all experiments. However, approximately 60% of the protein was sequenced compared with the nondepleted replicate samples where only approximately 30% of the protein was sequenced (Figure 4). Surprisingly, 38 unique proteins were identified in the nondepleted serum, which were not found in the depleted serum. After even closer analysis, 14 unique proteins identified were removed from this list as they were either extremely homologous to other proteins found in common to nondepleted and depleted datasets or were members of the immunoglobulin families. The final 24 proteins are listed in Table 3 while Table 4 lists the unique proteins identified in common with the two depleted serum replicates. Serum depletion of high abundance proteins did not result in the loss of a large number of proteins. Although this is reassuring, we are cognizant to the fact that our methods may simply lack the sensitivity to detect additional proteins or protein fragments, which would be present at low concentrations.
Conclusion Current proteomic strategies continue to wrestle with the paradox of obtaining both breadth and depth. Clinical proteomics is at the forefront of this problem. With current technologies, the ability to find the greatest number of protein identifications in serum requires narrowing the dynamic range of the sample. One way of accomplishing this is by depleting the most abundant proteins, such as albumin, immunoglobulins, transferrin and haptoglobin. Many of these proteins, however, act specifically as carriers for other proteins. By depleting the most abundant proteins, one may be eliminating with them a large number of far more interesting proteins. Additionally, because of nonselective losses, the ability to quantitate protein levels may be compromised. In this study, it appears that two types of nonselective loss of protein identifications are occurring. The first source of nonselective loss becomes apparent during replicate analysis. Concordance in both replicate analyses was only approximately 72% based on the peptides sequenced. The approximately 28% of peptides not identified in the replicate samples were not identified consistently due to either absent or poor MS/MS spectra. As MS/MS acquisition was carried out in a data dependent mode, an absent MS/MS spectrum is due to a lack of intensity of the parent ion, which can be dependent on the degree of ionization. In complex samples such as serum, the degree of ionization for a parent ion is highly variable, due to 1730
Journal of Proteome Research • Vol. 4, No. 5, 2005
Yocum et al.
Figure 8. Gene Ontology (GO) cellular component classification (a) and functional classification (b) of the 24 proteins identified in both replicates of nondepleted serum that were not found in depleted serum replicates.
competition for ionization. The lack of 100% concordance between replicates could simply be a function of competitive ionization and differences in fragmentation within the mass spectrometer and would contribute to the nonselective loss in protein identification. The second source of nonselective losses may be attributed to immunoaffinity depletion of albumin, transferrin, haptoglobin, IgG, IgA, and anti-trypsin. When comparing the replicates of depleted vs nondepleted serum, there were 24 proteins identified in both replicates of nondepleted serum that were not found in the depleted serum replicates. These proteins, which may be lost due to serum depletion, are located in several different subcellular locations and have various biological and molecular functions (Figure 8a and b, respectively). Loss of these and other proteins may hinder the ability to quantify possible biomarkers for disease progression or pharmacological efficacy and toxicity. One cannot ignore, however, that the process of depleting several of the most abundant proteins in the serum allowed for a much deeper analysis. The present study has demonstrated that serum depletion of abundant proteins results in both the uncovering of a large number of proteins, and the loss of a smaller number of serum proteins. While the ability to identify more, lower-abundance serum proteins is a welcome finding, the loss of even a few proteins is cause for concern. One implication of such nonselective loss is the inaccurate quantitative analysis of serum biomarkers. The need, therefore, remains for development of new methods for simplifying complex biological samples to prevent competitive ionization while minimizing the nonselective loss of protein identifications observed here.
Acknowledgment. We thank Drs. Richard Smith, Heather Mottaz, Ronald Moore, and Harold Udseth at the Environmental Molecular Sciences Laboratory, Pacific Northwest National
research articles
Immunoaffinity Depletion of Human Serum
Laboratory, Richland, WA for their helpful insights and discussions. Financial support from NIH Grant Nos. R01 CA95586, S10 RR019221, and R25 CA101871 is gratefully acknowledged.
References (1) Stanley, B. A.; Gundry, R. L.; Cotter, R. J.; Van Eyk, J. E. Dis. Markers 2004, 20, 167-178. (2) Liotta, L. A.; Ferrari, M.; Petricoin, E. Nature 2003, 425, 905. (3) Diamandis, E. P. Mol. Cell. Proteomics 2004, 3, 367-378. (4) Shen, Y.; Jacobs, J. M.; Camp, D. G., 2nd; Fang, R.; Moore, R. J.; Smith, R. D.; Xiao, W.; Davis, R. W.; Tompkins, R. G. Anal. Chem. 2004, 76, 1134-1144. (5) Hanash, S. Mol. Cell. Proteomics 2004, 3, 298-301. (6) Adkins, J. N.; Varnum, S. M.; Auberry, K. J.; Moore, R. J.; Angell, N. H.; Smith, R. D.; Springer, D. L.; Pounds, J. G. Mol. Cell. Proteomics 2002, 1, 947-955. (7) Qian, W. J.; Jacobs, J. M.; Camp, D. G., 2nd; Monroe, M. E.; Moore, R. J.; Gritsenko, M. A.; Calvano, S. E.; Lowry, S. F.; Xiao, W.; Moldawer, L. L.; Davis, R. W.; Tompkins, R. G.; Smith, R. D. Proteomics 2005, 5, 572-584. (8) Mehta, A. I.; Ross, S.; Lowenthal, M. S.; Fusaro, V.; Fishman, D. A.; Petricoin, E. F., 3rd; Liotta, L. A. Dis. Markers 2003, 19, 1-10.
(9) Reed, R. G.; Davidson, L. K.; Burrington, C. M.; Peters, T., Jr. Clin. Chem. 1988, 34, 1992-1994. (10) Stoscheck, C. Quant. Protein Methods Enzymol. 1990; Vol. 182, p 50-69. (11) Qian, W. J.; Monroe, M. E.; Liu, T.; Jacobs, J. M.; Anderson, G. A.; Shen, Y.; Moore, R. J.; Anderson, D. J.; Zhang, R.; Calvano, S. E.; Lowry, S. F.; Xiao, W.; Moldawer, L. L.; Davis, R. W.; Tompkins, R. G.; Camp, D. G., 2nd; Smith, R. D. Mol. Cell. Proteomics 2005, 4, 700-709. (12) Von Haller, P. D.; Yi, E.; Donohoe, S.; Vaughn, K.; Keller, A.; Nesvizhskii, A. I.; Eng, J.; Li, X. J.; Goodlett, D. R.; Aebersold, R.; Watts, J. D. Mol. Cell. Proteomics 2003, 2, 428-442. (13) Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Anal. Chem. 2002, 74, 5383-5392. (14) Mayya, V.; Rezaul, K.; Cong, Y. S.; Han, D. Mol. Cell. Proteomics 2005, 4, 214-223. (15) HUPO (Human Proteome Organization) 3rd Annual World Congress. Beijing, China, October 25-27, 2004. Mol. Cell. Proteomics 2004, 3, S1-352. (16) MacCoss, M. J.; Wu, C. C.; Yates, J. R., 3rd. Anal. Chem. 2002, 74, 5593-5599.
PR0501721
Journal of Proteome Research • Vol. 4, No. 5, 2005 1731