Evaluation of Three Principally Different Intact Protein Prefractionation Methods for Plasma Biomarker Discovery Maria Pernemalm,† Lukas M. Orre,† Johan Lengqvist,† Pernilla Wikstro ¨ m,‡ Rolf Lewensohn,† ,† and Janne Lehtio ¨* Karolinska Biomics Center, Karolinska University Hospital, Karolinska Institutet, Z5:02, 171 76 Stockholm, Sweden, and Medical Biosciences, Pathology, Umeå University, 90185 Umeå, Sweden Received December 6, 2007
The aim of this study was to evaluate three principally different top-down protein prefractionation methods for plasma: high-abundance protein depletion, size fractionation and peptide ligand affinity beads, focusing in particular on compatibility with downstream analysis, reproducibility and analytical depth. Our data clearly demonstrates the benefit of high-abundance protein depletion. However, MS/ MS analysis of the proteins eluted from the high-abundance protein depletion column show that more proteins than aimed for are removed and, in addition, that the depletion efficacy varies between the different high-abundance proteins. Although a smaller number of proteins were identified per fraction using the peptide ligand affinity beads, this technique showed to be both robust and versatile. Size fractionation, as performed in this study, focusing on the low molecular weight proteome using a combination of gel filtration chromatography and molecular weight cutoff filters, showed limitations in the molecular weight cutoff precision leading detection of high molecular weight proteins and, in the case of the cutoff filters, high variability. GeLC-MS/MS analysis of the fractionation methods in combination with pathway analysis demonstrates that increased fractionation primarily leads to high proteome coverage of pathways related to biological functions of plasma, such as acute phase reaction, complement cascade and coagulation. Further, the prefractionation methods in this study induces limited effect on the proportion of tissue proteins detected, thereby highlighting the importance of extensive or targeted downstream fractionation. Keywords: plasma • proteomics • prefractionation • depletion • affinity beads • biomarker discovery • size exclusion
Introduction Plasma proteomics has gained much attention during the past few years. The human plasma proteome project (http:// www.hupo.org/research/hppp/) was launched in 2002, and since then, the field has grown exponentially. From a clinical point of view, it is easy to understand why plasma biomarker discovery is so attractive. First, plasma is routinely sampled in the clinic. Second, the sampling is minimally invasive and can be performed repeatedly (for example, before, during or after a treatment). Third, plasma itself is a very rich source of proteins since it is in contact with all tissues in the body and thereby, in theory, contains traces of all activities in the body.1 However, plasma has proved to be a very challenging starting point for discovery proteomics. This is mainly due to the high dynamic range of concentrations in plasma, where a few highabundance proteins completely dominate the protein content, making transient low-abundance proteins extremely hard to detect. To facilitate the discovery of low-abundance proteins, * To whom correspondence should be addressed. E-mail, janne.lehtio@ ki.se; tel, +46-8-51776391; fax, +46-8-51776099. † Karolinska University Hospital. ‡ Umeå University.
2712 Journal of Proteome Research 2008, 7, 2712–2722 Published on Web 06/13/2008
a vast number of prefractionation methods have been developed. In general, prefractionation for plasma proteomics can be performed both on the protein or the peptide level or using a combination of the two. Lately, intact protein separation, topdown proteomics, has gain increased interest even in mass spectrometry based work flows.2 In this study, only intact protein fractionation methods will be discussed. The prefractionation methods evaluated in this study should be seen as first-line prefractionation, meaning that downstream analysis should contain a second level of fractionation. Important aspects to be evaluated are therefore compatibility with downstream analyses reproducibility and throughput. Most first line prefractionation methods can be assigned to any of the following categories: high-abundance protein depletion, affinity enrichment and biophysical fractionation. The underlying principle with all these methods is making the sample less complex, thereby increasing the chance of detecting low abundant proteins. High-abundance protein depletion is most commonly performed using antibodies toward one or several high-abundance proteins in plasma. This has almost become a standard first step in plasma proteomics and several comparative studies have been made evaluating a variety of depletion systems.3–6 10.1021/pr700821k CCC: $40.75
2008 American Chemical Society
research articles
Evaluation of Protein Based Plasma Prefractionation One of the most commonly used systems is the Multiple Affinity Removal System (MARS) from Agilent Technologies, which is also one of the techniques evaluated in this study. Although widely used, no evaluation has been done of the MARS 7 column, as all previous studies have focused on the MARS 6 column, and in addition, very few studies have analyzed both the depleted sample and the eluate from the column. The MARS 7 column is designed specifically for plasma and contains antibodies to remove Albumin, Transferrin, IgG, IgA, Antitrypsin, Haptoglobin and Fibrinogen (fibrinogen being present in plasma samples but not in serum samples). The main critique against depletion strategies has been that more proteins than aimed for are removed from the sample, partly due to unspecific binding to the column and antibodies, but also due to protein interactions with the removed proteins.4,7,8 Therefore, analysis of the eluate from the MARS 7 column is also included in this study. Affinity enrichment has been used to investigate different subproteomes within the plasma proteome, thereby performing a more targeted discovery. Recently, several studies have demonstrated good results using enrichment of glycosylated proteins in plasma.9–12 Other strategies include autoantibodies13–15 and proteins bound to albumin.7,16 Another, broader, affinity enrichment strategy is to use a combinatorial peptide library coupled with porous beads. The so-called ProteoMiner affinity beads (Bio-Rad) are designed to have combinatorial peptide ligand library with different affinities in equal proportions so that theoretically all proteins in an unknown sample will be sequestered by a bead.17–19 The beads contain equivalent binding capacity resulting in saturation of high-abundance proteins and concentration of low-abundance proteins, an approach that would be very appealing for plasma proteomics. At present, no thorough comparative study has been published using ProteoMiner beads, and in particular, no comparison with the MARS-7 depletion system or to the size fractionation described below has been made. Further, to our knowledge, no reproducibility studies showing coefficient of variation of this method have previously been reported. Different biophysical methods like chromatography,20–22 isoelectric focusing23–25 or size exclusion7,26,27 are also commonly used as prefractionation techniques. One of the big drawbacks with many of the biophysical methods is the generation of a large number of fractions, hence, increasing the number of samples to be analyzed. One way to circumvent this is to limit the analysis to the low molecular weight proteome, and a modified version of size-exclusion prefractionation is therefore evaluated in this study. Analyzing proteins below the kidney filtration limit (100 times with MilliQ grade water. Size Fractionation. Plasma was diluted 1:2 in 50 mM ammonium acetate, pH 6, and centrifuged at 10 000g for 10 min at 4 °C. Quadruplicates of 250 µL of diluted plasma were injected by partial loop injection and fractionated using a Superdex 200 10/300 GL column (GE Healthcare) at a flow rate of 400 µL/min. One milliliter fractions were collected from 20 to 60 min. Fractions were freeze-dried and dissolved in 100 µL of 50% acetonitrile, 0.1% trifluoro acetic acid (TFA) (fraction 1-8) or 100 µL of MilliQ grade water (fraction 9-18). Fractions 1-8 and 9-18, respectively, were pooled, generating two pooled fractions. Fraction dissolved in 50% acetonitrile, 0.1% TFA were centrifuged briefly and the supernatant was applied to a 30 kDa cutoff filter (Millipore) and centrifuged for 30 min at 10 000 rpm at 4 °C. Flow through was collected and freezedried and redissolved in MilliQ grade water. Pooled fractions 1-8 will hereafter be designated cutoff and pooled fractions 9-18 low. SDS-PAGE. Two micrograms of protein per fraction from each fractionation method, as well as crude plasma, was loaded on a 1.0 mm 12% Bis-Tris gel (Invitrogen). The gel was stained Journal of Proteome Research • Vol. 7, No. 7, 2008 2713
research articles using Silver Quest silver staining kit (Invitrogen) according to manufacturer’s instructions. SELDI-TOF. CM10 arrays (Bio-Rad) were equilibrated with 50 mM ammonium acetate, pH 4.5, 0.1% Triton X100. Five microliters of each ProteoMiner fraction was diluted in 45 µl of 50 mM ammonium acetate, pH 4.5, and 0.1% Triton X100. A total of 250 ng from each of the other samples was dissolved in 50 µL of 50 mM ammonium acetate, pH 4.5, and 0.1% Triton X100. All samples were then incubated on the CM10 array for 1 h at 4 °C with shaking. The spots were then washed with 50 mM ammonium acetate, pH 4.5, 0.1% Triton X100, and 50 mM ammonium acetate followed by a brief rinse with MilliQ grade water. To each spot was applied 1 µL of 50% saturated sinapinic acid in 50% acetonitrile and 0.5% TFA twice. The arrays were then analyzed in a ProteinChip Reader IIc (Bio-Rad), optimized for the detection of ions between 3000 and 10 000 Da averaging 225 shots per spot. Baseline reduction was performed for each spectrum after data collection. Data was calibrated using external calibration against the following calibrants: [Arg8]Vasopressin(1084.25 + 1H), Somatostatin (1637.9 + 1H), Dynorphin A [209-225] (porcine) (2147.5 + 1H), ACTH [1-24] (human)(2933.5 + 1H), Insulin B-chain (bovine)(3495.94 + 1H), Insulin (human recombinant) (5807.65 + 1H) and Hirudin BKHV(7033.61 + 1H). Mass accuracy of SELDI-TOF analysis is 10 kDa), SDS-PAGE was used (Figure 3). Two micrograms of each sample was applied to the gel. Lanes 3 and 4 shows eluate and flowthrough fractions from MARS-7 immuno-depletion. Presence of more than 7 bands in the eluate can be seen in lane 3, indicating that more than the seven proteins that should be depleted by the MARS-7 column are removed. However, this could also be due to removal of truncated versions of the seven proteins. When comparing the flowthrough (lane 4) and the eluate (lane 3) with the crude plasma (lane 2), one can clearly see the effect of the depletion, where several bands present in crude plasma and eluate are undetectable in flowthrough. Lanes Journal of Proteome Research • Vol. 7, No. 7, 2008 2715
research articles
Pernemalm et al.
Table 1. Approximate Yield from Each Fractionation Method, Calculated Based on Protein Concentration Measurements Using Dc Protein Assay (Bio-Rad) and Approximation of Volume method
load volume (µL)
load (mg)
yield (mg)
yield %
MARS-7 Eluate Flowthrough ProteoMiner NaCl Glycine Ethylene Glycol Organic Solvent Size fractionation Cutoff Low
60
6
900
90
125
12.5
4.28 3.7 0.58 1.90 0.34 1.18 0.24 0.14 0.64 0.54 0.1
71 62 10 2.12 0.38 1.31 0.27 0.16 5 4 1
5-8 show the ProteoMiner beads fractions. It is noteworthy that the fractions show presence of a high number of thin sharp bands, indicating an even distribution of protein concentrations. However, some overlap between the individual ProteoMiner beads fractions can also be seen. Last two lanes (9 and 10) with the two fractions from size fractionation clearly show presence of proteins above 30 kDa, indicating incomplete size cut off. Overlap between the lanes can also be detected, especially in the mass range between 17 and 62 kDa. To assess compatibility with a LC-MS/MS workflow, all samples where analyzed using both direct LC-MALDI TOF/TOF as well as GeLC-MALDI-TOF/TOF. In the GeLC-MALDI-TOF/ TOF workflow, each sample was first separated on a protein level using SDS-PAGE. Each lane was cut into five sections, followed by in-gel digestion and peptide elution. Peptides were separated by reversed phase chromatography over a linear gradient and spotted onto a MALDI target. An ABI 4800 MALDI-TOF/TOF instrument was used for MS/MS analysis and all peptides with S/N > 100 were chosen for fragmentation. Mass accuracy was calculated for all nonredundant peptides (g95% confidence limit) from the GeLC-MS/ MS experiment as suggested in ref 29. Average mass accuracy
Figure 3. SDS-PAGE showing protein distribution. (1) Molecular weight marker, (2) crude plasma, (3) MARS-7 eluate, (4) MARS-7 flow through, (5) ProteoMiner beads fraction eluted with NaCl, (6) ProteoMiner beads fraction eluted with Glycine, (7) ProteoMiner beads fraction eluted with Ethylene Glycol, (8) ProteoMiner beads fraction eluted with Organic solvent, (9) size fractionation cut off, (10) size fractionation low. 2716
Journal of Proteome Research • Vol. 7, No. 7, 2008
over the entire experiment was 68 ppm. Summary of the number of identified proteins is shown in Figure 4. Comparing the direct LC-MS/MS results with the results from the GeLC-MS/MS experiment, on average, about 4 times as many proteins are identified in the latter experiment. Most pronounced was the increased number of proteins identified from the size fractionation method, in particular from the low fraction where 123 proteins were identified in the GeLC-MS/ MS experiment as compared with 14 in the direct LC-MS/MS analysis. The cutoff fraction generated 46 identified proteins and in a merged search the two fractions generated 135 identified proteins. The number of identified proteins in the ProteoMiner fractions increased from between 14 and 28 to between 77 and 93. A merged search of the four fractions generated a total number of 150 identified proteins. The fact that the merged number of identified proteins in the size fractionation and the ProteoMiner beads is less than the sum of identities in the individual fractions is in line with the overlap seen on the gel and in the SELDI-TOF experiment. MARS-7 showed the smallest difference in number of identified proteins between the direct LC-MS/MS and the GeLC-MS/MS, about twice as many in the flowthrough (56 versus 116) and only 32 versus 45 in the eluate. Analyzed together, a total number of 138 proteins were identified from the MARS-7 column. A list of all proteins identified in this study is available as Supporting Information. Another important aspect regarding the compatibility of the prefractionation method with downstream analysis is yield. The amount of material needed for the prefractionation and how much protein can one expect post prefractionation are important parameters in any proteomics experiment. Yield was calculated based on protein concentration measurements using Dc Protein Assay (Bio-Rad) and approximation of volume. Yield from ProteoMiner beads was calculated based on commercially available spin cartridge kit (Bio-Rad). Results from the yield calculations are shown in Table 1. The yield (in mg) from each of the methods is quite similar, ranging from 0.1 to 1.18 mg except for the eluate from the MARS-7 column which contains 3.7 mg. The high amount of protein in the eluate is expected as it should contain seven extremely high-abundance plasma proteins. The load volume on the other hand differs significantly; for ProteoMiner, 900 µl of plasma is needed, for size fractionation 125 µL and for MARS-7 60 µL. Assessment of Reproducibility. In this study, quantitative reproducibility was evaluated for all prefractionation methods using SELDI peak intensity. MARS 7-immuno depletion and size fractionation were performed four times and ProteoMiner fractionation was performed three times. All replicates were analyzed with SELDI-TOF. Average coefficient of variation (CV) was calculated for all fractionation methods based on the ion intensities of the peaks detected in SELDI-TOF. All peaks detected were used in the CV calculations. For MARS-7, immuno depletion, size fractionation and crude plasma the reproducibility were also calculated based on iTRAQ reporter ion intensities. Every nonredundant peptide (g95% confidence limit) containing the iTRAQ labels were included in the CV calculation. Number of SELDI-TOF peaks, MALDI-TOF/TOF precursors and CVs are shown in Table 2. As a reference sample, unfractionated crude plasma showed 19% CV at low molecular weights and 23% at high molecular weights in SELDI-TOF. We have previously reported experimental SELDI CVs on CM10 arrays between 18 and 19%35 consistent with the present results. Using iTRAQ reporter ion
research articles
Evaluation of Protein Based Plasma Prefractionation
Figure 4. LC-MS/MS and GeLC-MS/MS summary. Number of identified proteins (95% confidence limit) on the y-axis and fractionation method on the x-axis. Table 2. Number of SELDI Peaks and MALDI Precursors Used in the CV Calculationsa SELDI low mass range
high mass range
iTRAQ
method
n
peaks
CV
peaks
CV
n
precursors
CV
Crude MARS-7 Eluate Flowthrough ProteoMiner NaCl Glycine Ethylene Glycol Organic Solvent Size frac. Cutoff Low
8 4 4 4 3 3 3 3 3 4 4 4
39
19
37
23
4
748
10
17 20
65 24
18 29
42 30
4 4
1000 1199
39 24
37 41 46 41
27 23 12 18
41 42 42 45
26 30 13 20
n.a. n.a. n.a. n.a.
n.a. n.a. n.a. n.a.
n.a. n.a. n.a. n.a.
67 35
55 22
40 34
52 26
4 n.a.
508 n.a.
80 n.a.
a Table showing the average CV (%) for each of the prefractionation methods. Calculations are based on peak intensity values in the replicates. Number of peaks used in the SELDI calculations is shown in ‘peaks’ columns. Low mass range 3-10 kDa, high mass range 10-150 kDa. Number of precursors (95% confidence limit) used in the iTRAQ calculations are shown in ‘precursor′ column.
intensities, 10% CV was measured for crude plasma, highlighting the difference in reproducibility between the two analytical methods. Particularly high variability (