Evaluation of Shotgun Sequencing for Proteomic Analysis of Human

human growth hormone (hGH) at a clinically relevant level (5 ug/L). The isotope coded ... Institute of Clinical Neuroscience, Department of Neurochemi...
0 downloads 0 Views 757KB Size
Evaluation of Shotgun Sequencing for Proteomic Analysis of Human Plasma Using HPLC Coupled with Either Ion Trap or Fourier Transform Mass Spectrometry Shiaw-Lin Wu,*,† Gargi Choudhary,† Margareta Ramstro1 m,‡ Jonas Bergquist,‡,§ and William S. Hancock| ThermoFinnigan, 355 River Oaks Parkway, San Jose, California, Institute of Chemistry, Department of Analytical Chemistry, Uppsala University, Uppsala, Sweden, Institute of Clinical Neuroscience, Department of Neurochemistry, Sahlgrenska University Hospital, Go¨teborg University, Mo¨lndal, Sweden, and Barnett Institute, and Department of Chemistry, Northeastern University, Boston, Massachusetts Received February 26, 2003

This paper reports on studies directed to the characterization of the proteome of human plasma by the shotgun sequencing approach, namely the use of HPLC coupled to mass spectrometry (MS). The report will present data from two laboratories that allows the comparison of peptide and protein identifications by either accurate mass measurement on a Fourier transform mass spectrometry or MS/MS fragmentation on an ion trap mass spectrometer. Because the dynamic range of the protein components of plasma is one of the largest for a biological sample, the analysis of such a challenging sample was aided by the use of these two MS approaches. The major classes of proteins observed were transport proteins, enzymes, and enzyme inhibitors, blood-clotting factors, membrane-associated proteins including soluble forms of receptors, hormones, immunoglobulins, and other glycoproteins. The protein identifications were also highly consistent with results obtained from 2D gel studies, although a larger number of additional proteins were observed with the shotgun sequencing approach. The quantitation of low to medium level proteins was explored in the ion trap with an add-back of a known amount of human growth hormone (hGH) at a clinically relevant level (5 ug/L). The isotope coded affinity tag (ICAT) approach was used to quantitate successfully different levels of hGH in replicate analysis via the disulfide linked tryptic peptide (T6-T16). These studies suggest that the shotgun sequencing approach can be used to characterize part of the plasma proteome and serve as a starting point for the use of multidimensional analytical approaches for the analysis of complex biological samples. Keywords: human plasma proteins • ICAT • differential quantitation • human growth hormone • Ion Trap • LC-MS • FTMS • proteomics • 2D-LC • multidimensional seperation

Introduction A goal of this report and other studies is to begin to estimate the size of the plasma proteome, which may contain tens of thousands of proteins and to guide innovation of improved platforms for proteomic analysis. In addition to this complexity, any proteomic study will be further challenged by a variety of post-translational modifications present in plasma proteins. Phosphorylation has been identified as a key post-translational modification (PTM) in intracellular studies of membrane associated protein pathways. In plasma, glycosylation will also be of major significance, particularly with its close association with secretion pathways in the liver and other organs. †

ThermoFinnigan. Institute of Chemistry, Department of Analytical Chemistry, Uppsala University. § Institute of Clinical Neuroscience, Department of Neurochemistry, Sahlgrenska University Hospital, Go¨teborg University. | Barnett Institute, and Department of Chemistry, Northeastern University. * Corresponding author. ‡

10.1021/pr034015i CCC: $25.00

 2003 American Chemical Society

One popular approach to complexity of the plasma proteome is to use a high resolution 2D gel, which has the ability to resolve several thousand components, including various isoforms of the same protein.1,2 The effectiveness of such an analysis to identify low level proteins by mass spectrometry through in-digestion is limited, however, by a restricted loading range.3 Another approach is to remove high abundance proteins with specific affinity columns, which contain ligands such as antibodies, lectins or other moieties. The removal of albumin and other transport proteins may, however, also remove specifically bound ligands.4,5 Such columns may not remove fragments or variants that do not contain the selected binding site used to isolate the abundant protein.6 Another concern is the presence of nonspecific interactions in the affinity column which may inadvertently remove nontargeted proteins, as was observed in the removal of growth hormone from human plasma by a soluble receptor affinity column.7 Thus, prefractionation steps add complexity to the proteomic study as well as raise issues of recovery of certain components. For these Journal of Proteome Research 2003, 2, 383-393

383

Published on Web 06/12/2003

research articles reasons, this study has explored the potential of direct analysis of plasma by several LC/MS approaches. In this and related studies,7-10 reversed phase HPLC (RPLC) was used in all single dimension separations, whereas in 2D separations the second dimension was ion exchange chromatography, which will prefractionate the peptide mixture by charge before the hydrophobic separation step (RPLC). The use of a second dimension of chromatography for peptide fractionation has been shown to provide improved characterization of low level proteins in plasma,6,11,12 as the complexity of the peptide mixture being ionized at a given time point in the LC/ MS measurement is reduced. This study will explore analysis of an enzyme digest of unfractionated plasma using either one or two dimensions of HPLC as well as peptide identification by either accurate mass (Fourier transform mass spectrometry) or MS/MS fragmentation patterns (ion trap mass spectrometry). The results will be compared with protein identifications from 2D gel studies to reveal a high degree of complementarity between all these approaches for identification of major plasma proteins. The LC/MS approaches were, however, able to detect significant numbers of lower level proteins that could not be observed from MS studies of 2D gel spots, whereas the 2D gel approach allowed the visualization of families of protein isoforms. This study will also explore the challenge of quantitation of lowlevel proteins with the ICAT approach using human growth hormone (hGH) as a model system.

Methods Accurate Mass Measurement Approach by Fourier Transform Mass Spectrometry Approach. Human Plasma Sample Preparation. Human plasma samples were collected from healthy blood donors. A volume of 10 mL of EDTA-blood was centrifuged at 400 ×g for 10 min (21 °C), and the plasma was aliquotted into 1-mL fractions and kept at -80 °C until further processing. Prior to the tryptic digestion, 4 µL (approximately 250 µg total protein) of the thawed sample was centrifuged to dryness in a small Eppendorf tube, using a Speevac system ISS110 (Savant Holbrook, NY). Tryptic Digestion of Proteins. The dry protein pellet was dissolved in 250 µL of 8 M urea, 0.4 M NH4HCO3 followed by addition of 25 µL of 45 mM dithiothreitol. The mixture was incubated at 50 °C for 15 min and cooled to ambient temperature. After incubation, 25 µL of 100 mM iodoacetamide was added, and the mixture was incubated for an additional 15 min in darkness. After the second incubation, 700 µL of deionized water was added together with 5% (w/w) trypsin. The digestion was performed at 37 °C overnight in darkness. Desalting. A volume of 20 µL of the tryptic digested sample, corresponding to 5 µg total protein, was desalted either on a C18 (Biobasic, 2.1 mm. × 2 cm, ThermoHypersil) or on a ZipTip C18 column (Millipore Corporation, Bedford, USA) using the following protocol: The tip was first wetted in 50 µL of 50% acetonitrile (ACN), and equilibrated with 50 µL of 1% acetic acid (HAc). The sample was acidified to a concentration of 2.5% HAc, after which the peptides were adsorbed on the media using 30 repeated cycles of sample loading. The tip was washed using 5 × 10 µL of 1% HAc, and after that, the peptides were eluted in 10 µL of 50% ACN, 1% HAc followed by 10 µL of 100% ACN. This procedure was repeated twice for every 20 µL of sample, resulting in a total volume of the eluate of 40 µL. After desalting, the eluate was vacuum centrifuged to dryness. The peptides were redissolved in 20 µL of 1% HAc. 384

Journal of Proteome Research • Vol. 2, No. 4, 2003

Wu et al.

Packed Capillary Liquid Chromatography. The experimental setup is described in detail elsewhere.14 Reversed-phase liquid chromatography was performed using a 10 cm long inhouse packed C18 column, I. D. 200 µm. The packing material used was ODS-AQ, L5 µm (YMC Europe GmbH, Schermbeck, Germany). Two HPLC-pumps (JASCO 1580, JASCO Japan) delivered a mobile phase program, using solvent A:ACN:H2O: HAc (5:94.5:0.5, v:v:v) and solvent B: ACN:H2O:HAc (94.5:5:0.5, v:v:v) at a flow rate of approximately 1 µL/min over column. The program started with isocratic elution 100% A for 10 min, followed by gradient elution 100-50% A for 54 min, then 500% A during 6 min. A volume of 10 µL of the sample was injected onto the column using a six-port injector valve (Valco Instruments Co. Inc., Schenkon, Switzerland). The separated peptides passed a UV-detector before they were electrosprayed on-line to a Bruker Daltonics BioAPEX -94e 9.4 T Fourier transform ion cyclotron resonance mass spectrometer (Bruker Daltonics, Billerica MA).15 The UV detector was used for optimization studies with standard peptide digests. The emitter end of the capillary was mechanically tapered, and was “Black Dust” (polyimide-graphite) coated to form a sheathless electrospray emitter.16 The emitter end was inserted in an in-house designed ESI interface that consisted of a 22-gauge stainless steel needle mounted into a Valco connector. The connector was fitted into a brass plate in the Analytica atmospherevacuum interface (Analytica, Branford, CT). FTICR Mass Spectrometry and Data Analysis. Experiments were performed on a Bruker Daltonics BioAPEX-94e FTICR mass spectrometer with a passively shielded 9.4 T superconducting magnet equipped with an Analytica (Branford, CT) ESI source. In the experiments described in this paper, the Analytica atmosphere-vacuum interface was used together with the home-built sprayer described above. Primary data analysis was performed on a PC running XMASS. In total, 256 spectra were collected during the experiment, and each spectrum was recorded for 10 s. Good mass calibration is vital in peptide mapping of complex mixtures. The mass spectra were initially externally mass-calibrated, and tryptic peptides from HSA were identified. The spectra were then internally calibrated using 6 HSA fragments distributed throughout the chromatographic separation and mass-to-charge window. After calibration, the calibrants differed from the theoretically calculated values by less than 5 ppm. An AURA macro was written that automatically calibrated, apodized by time-domain convolution with a sine square function, and generated peaklists from the 256 spectra. The experimental masses were reduced into isotopic clusters using the ESIMSA routine [www.angstrom.uu.se/ ionphysics/software.html], and then compared using the inhouse written program DATACOMP to masses from a database with almost 200 putative proteins16 digested in silico by the DIG program [www.angstrom.uu.se/ionphysics/software.html]. In DIG, all cysteines were assumed to be carbamidomethylated and 1 missed cleavage site was allowed. The experimental masses were not allowed to differ from the theoretical values by more than 7 ppm. Putative plasma proteins were those that had a high sequence coverage (more than 7 peptides identified. Ion Trap MS/MS Analysis Approach. HPLC and Mass Spectrometry Measurements. The HPLC separation was performed on a Surveyor LC system (ThermoFinnigan, San Jose, CA). The flow rate was maintained at 150 µL/min before splitting and at 1.5 µL/min after the flow split. The gradient was started at 2% AcCN dissolved in 0.1% formic acid for 3 min, then ramped to 60% AcCN in 180 min, and finally ramped

research articles

Proteomic Analysis of Human Plasma

to 80% AcCN for another 20 min. The trypsin-digested sample (in a 20 µL sample loop) was injected from the autosampler (using the no-waste mode) onto a C-18 capillary column for MS analysis. A ThermoHypersil C-18 column (Biobasic, 180 µm × 10 cm) was connected to ion source chamber (orthogonal) with a sheath gas flow at 3 units. The temperature of the ion transfer tube was set at 140°C. The spray voltage was set at 3.6 kV and the normalized collision energies were set at 35% for MS/MS. Dynamic exclusion was used at an exclusion duration for 5 min. Data dependent ion selection was set to trap the interesting precursor ions (based on predicted masses for hGH tryptic fragment ions) from the previous MS/MS scan. Bioinformatics. The sequences of the uninterpreted CID spectra were identified by correlation with the peptide sequences present in the nonredundant protein sequence database (OWL Version 30.3) using the SEQUEST algorithm (Version C1) incorporated into the ThermoFinnigan BioWorks software (Version 3.1).17-19 The SEQUEST search results were initially assessed by examination of the Xcorr (cross correlation) and ∆Cn (delta normalized correlation) scores. The Xcorr function measures the similarity between the mass-to-charge ratios (m/ z) for the fragment ions predicted from amino acid sequences obtained from the database, and the fragment ions observed in the MS/MS spectrum. The ∆Cn score is obtained by normalizing the Xcorr values to 1.0 and observing the difference between the first- and second-ranked amino acid sequences.17 Thus, the ∆Cn score discriminates between high quality and noisy spectra although both may match a theoretical spectra. As a general rule, an Xcorr value of greater than 2.5 for triply, 2.0 for a doubly charge and >1.5 for singly charge ions and ∆Cn greater than 0.1, was accepted as a positive identification.17,20 Manual inspection key spectra were performed to confirm the SEQUEST result. BioWorks 3.1 is a new version of TurboSequest in which the three matching factors (Sp, Xcorr, and ∆Cn) are used to construct a unified ranking score.21,22 ICAT Procedure. A sample of human plasma containing femtomole-level hGH (5 mg/mL) was divided into two aliquots (one original aliquot and the other was diluted 3-folds from the original). Both samples were dissolved in 6M GnHCl, reduced with DTT and alkylated (the original one with D0-ICAT, the 3-fold diluted one with D8-ICAT), and buffer exchanged.23 Note: the hGH was reduced, and ICAT-labeled separately. Then added the D0-hGH to the D0-aliquot of plasma, added D8hGH to the D8-aliquot of plasma. The samples were then combined, digested with trypsin, desalted, and injected on a strong cation exchange column (BioBasic SCX, ThermoHypersil), and then step-eluted with NH4Cl steps of increasing molarity onto alternating reversed phase columns.22 All of the 2D LC-MS/MS operations were performed automatically with a ProteomeX Workstation. The HPLC separation was performed on a Surveyor LC system (ThermoFinnigan, San Jose, CA). The flow rate was maintained at 150 uL/min before splitting and at 1.5 uL/min after the flow split. The 2D separations were performed on a ProteomeX system, which comprises an auto sampler, two HPLC pumps, a ten-port column-switching valve and Deca XP Plus ion-trap mass spectrometer with a microelectrospray interface. The 10-port valve allows loading of a subsequent ion-exchange fraction onto the second reversed phase column, whereas the first one is performing LC-MS/ MS analysis. The micro electrospray interface is composed of a 30-µm metal needle that is orthogonal to the inlet of the iontrap mass spectrometer. For the capillary separations a flow

Table 1. List of 8 Proteins (the high to medium abundance proteins) along with hGH (the low abundant protein) were Differential Displayed Using the Similar Approach as Shown in Figure 2Ca differential quantitation 1st plasma pool (D0)

theoretical experimental

serum albumin lgG gamma-1 chain C region lgG kappa chain C region lgG lambda chain C region tubulin beta-4Q chain human interleukin-16 ARP 2/3 complex (P20-ARC) SCF_Human ligand precusor (C-kit ligand) ave growth hormone

3

2nd plasma pool (D8) 1

3.13 2.78 2.78 3.51 3.63 3.37 3.01 2.78

1 1 1 1 1 1 1 1

3.12 ((0.35 or 11.2%) 2.3

1 1

a As shown in the table, the differential quantitation of hGH is still within 2 standard deviation error as compared the other proteins.

rate of 1-2 µL/min was used. In the first step, a strong cation exchanger (BioBasic SCX, 0.32 mm × 10 cm, ThermoHypersil, Allentown, PA) was used (salt steps of 0, 40, 100, 150, 200, 250, 300, and 500 mM ammonium chloride) and then a reversed phase (BioBasic C18, 300 Å, 5 µm silica, 180 um × 10 cm, ThermoHypersil, Allentown, PA) capillary columns was used for the second dimension. Relevant hGH peptides (cyscontaining peptides bound with D0 or D8 ICAT) were identified using MS/MS spectra. Differential quantitation was performed by calculation of ratios of these peptides with XPRESS, embedded in BioWorks 3.1 software, using MS spectra (see Table 1).

Results and Discussion Identification and Quantitation of a Low Level Plasma Protein, Human Growth Hormone, Using the Isotope Coded Affinity Tag (ICAT). The quantitation of individual proteins in a proteomic study is of importance in allowing differential studies and a number of approaches have been published.3,31-6 An advantage of the ICAT technique as a quantitative approach is that it produces a tagged cysteine-peptide, which can be isolated by affinity chromatography, thus greatly simplifying the complexity of the proteomic analysis.31 The large dynamic range of plasma proteins is not altered significantly by such a process, as the majority of plasma proteins contain cysteine residues and thus produce the corresponding ICAT-labeled peptide. However, the removal of nontagged peptides in the affinity step may remove information about post-translational modifications in regions of the protein not covered by the Cyspeptide. Human growth hormone, hGH is present in plasma at a level of a few micrograms per liter and is significantly below the level of the more abundant plasma proteins (Tables 2 and 3). It was, therefore, decided to perform a model study on the quantitation of hGH without the affinity isolation step normally used with the ICAT reagent, as an example of a lower level protein in plasma. A sample of plasma containing femtomole-level hGH was divided into two aliquots, one original aliquot and the other was diluted 3-fold from the original, before tagging with the ICAT reagent (note: hGH was reduced, and two aliquots were labeled with ICAT separately. The D0-hGH was then added to the D0-aliquot of plasma, and D8-hGH to the D8-aliquot of plasma). After combination of the two aliquots, the digested Journal of Proteome Research • Vol. 2, No. 4, 2003 385

research articles

Wu et al.

Table 2. List of Plasma Proteins that Were Identified in Common 2D Gel Studiesa and Compared with the LC/MS Studies with Either Ion Trap MS/MS Mass Spectrometry or FTICR-MSb

a Results from the website: www.expasy. b See the Experimental Section for details of LC and MS measurements and ref 27. c Green is used to denote protein identifications that are common to all three proteomic approaches, while blue denotes a missing identification with one of the HPLC approaches. The uncolored box denotes a missing protein identification.

386

Journal of Proteome Research • Vol. 2, No. 4, 2003

research articles

Proteomic Analysis of Human Plasma Table 3. Comparison of the Results of 2D Gel and LC/MS Identification of Plasma Proteins

a The proteins listed in this column were identified by both LC/MS approaches, proteins identified only by ion-trap MS are denoted by * and only by FTMS with **. The number of peptides identified by the ion trap study are given in parentheses b The plasma concentration are approximate average values obtained form the scientific literature and in the interests of space the extensive bibliography is not listed but can be obtained form the authors on request. c The 2D gel identifications are a typical result obtained from a single high-resolution gel followed by MS identification of enzyme digests of extracted gel spots.

peptides were resolved by 2D chromatography (ion exchange with 8 salt steps followed by reversed phase HPLC) and the results were compared (Figure 1). The tryptic peptides eluted from 2-D separation were identified by BioWorks (SEQUEST) with the consensus report software, which assigned appropriate peptides to the parent protein. Thus, plasma proteins in the sample were identified and listed with the corresponding peptide sequence coverage. A partial screen image of the list was shown in Figure 2A to target results related to the protein of interest, hGH. As shown in Figure 2A, 5 peptides eluted with different salt fractions (2 from 0 mM, 1 from 40 mM, and 2

from 100 mM) were identified and assigned to hGH. Because there were 5 unique peptides were found, the confidence of the assignment is high. In addition, the ICAT labeled, cysteinecontaining peptide (NYGLLYCFR) was identified as a doubly charge ion with the Xcorr of 3.318, as shown in the list. Figure 2B shows the MS/MS spectrum of this cysteine-containing peptide (NYGLLYCFR). In this spectrum the major intensity ions were well matched (e.g., the y3, y4, and y5 ions at 1+ charge state) and this result further confirmed the assignment. In this measurement, the mass spectrometer was set to perform a MS scan and then 3 rounds of MS/MS measureJournal of Proteome Research • Vol. 2, No. 4, 2003 387

research articles Table 3 (Continued)

388

Journal of Proteome Research • Vol. 2, No. 4, 2003

Wu et al.

Proteomic Analysis of Human Plasma

research articles

Figure 1. 2D separation of tryptic digest of a mixture of reduced plasma proteins that contained an add-back of human growth hormone (hGH) at the level of approximately 5 ug/L. In addition the disulfide containing peptides had been tagged with the ICAT reagent (D0 and D8, 3-fold difference) to allow quantitative measurements. The tryptic digest of the mixture of plasma proteins (approximately 100 ug) were loaded onto a capillary BioBasic SCX column and then step eluted with NH4Cl solutions of increasing concentration onto a BioBasicC18 LC column. The peptides were eluted with a gradient of 0.1% formic acid/acetonitrile for LC-MS analysis (see the Experimental Section for details). The resulting elution profiles are shown for the following NH4Cl concentrations (from the top) 0, 40, 100, 150, 200, 250, 300, and 500 mM.

ments. The ICAT-containing peptides were identified by BioWorks (SEQUEST) using the MS/MS spectra as shown in Figure 2A and B. The differential quantitation was performed by calculating the ratios of the paired peptides (e.g., peak area of D0 peptide Vs. D8 peptide) using their MS spectra. As shown in Figure 2C, the quantitation was performed by extracting (integrating) the m/z ion containing D0 (the top profile with the dark peak area - m/z of 795.5 +2) and D8 (the bottom profile with the dark peak area - m/z of 799.5 +2) in the region where the peptide was identified. The MS/MS spectra was used to identify the ICAT-containing peptides and the MS spectra to determine the ICAT ratio at that region of the chromatogram (e.g. 30 scans before and after the location). By this approach, we can both identify and differentially quantitate plasma proteins. Table 1 shows a list of 8 proteins (the high to medium abundant proteins) along with hGH by the above approach. As shown in the table, the differential ratio of hGH was 2.3 instead of the theoretical value of 3.0. The differential quantitation of abundant plasma proteins was consistent with the expected value (average of 3.12). Nevertheless, the differential

quantitation of hGH was still within a 2 standard deviation error as compared these abundant proteins. In the context of these results, it should be noted that using this shotgun approach can analyze a sample dynamic range of approximately 40 000.6 The observed range is limited by the resolving power of the separation, which introduces a time constraint in performing MS/MS measurements of minor peaks in a flowing system. In another study6 using different HPLC or gel based strategies, it was found that low levels of hGH in a complex plasma sample resulted in the identification of only a single or a few peptides, although the 2D HPLC method used here gave the best result with the identification of 5 peptides. Apparently, the same constraints are observed in these quantitation studies with the observation of a single ICAT-peptide (e.g., only T16 peptide was detected in this analysis out of a possible of 4 such peptides in hGH) and may represent an approximate limit to such measurements in complex samples without improvement in sample handling or LC/MS protocols (e.g., sensitivity). Issues of Characterization of Major vs Minor Plasma Proteins. The disparate functions of blood range from bulk Journal of Proteome Research • Vol. 2, No. 4, 2003 389

research articles

390

Journal of Proteome Research • Vol. 2, No. 4, 2003

Wu et al.

Proteomic Analysis of Human Plasma

research articles

Figure 2. (A) Tryptic peptides eluted from the 2-D separation shown in Figure 1 were identified by BioWorks (SEQUEST) with a consensus report software, which assigned peptides to the parent protein. For example, there were 5 peptides eluted from different salt fractions were identified and assigned to growth hormone (hGH) as shown in the figure with the label and arrow. The identified proteins with their assigned peptide sequences, molecular weights, charge states, X correlation, and delta Cn scores were displayed in the figure. (B) The MS/MS spectrum of the cysteine-containing peptide of Growth Hormone as shown in Figure 2A was displayed with the matched y and b ions (note: some a ions were also matched as shown but they were not taken into account for the Xcorr scores). As shown in the figure, the major intensity ions were matched (e.g., the y3, y4, and y5 ions at 1+ charge state). (C) The quantitation of femtomole levels of human growth hormone (hGH) in plasma by the ICAT approach. In this measurement, the mass spectrometer was set to perform a MS and then 3 rounds of MS/MS measurements. The ICAT-containing peptides were identified by BioWorks (SEQUEST) using the MS/MS spectra as shown in Figure 2, parts A and B. The differential quantitation was performed by calculating the ratios of the two paired peptides (e.g., peak area of D0 peptide vs D8 peptide) using their MS spectra. As shown in the figure, the quantitation was performed by extracting (integrating) the m/z ion containing D0 (e.g., the top profile with the dark peak area - m/z of 795.5 +2) and D8 (e.g., the bottom profile with the dark peak area - m/z of 799.5 +2) at the region in the chromatogram where the peptide was identified (( 30 scans).

processes such as supply of nutrients to tissues and organs to the transport of highly active factors such as lymphokines and hormones.24 It is not surprising, therefore, that the range of concentration of blood proteins is estimated as greater than 1010sfrom albumin to low level proteins such interleukin 6 (>50 g/L to 5 ng/L respectively). Such a large dynamic range is a concern with a shot-gun sequencing study where an enzymatic digestion is performed on an unfractionated protein mixture before the LC/MS measurement. This complexity can prevent the identification of a given low level peptide due to overloading effects interfering with the separation, suppression of ionization of a given peptide or insufficient time for the mass spectrometer to analyze the MS/MS spectrum of a given peptide in a flowing separation. As shown in the previous section, we used a two vs onedimensional separation in which an ion exchange column is coupled to a reversed phase column, which has been shown8-11,22 to significantly increase the number of protein identifications

vs a single dimension of reversed phase HPLC. In this study, the 2D approach does indeed give an increase in preliminary protein identifications over a corresponding 1D separation (448 vs 260 respectively), but at the expense of a longer analysis (24 vs 6 h). The quality of protein identification is also often improved by the observation of increased sequence coverage (more peptides per protein).6,8-11 As may be expected either an increase in the number of ion exchange steps (5 to 10) or length of the reversed phase gradient (1 to 2 h) increased the number of peptide identifications (data not shown). The number of initial protein identifications (448) is comparable to the study of Adkins et al.11 where 490 serum proteins were identified in a 2D HPLC system in which 60 ion exchange fractions were separated in a 80 min RPLC step. The other approach explored in this study, LC-FTICR has a potential advantage of greater speed of analysis (approximately 70 min/ sample in this study) as well as providing orthogonal peptide Journal of Proteome Research • Vol. 2, No. 4, 2003 391

research articles identification (accurate mass in full MS mode vs MS/MS fragmentation). The long lists generated by both these studies are not particularly informative and are not reported here except in the context of evaluating the analytical technology. A common approach, however, is to sort the identified proteins by variables such as function, concentration, disease state, commonality with other fluids of the same species. In this study and in terms of plasma function the following groups of proteins were observed: transport (albumin, apolipoproteins AI, II, III, IV, B, CI, CII, III, D, E), transferrin, transthyretin, ceruloplasmin, hemopexin, myoglobin and hemoglobin alpha, beta, haptoglobulin); immunoglobulins (A, D, E, G, M, macroglobulin alpha and beta2, Bence Jones protein); glycoproteins (alpha 1 and 2-, beta 1 and 2- acidglycoprotein, fetuin, orosomucoid 1 and 2; coagulation factors (fibrinogen alpha, beta, gamma, properdin, thrombin, Factors III, IX, V, VI, VII, VIII, X, XI, XII, and XIII, clusterin); complement c1 (q,r,s), 2, 3, 4a,b, 5, 6, 7, 8, 9; enzymes (angiotensinogen, amylase, aspartate aminotransferase, carbonic anhydrase, creatine kinase, enolase, glutathione Stransferase, lactate dehydrogenase, lactoperoxidase, lysozyme, superoxide dismutase, matrix metalloproteinase, NADH dehydrogenase, pepsinogen, glutathione peroxidase, plasminogen, prostaglandin D synthase, prostaglandin dehydrogenase, prostaglandin isomerase, paraoxonase, telomerase); inhibitors (antiplasmin, anti-chymotrypsin, anti-thrombin, anti-trypsin inhibitor, protein kinase C inhibitor); hormone/growth factors (choronic gonadotropin, interleukin 6, tumor growth factor, vitronectin, epidermal growth factor precursor, vascular endothelial growth factor (VEGF), platelet-derived growth factor: structural/membrane associated (actin, tropomyosin, fibulin, myosin, calcium binding protein, retinal binding protein, interleukin 2 receptor). Comparison of the Results of Plasma Analysis by 2D Gels and LC-FTICR/LC-Ion Trap. Table 2 presents a compilation of 43 proteins identified in 2D-gel studies25 and compares this list with the corresponding identifications by the two LC/MS approaches. Whereas in other studies additional spots have been characterized, such as prothrombin, antiplasmin, and angiotensinogen,1 the results shown here are typical of individual 2D gel studies. Table 2 shows a high degree of commonality between the different studies, particularly if one allows for differences in methodology or sample variability. For example, the ion trap studies did not identify actin, alphafetoprotein, C-reactive protein, and glutathione S-transferase. Correspondingly, the FTMS study did not identify endothelian-1 and paraoxonase. Although the issues of variability in the detection of low level plasma proteins will be addressed later in this paper, these results clearly demonstrate the complementary nature of the different approaches. In an effort to examine the dynamic range of these proteomic measurements, Table 3 lists the plasma concentrations for 95 high and medium abundance proteins that have been quantitated in nonproteomic studies by a variety of methodologies. All of the proteins in Table 3 were identified in one or both of the two LC/MS approaches. The proteins identified in standard 2D gel studies are noted in the last column of this Table by the red box. As could be expected, the proteins common to all three studies are predominately of high to medium abundance (see the top section of Table 3, above 500 mg/L). Also the LC/ MS identifications were in most cases achieved with a large number of peptide identifications, ranging from 44 peptides in the case of albumin to 4 peptides in the case of properdin. 392

Journal of Proteome Research • Vol. 2, No. 4, 2003

Wu et al.

In Table 3, ferritin is the lowest level protein (concentration of 20 µg/L) listed for purposes of the comparison. This protein, however, must be considered a medium abundance protein relative to previously characterized factors such as endothelin (100 ng/L), tumor necrosis factor (12 ng/L), islet amyloid polypeptide (13 ng/L), and interleukin 6 (2 ng/L). The intermediate fraction (down to 20 mg/L) contained only partial identifications by 2D gels. In the bottom part of the Table 3 (below 20 µg/L), the listed proteins were only identified in the LC/MS studies and with significant complementarity between the two MS methods as each approach identified some different proteins. In this preliminary study, the results from two independent laboratories are presented with a description of procedures used for the statistical analysis of the data (see the Experimental Section). The issues associated with integration of data sets from the two different MS platforms will be addressed in future studies on a hybrid platform which combines an ion trap with a FTMS. It is not the intention of Table 3 to claim the detection of novel plasma proteins as all of the proteins listed in the Table have been reported and quantitated in previous studies (see footnotes to Table). It would require an extensive study to confirm the identifications listed in this Table as 45 out of a total of 95 of the identifications are listed by a single peptide. Such observations can be linked to the difficulty of characterizing low level proteins in such a complex mixture as has been described elsewhere.6 However, 24 of the 45 protein identifications were made with both the ion trap and FTMS, whereas 21 identifications were made with only one platform. It can be anticipated that new hybrid systems such as the Ion TrapFTMS will greatly aid the identification of low level peptides by allowing the augmentation of MS/MS measurements with high mass accuracy measurements on precursor or fragment ions. The observation of a low level protein is often via a single peptide, where the detection is dependent on factors such as elution of the peptide in a relatively open retention time window and on the ability of the mass spectrometer to detect the peptide. Such factors can be used to explain the different observations between the two LC/MS approaches. As was shown in another study,6 in an analysis where a low level protein is detected by a single peptide, the preliminary assignment should be followed by alternative strategies such as sample pretreatment which allow the observation of additional peptides. In this study, a lower limit of MS detection for a focused study of a plasma protein was established via an addback procedure where human growth hormone (hGH) was characterized down to the level of 5 ug/L in plasma by the LC/ MS/MS approach, which is slightly below the level of ferritin shown in Table 3. Although the add-back procedure may skew the observation of low abundant proteins in a complex mixture, it does provide an internal standard for exploring for the limit for ICAT quantitation using this LC/MS approach. The characterization of additional proteins for shotgun sequencing vs 2D gels in Table 3 is consistent with several reports6,8,9,11 that have indicated that the LC/MS approach is able to identify more low level proteins in a complex sample than the 2D gel approach, albeit with a lower degree of sequence coverage.1 In typical 2D gel studies26 most of the observed proteins are present as a family of spots indicating a substantial degree of heterogeneity caused by post-translational modifications. Such information is difficult to obtain by a LC/MS study of an enzymatic digest, which typically exhibits low sequence cover-

research articles

Proteomic Analysis of Human Plasma

age for an individual protein in a complex mixture. The high abundance proteins listed in Table 3 that are not identified in the gel studies chosen for reference purposes25 include complement factors B, H, c4a, vitamin D binding protein, glycine-rich 1-beta-glycoprotein, inter-alpha-trypsin inhibitor, and alpha1-beta glycoproteins. While alternative strategies can extend the dynamic range of 2D gel studies,3 one cause for the lack of ready identification of these proteins is the presence of complex post-translational modifications (PTM), such as glycosylation. In studies of purified protein isoforms, such as in the study of rDNA derived glycoproteins, it has also been shown27,28 that such complexity also greatly complicates full characterization by peptide mapping. In the shotgun sequencing approach to proteomics, however, the characterization is much less sensitive to this heterogeneity. The digestion of the protein mixture, before the HPLC separation, will generate for a given protein many non modified peptides at a much greater concentration than a PTM modified peptide, and the protein identification can be based on the analysis of such peptides. It has been noted, however, that other approaches are needed to detect post-translational modifications in low level proteins by the shotgun sequencing approach, such as isolation of protein complexes by specific affinity steps prior to digestion.8,9

Conclusions This study has explored the characterization of the plasma proteome by LC/MS analysis of a tryptic digest of unfractionated plasma with peptide identification by either accurate mass measurement or MS/MS results. Such studies can produce a large number of spectra for interpretation and give the opportunity for an initial exploration of the plasma proteome. Also such a protocol has the advantage of minimal sample manipulation, which has the potential of reducing contamination with adventitious proteins as well as the loss of minor components. The quantitation of a low level protein by the ICAT procedure demonstrated that the LC/MS procedure could be used for the relative quantitation of lower level proteins. This preliminary study can serve as a guide to methodological improvements necessary for the more complete evaluation of plasma proteomics so that the lower levels proteins can be both identified and characterized. The limit of this current study was approximately 5ug/L for the test protein hGH which contrasts with the concentration of around 5ng/L for known low level proteins such as the interleukins. Improvements such as removal of high abundance proteins, enrichment of low level protein families, higher selectivity and sensitivity HPLC and mass spectrometry will be necessary before the detection of rare plasma proteins will be achievable. Another benefit of improved methodologies will be increased sequence coverage of the identified proteins, which will improve the certainty of a given assignment as well as improve the possibilities of characterization of post-translational modifications.

Acknowledgment. The authors are grateful for the following collaborations: software development for FTICR (Dr. Magnus Palmblad) and Ion Trap (Jim Shoftstahl, and Guanghui Wang), the guidance of Dr. Ian Jardine, Prof. Per Håkansson and Prof. Karin Markides. Also the authors recognize the assistance of Dr. Iain Mylchreest and the other members of Product Development. The financial support of Knut and Alice Wallenberg, the Swedish Society for Medical Research, and the Swedish Research Council (Grant 13123) is acknowledged.

Jonas Bergquist has a senior research position at the Swedish Research Council (VR). This manuscript is contribution number 823 from the Barnett Institute, Northeastern University. Note Added after ASAP Posting: This manuscript was originally published on the Web (06/12/2003) with an incomplete version of Table 3. The complete version was published on the Web (08/01/2003) and in print (08/01/2003).

References (1) Anderson, N. L.; Anderson, N. G. Proc. Nat. Acad. Sci. 1977, 74, 5421-25. (2) Malmstrom, L.; Malmstrom, J.; Marko-Varga, G.; WestergrenThorsson, G. J. Proteome Res. 2002, 2, 135-38. (3) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R.; Nature Biotechnology 1999, 17, 994-999. (4) Scopes, R. K. Protein Purification: Principles and Practice, 3rd ed.; Springer-Verlag: New York, New York, 1994. (5) Apffel, A.; Chakel, J, A.; Hancock, W. S.; Souders, C.; M’Timkulu, T.; Pungor, E. J. Chromatogr. 1996, 734, 35-42. (6) Wu, S.-L.; Amato, H.; Biringer, R.; Choudhary, G.; Shieh, P.; Hancock, W. S. J. Proteome Res. 2002, 1, 459-465. (7) Battersby, J. E.; Mukku, V. R.; Clark, R. G.; Hancock, W. S. Analytical Chemistry 1995, 67, 447-455. (8) Haynes, P. A.; Yates, J. R., III. Yeast 2000, 17, 81-87. (9) Gygi, S. P.; Rist, B.; Griffin, T. J.; Eng, J.; Aebersold, R. J. Proteome Res. 2002, 2, 47-54. (10) Delinsky, D. C.; Gries, K. D. J. Proteome Res. 2002, 3, 279-84. (11) Adkins, J. N.; Varnum, S. M.; Auberry, K. J.; Moore, R. J.; Angell, N. H.; Smith, R. D.; Springer, D. L.; Pounds, J. G. Mol. Cell Proteomics 2002. (12) Smith, R. D. Int. J. Mass Spectrometry 2000, 200, 509-44. (13) Wetterhall, M.; Palmblad, M.; Hakansson, P.; Markides, K. E.; Bergquist, J. J. Proteome Res. 2002, 4, 361-367. (14) Ramstro¨m, M.; Palmblad, M.; Markides, K. E.; Håkansson, P.; Bergquist, J. Proteomics, in press. (15) Palmblad, M.; Håkansson, K.; Håkansson, P.; Feng, X.; et al. Eur. J. Mass Spectrom. 2000, 6, 267-275 (16) Nilsson, S.; Wetterhall, M.; Bergquist, J.; Nyholm, L.; Markides, K. E. Rapid. Commun. Mass Spectrom. 2001, 15, 1997-2000 (17) Eng, J. K.; McCormack, A. L.; Yates, J. R., III. J. Am. Mass. Spectrom. 1994, 5, 976-989. (18) Ducret, A.; Van Oostveen, I.; Eng, J. K.; Yates, J. R., III.; Aebersold, R. Protein Science 1998, 7, 706-719. (19) Tabb, D. L.; McDonald, W. H.; Yates, J. R. J. Proteome Res. 2002, 1, 21-26. (20) Sadygov, R. G.; Eng, J.; Durr, E.; Saraf, A.; McDonald, H.; MacCoss, M. J.; Yates, J. R., III. J. Proteome Res. 2002, 3, 211-16. (21) ThermoFinnigan Product Bulletin, B-1033, Feb 2002, “Xcalibur BioWorks 3.0”. (22) Hancock, W. S.; Choudhary, G.; Wu, S.-L.; Shieh, P. American Laboratory 2000, 32, 20-22. (23) Gygi, S. P.; Corthals, G. L.; Zhang, Y.; Rochon, Y.; Aebersold, R. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 9390-9395. (24) Anderson, N. L.; Anderson, N. G. Mol. CellProteomics 2002, 1, 845-867. (25) Swiss Institute of Bioinformatics, see http://us.expasy.org/ch2d. (26) Anderson, N. L.; Anderson, N. G. Electrophoresis 1991, 12, 883906. (27) Hancock, W. S.; Apffel, A. J.; Chakel, J. A.; Hahnenberger, K.; Choudhary, G.; Traina, J.; Pungor, E. Anal. Chem. 1999, 742A748A. (28) Udiavar, S.; Apffel, A.; Chakel, J.; Swedberg, S.; Hancock, W. S.; Pungor, E. Anal. Chem. 1998, 70, 3572-3578. (29) Sell, S. Clin. Lab. Medicine 1990, 10, 27-31. (30) Martell, R. E.; Xu, F. J.; Davis, W. Z.; Anselmino, L.; Yu, Y. H.; Daly, L.; Bast, R. C. Int. J. Biol. Markers 1998, 13, 145-9. (31) Gygi, S. P.; Rist, B.; Griffith, T. J.; Eng, J.; Aebersold, R. J. Proteome Res. 2002, 1, 47-54. (32) Reynolds, K. J.; Yao, X.; Feneslau, C. J. Proteome Res. 2002, 1, 2734. (33) Zhang, R.; Regnier, F. J. Proteome Res. 2002, 2, 139-148. (34) Chelius, D.; Bondarenko, P. J. Proteome Res. 2002, 4, 317-24. (35) Jiang, H.; English, A. M. J. Proteome Res. 2002, 4, 345-50. (36) Liu, P.; Regnier, F. J. Proteome Res. 2002, 5, 443-50. (37) Bergquist, J.; Palmblad, M.; Wetterhall, M.; Håkansson, P.; Markides, K. Mass Spectrom. Rev. 2002, 21, 2-15.

PR034015I Journal of Proteome Research • Vol. 2, No. 4, 2003 393