High-Sensitivity Analysis of Human Plasma Proteome by Immobilized Isoelectric Focusing Fractionation Coupled to Mass Spectrometry Identification Cheng-Jian Tu, Jie Dai, Su-Jun Li, Quan-Hu Sheng, Wen-Jun Deng, Qi-Chang Xia, and Rong Zeng* Research Center for Proteome Analysis, Key Lab of Proteomics, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China Received December 24, 2004
Immobilized pH gradients isoelectric focusing (IPG-IEF) is the first dimension typically used in twodimensional gel electrophoresis (2-DE). It can also be used on its own in conjunction with tandem mass spectrometry (MS/MS) for the analysis of proteins. Here, we described a strategy combining isoelectric focusing in immobilized pH gradient strips, and mass spectrometry to create a new highthroughput and sensitive detection method. Protein mixture is separated by in-gel IEF, then the entire strip is cut into a set of gel sections. Proteins in each gel section are digested with trypsin, and the resulted peptides are subjected to reversed-phase high performance liquid chromatography followed by electrospray-linear ion-trap tandem mass analysis. Using this optimized strategy, we have identified 744 distinct human proteins from an IPG strip loaded only 300 µg of plasma proteins. When compared with other works in published literatures, this study offered a more convenient and sensitive method from gel to mass spectrometry for the separation and identification proteins of complex biological samples. Keywords: plasma • immobilized pH gradients • isoelectric focusing • proteomics • linear ion trap mass spectrometry
Introduction The prefractionation techniques in proteome analysis could offer a strong step forward in “mining below the tip of the iceberg” for detecting the “unseen proteome” and had been a major focus of proteome researchers.1 Several fractionation strategies were selected including liquid chromatography based on reverse phase or ion exchange, liquid isoelectric focusing and so on.1,2 The isoelectric focusing can separate proteins according to their pI values, and has become one of the major fractionation methods. However, most of the current isoelectric focusing fractionation depend on liquid basis.2-5 In recent years, a resurgence of interest in immobilized pH gradient gels (IPG strips) has occurred because of its excellent resolution, high loading capacity, and good reproducibility as the first dimensional.6-8 Immobilized pH gradient (IPG) gels were originally introduced in the early 1980s as an alternative to carrier ampholyte tube gels. The ability to rehydrate these gels in variety of chaotropes or detergents such as thiourea and CHAPS, respectively, should eliminate many of the problems of protein precipitation in its pI position and ensure the excellent resolution. In addition, positional reproducibility of protein spots in two-dimensional polyacrylamide gel electrophoresis using immobilized pH gradient isoelectric focusing in the first dimension were discussed by Corbett, J, M.9 In our * To whom correspondence should be addressed. Research Center for Proteome Analysis, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 YueYang Road, Shanghai, 200031. Tel: +86-21-54920170. Fax: +86-21-54920171. E-mail:
[email protected]. 10.1021/pr0497529 CCC: $30.25
2005 American Chemical Society
lab, similar work was done, which indicates that the positional variability in the IEF dimension is 0.73 mm.10 So immobilized pH gradient isoelectric focusing in the first dimension ensures good reproducibility of protein separation. Finally, these IPG strips can accept higher protein loads (up to 5 mg) than carrier ampholyte gels without a significant loss of resolution.8 Also, IPG-IEF can be used on their own for the analysis of proteins.11,12 Isoelectric focusing in immobilized pH gradient gels followed by mass spectrometry was rarely used in proteome analysis and needs optimization. Giorgianni et al., using isoelectric focusing in an immobilized pH gradient gel followed by mass spectrometry, identified only 127 proteins from 750 µg of human pituitary proteins.13 Human blood plasma is a complex body fluid and is believed to harbor thousands of proteins which originate from a variety of cells and tissues through either active secretion or leakage from blood cells or tissues.14 It has an extraordinary dynamic range in that more than 10 orders of magnitude in relative abundance separate albumin and the rarest proteins now measured clinically.15 Many proteins have been functionally characterized and associated with disease processes.16-21 Nowadays, researchers place most of their attention on the human plasma proteome because of the central role plasma plays in clinical diagnostics. However, the complexity and enormous dynamic range of the liquid makes plasma the most difficult specimen to deal with, and provides major challenges in proteomics. The most commonly used conventional proteomics technology platform consists of two-dimensional electrophoresis (2-DE), mass Journal of Proteome Research 2005, 4, 1265-1273
1265
Published on Web 06/18/2005
research articles spectrometry, and bioinformatic analysis. Using this 2-DE based approach, recently Pieper et al. identified 325 distinct proteins from 1800 distinct serum protein spots.14 These relatively poor results are caused by the following disadvantages: very high abundance of a few proteins, high heterogeneity of many proteins resulting in long charge trains, crowding of 2-DE separated protein spots in the molecular mass range between 45 and 80 kD and in the isoelectric point range between 4.5 and 6.22 Therefore, other high-throughput proteomic technologies were developed to address these problems and identify less abundant proteins in human plasma on a large scale basis. For example, Anderson et al. merged four different views of the human plasma proteome, based on different methodologies, into a single nonredundant list of 1175 distinct gene products;23 Shen et al. identified 800 and 1682 proteins from a total of 365 µg of human plasma, using high efficiency nanoscale reversed-phase liquid chromatography and strong cation exchange LC to obtain ultrahigh-efficiency separations in conjunction with tandem mass spectrometry.24 In this study, we applied isoelectric focusing in immobilized pH gradient gels followed by mass spectrometry to analyze the human plasma proteome. Excellent resolution of IPG-IEF and effective peptide extraction from IPG gel sections ensure the sensitive analysis of plasma proteins. The peptide identification was done by the comparison of MS/MS and predicted spectra under strict criteria (Xcorr score tryptic peptides +1 g1.9, +2 g2.2, +3 g3.75, ∆Cn g0.1, and SP rank e 4) using BioWorks 3.0 software (Thermofinnigan). It is a powerful and convenient method that has resulted in the identification of 744 human plasma proteins of plasma from ∼300 µg (or ∼3 µL) material, without the need for depletion of high-abundant serum albumin or immunoglobulin. This also indicates that the ingel IEF-LC-MS/MS is high-throughput and sensitive enough to analysis complicated samples such as human plasma proteins, over a dynamic range of 10 orders of magnitude in relative abundance. When compared with other works in published literatures, we have provided a more powerful and convenient method for the separation and identification of human plasma proteins or other complicated samples.
Experimental Section Chemicals and Materials. Unless stated otherwise, all reagents and chemicals were of the highest purity available. Water was purified using a Milli-Q system (Millipore, Bedford, MA). Formic acid was from Sigma-Aldrich (St. Louis, MO). Trypsin (sequencing grade) was obtained from Promega (Madison, WI). 1, 4-dithiotreitol (DTT), Acryamide, Agarose, ammonium bicarbonate, Bis, 3-[(3-cholamidopropyl) imethylamino]-1-propane sulfonate (CHAPS), glycine, iodoacetamide, SDS and Tris were from Bio-Rad (Hercules, CA). Acetonitrile (ACN) and trifluoroacetic acid (TFA) were from Merck (Darmstadt, Germany). Methanol was from Fisher, Fair Lawn (NJ, UK). Trypsin was from Promage (USA). Bromphenol blue, glycerol, and IPG Buffer 3-10 NL were from Amersham Biosciences (Piscataway, NJ). Human Plasma Preparation. The human blood plasma was obtained from a healthy female donor (ages 27, O type), provided by the Shanghai Blood Station. Initial protein concentration of ∼100 mg/mL of plasma was determined using the Bradford method, after which the plasma sample of 300 µg was directly added to the rehydration buffer (8M Urea, 2%CHAPS, 18mM DTT, 0.5%IPG Buffer, Bromophenol blue 1266
Journal of Proteome Research • Vol. 4, No. 4, 2005
Tu et al.
trace) followed by isoelectric focusing in immobilized pH gradient strips with the IPGphor system (Amersham Pharmacia Biotech). Isoelectric Focusing and SDS-PAGE Analysis. 2DE was performed with the IPGphor system (Amersham Pharmacia Biotech) and PROTEAN xi II system (BioRad, Hercules, CA), as previously described.25,26 300 µg of plasma proteins were run with IEF method using an 18 cm pH 3-10 NL ReadyStrip (Amersham Pharmacia Biotech). IEF parameters for the separation were 70 µA/strip at 20 °C with a rehydration step for 12 h followed 500 V for 1 h, 1000 V for 1 h, and 8000 V for 8 h. The total Vh was about 60 000. SDS-PAGE was run at a constant current of 10 mA/gel for 20 min, the current was then switched to 25 mA/gel until the bromophenol blue frontier reached 0.5 cm of the bottom of the gel. The proteins were detected with silver staining. The gel was scanned using a GS710 imaging densitometer (BioRad) and analyzed by PDQuest software (BioRad). In Gel Digestion. The in-gel digestion was performed using the following protocol. To reduce disulfide bonds, the IPG strip was performed in 1% DTT in SDS equilibration buffer (50 mM Tris-Cl pH 8.8, 6 M urea, 30% glycerol, 2% SDS, bromophenol blue) for 15 min. This step was followed by alkylation of the free sulhydryl groups by 2.5% iodoacetamide in a SDS equilibration buffer for another 15 min in the dark at room temperature. Then the IPG strip was cut into 18 gel sections (each section about 1.0 cm in length). Each gel section was washed three times alternately with ACN and 100 mM ammonium bicarbonate. During the last wash the gel slices were incubated in 100 mM ammonium bicarbonate for 15 min at temperature 4 °C. The gel slices were dried by vacuum centrifugation and allowed to swell in a 50 µL trypsin solution containing trypsin (20 µg/mL) and ammonium bicarbonate (50 mM) at 4 °C for 45 min. After adding another 50 µL of trypsin solution, the gel slices was kept at 37 °C for 20 h. The supernatant was transferred to another vial, and the gel slices were extracted for 15 min three times by 0.1% formic acid in 60% ACN. The recovered peptide solutions were dried by vacuum centrifugation and desalted and cleaned using a Ziptip (Millipore, Corp., Bedford, MA). LC-MS/MS Analysis. The peptide mixtures from each section of the IPG strip were separated by Reverse phase HPLC followed by tandem mass analysis. RP-HPLC was performed on a surveyor LC system (Thermo Finnigan, San Jose,CA). The C18 column (RP, 180 µm × 150 mm) was obtained form Column Technology Inc. (Fremeont, CA,). The pump flow was split 1:120 to achieve a column flow rate of 1.5 µL/min. Mobile phase A was 0.1% formic acid in water, and mobile phase B was 0.1% formic acid in acetonitrile. The tryptic peptide mixtures were eluted using a gradient of 2-98% B over 180 min. The MS/MS was performed on a LTQ linear ion trap mass spectrometer (Thermo Finnigan, San Jose, CA) equipped with an electrospray interface and operated in positive ion mode. The capillary temperature was set to 170 °C and the spray voltage was at 3.4 kV. Normalized collision energy was at 35.0%. Automatic gain control was used to obtain maximal signal of each scan. The mass spectrometer was set so that one full MS scan was followed by ten MS/MS scans on the 10 most intense ions. Dynamic Exclusion was set at repeat count 2, repeat duration 30 s, exclusion duration 90 s. Database Searching. The acquired MS/MS spectra were searched against the IPI human database using BioWorks 3.0
Human Plasma Proteome by IPG-IEF Coupled to MS
software (Thermofinnigan) on an 8 node Dell PowerEdge 2650 cluster. An accepted SEQUEST result had a ∆Cn score of at least 0.1 (regardless of charge state), a value known for high confidence in a SEQUEST search.27,28 All output results were combined together using homemade software named Build Summary (Manuscript submitted to Proteomics) to delete the redundant data. To make sure that the MS/MS spectrum was of good quality with fragment ions clearly above baseline noise, we referred to the parameters reported in previous studies29-31 and applied stricter criteria for peptide identification. Peptides were validated after meeting the following criteria: The SEQUEST cross-correlation score must be g1.9 for a +1 tryptic peptide, g2.2 for a +2 tryptic peptide and g3.75 for a +3 tryptic peptide. In addition, ∆Cn cutoff values were g0.1 and the SP rank of the peptides e4. With these specific filter criteria, the peptides there showed some continuity to the b or y ion series, so we are fairly confident of peptides identified with these strict filter criteria. Bioinformatics Analysis. The theoretical isoelectric point (pI) was defined by the pI table and the algorithm used in Compute MW/pI program in Expasy (www.expasy.ch). Molecular weights (MW) values of proteins were computed through average amino acid weight table. The grand average hydrophobicity (GRAVY) values were calculated as the arithmetic mean of the sum of the hydropathic indices of each amino acid.32 The TMHMM version 2.0 algorithm was used to predict protein transmembrane (TM) regions.33 The total number of TM helices predicted per sequence was reported for each protein sequence. When a predicted TM region overlapped a predicted signal sequence, this was interpreted as a signal sequence only. The SignalP version 3.0 was used to process a signal peptide/nonsignal peptide prediction based on a combination of several artificial neural networks (NT) and hidden Markov models (HMM) algorithms.34-36
research articles
Figure 1. 2-DE map of the human plasma proteins. The human blood plasma was obtained from one healthy female donor (ages 27, O type), provided by Shanghai Blood Station. 300 µg plasma proteins were separated by IEF on immobilized pH gradients 3-10 NL in the first dimension. They were then separated by SDS-PAGE in the second dimension. Finally, the proteins were detected with silver staining.
Results and Discussion Fractionation of Human Plasma by IPG-IEF. Here we studied isoelectric focusing of the human plasma proteins using IPGphor system with 18 cm pH 3-10 NL ReadyStrip. Conventional 2-DE PAGE by ammonia silver stain37 detected the effect of IEF as shown in Figure 1. As expected, the proteins in the mixture migrated to a position in the established pH gradient equivalent to their respective isoelectric point (pI), which behaved in the 2-DE map as an individual spot. The immobilized pH gradient gels (IPGstrips) have demonstrated good reproducibility, high loading capacity, and excellent resolution. We tested the reproducibility of this in-gel IEF-LC-MS/MS method with the parallel gel sections from two 18 cm pH 3-10 NL ReadyStrips isoelectric focusing as above. We selected the gel sections at the same position in two parallel experiments and identified 169 unique proteins from the two sections. 52(61.54%) distinct proteins were both identified in the two gel sections. To evaluate the reproducibility of in-gel IEF-LCMS/MS, we tested the reproducibility of LC-MS/MS with tryptic mixture of plasma proteins by using 300 µg of plasma proteins in each time. 367 unique proteins were identified from the two experiments. Only 125(68.12%) unique proteins were both identified from the two experiments. The reproducibility of in-gel IEF-LC-MS/MS was close to the reproducibility of LC-MS/MS and very acceptable. We also discovered that the adjacent gel sections had overlap of identified proteins, which indicated that a lot of plasma proteins had several isoforms with various pI values and the most abundant proteins were
Figure 2. Number of identified proteins from each strip sections. About 93 ( 15 proteins were detected in each section of the IPG strip.
not distributed in a single gel sections. In 2-DE map, there are trains of high heterogeneity proteins and horizontal stripes of the abundant proteins (Figure 1). Human Plasma Protein Identification. In this study, we used a 2D linear ion trap for the identification of human plasma proteome. The data produced by linear ion trap mass spectrometry was filtered using both Xcorr and ∆Cn for the tryptic digested peptides to achieve a low false-positive rate.31 Compared with previous work,13,38 we applied stricter criteria for peptide identification. Using the accepted reliable criteria for peptide search (Xcorr g1.9 for a +1 full tryptic peptide, g2.2 for a +2 full or partial tryptic peptide, and g3.75 for a +3 full or partial tryptic peptide, ∆Cn g0.1), a total of 2255 unique peptides were identified, corresponding to 1142 distinct human proteins. 201 (17.6%) proteins were identified with 2 or more unique tryptic peptides. To further restrict the number of falsepositives, Shen et al.24 used neural network (NET) to remove between 10% and 23% of proteins identified by SEQUEST after a two-dimensional separation based on LC-RPLC-MS/MS. Journal of Proteome Research • Vol. 4, No. 4, 2005 1267
research articles
Tu et al.
Table 1. Secreted and Transmembrane Proteins of the Human Plasma rank
#1
#16
#17
#108
#131 #137 #145
#162 #213 #224 #229
#231 #264
#270
#334
#341 #346
#354 #385
#391
1268
reference
IPI:IPI00022229.1|SWISS-PROT:P04114|REFSEQ_NP: NP_000375|TREMBL:P78479;Q13779;Q9UMN0;P7848 0;P78481;Q9UE53;Q13828;Q9UE52;Q13786;Q9UE51; Q13788;Q7Z7Q0;Q7Z600|ENSEMBL:ENSP000002332 42 Tax_Id)9606 Apolipoprotein B-100 precursor IPI:IPI00021891.5|SWISS-PROT:P02679-1|REFSEQ_N P:NP_068656|ENSEMBL:ENSP00000336829 Tax_Id)9606 Splice isoform γ-B of P02679 Fibrinogen γ chain precursor IPI:IPI00292530.1|SWISS-PROT:P19827|REFSEQ_NP: NP_002206|ENSEMBL:ENSP00000273283 Tax_Id)9606 Inter-R-trypsin inhibitor heavy chain H1 precursor IPI:IPI00218746.1|SWISS-PROT:P02746|REFSEQ_NP: NP_000482|ENSEMBL:ENSP00000313967 Tax_Id)9606 complement component 1, q subcomponent, β polypeptide precursor IPI:IPI00163446.1|SWISS-PROT:P01880|TREMBL:Q8 WU38;Q8NF20|ENSEMBL:ENSP00000312052 Tax_Id)9606 Hypothetical protein IPI:IPI00107117.1|SWISS-PROT:P23284|REFSEQ_NP: NP_000933|TREMBL:Q9BVK5|ENSEMBL:ENSP0000 0300026 Tax_Id)9606 Peptidylprolyl isomerase B IPI:IPI00177869.4|SWISS-PROT:O14791-1|REFSEQ_N P:NP_663319;NP_003652|ENSEMBL:ENSP000002161 78;ENSP00000317625 Tax_Id)9606 Splice isoform 1 of O14791 Apolipoprotein L1 precursor IPI:IPI00017648.1|SWISS-PROT:P09848|REFSEQ_NP: NP_002290|ENSEMBL:ENSP00000264162 Tax_Id)9606 Lactase-phlorizin hydrolase precursor IPI:IPI00001432.1|SWISS-PROT:Q9Y5E7|REFSEQ_NP: NP_061759|ENSEMBL:ENSP00000194155 Tax_Id)9606 Protocadherin β 2 precursor IPI:IPI00410429.1|REFSEQ_NP:NP_997234 Tax_Id)9606 hypothetical protein FLJ33674 IPI:IPI00066317.1|SWISS-PROT:Q96RL6-1|REFSEQ_N P:NP_443116|ENSEMBL:ENSP00000293386 Tax_Id)9606 Splice isoform 1 of Q96RL6 Sialic acid binding Ig-like lectin 11 precursor IPI:IPI00401164.1|REFSEQ_XP:XP_376600;XP_37981 8 Tax_Id)9606 similar to RAS p21 protein activator 4 IPI:IPI00304504.5|REFSEQ_NP:NP_071931|TREMBL: Q9H6V2;Q96DM9;Q9NTA7;Q8WU83;Q8WX60|ENSE MBL:ENSP00000297886 Tax_Id)9606 sushi domain containing 1 IPI:IPI00295832.1|SWISS-PROT:P23515|REFSEQ_NP: NP_002535|ENSEMBL:ENSP00000247271 Tax_Id)9606 Oligodendrocyte-myelin glycoprotein precursor IPI:IPI00409566.1|SWISS-PROT:P98161-1|REFSEQ_N P:NP_000287|TREMBL:O75276|ENSEMBL:ENSP0000 0262304 Tax_Id)9606 Splice isoform 1 of P98161 Polycystin 1 precursor IPI:IPI00023197.1|SWISS-PROT:Q92729 Tax_Id)9606 Receptor-type protein-tyrosine phosphatase U precursor IPI:IPI00001592.1|SWISS-PROT:Q14956|REFSEQ_NP: NP_002501|TREMBL:Q8N1A1|ENSEMBL:ENSP00000 258733 Tax_Id)9606 Putative transmembrane protein NMB precursor IPI:IPI00328716.1|REFSEQ_NP:NP_849161|TREMBL: Q86UE6|ENSEMBL:ENSP00000295057 Tax_Id)9606 Leucine-rich repeat transmembrane neuronal 1 protein IPI:IPI00001893.2|SWISS-PROT:O60245-1|REFSEQ_N P:NP_002580|ENSEMBL:ENSP00000303175 Tax_Id)9606 Splice isoform A of O60245 Protocadherin 7 precursor IPI:IPI00011651.1|SWISS-PROT:P23470|REFSEQ_NP: NP_002832|TREMBL:O60420|ENSEMBL:ENSP000002 95874 Tax_Id)9606 Protein-tyrosine phosphatase γ precursor Journal of Proteome Research • Vol. 4, No. 4, 2005
peptide hits
unique peptides
sequence coverage (%)
MW (kDa)
pI
no. of TMHs
183
55
19.90
515562.59
6.61
1
187
18
47.46
51511.61
5.37
1
86
18
30.52
101389.05
6.31
1
6
3
20.95
26703.68
8.83
1
6
2
8.03
62967.29
6.84
1
4
2
12.04
23742.45
9.42
1
3
2
9.55
44026.34
5.99
2
2
2
1.66
218603.27
5.9
1
3
1
2.88
87254.05
4.76
1
3
1
5.94
44109.61
5.07
1
2
1
2.62
74544.47
7.57
1
2
1
3.89
38884.32
6.36
1
2
1
2.68
82709.9
6.02
1
2
1
4.77
49607.7
8.06
1
1
1
0.35
462569.52
6.27
10
1
1
1.40
160228.14
6.31
2
1
1
3.21
62643.13
6.17
1
1
1
1.53
58613.61
7.01
1
1
1
1.96
116104.97
5.05
2
1
1
1.31
162058.69
6.02
1
research articles
Human Plasma Proteome by IPG-IEF Coupled to MS Table 1. (Continued) rank
reference
#398
IPI:IPI00295525.4|SWISS-PROT:Q9H3T3-1|REFSEQ_ NP:NP_115484|ENSEMBL:ENSP00000301292 Tax_Id)9606 Splice isoform 1 of Q9H3T3 Semaphorin 6B precursor IPI:IPI00375552.1|SWISS-PROT:P58335-1 Tax_Id)9606 Splice isoform 1 of P58335 Anthrax toxin receptor 2 precursor IPI:IPI00024566.1|REFSEQ_NP:NP_064541|TREMBL: Q96QH4;Q9BTU1;Q9NS00|ENSEMBL:ENSP00000223 122 Tax_Id)9606 Core1 UDP-galactose: N-acetylgalactosamine-R-R β 1,3-galactosyltransferase IPI:IPI00375836.1|REFSEQ_NP:NP_945185 Tax_Id)9606 cancer-associated nucleoprotein IPI:IPI00023152.3|SWISS-PROT:Q9UQQ1-1|REFSEQ_ NP:NP_005459|ENSEMBL:ENSP00000301884 Tax_Id)9606 Splice isoform 1 of Q9UQQ1 N-acetylated-R-linked acidic dipeptidase like protein IPI:IPI00007709.1|SWISS-PROT:Q9UKQ2-1|REFSEQ_ NP:NP_068548 Tax_Id)9606 Splice isoform 1 of Q9UKQ2 ADAM 28 precursor IPI:IPI00006372.1|SWISS-PROT:Q9Y6I9|REFSEQ_NP: NP_057010|ENSEMBL:ENSP00000296478 Tax_Id)9606 Putative secreted protein ZSIG11 precursor IPI:IPI00026237.1|SWISS-PROT:P20916|REFSEQ_NP: NP_002352|TREMBL:Q15489|ENSEMBL:ENSP000002 62624 Tax_Id)9606 Myelin-associated glycoprotein precursor IPI:IPI00003773.2|REFSEQ_NP:NP_009192|TREMBL: Q9UBK4;O95100;Q9HD97|ENSEMBL:ENSP00000301 570 Tax_Id)9606 CMRF-35-H9 IPI:IPI00302641.1|SWISS-PROT:Q9NYQ8|REFSEQ_N P:NP_001438|ENSEMBL:ENSP00000261800 Tax_Id)9606 Protocadherin Fat 2 precursor IPI:IPI00033569.1|REFSEQ_NP:NP_054799|TREMBL: Q9NRD8|ENSEMBL:ENSP00000267837 Tax_Id)9606 NADPH thyroid oxidase 2 IPI:IPI00021010.1|SWISS-PROT:Q9NYJ7|REFSEQ_NP: NP_058637|TREMBL:Q8NBS4|ENSEMBL:ENSP0000 0205143 Tax_Id)9606 δ-like protein 3 precursor IPI:IPI00395859.1|SWISS-PROT:Q9BZ76-2|TREMBL: Q96MJ5|REFSEQ_XP:XP_372088|ENSEMBL:ENSP00 000338896 Tax_Id)9606 Splice isoform 2 of Q9BZ76 Contactin associated protein-like 3 precursor IPI:IPI00240956.4|REFSEQ_XP:XP_290385;XP_37984 0|ENSEMBL:ENSP00000333166 Tax_Id)9606 similar to solute carrier family 29 (nucleoside transporters), member 4 IPI:IPI00140833.2|TREMBL:Q96Q04|REFSEQ_XP:XP_ 055866|ENSEMBL:ENSP00000270238 Tax_Id)9606 lemur tyrosine kinase 3 IPI:IPI00215836.1|SWISS-PROT:Q9Y5H1-1|REFSEQ_ NP:NP_061738 Tax_Id)9606 Splice isoform 1 of Q9Y5H1 Protocadherin γ A2 precursor IPI:IPI00178212.1|ENSEMBL:ENSP00000305526 Tax_Id)9606 IPI:IPI00215835.1|SWISS-PROT:Q9Y5H2-1|REFSEQ_ NP:NP_061737 Tax_Id)9606 Splice isoform 1 of Q9Y5H2 Protocadherin γ A11 precursor
#412
#420
#485 #487
#557
#573
#605
#611
#617
#621
#657
#659
#675
#677
#710
#721 #723
Here, we added the criteria SP rank e4 for peptide search and found that 35% proteins were removed as false positives. And the percent of proteins identified with 2 or more unique tryptic peptides was increased to 21.8%. In short, using this strict criteria(Xcorr g1.9 for a +1 full tryptic peptide, g2.2 for a +2 full or partial tryptic peptide and g3.75 for a +3 full or partial tryptic peptide, ∆Cn g 0.1 and SP rank e4), a total of 1779 unique peptides were identified in this work, corresponding
peptide hits
unique peptides
sequence coverage (%)
MW (kDa)
pI
no. of TMHs
1
1
2.14
95284.88
8.83
1
1
1
2.25
53692.31
7.42
1
1
1
3.86
42202.83
6.17
1
1
1
2.72
84674.27
8.83
1
1
1
1.22
80510.95
5.28
1
1
1
2.32
87208.64
6.46
1
1
1
3.83
34188.7
4.79
1
1
1
3.04
69068.77
4.97
1
1
1
4.32
33658.48
5.47
1
1
1
0.21
479392.93
5.01
1
1
1
1.10
175364
8.02
6
1
1
1.94
7.86
1
1
1
1.30
8.2
1
1
1
3.10
8.71
5
1
1
1.28
157098.2
4.83
2
1
1
1.82
101484.6
4.85
1
1
1
4.62
49201.53
5.98
1
1
1
1.82
101543.12
4.81
2
64618.14
143252.9
49470.44
to 744 distinct human proteins listed in Supplemental Table 1 in the Supporting Information. And 162 (21.8%) proteins were identified with 2 or more unique tryptic peptides. Compared with another work of our lab,39 using 2DLC fractionation of proteins, only 147 proteins were both founded in both work. 138 (10.7%) proteins were identified with 2 or more unique tryptic peptides through the same criteria for peptide identification in their work. Both the number and the percent of Journal of Proteome Research • Vol. 4, No. 4, 2005 1269
research articles
Figure 3. Percent of identified peptides in each section of the most abundant proteins in human plasma such as serum albumin precursor, apolipoprotein B-100 precursor, serotransferrin precursor, haptoglobin precursor, R-1-antitrypsin precursor, fibrinogen β chain precursor and Ig γ-4 chain C region.
proteins identified with 2 or more unique tryptic peptides were higher in this study, which made the proteins identified in this study more reliable.
Tu et al.
The number of identified proteins from each strip section is shown in Figure 2. About 93 ( 15 proteins were detected in each section of the IPG strip. Most proteins appeared in multiple sections; this situation is expected because many proteins exist as isoforms with different pI values. A certain degree of diffusion of high-abundant proteins is unavoidable during IEF, which also revealed by the silver staining in Figure 1.The IPG strip was divided into gel sections of arbitrary size.15 Data from several gel sections was dominated by highly abundant proteins, such as serum albumin precursor, apolipoprotein B-100 precursor and Serotransferrin precursor. These results were anticipated because these proteins are the most abundant proteins in human plasma. Furthermore, most human plasma proteins have glycosylated chains with microheterogeneous, and have various modifications. Therefore, a lot of plasma proteins have several isoforms with various pI values. Although the most abundant proteins were dominant in a few gel sections, as shown in Figure 3. IEF could enrich the
Figure 4. MS/MS spectra of the peptide R.GGTLSTPQTGSENDALYEYLR.Q and K.DPEAPIFQVADYGIVADLFK. V. (a) The MS/MS spectra of the peptide R.GGTLSTPQTGSENDALYEYLR.Q (M + 2, Xcorr 4.65, ∆Cn 0.62). (b) The MS/MS spectra of the peptide K.DPEAPIFQVADYGIVADLFK.V (M + 2, Xcorr 6.93, ∆Cn 0.71). 1270
Journal of Proteome Research • Vol. 4, No. 4, 2005
Human Plasma Proteome by IPG-IEF Coupled to MS
low abundant proteins by focusing the proteins to their pI position on the IPG strip. Only ∼300 µg (or ∼3 µL) of plasma was analyzed, 744 distinct human proteins were identified. When compared with other works to analysis plasma proteomics, which commonly used plasma amounts of milligram in order to identify more plasma proteins.5,39 In this work, we used only 300 µg of plasma proteins here to obtain 744 proteins identified with such strict criteria, indicating this method is sensitive enough to identify relatively low-level (approximately pg/mL) proteins (e.g., cytokines) coexisting with high-abundance proteins (e.g., mg/mL-level serum albumin). So it can be applied to analyze complex and exiguous sample. In this study, we found some low-level protein markers such as ovarian cancer related tumor marker CA125, serum amyloid A-4 protein precursor and tetranectin precursor etc. and some tissue leakage products which included many of the most important diagnostic markers. Many of them were identified by a single unique peptide. To confirm the reliability of these proteins, here we showed a couple of MS/MS spectra of two unique peptides which were not identified by another work in our lab.39 Tetranectin precursor, which is an independent prognostic marker in colorectal cancer,40 was identified by one unique peptide, R.GGTLSTPQTGSENDALYEYLR.Q (M + 2, Xcorr 4.65, ∆Cn 0.62). Electron-transfer flavoprotein R-subunit,mitochondrial precursor, defects in which will cause glutaric acidemias type II (GAII), was also identified by one unique peptide, K.DPEAPIFQVADYGIVADLFK.V (M + 2, Xcorr 6.93, ∆Cn 0.71).The MS/MS spectra of these two peptides were shown in Figure 4. This also elucidated the efficiency of enrichment of low abundant proteins by IPG-IEF, and the ultrahigh sensitivity and resolution of linear ion trap mass spectrometry. The in-gel IEF-LC-MS/MS method is able to analyze the high dynamic range sample, such as human plasma, which has a dynamic range of over 10 orders of magnitude in relative abundance. Furthermore, if a variety of IPG strips with different lengths and pH ranges is used, resolution can be further improved.41 In this study, 162 (21.8%) proteins were identified based on 2 or more unique tryptic peptides, which made the results more reliable. In one experiment with 300 µg of plasma proteins using LC-MS/MS, we found that 73(28.4%) proteins were identified by at least 2 unique peptides. This number could reach 86(23.4%) and 102(20.5%) after combined results of two and three parallel experiments directly by LC-MS/MS without fractionation. It seemed that the percentage of proteins based on at least 2 unique peptides did not increase but decreases after directly combined the output results of the repeating parallel experiments. However, the absolute number of proteins based on at least 2 unique peptides can be raised. Here, we identified more proteins based on at least 2 unique peptides by IEF-LC-MS/MS. This showed that the in-gel IEF-LC-MS/ MS method could identify more unique proteins with confidence. Physicochemical Characteristics of the Identified Proteins. The 744 identified proteins were classified according to their different physicochemical characteristics such as molecular weight (MW), isoelectric point (pI), hydrophobicity (GRAVY value), signal peptides, and transmembrane domains. The protein distribution patterns of the five characteristics are shown in Figure 5. In this work, we report the use of in gel IEF-LC-MS/MS method for the analysis of the human plasma proteins according to their physicochemical characteristics. For the 744
research articles
Figure 5. Physicochemical characteristics of the 744 identified plasma proteins. (a) MW distribution. (b) pI distribution. (c) GRAVY value distribution. (d) TM domains distribution. (e) Signal peptides distribution. Proteins were arranged by their number of unique peptides identified which could represent the abundance of proteins in a certain degree. In the first 50 proteins 36 (72%) proteins were predicted as secreted proteins, while in the total 744 proteins only 214 (29.1%) proteins have signal peptides.
proteins, 356 (47.7%) of them have MW within 10∼60 kDa, which are compatible with general 1D-PAGE or 2D-PAGE method. There are other 213 (28.6%) proteins have MW < 10 kDa or > 100 kDa, beyond the general 1D-PAGE or 2D-PAGE separation limits. The MW of the largest protein is 3816.2 kDa, which was identified by 2 unique peptides, K.EAEKTAVTKVVVAADK.A (M + 2, Xcorr 3.62, ∆Cn 0.3) and K.KEEAPPAKVPEJournal of Proteome Research • Vol. 4, No. 4, 2005 1271
research articles VPK.K (M + 2, Xcorr 2.22, ∆Cn 0.1). The classical plasma proteins are largely secreted by the liver and intestines. A key aspect of plasma proteins is a native molecular mass larger than the kidney filtration cutoff (∼45 kDa).2 In this study, 494 (66.4%) proteins were identified in human plasma have MW > 45 kDa, which are candidates for the classical plasma proteins. Regarding the pI distribution, the 744 proteins have distribution across a wide pI range (4.2∼12.2). Nearly all the proteins (97%) have distribution among pI 3∼10 intervals. Only 22 (3%) proteins have pI > 10 beyond the 3-10 NL IPG strip separation boundary. This also proved the effective separation of isoelectric focusing by immobilized pH gradient strips. Human blood plasma is a body fluid, its proteins are generally hydrophilic, thus they have negative GRAVY values. For the 744 proteins we identified, their GRAVY values vary in the range of -2.03∼0.84 as shown in Figure 5. Most of them (93.5%) have negative values and only 48 proteins (6.5%) have positive values. For prediction of classically secreted proteins, we used SignalP (version 3.0), which consists of two different updated predictors based on neural network (NN) and hidden Markov model algorithms (HMM).34 214 (29.1%) proteins have signal peptides of the 736 (98.9%) proteins predicted by SignalP. Proteins were arranged by their number of peptides hits identified, which could represent the abundance of proteins in a certain degree. More interesting is that in the first 50 proteins, 36 (72%) proteins were predicted as secreted proteins, in the first 100 proteins, 69 (69%) were predicted, in the first 200 proteins, 104 (52%) were predicted, and in the first 400 proteins 141 (35.3%) proteins were predicted as secreted proteins, as shown in Figure 5. This indicates that more tissue leakage products, which include many of the most important diagnostic markers, e.g., cardiac troponins, creatine kinase, or myoglobin used in the diagnosis of myocardial infarction,15 are identified with increased resolution of protein separation and identification strategy. Transmembrane (TM) domains were predicted by TMHMM. When a predicted TM region overlapped a predicted signal sequence, this was interpreted as a signal sequence only. In the total 744 proteins, 84 (11.3%) proteins have one or more predicted TM domain, in which 27 (3.6%) proteins have three or more TM domains. 38 proteins were predicted as secreted proteins in the 84 proteins as shown in Table 1. Recently, a large-scale effort, termed the Secreted Protein DiscoveryInitiative (SPDI), was undertaken to identify novel secreted and transmembrane proteins. These secreted and transmembrane proteins are known to have key roles in important biological processes such as morphogenesis, cellular differentiation, angiogenesis, apoptosis, the modulation of the immune response, as well as disease processes such as cancer progression.42
Conclusion The use of isoelectric focusing to fractionate proteins in human plasma at the first dimensional separation has been demonstrated. IEF-LC-MS/MS in conjunction with several prefractionation schemes could become the method of choice for multidimensional proteome separations and analysis, due to the high resolving power of this approach. Immobilized pH gradient gels (IPG strips) have excellent resolution, high loading capacity, and good reproductability. Moreover, when IPG strips with longer lengths and various pH ranges are used, or when they are cut into more sections, resolution can be improved 1272
Journal of Proteome Research • Vol. 4, No. 4, 2005
Tu et al.
drastically. The high reproducibility in the commercial production of a stationary pH gradient means that we can acquire the exact pI of the separated proteins without the need for external or internal standards. This would be interesting in relation to the separation of protein isoforms due to posttranslational modifications. And compared with other methods to analysis plasma proteomics, this in-gel IEF-LC-MS/MS method is most sensitive only consuming 300 µg proteins. The in-gel IEF-LC-MS/MS method is open to further modifications; it could be combined with other sample prefractionation methods such as subcellular fractionation to reduce the overall complexity of the protein mixture prior to analysis. Also, the complexity could be addressed at the peptide level by combining it with various separation approaches prior to IEF-LC-MS/MS. An exciting application of this technique is protein quantification by incorporating stable isotope labeling, such as the isotope-coded affinity tag (ICAT), and the stable isotopic labeling with amino acids in cell culture (SILAC). We are continuously developing the in-gel IEF-LC-MS/MS approach, and we are testing various modifications to increase the utility of this technique. Since it is a powerful and convenient method for the separation and identification of complicated samples such as human plasma, we are currently investigating this as a high-throughput and sensitive strategy for proteome profiling.
Acknowledgment. This work was supported by National High-Technology Project (2002BA711A11) and Basic Research Foundation (2002CB713807). Supporting Information Available: 1779 unique peptides identified in this work, corresponding to 744 distinct human proteins listed in Supplemental Table 1. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Righetti, P. G.; Castagna, A.; Herbert, B.; Reymond, F.; Rossier, J. S. Proteomics 2003, 3, 1397-1407. (2) Issaq, H. J.; Conrads, T. P.; Janini, G. M.; Veenstra, T. D. Electrophoresis 2002, 23, 3048-3061. (3) Shang, T. Q.; Ginter, J. M.; Johnston, M. V.; Larsen, B. S.; McEwen, C. N. Electrophoresis 2003, 24, 2359-2368. (4) Wang, M. Z.; Howard, B.; Campa, M. J.; Patz, E. F., Jr.; Fitzgerald, M. C. Proteomics 2003, 3, 1661-1666. (5) Xiao, Z.; Conrads, T. P.; Lucas, D. A.; Janini, G. M.; Schaefer, C. F.; Buetow, K. H.; Issaq, H. J.; Veenstra, T. D. Electrophoresis 2004, 25, 128-133. (6) Westermeier, R.; Postel, W.; Weser, J.; Gorg, A. J. Biochem. Biophys. Methods 1983, 8, 321-330. (7) Bjellqvist, B.; Ek, K.; Righetti, P. G.; Gianazza, E.; Gorg, A.; Westermeier, R.; Postel, W. J. Biochem. Biophys. Methods 1982, 6, 317-339. (8) Bjellqvist, B.; Sanchez, J. C.; Pasquali, C.; Ravier, F.; Paquet, N.; Frutiger, S.; Hughes, G. J.; Hochstrasser, D. Electrophoresis 1993, 14, 1375-1378. (9) Corbett, J. M.; Dunn, M. J.; Posch, A.; Gorg, A. Electrophoresis 1994, 15, 1205-1211. (10) Yu, L. R.; Wang, N.; Wu, G. D.; Xu, Y. H.; Xia, Q. C. Chin. Sci. Bull. 2000, 45, 1113-1122. (11) Poland, J.; Bohme, A.; Schubert, K.; Sinha, P. Electrophoresis 2002, 23, 4067-4071. (12) Castellanos-Serra, L.; Vallin, A.; Proenza, W.; Le Caer, J. P.; Rossier, J. Electrophoresis 2001, 22, 1677-1685. (13) Giorgianni, F.; Desiderio, D. M.; Beranova-Giorgianni, S. Electrophoresis 2003, 24, 253-259. (14) Pieper, R.; Gatlin, C. L.; Makusky, A. J.; Russo, P. S.; Schatz, C. R.; Miller, S. S.; Su, Q.; McGrath, A. M.; Estock, M. A.; Parmar, P. P.; Zhao, M.; Huang, S. T.; Zhou, J.; Wang, F.; Esquer-Blasco, R.; Anderson, N. L.; Taylor, J.; Steiner, S. Proteomics 2003, 3, 13451364. (15) Anderson, N. L.; Anderson, N. G. Mol. Cell Proteomics 2002, 1, 845-867.
research articles
Human Plasma Proteome by IPG-IEF Coupled to MS (16) Petricoin, E. F.; Ardekani, A. M.; Hitt, B. A.; Levine, P. J.; Fusaro, V. A.; Steinberg, S. M.; Mills, G. B.; Simone, C.; Fishman, D. A.; Kohn, E. C.; Liotta, L. A. Lancet 2002, 359, 572-577. (17) Petricoin, E. F., 3rd; Ornstein, D. K.; Paweletz, C. P.; Ardekani, A.; Hackett, P. S.; Hitt, B. A.; Velassco, A.; Trucco, C.; Wiegand, L.; Wood, K.; Simone, C. B.; Levine, P. J.; Linehan, W. M.; EmmertBuck, M. R.; Steinberg, S. M.; Kohn, E. C.; Liotta, L. A. J. Natl. Cancer Inst. 2002, 94, 1576-1578. (18) Xiao, Z.; Luke, B. T.; Izmirlian, G.; Umar, A.; Lynch, P. M.; Phillips, R. K.; Patterson, S.; Conrads, T. P.; Veenstra, T. D.; Greenwald, P.; Hawk, E. T.; Ali, I. U. Cancer Res 2004, 64, 2904-2909. (19) Pusztai, L.; Gregory, B. W.; Baggerly, K. A.; Peng, B.; Koomen, J.; Kuerer, H. M.; Esteva, F. J.; Symmans, W. F.; Wagner, P.; Hortobagyi, G. N.; Laronga, C.; Semmes, O. J.; Wright, G. L., Jr.; Drake, R. R.; Vlahou, A. Cancer 2004, 100, 1814-1822. (20) Joo, W. A.; Kang, M. J.; Son, W. K.; Lee, H. J.; Lee, D. Y.; Lee, E.; Kim, C. W. Proteomics 2003, 3, 2402-2411. (21) Diamandis, E. P. J. Natl. Cancer Inst. 2004, 96, 353-356. (22) Pieper, R.; Su, Q.; Gatlin, C. L.; Huang, S. T.; Anderson, N. L.; Steiner, S. Proteomics 2003, 3, 422-432. (23) Anderson, N. L.; Polanski, M.; Pieper, R.; Gatlin, T.; Tirumalai, R. S.; Conrads, T. P.; Veenstra, T. D.; Adkins, J. N.; Pounds, J. G.; Fagan, R.; Lobley, A. Mol. Cell Proteomics 2004, 3, 311-326. (24) Shen, Y.; Jacobs, J. M.; Camp, D. G., 2nd; Fang, R.; Moore, R. J.; Smith, R. D.; Xiao, W.; Davis, R. W.; Tompkins, R. G. Anal. Chem. 2004, 76, 1134-1144. (25) Ding, S. J.; Li, Y.; Shao, X. X.; Zhou, H.; Zeng, R.; Tang, Z. Y.; Xia, Q. C. Proteomics 2004, 4, 982-994. (26) Jiang, X. S.; Zhou, H.; Zhang, L.; Sheng, Q. H.; Li, S. J.; Li, L.; Hao, P.; Li, Y. X.; Xia, Q. C.; Wu, J. R.; Zeng, R. Mol. Cell Proteomics 2004. (27) Yates, J. R., 3rd; Eng, J. K.; McCormack, A. L.; Schieltz, D. Anal. Chem. 1995, 67, 1426-1436. (28) Yates, J. R., 3rd; Carmack, E.; Hays, L.; Link, A. J.; Eng, J. K. Methods Mol. Biol. 1999, 112, 553-569. (29) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R., 3rd Nat. Biotechnol. 1999, 17, 676-682.
(30) Yates, J. R., 3rd; Link, A. J.; Schieltz, D. Methods Mol. Biol. 2000, 146, 17-26. (31) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. J. Proteome Res. 2003, 2, 43-50. (32) Kyte, J.; Doolittle, R. F. J. Mol. Biol. 1982, 157, 105-132. (33) Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E. L. J. Mol. Biol. 2001, 305, 567-580. (34) Dyrlov Bendtsen, J.; Nielsen, H.; Von Heijne, G.; Brunak, S. J. Mol. Biol. 2004, 340, 783-795. (35) Nielsen, H.; Krogh, A. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1998, 6, 122-130. (36) Nielsen, H.; Engelbrecht, J.; Brunak, S.; von Heijne, G. Protein Eng. 1997, 10, 1-6. (37) Rabilloud, T.; Kieffer, S.; Procaccio, V.; Louwagie, M.; Courchesne, P. L.; Patterson, S. D.; Martinez, P.; Garin, J.; Lunardi, J. Electrophoresis 1998, 19, 1006-1014. (38) Le Bihan, T.; Pinto, D.; Figeys, D. Anal. Chem. 2001, 73, 13071315. (39) Jin, W. H.; Dai, J.; Li, S. J.; Xia, Q. C.; Zou, H. F.; Zeng, R. J. Proteome Res. 2005, 4, in press. (40) Hogdall, C. K.; Christensen, I. J.; Stephens, R. W.; Sorensen, S.; Norgaard-Pedersen, B.; Nielsen, H. J. Apmis 2002, 110, 630-638. (41) Gorg, A.; Obermaier, C.; Boguth, G.; Harder, A.; Scheibe, B.; Wildgruber, R.; Weiss, W. Electrophoresis 2000, 21, 1037-1053. (42) Clark, H. F.; Gurney, A. L.; Abaya, E.; Baker, K.; Baldwin, D.; Brush, J.; Chen, J.; Chow, B.; Chui, C.; Crowley, C.; Currell, B.; Deuel, B.; Dowd, P.; Eaton, D.; Foster, J.; Grimaldi, C.; Gu, Q.; Hass, P. E.; Heldens, S.; Huang, A.; Kim, H. S.; Klimowski, L.; Jin, Y.; Johnson, S.; Lee, J.; Lewis, L.; Liao, D.; Mark, M.; Robbie, E.; Sanchez, C.; Schoenfeld, J.; Seshagiri, S.; Simmons, L.; Singh, J.; Smith, V.; Stinson, J.; Vagts, A.; Vandlen, R.; Watanabe, C.; Wieand, D.; Woods, K.; Xie, M. H.; Yansura, D.; Yi, S.; Yu, G.; Yuan, J.; Zhang, M.; Zhang, Z.; Goddard, A.; Wood, W. I.; Godowski, P.; Gray, A. Genome Res. 2003, 13, 2265-2270.
PR0497529
Journal of Proteome Research • Vol. 4, No. 4, 2005 1273